Other than playing board games with fixed and limited rules and variations, RL has found pretty much no production grade applications in the Real World. Why is that?
Because it is prohibitively compute intensive. In real world applications the number of samples it needs to learn from grows exponentially and tends to infinity.
That is why RL came and went without anyone even noticing.
We have implemented RL deterministically requiring only a modest polynomial number of samples. So we can deploy RL to real world applications in the industry.