A counterexample for Q-Learning, discussed in "Non-delusional Q-learning and value-iteration."
Lu, Tyler, Dale Schuurmans, and Craig Boutilier. "Non-delusional Q-learning and value-iteration." Advances in Neural Information Processing Systems. 2018.