In addition, they show a counter-intuitive scaling limit: their reasoning work raises with difficulty complexity around a degree, then declines despite possessing an ample token spending budget. By evaluating LRMs with their normal LLM counterparts under equal inference compute, we determine a few effectiveness regimes: (1) small-complexity duties wherever https://www.youtube.com/watch?v=snr3is5MTiU