In March 2023, GPT-4 was launched and received widespread praise for its substantial improvement over GPT-3.5, the previous best model by OpenAI, which had powered ChatGPT’s initial release. Nevertheless, there are valid reasons to be sceptical about GPT-4’s ability to reason. This paper critiques the current formulation of reasoning problems in the NLP community and the evaluation of LLM reasoning performance. The paper also introduces 21 reasoning problems and performs a comprehensive analysis of GPT-4’s performance on those problems. The paper concludes that, despite demonstrating flashes of analytical brilliance, GPT-4 is currently incapable of reasoning.







