ChatGPT Fails to Recognize Retracted or Discredited Research in Literature Reviews, Study Finds
A new study has revealed that ChatGPT often fails to recognize when academic articles have been retracted or contain serious errors, raising concerns about its reliability in academic research. The research, led by Professor Mike Thelwall and Dr. Irini Katsirea and published in the journal Learned Publishing, is part of a broader project on unreliable science and media misrepresentation that began in October 2024. The team analyzed 217 high-profile academic studies that had been retracted or were otherwise considered concerning, all of which had high altmetric scores indicating significant public or academic attention. They asked ChatGPT to evaluate the quality of each article 30 times, resulting in 6,510 assessments. Across all evaluations, ChatGPT never mentioned that any of the articles had been retracted or were known to contain errors. Instead, it rated 190 of them as high quality—describing them as world-leading or internationally excellent. The few negative critiques ChatGPT offered were based on academic weaknesses such as poor methodology or weak conclusions, not on the fact that the studies had been retracted or discredited. In just five cases, the model noted that the topic of the article was controversial, but did not flag the retraction or scientific flaws. In a follow-up test, the researchers extracted 61 claims from the retracted papers and asked ChatGPT whether each was true, repeating the query 10 times per claim. The model responded affirmatively or with a positive confirmation in two-thirds of the cases. This included at least one statement that had been proven false over ten years prior. The findings highlight a serious risk in using large language models like ChatGPT for literature reviews or academic analysis. The researchers stress that users must verify information from these systems, even when the responses sound confident and well-informed. Professor Thelwall expressed concern over the results, stating that the failure of ChatGPT to recognize retracted research is alarming. He hopes the study will prompt developers to improve the reliability of AI systems and serve as a caution to users to approach AI-generated content with skepticism.