Do you know ChatGPT succeeds on the US Medical Licensing Exam? According to two recent articles, two artificial intelligence (AI) algorithms, including ChatGPT, have passed the U.S. Medical Licensing Examination (USMLE).
The papers discussed several strategies for employing large language models for taking the USMLE, which consists of the Step 1, Step 2 CK, and Step 3 tests.
Based on queries from human users, ChatGPT is an artificial intelligence (AI) search tool that imitates long-form writing. It was created by OpenAI, and it gained popularity after multiple social media articles outlined potential applications for the technology in healthcare settings, frequently with unsatisfactory outcomes.
The first study examined ChatGPT’s performance on the USMLE without any additional training or reinforcement before the examinations, and it was released on medRxiv in December. The findings revealed “new and startling evidence” that this AI technology was capable of meeting the challenge, according to Victor Tseng, MD, of Ansible Health in Mountain View, California, and colleagues.
ChatGPT was able to perform at >50% accuracy on all of the assessments, and even reached 60% in most of the analyses, according to Tseng and company. Although the USMLE passing score varies from year to year, according to the authors, passing is often around 60%.
They reported that ChatGPT was able to show “a high level of concordance and insight in its explanations,” noting that the tool “performed at or near the passing threshold for all three assessments without any specialised training or reinforcement.”
These findings “indicate that massive language models may have the potential to support clinical decision-making and medical education.”
The performance of another large language model, Flan-PaLM, on the USMLE was assessed in the second paper, which was also released on arXiv in December. According to AI researcher Vivek Natarajan and colleagues, the main distinction between the two models was that this one was significantly altered to prepare for the tests using a collection of medical question-and-answer databases known as the MultiMedQA.
When answering USMLE questions, Flan-PaLM had an accuracy rate of 67.6%, which was nearly 17 percentage points better than the previous record set by PubMed GPT.
Large language models “provide a tremendous opportunity to rethink the development of medical AI and make it easier, safer, and more egalitarian to employ,” Natarajan and company found.
New research publications exploring the utility of the technology in medicine have begun to include ChatGPT and other AI systems as subjects, and occasionally as co-authors.
Naturally, healthcare professionals have voiced their concerns about these developments as well, particularly in light of the fact that ChatGPT is named as an author on research articles. A recent article in Nature brought attention to the unease among prospective coworkers and authors of the developing technology.
One argument against the employment of AI tools in research focused on whether these programmes could actually contribute anything of value to the field of scholarship, while another focused on the fact that these tools can’t consent to be co-authors in the first place.
According to the Nature report, the editor of one of the papers who put ChatGPT as an author indicated it was a mistake that will be fixed. Nevertheless, experts have since written a number of publications praising these AI programmes as helpful resources for clinical decision-making, medical research, and even educational purposes.
However, Natarajan and colleagues’ initial hope was that their findings would “spark further conversations and collaborations between patients, consumers, AI researchers, clinicians, social scientists, ethicists, policymakers, and other interested people in order to responsibly translate these early research findings to improve healthcare.” Natarajan and colleagues’ paper concluded that large language models could become useful tools in medicine.