According to a recent study published in the BMJartificial intelligence (AI) is not yet prepared to replace doctors and conduct complicated surgeries or tests.
Researchers in the UK claim that a robot program failed a significant radiography test that serves as a qualifying standard for medical students.
AI is increasingly being utilized by doctors to perform specific jobs, such as reading radiographs, X-rays, and scans to assist in the diagnosis of a variety of illnesses.
The fact that the software in this study failed one of the required radiological exams suggests that the technology is not yet prepared to take the place of human doctors in more serious cases.
A commercially accessible AI tool was tested against 26 radiologists, the majority of whom were between the ages of 31 and 40. The participants who were humans made up 62% of the population.
The Fellowship of the Royal College of Radiologists (FRCR) exam was passed by each candidate last year. UK trainees must pass the exam in order to become radiology consultants.
Based on one of the three modules that make up the qualifying FRCR paper, the study authors created ten “mock” rapid reporting exams to gauge candidates’ accuracy and speed.
Each practice test has 30 radiographs with the same degree of complexity and range of knowledge as the actual FRCR exam, if not more.
Candidates had 35 minutes to properly interpret at least 27 (90%) out of 30 photos in order to pass. The AI candidate was trained by researchers to evaluate radiographs of the chest and bones (musculoskeletal) for a variety of problems, such as fractures, swelling and dislocated joints, and collapsing lungs.
The AI passed two out of 10 simulated FRCR exams and achieved an average overall accuracy of 79.5% after excluding photos that were unable to be read from the study. The average radiologist, on the other hand, passed four out of 10 mock exams with an average accuracy of 84.8%.
The AI candidate’s sensitivity, or its ability to accurately identify patients with a given disease, was 83.6%, compared to 84.1% for the radiologists put to the test.
The specificity, or the ability to properly identify people free of a particular condition, was 75.2% for the AI and 87.34% for all the humans who took the tests.
In 134 (91%) of the questions, the AI candidate was correct; in the other 14 (9%), it was wrong. In 20 out of 300 radiographs that over half of radiologists interpreted incorrectly, the AI candidate was incorrect in 10 (50%) and correct in the remaining half.
Researchers discovered that radiologists overestimated the performance of the AI, thinking it would outperform them in at least three of the 10 mock tests and perform almost as well as them on average. The researchers acknowledge that this was not the case.
“On this occasion, the artificial intelligence candidate was unable to pass any of the 10 mock examinations when marked against similarly strict criteria to its human counterparts, but it could pass two of the mock examinations if special dispensation was made by the RCR to exclude images that it had not been trained on,” the researchers wrote in a media release.
The researchers “strongly suggested” more training and review, especially for instances the AI deemed “non-interpretable,” such as abdominal radiographs and those of the axial skeleton, or the bones of a vertebrate’s head and trunk.
Although scientists claim that at this stage of the technology, human input is still essential, AI may facilitate the work of doctors. In order to address a variety of healthcare demands, researchers say using artificial intelligence “has untapped potential to further assist efficiency and diagnostic accuracy.”
However, the experts say that doing so appropriately, “implies educating physicians and the public better about the limitations of artificial intelligence and making these more transparent.”
The experts point out that because the team only used mock tests that weren’t timed or watched, radiologists might not have felt as much pressure to perform well as they would in a genuine exam.
This study, on the other hand, provides a wide variety of scores and outcomes for analysis and is one of the more thorough cross-comparisons between radiologists and AI.