AI fails radiology qualifying exam, human touch still superior


Human radiologists still do a better job at accurately interpreting scans such as this MRI than their artificial intelligence counterparts. — Bloomberg

Artificial Intelligence (AI) is currently unable to pass one of the qualifying radiology examinations, suggesting that this promising technology is not yet ready to replace doctors, finds a study in the Christmas issue of The BMJ.

AI is increasingly being used for some tasks that doctors do, such as interpreting radiographs (x-rays and scans) to help diagnose a range of conditions.

But can AI pass the Fellowship of the Royal College of Radiologists (FRCR) examination, which United Kingdom trainees must do to qualify as radiology consultants?

To find out, researchers compared the performance of a commercially available AI tool with 26 radiologists (aged between 31 and 40 years; 62% female) all of whom had passed the FRCR exam the previous year.

They developed 10 "mock" rapid reporting exams, based on one of three modules that make up the qualifying FRCR examination, designed to test candidates for speed and accuracy.

Each mock exam consisted of 30 radiographs at the same or a higher level of difficulty and breadth of knowledge expected for the real FRCR exam. To pass, candidates had to correctly interpret at least 27 (90%) of the 30 images within 35 minutes.

The AI candidate had been trained to assess chest and bone (musculoskeletal) radiographs for several conditions including fractures, swollen and dislocated joints, and collapsed lungs.

Allowances were made for images relating to body parts that the AI candidate had not been trained in, which were deemed “uninterpretable.”

When uninterpretable images were excluded from the analysis, the AI candidate achieved an average overall accuracy of 79.5% and passed two of 10 mock FRCR exams, while the average radiologist achieved an average accuracy of 84.8% and passed four of 10 mock examinations.

The sensitivity (ability to correctly identify patients with a condition) for the AI candidate was 83.6% and the specificity (ability to correctly identify patients without a condition) was 75.2%, compared with 84.1% and 87.3% across all radiologists.

Across 148 out of 300 radiographs that were correctly interpreted by more than 90% of radiologists, the AI candidate was correct in 134 (91%) and incorrect in the remaining 14 (9%).

In 20 out of 300 radiographs that over half of radiologists interpreted incorrectly, the AI candidate was incorrect in 10 (50%) and correct in the remaining 10.

Interestingly, the radiologists slightly overestimated the likely performance of the AI candidate, assuming that it would perform almost as well as themselves on average and outperform them in at least three of the 10 mock exams.

However, this was not the case.

The researchers say: “On this occasion, the AI candidate was unable to pass any of the 10 mock examinations when marked against similarly strict criteria to its human counterparts, but it could pass two of the mock examinations if special dispensation was made by the RCR to exclude images that it had not been trained on.”

These are observational findings and the researchers acknowledge that they evaluated only one AI tool and used mock exams that were not timed or supervised, so radiologists may not have felt as much pressure to do their best as one would in a real exam.

Nevertheless, this study is one of the more comprehensive cross comparisons between radiologists and AI, providing a broad range of scores and results for analysis.

Further training and revision are strongly recommended, they add, particularly for cases the AI considers “non-interpretable,” such as abdominal radiographs and those of the axial skeleton.

AI may facilitate workflows, but human input is still crucial, argue researchers in a linked editorial.

They acknowledge that using AI “has untapped potential to further facilitate efficiency and diagnostic accuracy to meet an array of healthcare demands” but say doing so appropriately “implies educating physicians and the public better about the limitations of AI and making these more transparent.”

The research in this subject is buzzing, they add, and this study highlights that one foundational aspect of radiology practice – passing the FRCR examination necessary for the licence to practise – still benefits from the human touch.

Follow us on our official WhatsApp channel for breaking news alerts and key updates!
   

Next In Health

Does taking vitamin C when you're sick really work?
Why rescue blankets have a silver and a gold side
'Engage your core!' – but how do you really do that?
Region-specific diets around China recommended to combat obesity
It’s tough becoming a dad – and they get little support too
A tired brain leads to bad decisions and bad behaviour
Diabetes: Empowering communities with peer support and social media
Lower your blood pressure by swapping just a few minutes of sitting for exercise
When uncontrolled diabetes causes you to go blind
New ways to combat the threat of antimicrobial resistance (AMR)

Others Also Read