Four challenges in developing AI algorithms for medical imaging
Posted 10th May 2019 by Joshua Broomfield
Unsurprisingly, there is a lot of hype surrounding AI. Available deep learning packages make it so easy to create models and so we can expect lots of them to emerge. Anyone able to access sufficiently labelled data can start building models.
With so much emerging on the scene and the inherent technical challenges involved in extracting data from images, what we really want to be focusing on is this question: how robust and reliable is the application itself?
When seeking to answer this question, four main challenges present themselves.
1. The generalizability of deep learning models
Anyone who builds AI models is building them from limited data that is assumed to be representative of the future. When speaking about deep learning as a methodology, although the models are called ‘artificial intelligence’, they don’t have intelligence per se. They are trained to produce an output based on the raw data on which they were trained. The initial question is “how well do these models work on the data they have been seeing where they were trained?”
The challenge is then if a model is taken and run on new data, is it going to perform as well? For example, if a model is built on data acquired in one part of the world and then taken to an institution in another part of the world will it produce the correct output?
If the data from which deep learning models are built are representative of all the other data that it’s going to see, it is likely to work very well. But populations vary tremendously around the globe, and even within a given geographic region. The population of patients seen in private practice, for example, is different than seen in tertiary care centers.
Also, the distribution of diseases varies – if a rare disease is not well represented in a training set but is more common where the model is applied, then it may not perform as well.
This problem is referred to as the ‘generalizability of models’, and there have already been some published studies showing that the performance of AI models decreases in new datasets compared to the data in which they were developed and validated.
I see generalizability as probably the top priority challenge to address because no clinician is going to want to trust a model that isn’t going to work as well as advertised or assessed when it was developed.
2. The acquisition of sufficient data for representative models
A related challenge that developers face is how to acquire sufficient amounts of data to build representative models?
It’s expensive to acquire data; companies pay hospitals or collaborators to contribute and generate large data sets. But ideally, you want to get your hands on data from everywhere, all over the world.
Building any of these AI algorithms requires a lot of data because they’re complicated models. They have many parameters and overfitting is a very common problem, which can only be avoided by getting your hands on preferably millions of examples.
How can you do that? That’s a major challenge. Institutions don’t share data willingly for free because it’s expensive to acquire, and there’s also regulatory and privacy concerns.
3. Collecting multiple types of data
Another challenge is even if you got your hands on enough data, it’s generally going to be only one type of data, such as image data. But only a certain number of medical problems are solvable by looking only at images.
If it’s a question of detecting a disease from an image – detecting a classified cancer type out of a pathology image, or detecting an abnormality out of a radiology image – then a single data type is probably sufficient. But many problems need data from more than just a single modality. Particularly in problems like a clinical prediction, more context about the patients provided by medical records data and pathology is usually needed.
As soon as you start talking about getting your hands on multiple types of data, the challenge is of getting that data in large scale explodes tremendously, especially when talking about multiple modalities. This is much more difficult because you need linked data across data types in each patient. Patient data is fragmented: e.g. images are stored in one place, and clinical data in another system entirely. Linking them all together requires enormous institutional effort. When that is scaled up across multiple institutions, it becomes extraordinarily challenging.
4. Lacking the infrastructure to assess algorithms
An increasingly important challenge is that most clinical settings don’t have the computational infrastructure to assess how well an algorithm is going to work in their population. Hospitals are generally not equipped to collect data to assess how well AI algorithms are working. This is a very important challenge to be addressed because if these algorithms are being used in practice, it’s going to be important to know that they are effective since many AI algorithms do not necessarily generalize, as noted above.
Identifying and overcoming the foregoing challenges will be crucial in the development of AI models which are measurably and reproducibly beneficial in clinical settings.
To use mammography as an example, in the early days when computer-assisted diagnosis (CAD) came on the scene, people were not systematically measuring its impact on clinical outcomes. There was a significant amount of hype about the value of these algorithms and many practices adopted them on the promise of their benefit based on FDA approval.
However, subsequently researchers undertook historical studies looking at the impact of these algorithms on clinical outcomes. It was found out that these algorithms produced many false positives, and they were not improving clinical outcomes.
Without systematic measurement of accuracy, robustness, and reliability, deep learning models could potentially have no benefit or even detrimental impact on patient care. I believe that tackling the challenges through research and adoption of processes to measure the clinical impact outlined above will enable us to avoid the hazards and produce beneficial AI algorithms.
Daniel Rubin is Professor of Biomedical Data Science and Radiology at Stanford University, USA.
Join us at the 5th Digital Pathology & AI Congress: USA to learn how computer-augmented diagnostic algorithms enhance diagnosis and research. Download the agenda to see which experts will be speaking.
Leave a Reply