Developing Deep Learning Models for Pathology Analysis
Posted 9th March 2020 by Liv Sewell
Ahead of the 6th Digital Pathology & AI Congress: USA, Dr Saeed Hassanpour introduces us to the subject of his presentation: the opportunities and challenges in developing deep learning based tools for histology.
Harnessing advances in artificial intelligence (AI) for pathology
In the last decade, there has been massive progress in the artificial intelligence (AI) field, particularly in the domain of deep learning. This progress presents new opportunities for various domains dealing with images, particularly medical imaging. At the Hassanpour lab, we are harnessing advances in AI to enable pathologists to analyze and understand their data. The applications are particularly applicable for histology images. We are developing deep learning models that can be used for histological characterization of microscopy images which is critical for diagnosis and prognosis and treatment of many patients.
There is a huge volume of histology slides produced every day and there are simply not enough pathologists, especially in developing countries or rural settings, to read the slides and make meaningful conclusions for clinical practice. Even if there were enough pathologists, reading the slides and manually characterizing these slides is a difficult and subjective task which is a bottleneck in the characterization of histology slides. The process would benefit from deep learning models that can accurately and reliably characterize and classify digital pathology images and that is where our work comes in. Of course, building these models requires the domain expertise of pathologists, replicating verifications, and technical innovations to benefit clinicians.
From computational models to precision medicine: developing tools for pathologists and clinicians
We have been building an attention-based deep neural network model which looks very promising. The model is of particular use in overcoming the shortage in pathologists because it does not require any new detailed annotations from pathologists in its building. It relies on the whole slide-level annotations that can be retrieved from pathology reports and electronic medical records.
We have also developed and published generative image translation models for histology data augmentation. These generative adversarial models generate synthetic near-real images for rare classes, based on images from more common classes, to improve deep learning models, and ultimately overcome the class imbalance bottlenecks in datasets.
We have been training models to characterize whole slides and we are now running prospective clinical trials to test the integration of an AI model with an easy-to-use neural network visualization and graphical interface. We want to show that the use of this system will increase the efficiency and accuracy of pathologists in reading the slides in the lab. A key issue for harnessing AI technology to improve the lab process is avoiding the ‘black box’ effect, which can accompany deep learning-based tools, and making the model’s decision-making process transparent. We are using neural network visualization methods to identify the decisive features and regions that contributed to the decision making of the model and developing an easy-to-use graphical visual interface that makes this information accessible. Pathologists could use the tool to support clinical decision making – for example, to pre-screen or as a second opinion.
We are also developing a precision medicine model that will combine information extracted from histology with clinical history, family history, genetic information, and other biomarkers for comprehensive risk assessment. The model will be able to predict prognosis and also identify the best course of treatment for patients, especially in the domain of cancer.
Bottlenecks, data, and communication: The challenges in developing deep learning models for pathology image analysis
The majority of our work is focused on supervised learning, and that means that we need reliable annotations to be able to train these models. For these annotations to be reliable, they need to be generated by domain experts, clinicians, and pathologists which is a time-consuming task and can be a challenging bottleneck in developing the models.
Accessing high-quality, balanced, datasets to train and validate our models can be tough. For example, for subtype classification of different types of tumors, we need to access high-quality balanced datasets with the distribution of data-driven by different factors, such as by the prevalence of subtypes. Sometimes these subtypes are rare, and they are difficult to gain access to. In addition, it would be great if we could train our model on our training set and then validate it on multi-institutional datasets. Access to such large and diverse datasets is a challenge in the healthcare domain. We are currently working on that – by getting access to large scale multi-solutional datasets from different organizations.
For these models to actually assist clinical practice, we need to establish their reproducibility and generalizability, and clinical trials are the next step for us. We hope that the clinical trials and published results will make progress in one of the biggest challenges in this domain: building trust. We are working to build and communicate explainable models that have transparent processes and understandable limitations to enable clinicians and patients to trust AI tools in diagnosis.
Other challenges are technical: usually, histology images are high-resolution and the sheer size of the histology images compared to general images is a technical challenge. But recent progress in both hardware and software, for example upgrading to the NVIDIA Quadro RTX 8000 GPU cards, has reduced the bottleneck we experienced because of the improvement in memory capacity and efficiency.
The future: innovation and transforming patient care
With one of the important challenges in this domain being establishing reproducibility and generalizability of these models, as I look to the future, I would love to see more consortiums and initiatives working together to build publicly available, high-quality, multi-institutional datasets that can be used for developing and validating image analysis models. This would be hugely useful for research in this domain, it would fuel innovation, enable the development of new algorithms and models, and ultimately improve patient care.
Saeed Hassanpour is Associate Professor in the Department of Biomedical Data Science, Computer science, and Epidemiology at Dartmouth College in New Hampshire.
Leave a Reply