UK +44 (0)1865 849841
Malaysia +60 3 2117 5193

Machine Learning Innovation to aid Clinical Decision Making in Pathology

“A medical image alone doesn’t add value unless it is tied to a clinical decision-making process” – Jochen Lennerz

The Breast Cancer Scanning Initiative (BCSI) scans histological slides of patients with high-risk breast lesions to generate annotated images. We then apply a Machine Learning algorithm to the slides to determine whether surgery is necessary.

High-Risk Lesions: Surgery or Not?

When a woman feels a lump in her breast or mammography screening identifies a lesion, a biopsy is performed to delineate whether or not it is cancer. When cancer is ruled-out, a set of lesions is known to be associated with high risk for breast cancer or progression to invasive cancer; these are called ‘high-risk lesions’.

The affected women and her treating physician need to make a difficult management decision: to undergo surgery or to watch and wait. Having been diagnosed with a high-risk lesion is frightening and understandably most woman opt for surgery just to know what it was and to “take it all out”.

In 10-12% of cases, it is either cancer or something else considered high risk so can be classified as “necessary”. However, from another perspective, you could say that almost 90% of surgeries are unnecessary and come with all sorts of complications (medical and aesthetic), as well as a financial burden.

The clinical decision is a fundamental management problem and represents an ideal setting to apply a Machine Learning algorithm to determine what is or isn’t leading to a “necessary” or “unnecessary” surgery. Regina Barzilay (Massachusetts Institute of Technology) and a team at Massachusetts General Hospital have built a machine learning algorithm that is able to reliably predict upgrading.

Specifically, 30% of surgeries of benign diseases could have been avoided while at the same time identifying 97.4% of malignancies (Bahl et al., 2017 Radiology). Interestingly, one of the strongest predictors in this model was the pathology report. Currently, however, the algorithm only takes the pathologic diagnosis into account but not the pathologic, histologic image.

The BCSI is seeking to supplement this existing Machine Learning model with images derived from digital slides. The promise is that once digitised, the histologic features can be part of the clinical decision-making process.

At first glance this sounds straightforward: scanning thousands of slides from these patients and feeding it to the algorithm. However, when it comes to realizing this, the combination of physical storage, generating high-quality pixel and metadata is far more complicated.

Importantly, a medical image alone doesn’t add value unless it contains relevant metadata and is tied to a clinical decision-making process. This is a challenge because there are thousands of slides that have to be sorted, annotated, and then scanned to develop a ground truth which can then be used to supplement the model.

Challenges and Hurdles in the BCSI

There are really two parts to the project. First, creating the scans and annotations, and second, applying the model. From an overall work-effort, once the scans are created and annotated, applying the model can be achieved relatively quickly.

When it came to creating the scans, there were two major hurdles that we had to overcome. The first was the generation of pixel data, and the second was the metadata.

  1. Pixel Data

It’s clear from some of the bigger studies that creating well-annotated data sets for machine learning models equates to 80-85% of the work: bringing the data into the correct format, cleaning out what isn’t informative, sorting through the files, applying naming conventions and essentially creating a data model that works for the model in question.

  1. Metadata

The biggest learning point from my perspective is the integral importance of the metadata.

For example, one way of viewing the metadata is to label the images case 1, 2, 3, etc. However, case 1 will have a specific diagnosis and the metadata will have to account for the diagnosis, into what data group(s) it falls, and which pixels on the slide determine this.

The annotation is easy when you point out the lesion to someone looking through the microscope or at your screen; however, annotating a digitized slide in a way that the data itself clearly define relevant pixel information without human supervision is currently the biggest hurdle.

Seeking data standardisation

The lack of a standardized filed format is currently a major problem – in particular when it comes to using these models for clinical diagnosis. Individual pathologists may use different nuances to refer to the same thing with different emphases. As a pathologist, I love this creativity, but the dichotomy is that we need standardisation in order to access the data in a more discrete way.

Creating more discrete datasets are not only a lot of work, but also illustrate the fundamental change in data annotation in pathology practice. The critical issue is that without the metadata, the pixel data is essentially useless. For example, a ‘case’ is referred to as breast cancer; however, where exactly is the cancer (i.e., in which block, slide, level, and region)?

When it comes to pixel data annotation, the separation of metadata and pixel data seems inadequate. This emphasises the critical need for more integrated solutions. The BCSI team currently works on implementing the DICOM standard to convert pixel and metadata into a DICOM compliant files (Herrmann et al., 2018). DICOM allows efficient access to image data as well as associated metadata. however, requires a bit more computational expertise that other proprietary solutions.

Finally, it is vital to have these solutions build into existing interoperable infrastructure solutions – in particular when trying to move to clinical practice. Given that we are not only replacing the microscope but the entire mode of histopathology-based diagnostic pathology, it is important to provide scientific input into regulatory decision making. We have recently started an initiative to gain input from a variety of subject matter experts from industry, academia, patient-advocacy and regulatory bodies.

We hope that the BCSI will provide a concrete clinical use-case to explore regulatory pathways and serve as a blueprint to move the field forward and improve patient care.


Jochen Lennerz is Associate Chief, Department of Pathology at Massachusetts General Hospital.


Join Jochen at the 6th Digital Pathology & AI Congress: Europe to learn about new innovations in this exciting new realm of pathology. Download the agenda now.


Leave a Reply

Subscribe to Our Newsletter

Get free reports and resources from our world class speakers.
  • This field is for validation purposes and should be left unchanged.

Life Sciences Twitter Feed