UK +44 (0)1865 849841
Malaysia +603 2731 9335

Digital pathology and bioinformatics: two sides of the same story

The last decade saw an impressive uptake of digital pathology technologies fueled by technological and methodological developments in whole slide image acquisition, processing and analysis. Besides the continuous improvements in scanner technology – coupled with decreasing storage prices – the development of machine learning methods led to a new generation of algorithms for image segmentation and interpretation.

However, digital pathology is but one of the many modalities available for investigating a given biological reality (Figure 1). Take, for example, a case-control study for discovering a new biomarker in cancer. More often than not, from a single case, several different samples are obtained and used to produce molecular profiles, pathology and immunohistochemical slides, etc., data which is accompanied by clinical and patient survival information.

From an analytical perspective, finding biomarkers by combining all these data is an extremely challenging problem – not only due to its high dimensionality but also because of the difficulty of bringing the various modalities in a format suitable for statistical analysis.

Data puzzle in life sciences

Figure 1. Data puzzle in life sciences

One approach to the problem would be to follow separate analysis paths for each modality and to integrate the results only at later stages. By analysing the digital slides, one could obtain, for example, a summarising description showing the distribution of several predefined parameters (e.g. describing the nuclear morphology and the spatial distribution of tumour cells, etc) which can later be correlated with clinical and molecular data.

Such an approach is used in Quantitative Image Analysis of Cellular Heterogeneity in Breast Tumors Complements Genomic Profiling1 to explore the heterogeneity of breast tumours and clearly demonstrates the benefits of exploiting the complementarity between pathology imaging and molecular profiling. The advantage of this approach is that the image features (and the discovered correlations) may be easily interpreted since they can be related to known pathology parameters.

An alternative direction of investigation is to use a less (pathologist-)supervised approach, in which the image data is summarised in terms of some generic image features and eventual correlations with clinical outcome or molecular profiles are mined for afterwards. This avenue of investigation has the potential advantage of uncovering novel image features of importance for diagnostic previously unknown, with the risk of possibly reduced interpretability of the results.

We have started to explore this second paths research, focusing on colon and breast cancer data. For example, in Joint analysis of histopathology image features and gene expression in breast cancer 2 such a joint analysis of image and molecular data led to the development of a combined image+gene expression risk score for breast cancer. In that case, most of the image features could be recognised to represent various proliferation patterns – a known marker for high-risk breast tumours.

A similar approach was taken in Image-based surrogate biomarkers for molecular subtypes of colorectal cancer3, where the objective was to establish links between molecular subtypes of colon cancer and histopathology images. The fact that an image-based classifier could reproduce with a high degree of accuracy the discoveries from transcriptomics data is remarkable and proves that there are many connections between these modalities that await their discovery. See here for some examples.

These are only a few results from what seems to be an emerging new direction of research which bridges the digital pathology and bioinformatics. The importance of the cross-talk between the two fields has begun to be recognised. Clearly, there is a need for the two research communities to better communicate and integrate their approaches. And this integration should start from the early experiment design stages to ensure that the data produced is most efficiently used and the number of questions that could be addressed is maximised. Questions like tissue sampling strategies – both for histopathology and molecular profiling (for example) – must be addressed from the beginning to support meaningful joint analyses of these modalities.

Given the pace at which both fields advance, the future is bright. And challenging.

Vlad Popovici260
Vlad Popovici is Assistant Professor at Masaryk University. He will be speaking at the 3rd Digital Pathology Congress: Asia Pacific on Gene Expression to Tissue Architecture and Back.


Keep updated with all the latest news in digital pathology. Sign up here.


  1. Yuan, Y. et al. Quantitative Image Analysis of Cellular Heterogeneity in Breast Tumors Complements Genomic Profiling. Science Translational Medicine 4: 157ra143, (2012).
  2. Popovici, V. et al. Joint analysis of histopathology image features and gene expression in breast cancer. BMC Bioinformatics 17, 209 (2016).
  3. Popovici, V., Budinská, E., Dušek, L., Kozubek, M. & Bosman, F. Image-based surrogate biomarkers for molecular subtypes of colorectal cancer. Bioinformatics (2017). doi:10.1093/bioinformatics/btx027

Leave a Reply

Subscribe to Our Newsletter

Get free reports and resources from our world class speakers.
  • This field is for validation purposes and should be left unchanged.