UK +44 (0)1865 849841
Malaysia +603 2731 9335

Are smaller biobanks still relevant in the era of 500,000 plus cohorts?


Shona Kerr, MRC Human Genetics Unit, writes:

Large-scale national cohorts and biobanks, linked to detailed genomic, phenotypic and clinical data, are currently being created and developed through investment from a large number of governments worldwide. These data-rich resources are exemplified by the 500,000 research volunteers in the UK Biobank, a cohort established primarily to investigate the genetic and lifestyle determinants of middle and later life diseases. Now it is accessible by all health researchers worldwide and the UK Biobank is supporting an unprecedented range of novel insights into the biology of disease, from varicose veins to lifespan.

With genomic medicine increasingly becoming a key part of routine healthcare, it is predicted that more than 60 million people will have their genome sequenced in a healthcare context by 2025 [1]. The question is: do smaller research biobanks still have a role to play in this era of big data?

Most biobanks created before or around the time of completion of the Human Genome Project in 2003 are of modest size. From now on, there may be little appetite from investors, or scientific justification, for the creation of new small biobanks, but it can be argued that if a biobank already exists, it may still be of considerable value.

There are several factors that should be taken into consideration when assessing the relevance of an existing biobank. Firstly, does the resource have the capacity to answer specific research questions or are more samples and/or data required to achieve sufficient statistical power? Is the informed consent that was taken from the participants still fit for purpose? Ideally, it should be broad and enduring and include consent for genomics and commercial use. In addition, a mechanism for linkage to routine electronic health record data, which allows the passive, lifelong follow-up of the health trajectories of participants and facilitates the opportunity to request additional consent should be in place.

The samples in the biobank must be accessible through responsible and responsive governance mechanisms, such as the “FAIR” principles (Findable, Accessible, Interoperable and Reusable [2]). Efforts are underway, particularly across Europe, to make biobanks more easily accessible. For example, the central hub of the UKCRC Tissue Directory and Coordination Centre acts as the UK node of the European-wide Biobanking and Biomolecular Resources Research Infrastructure European Research Infrastructure Consortium (BBMRI-ERIC) network. [2]

The stored samples and data must also be high quality. Smaller biobanks could be more agile in proLaboratory viding samples from selected participants than larger initiatives that may permit analysis only of samples from the entire cohort. This policy is designed to avoid piecemeal depletion of finite materials, but smaller biobanks are more likely to be able to support smaller scale, but still well powered, biomarker and “omics” studies.

New statistical approaches, such as two-step Mendelian Randomisation, can allow measures made in samples from a small biobank, such as plasma protein levels, leading to derivation of “pQTLs,” to be investigated for causality in a larger biobank that has genetic data, but lacks the protein measures.

The level of difficulty involved in obtaining certain samples, especially ones that are considered rare, should be considered. For example, carefully stored tumour samples collected in a longitudinal sequence of disease progression could be essential for certain research projects, but are difficult and time-consuming to obtain.

Lastly, does the biobank contain data from unique populations? Samples from large multi-generational families, special populations such as geographical isolates, or rare disease cohorts, may be of immense value for specific research projects.

Relevant opportunities that are less likely to be available in new large cohorts include the provision of samples in a longitudinal series, samples collected before onset of incident disease for predictive biomarker validation, and recall–by-genotype study designs allowing detailed clinic measurements of selected volunteer participants.

It seems appropriate for smaller biobanks to increase their visibility by registering in national and international online catalogues such as the BBMRI-ERIC Directory [3], to make their data and samples accessible and curated to high standards, and to focus on their strengths including those outlined above.

New prospective biobanks containing samples and data from at least half a million people are providing unparalleled research opportunities across all aspects of biomedicine. As a result, the investment of resources to sustain smaller biobanks may no longer be justifiable, although closure of a biobank can itself be a complex and challenging process. But if robust data and sample access processes are in place, many existing small biobanks can still have an important role in research and development in the life sciences.


Shona Kerr is Project Manager of the QTL in Health and Disease group at the MRC Human Genetics Unit. She is also Associate Director of the Edinburgh CRF Genetics Core.


Find out more about our upcoming conferences here.

See the full reference list here.

Leave a Reply

Subscribe to Our Newsletter

Get free reports and resources from our world class speakers.
  • This field is for validation purposes and should be left unchanged.