UK +44 (0)1865 849841
Malaysia +603 2731 9335

What’s Driving Change in Drug Discovery and Development?

Machine learning, AI and big data

The past decade has witnessed huge advances in both scientific research and technological innovations within the drug discovery market. Here Dr James Willans, Chief Technology Officer of the not-for-profit organisation and educational charity, Lhasa Limited, discusses two significant areas that have the potential to further transform the industry over the coming years.

Artificial Intelligence (AI) and Machine Learning from Big Data

According to approximations by the Association of the British Pharmaceutical Industry, it takes 12 years and costs an average of £1.15bn for a drug to be brought to market. As costs continue to increase, AI has the potential to increase efficiency – for example, by more effectively learning from the existing body of knowledge and data than any human could hope to do unaided.

The distinction between AI and machine learning is often poorly made – it is best to think of AI as the ability for machines to emulate human decision making and machine learning as an enabling technology that helps fulfil that ambition. Machine learning involves computers discovering patterns by being ‘fed’ data to build models.

At present, machine learning requires data of a minimum size and consistency to produce reliable results. Unfortunately, in many cases, there are simply too few well-structured and consistent data sets to allow machine learning to work outside narrow areas of chemical space. However, new methods are becoming increasingly effective at learning from much larger sets of unstructured data to build statistically-based models, which an expert can then verify as being scientifically relevant. Experts can use these as predictive models in their own right, or as a spur to further knowledge discovery – essentially trust or verify!

Acceptance of In Silico Predictions

In silico predictions can offer many benefits over other testing methods in addition to a desirable reduction in animal testing. In silico predictions are often cheaper to conduct (both in time and money), and they can also be more reproducible and relevant than other methods. The lack of reproducibility of wet assays is a significant cost, since it drives the need for replicate studies and any in vitro or animal in vivo model is only a surrogate for human toxicity. It may not accurately predict the effects that would be seen in man.

This latter challenge also applies to in silico models and is one of the most difficult questions to answer. A large step to defining what is needed has been captured by the OECD’s five principles for the validation of (Q)SAR models, but our experience within Lhasa suggests that these are still not sufficient and demand a shift in thinking from ‘when can the model be used?’ to ‘can I trust this specific prediction?’.

In our experience, users require:

  • A biological (mechanistic) explanation of the model
  • An explanation of how the predictions have been derived covering the algorithm used or the rules invoked, including any training sets and assumptions about how predicted values can be modelled
  • The model’s historical performance using both internal and external validation sets to show how well the model can capture the endpoint
  • A measure of confidence in the case of any specific prediction – how likely is the model to give the right answer given appropriate measures of similarity to analogues known to the model (builder) and the consistency with which they show a similar outcome?
  • A sufficient quantity of transparent supporting data and knowledge that allows an expert to review and decide when to accept or overturn the model’s prediction

The level of transparency and accuracy will depend upon the specific decision being made – prioritisation decisions will have much lower thresholds than ‘regulatory decisions’ – by regulators, that are the final arbiters, before permitting exposure to humans. Sufficient support is then required to both make and defend a decision in order to minimise human risk. If in silico predictions are to replace either in vitro or in vivo experiments, then the accuracy of a negative prediction is crucial. If a negative prediction is to be accepted, then some means of understanding the risk of missing potential new routes to toxicity must be clear.

So far, in silico predictions have been accepted by regulators for the prediction of genotoxic impurities under ICH M7 guidelines, and it is expected that other endpoints will follow – currently skin sensitisation is close, with in silico models being able to suggest which assay or combination of assays can be used in lieu of animal testing.

As the application and acceptance of in silico models increases, then the ability to make earlier decisions about the potential to progress a compound can take place efficiently.



James Willans will be presenting his observations on the pharmaceutical industry’s convergence towards cloud computing at the Global Pharma R&D Informatics Congress.

Download the agenda to read the full abstract.

2 Responses to “What’s Driving Change in Drug Discovery and Development?”

  1. It was interesting to learn that machine learning requires data of a minimum size and consistency to produce reliable results. It seems like in addition to machine learning, it would be really important to have people working hands-on with the drugs to be able to see successful results. I wonder what kind of experience is needed to be able to work in that type of field.

  2. It was really interesting to learn that AI is the ability for computers to emulate human decision making. I imagine that it would be really helpful for someone who is trying to develop drugs to have access to AI. It is really interesting to think about what it takes to develop a drug that can help lots of people.


Leave a Reply

Subscribe to Our Newsletter

Get free reports and resources from our world class speakers.
  • This field is for validation purposes and should be left unchanged.