AI in drug discovery: can machine learning live up to the industry hype?

AI projects work better when multidisciplinary teams come together, says Wilson. Credit: courtesy of Elsevier

Wherever you look in today鈥檚 rapidly evolving pharma industry, the shadow of artificial intelligence (AI) isn鈥檛 far away.

The high failure rate for experimental drugs, coupled with the sheer cost and time commitment required to back a drug candidate through R&D and commercialisation, makes the promise of AI all the more enticing to drug developers, particularly in data-heavy applications such as drug discovery.

AI brings with it the prospect of using complex machine learning algorithms to screen for disease targets and drug candidates with a speed and accuracy that would be impossible for human researchers, potentially saving pharma and biotech firms billions in drug development

The AI hype bubble

The attractiveness of the proposition has been borne out in the stacks of pharma and biotech investment that has been flowing towards tech and machine learning-focused start-ups in the last few years. From Merck鈥檚 AI partnerships with Numerate and Atomwise to GlaxoSmithKline鈥檚 $43m collaboration with Exscientia and the rise of AI-centric scientific innovators such as BenevolentAI, pharma AI has become a lucrative business, even before substantial evidence of its impact on drug discovery has been fully explored.

As with any exciting up-and-coming technology, AI in pharma has been prone to overhype, with the complex realities of using machine learning models in the drug development process still unable to compete with the extravagant promises coming from the tech world.

鈥淎 little knowledge is a dangerous thing,鈥� says Elsevier鈥檚 consulting director of text and data analytics Jabe Wilson, a 30-year veteran in the AI field. 鈥淚 think some of the generic AI systems have not really reached their potential in some cases. I know some stories about pharma companies that have worked with different platforms, which have then found out they鈥檝e had to do a great deal of work in curating the information themselves to feed into the platform.

How well do you really know your competitors?

Access the most comprehensive Company Profiles on the market, powered by GlobalData. Save hours of research. Gain competitive edge.

Company Profile 鈥� free sample

Thank you!

Your download email will arrive shortly

Not ready to buy yet? Download a free sample

We are confident about the unique quality of our Company Profiles. However, we want you to make the most beneficial decision for your business, so we offer a free sample that you can download by submitting the below form

By GlobalData

Tick here to opt out of curated industry news, reports, and event updates from Pharmaceutical Technology.

Visit our for more information about our services, how we may use, process and share your personal data, including information of your rights in respect of your personal data and how you can unsubscribe from future marketing communications. Our services are intended for corporate subscribers and you warrant that the email address submitted is your corporate email address.

鈥淭here鈥檚 a lot of hype being talked in the business literature about AI tools. They certainly have potential to speed up the performance of looking for patents, of sifting vast amounts of data. There鈥檚 the potential there, and then where that hype meets the road is when you have to put teams together to really create and tune the tools for the context and the use case. That鈥檚 where there鈥檚 been a challenge.鈥�

Pharma AI: no free lunch

In recent months, there has been an increasing tendency towards events in healthcare AI that prick the hype bubble. At the beginning of 2018, mathematician and founder of AI expert network Startcrowd Mostapha Benhenda that criticised overhype surrounding AI systems for drug discovery, arguing that 鈥減retty often, AI researchers overhype their achievements, to say the least鈥�. The piece presented examples of 鈥渙verhyped鈥� AI research from , Harvard and Stanford universities and Insilico Medicine, all of which, he argued, contained flaws limiting their impact for drug discovery.

And then, of course, there was the high-profile failure of IBM鈥檚 Watson for Oncology application, a cognitive computing cloud platform designed to sift through patient data and medical studies to provide treatment recommendations for cancer patients. As initially in July 2018, internal IBM documents revealed that the system had a tendency to return 鈥渦nsafe and incorrect treatment recommendations鈥�. The brunt of the blame for the failures was placed on the raw data fed to Watson for training purposes, which included hypothetical patient data rather than real-world cases.

鈥淚 think that鈥檚 the critical piece,鈥� says Wilson. 鈥淲here these generalist systems can fail is not having the components. It could be not having the dictionaries and the ontologies necessary to exert the semantic data that you need, or not having enough of the content to process through those ontologies to get your semantic data. We鈥檙e [also] very keen to try and help our customers be aware of the bias in terms of the data that鈥檚 input to the models. That鈥檚 really critical, because it鈥檚 one thing to have bad data leading to you not being able to make predictions. But the worst thing is that you can end up with biased data that leads you to make biased predictions that negatively impact certain populations.鈥�

It鈥檚 an issue that brings to mind Wolpert and Macready鈥檚鈥榥o free lunch鈥� theorem in machine learning, which states that 鈥渁ny two optimisation algorithms are equivalent when their performance is averaged across all possible problems鈥� 鈥� in other words, no general AI system 鈥� like IBM鈥檚 Watson 鈥� offers a short cut to solve all problems, and will be outperformed by models designed specifically for a specialist purpose.

鈥淥ne of the things I鈥檓 hearing more and more as I speak to people in the industry is that an important aspect of working with machine learning models is choosing the right architecture, choosing the right type of machine learning model, as well as the training data,鈥� Wilson says. 鈥淧eople are interested in whether their partners 鈥� the suppliers or whoever 鈥� can help them choose the right machine learning model for their particular problem.鈥�

The human element in the AI system

These sorts of issues have been in the front of Elsevier researchers鈥� minds as they developed the company鈥檚 own Entellect system, a cloud-based data platform launched this year, designed to bring together clinical data from thousands of unstructured sources before adding context and connecting drug, target and disease data to give AI-enabled research teams a leg up in drug discovery and R&D.

鈥淸Entellect] comes as logical outcome from our heritage,鈥� says Wilson. 鈥淲e鈥檝e been creating these databases for a long time, and working with customers on using them as tools. We created this tool for our own products and professional services, and that is then something that we can make available to our customers in the life sciences process.鈥�

By focusing the Entellect project on open design and data curation and governance, Elsevier is hoping to empower clinical research teams to make their own decisions on what data to trust and how best to move forward. This emphasises the importance of getting dedicated data scientists together with pharma subject experts to achieve what Wilson calls 鈥渋nformed, subject-focused outcomes鈥�.

鈥淲e鈥檙e not expecting these AI systems to be able to replace people,鈥� he says. 鈥淵ou really need these systems within the context of a workflow. You need biologists, pharmacologists, and to bring those together with the data scientists. I like the analogy of Lego blocks; you鈥檙e building this system, this toy, and you need the data to plug together, you need the people to plug together, to get this system that you can then answer questions with.鈥�

It鈥檚 a philosophy that has driven Elsevier鈥檚 recent 鈥榙ata-thon鈥�, which brought together data scientists, pharma groups and clinical consultants to work on drug repurposing opportunities for treating chronic pancreatitis, a rare inflammatory condition that affects an estimated five to 12 people per 100,000 in developed countries. The data-thon allowed the data scientists to use their most favoured tools 鈥� from the Jupiter Hub notebook to programming languages such as Python and R 鈥� to develop functions based on the data, with pharma experts on hand to advise on relevance.

鈥淲hat鈥檚 been so lovely about this data-thon is seeing people come together, sparking each other鈥檚 insights and interests, and being able to work together on these machine learning models,鈥� Wilson enthuses. 鈥淏y the end of the year we鈥檒l have the outcomes validated, and then potentially with our partners, we might even be able to move to some clinical trials to see where we can build on the value. You鈥檙e using predictive tools to create new knowledge. It鈥檚 like an art form, so you need the subject matter experts and the data scientists working together on platforms like Entellect, which has all the Lego blocks that you can then use to build your predictive tool.鈥�

AI in drug discovery: tools for the future

For all the hype in the industry, it鈥檚 clear we鈥檙e still a long way off from achieving the incredible potential that AI offers to pharma R&D and drug discovery. The way forward can be achieved by moving 鈥渜uickly but carefully鈥� 鈥� as espoused by Benhenda 鈥� investing in the right machine learning models to solve particular problems and building the interdisciplinary teams necessary to validate and make the most of the data.

Data discipline will also be incredibly important as the pharma industry builds its R&D tools for the future. Wilson believes data auditing 鈥� essentially checking the workings of a given machine learning model 鈥� will become increasingly vital.

鈥淚 think in the future we鈥檙e going to see that if a machine learning model has been used in defining an outcome 鈥� like a drug and a treatment 鈥� then it will be necessary to audit that model to ask, 鈥楬ow did you come up with this answer?鈥� You can do that by circling back to the other known scientific data. So I think that鈥檚 certainly one of the avenues that we鈥檙e looking at very actively right now.鈥�

High-profile failures and a certain scepticism around the reliability of AI-generated drug discovery conclusions may have scuffed pharma AI鈥檚 gleaming reputation, but any damage caused is superficial. Buffing it out will involve a relentless focus on putting the right algorithm in the right hands, and for the right application.

鈥淭here鈥檚 a lot of hype, and these generic systems potentially are not able to deliver if they don鈥檛 have the background and the insights baked into them,鈥� says Wilson. 鈥淏ut when you do have that, there really is huge opportunity there.鈥�

Not so elementary, Watson: the roadblocks for AI in pharma

Go deeper with GlobalData

Data Insights

The AI hype bubble

How well do you really know your competitors?

Thank you!

Not ready to buy yet? Download a free sample

Pharma AI: no free lunch

The human element in the AI system

AI in drug discovery: tools for the future

Data Insights

Sign up for our daily news round-up!

Sign up to the newsletter: In Brief

网曝门

Go deeper with GlobalData

Data Insights

The AI hype bubble

How well do you really know your competitors?

Thank you!

Not ready to buy yet? Download a free sample

Pharma AI: no free lunch

The human element in the AI system

AI in drug discovery: tools for the future

Sign up for our daily news round-up!

Give your business an edge with our leading industry insights.

Go deeper with GlobalData

Data Insights

Sign up for our daily news round-up!

Sign up to the newsletter: In Brief

I would also like to subscribe to:

Thank you for subscribing