Using Machine Learning and Natural Language Processing to Accelerate Research

  • March 5th, 2020

Key Opinion Leaders (KOL) can catalyze the capital-intensive drug discovery and development process immensely. Not only can their publications help accelerate development and discovery, but they can also serve as a conduit of information between pharma companies and consumers.
They are a valuable source of consumer signals and insights on market shifts and are trusted representatives of the pharma community to spread awareness about available treatment options to consumers.

However, KOLs often publish in their regional languages and in different databases, making it difficult to identify these KOLs their publications. Fortunately, the right problem-solving mindset and advanced analytical toolset can be combined to overcome linguistic barriers and catalyze drug development.

The Problem

The R&D branch of our partner, a leading pharma company was looking for a way to search across both English and Japanese databases for specific KOLs. With no tool in place, the company solely relied on searches in English databases to find these leaders.

They regularly pull publications on different medical topics to find Key Opinion Leaders for company events and conventions. These publications are published sometimes by the same author but in different languages and regional databases.

A system was needed that can pull related publications in different languages based on search strategies deployed in English searches and eliminate the need for subject matter expert intervention.

The Mu Sigma Approach

We utilized Machine Learning and Natural Language Processing to create a single-point search that automates and enhances the search strategy across databases in both languages

We developed an intelligent solution that would save time in both search strategy optimization and language translation. This involved creating a search method that was superimposed on top of an existing search engine to pull related articles in Japanese from an English search.

The Solution

Our Process for creating the solution:

•   We started by pulling the Universal Medical Language System Meta thesaurus and created a word embedding for both English and Japanese. Word embedding is a technique where individual words are represented as real-valued vectors, by grouping similar terms based on a concept with a unique ID. This introduced context in the text

•   We used Multi-lingual Unsupervised and Supervised Embeddings (MUSE) to align all the embeddings using a bilingual dictionary

•   Finally, with all mappings and embeddings aligned, we implemented the nearest neighbor algorithm to fetch the closest results. This enabled us to query in English and obtain equivalent Japanese terms and terms closest to it

Techniques Used

•   Natural Language Processing – Using the technique of word embedding we leveraged medical context from text data. This is frequently used
in document classification, machine translation, recommendation systems, etc.

•   MUSE – MUSE is a technique used for cross-lingual mapping for word embeddings

•   K-Nearest Neighbor – By implementing the machine learning technique of K-Nearest Neighbor into the search engine, we were able to fetch
words closest to the word/phrase provided as input in both English and Japanese

The Output

By combining Natural Language Processing and Machine Learning techniques, Mu Sigma was able to create a system that

•   Solved the presented problem
•   Created an intelligent solution that enhances the inserted search strategy automatically
•   Unlike generic solutions, considered more factors than just matching terms to fetch closest results to the queries
•   Was ready to be scaled to more languages than English and Japanese

The Impact

This system resulted in significant cost savings for the R&D department

•   A 17% increase in the number of KOLs identified for a specific disease area
•   Reduced effort in building Japanese search strategies
•   Lower costs by reducing the dependency on Subject Matter Experts

This intuitive search scaled to key languages will enable this pharma leader to collate research rapidly through an intuitive English interface, lowering response time.

Click here to understand how we identified physicians at risk for a large pharmaceutical manufacturer