August 2018: Deep Learning for Medical Information Extraction

Our August session featured Pengtao Xie, PhD, who has recently completed his PhD from the Machine Learning Department at Carnegie Mellon University. Director of Data Services and Solutions and Research Scientist at Petuum,Inc.He presented his team’s work titled, “Deep Learning for Medical Information Extraction”.

We all learnt from the complexity of machine learning models and how they can be used to solve some of the real world electronic health record data problems with a high degree of accuracy.

Summary of the discussion from the 2 papers(in press) he discussed is below:

1) Named Entity Recognition (NER).

Title: “Effective use of bidirectional language modeling for biomedical named entity recognition”

Background: There is an increased need for text mining in the biomedical field due to the rapid increase in the number of publications, scientific articles, reports, medical records, etc. that are available and readily accessible in electronic format. To transform unstructured collections of medical text into structured information and link them, information extraction systems must accurately identify different biomedical entities such as chemical ingredients, genes, proteins, medications, diseases, symptoms, etc. The task of identification and tagging of such entities in text as members of predefined categories such as diseases, chemicals, genes, etc. is referred to as NER. Designing an NER system with high precision and recall for the biomedical domain is a very challenging task due to the limited availability of high-quality labeled data and the linguistic variation of that data that includes ambiguous abbreviations, non-standardized descriptions, and lengthened names of entities. An NER system can be devised as a supervised ML task in which the training data consists of labels for each token in a text.

2) Relation Extraction (RE).

Title: “Relation Extraction of medical entities and attributes on Electronic Medical Records”.

This paper presents the findings of the development and testing of a deep learning model developed for medical relation extraction in clinical notes. The novel and effective deep learning approach automatically extracts relation for medical entities from EHRs. Specifically, a CNN-based model is used, which captures both salient syntactic feature and latent semantic feature from the text descriptions, despite their differences in language style. The model was evaluated on a real-patient dataset and achieved better performance than existing baselines on the tasks of extracting relations and deciding negations. It also shows significant potential in helping doctors in downstream tasks.

Petuum has been named as a 2018 Technology Pioneer by the World Economic Forum with a significant focus and leadership in healthcare.

Link to Petuum’s healthcare research and publications:

https://www.petuum.com/healthcare.html