AI Powered Automatic Medical Coding of Electronic Health Records | Robert Bosch Centre for Data Science and Artificial Intelligence

--Sudarsun Santhiappan Jeshuren Chelladurai--

Medical coding (codifying the medical records) has essentially remained a human powered job since its inception. BUDDI.AI aims to change that by partially automating the process using AI technology. To know more about BUDDI.AI, a talk on “AI Powered Automatic Medical Coding of Electronic Health Records’’ was organized as a part of second RBCDSAI AImpact Seminar on 12th August 2021. The talk was delivered by Sudarsun Santhiappan (Co-founder of BUDDI.AI) and Jeshuren Chelladurai (Research scientist at BUDDI.AI).

Sudarsun commenced the talk by explaining about medical records and medical coding. Medical records, he explained, are printed or handwritten documents that have a history of a patient’s illness, diagnosis, medications, surgical procedures etc. and medical coding is a task where the medical records are codified to simplify and summarize medical records which otherwise are lengthy and run over various pages. He further said that these medical codes are important for physicians as well as insurance companies which require it while settling the insurance claims but medical coding is a tedious human powered process and automating the process even partially can help the insurance companies in settling the claims in the stipulated time period of 3-6 months which was the main motivation behind developing BUDDI.AI tool. Talking about the major challenges in developing an AI based tool for medical coding, he explained that there is lack of explainability, large label space (large number of medical codes) and difficulty in getting the large amounts of labelled data for AI research due to privacy issues. He said that to automate the process of medical coding, the team decided to mimic the work of the medical coders by understanding the portions they specifically look in the medical records while doing the medical coding. They also came to know about information retrieval tools used by medical coders which help them associate medical entities to relevant ICD codes. Sudarshun pointed out that BUDDI.AI was conceived as a hybrid system where majority of the medical coding is done by the statistical machine learning model and a sizable chunk of the task is done through human intellect ensuring that both scalability and accuracy are achieved in results.
Further on, Jeshuran talked about nitty gritties of the tool while explaining the pipeline used by the tool. He explained that in the first step a contextualized query and named entities are retrieved from the medical records using GrabQC function. Next, the extracted queries and entities are used for generating a contextual graph. To find relevant information from this contextual graph, it is given as an input for a graph neural network model to automatically predict relevant and non-relevant nodes. Next, relevant nodes are concatenated to form a contextualized query which is passed on to the IR system to predict the ICD codes.

Next, Jeshuran said that various experiments related to choice of graph neural network, query level comparison and machine learning model comparisons were carried out as well. He concluded the talk by mentioning that in future, the team plans to work on using reinforcement learning for automatically contextualizing the query without need for relevance labels and extend hard attention for DL methods.

The video is available on our YouTube channel: Link.