What’s OpenBioLLM-70B? A Breakthrough in Medical AI


The sector of medical AI has witnessed exceptional developments in recent times, with the event of highly effective language fashions and datasets driving progress. On this article, we are going to discover the journey of MedMCQA, a groundbreaking medical question-answering dataset, and its position in shaping the panorama of medical AI. We’ll look at the challenges confronted throughout its publication, its impression on the analysis neighborhood, and the way it paved the way in which for the event of OpenBioLLM-70B, a state-of-the-art biomedical language mannequin that has surpassed trade giants equivalent to GPT-4, Gemini, Med-PaLM-1, Med-PaLM-2, and Meditron in efficiency.

OpenBioLLM-70B: India's Contribution to Cutting-Edge Biomedical Language Models

The Genesis of MedMCQA

Our thought for creating medical language fashions originated in 2020, drawing inspiration from the widely-used fashions BlueBERT and BioBERT.

BioBERT pre-trained biomedical language model

Upon analyzing the datasets used for coaching and fine-tuning in these papers, I observed that they lacked range. They principally consisted of PubMed articles and relation-mentioned paperwork. This commentary led me to understand the necessity for a complete and numerous dataset for the medical AI neighborhood.

BlueBERT medical model

Motivated by this objective, I began engaged on a dataset that may later be revealed below the title MedMCQA. The MedMCQA paper incorporates a set of questions and solutions from the Indian medical area, sourced from NEET and AIIMS exams, in addition to mock questions. By curating this dataset, we aimed to supply a precious useful resource for researchers and builders engaged on medical AI applications. The concept was to allow them to coach and consider fashions on a variety of difficult medical questions. The event of MedMCQA marked the start of our journey in the direction of creating medical language fashions.

Challenges and Perseverance: The Journey to Publication

Curiously, the journey of MedMCQA was not with out its challenges. Regardless of being thoughtfully written in 2021, the paper confronted quite a few rejections from prime NLP conferences throughout the peer assessment course of. As virtually a yr handed with out the paper being accepted for publication, I started to really feel nervous and uncertain concerning the high quality of our work. At one level, I even thought of abandoning the concept of publishing this paper altogether. Nevertheless, considered one of my co-authors instructed giving it a last try by submitting it to an ACM convention. With renewed dedication, we determined to take this final shot and submit our work to the convention.

After the paper’s acceptance, it began gaining vital recognition inside the medical AI neighborhood. Progressively, MedMCQA turned the biggest medical question-answering dataset accessible. Researchers and builders from numerous organizations began incorporating it into their language mannequin use instances. Notable examples embrace Meta, which used MedMCQA for pre-training and evaluating their Galactica mannequin. In the meantime, Google utilized the dataset within the pre-training and analysis of their state-of-the-art medical language fashions, Med-PaLM-1 and Med-PaLM-2. Moreover, the OpenAI and Microsoft official paper on ChatGPT-4 additionally employed MedMCQA to guage the mannequin’s efficiency on medical functions.

MedMCQA Research Paper

Within the Med-PaLM paper, which showcases Google’s greatest medical mannequin, a better take a look at the datasets utilized in pretraining reveals that our Indian dataset, MedMCQA, made the of the biggest contribution among the many medical datasets used. This highlights the numerous impression of Indian analysis labs within the discipline of large language models (LLMs) and underscores the significance of our work in advancing medical AI analysis on a worldwide scale.

Instruction finetuning data mixture

The Start of an Thought: Specialised BERT Models for Medical Domains

Within the MedMCQA paper, we introduced subject-wise accuracy for the primary time within the medical AI discipline, offering a complete analysis throughout roughly 20 medical topics taught throughout the preparation for NEET and AIIMS exams in India. This strategy ensured that the dataset was numerous and consultant of the varied disciplines inside the medical area. Moreover, we examined quite a few open-ended medical question-answering fashions and revealed the ends in the paper, establishing a benchmark for future analysis.

Whereas analyzing the subject-wise accuracy, I had an intriguing thought: since no single mannequin may obtain the best accuracy throughout all medical topics, why not construct separate fashions and embeddings for every topic? At the moment, I used to be working with BERT, as giant language fashions (LLMs) weren’t but extensively fashionable. This concept led me to contemplate creating specialised BERT fashions for various medical domains, equivalent to BERT-Radiology, BERT-Biochemistry, BERT-Drugs, BERT-Surgical procedure, and so forth.

Fine-grained evluation per subject
Supply: https://proceedings.mlr.press/v174/pal22a.html

Information Assortment and the Evolution from BERT to OpenBioLLM-70B

To pursue this concept, I wanted datasets particular to every medical topic, which marked the start of my information assortment journey. Though the info assortment efforts commenced in 2021, the preliminary plan was to create specialised BERT fashions for every area. Nevertheless, because the venture advanced and LLMs gained prominence, the collected information was in the end used to fine-tune the Llama-3 mannequin. This later turned the muse for OpenBioLLM-70B. Within the growth of OpenBioLLM-70B, we utilized two forms of datasets: instruct information and DPO (Direct Choice Optimization) datasets.

To generate a portion of the instruct dataset, we collaborated with medical college students who offered precious insights and contributions. We then used this preliminary dataset to generate extra artificial datasets for fine-tuning the mannequin. This helped broaden the coaching information and enhance its efficiency.

Instruction Dataset from Medical Students

For the DPO dataset, we employed a novel strategy to make sure the standard and relevance of the mannequin’s responses. We generated 4 responses from the mannequin for every enter and introduced them to the medical college students for analysis. The scholars have been then requested to pick the perfect response based mostly on their inter-annotation settlement. This helped us determine essentially the most correct and applicable solutions.

To mitigate potential biases within the choice course of, we launched a randomness issue by randomly sampling roughly 20 samples and swapping their labels from chosen to rejected and vice versa. This method helped steadiness the dataset and forestall the consultants from being overly biased in the direction of their preliminary selections.

As we proceed to refine OpenBioLLM-70B, we’re actively exploring extra strategies to additional align the mannequin with human preferences. We’re additionally engaged on enhancing the mannequin and bettering its efficiency. A few of the ongoing experiments embrace multi-turn dialogue DPO settings.

High-quality-tuning Llama-3: The Making of OpenBioLLM-70B

Earlier than the discharge of Llama-3, I had already began engaged on fine-tuning different fashions, equivalent to Mistral-7B and a few others. Surprisingly, the fine-tuned Starling mannequin confirmed the perfect accuracy in comparison with the opposite fashions, even outperforming GPT-3.5. We have been thrilled with the outcomes and deliberate to launch the fashions to the general public.

Nevertheless, simply as we have been about to launch the Starling mannequin, we realized that Llama-3 was scheduled to be launched on the identical day. Given the potential impression of Llama-3, we determined to postpone our launch and look ahead to the Llama-3 mannequin to turn into accessible. As quickly as Llama-3 was launched, I wasted no time in evaluating its efficiency within the medical area. Inside simply 15 minutes of its release, I had already begun testing the mannequin. Drawing from our earlier expertise and the datasets we had ready, I rapidly moved on to fine-tuning Llama-3. For this we used the identical information and hyperparameters we had used for the Starling mannequin.

OpenBioLLM-70B: India's biggest advancement in biomedical language models

Surpassing Trade Giants: OpenBioLLM-70B’s Groundbreaking Efficiency

The outcomes have been astounding. The fine-tuned Llama-3 8B mannequin delivered exceptional efficiency, surpassing our expectations. The mixture of the highly effective Llama-3 structure and our rigorously curated medical datasets proved to be a profitable components. It set the stage for the event of OpenBioLLM-70B.

Excited by the spectacular efficiency of the 8B mannequin, I satisfied my supervisor to push the boundaries and work on the 70B mannequin. Though it was not initially a part of our deliberate experiments, the distinctive accuracy we noticed motivated us to discover the potential of a bigger mannequin. We rapidly ready the setting to fine-tune the 70B mannequin, which required the usage of 8 x 80 H100 GPUs. The fine-tuning course of was computationally intensive, however as soon as it was accomplished, we eagerly evaluated the mannequin’s efficiency. To our astonishment, the outcomes have been past our wildest expectations. At first, we couldn’t consider what we have been seeing! Our fine-tuned Llama-3 70B mannequin was outperforming GPT-4 on numerous biomedical benchmarks.

This groundbreaking achievement marked a major milestone in our journey to develop OpenBioLLM-70B.

Comparison of Performance Scores of Large Language Models on Diverse Medical Benchmarks.

Reassuring Our Belief

I bear in mind the thrill of sharing updates with my supervisor as our fashions continued to surpass the efficiency of trade giants. First, we had the Starling mannequin beating GPT-3.5, then we outperformed Med-PaLM, and eventually, we surpassed Gemini. The second of fact arrived after I despatched a message to my supervisor, asserting that our mannequin had overwhelmed GPT-4. It was a declare so daring that none of us may consider it at first.

We rapidly organized a gathering in the midst of the evening, as I typically labored late hours. My supervisor congratulated me and urged me to confirm the outcomes a number of occasions to make sure their accuracy. Regardless of the audacity of the declare, we rigorously evaluated the mannequin’s efficiency a number of occasions. The outcomes confirmed that we had certainly surpassed GPT-4, Gemini, Med-PaLM-1, Med-PaLM-2, Meditron, and some other mannequin accessible worldwide at the moment.

OpenBioLLM-70B had established itself because the best-performing biomedical language mannequin in existence.

Subject-wise Accuracy of OpenBioLLM-70B

We shared the information on Twitter, and the submit went viral. It was a collection of firsts for a lot of issues. OpenBioLLM-70B was the primary mannequin to outperform GPT-4 and the primary healthcare mannequin to realize such widespread recognition. Most significantly, it was the primary Indian mannequin to development among the many prime 10 world’s greatest fashions on Hugging Face. This was an inventory that included trade giants like Apple, Microsoft, and Meta.

A Serendipitous Encounter: Validating OpenBioLLM with Neurologists

On the identical day that we achieved this milestone, I had an fascinating encounter whereas touring from Chennai to Dehradun. In the course of the flight, I met two women who requested for assist with their iPhone digital camera, a subject I wasn’t notably acquainted with. Nevertheless, seeing their want for help, I made a decision to attempt one thing distinctive. Since we have been within the aircraft and there was no web so I took out my MacBook and loaded the OpenBioLLM mannequin domestically, handing it over to them within the flight. These women have been unfamiliar with chatbots like ChatGPT, so the expertise was fully new for them. They began by asking questions associated to the iPhone, and to their shock, the mannequin offered fairly passable solutions. Curious concerning the expertise, they inquired about what it was. I defined that it was a chatbot particularly designed for healthcare.

Intrigued, they expressed their need to check the mannequin additional and started asking in-depth questions, equivalent to medicine options and symptom-related eventualities, all inside a correct medical context. Stunned by the complexity of their questions, I politely requested about their background. They revealed that they have been each skilled neurologists and docs. I used to be shocked and realized that they have been the right people to guage the mannequin’s efficiency.

They proceeded to check the mannequin extra totally, and I may see the astonishment on their faces as they interacted with OpenBioLLM. Once I requested them to price the mannequin on a scale of 0-5, they responded that it was an excellent mannequin and gave it a score of 4. Moreover, they expressed their willingness to help with information assortment and different facets of the mannequin’s growth. I realized that they have been from a well known hospital in Nellore known as Narayan Medical School.

OpenBioLLM Medical AI

The Viral Success of OpenBioLLM and Its Impression on the Analysis Neighborhood

The information of OpenBioLLM’s success unfold like wildfire, with quite a few blogs, movies, and articles protecting the breakthrough. The viral consideration was overwhelming at occasions, but it surely additionally opened up unimaginable alternatives for collaboration and data sharing. I used to be honored to obtain an invite from Harvard College to current my work within the prestigious Lab. Moreover, I had the privilege of giving a chat on the Edinburgh Core NLP Group on the identical matter. All through this journey, I shaped friendships with many gifted researchers engaged on thrilling tasks, equivalent to genomics LLMs and multimodal LLMs.

Engaged on the OpenBioLLM venture was a real honor, but it surely’s vital to notice that that is only the start. We have now ignited a spark that’s now rising right into a blazing fireplace, inspiring researchers worldwide to consider in the potential of attaining significant outcomes by strategies like QLora and Lora for fine-tuning giant language fashions. I’ve been deeply moved by the numerous messages of thanks and appreciation I’ve acquired from researchers and fanatics across the globe. It fills me with immense happiness to know that our work has made a major contribution to the analysis neighborhood and has the potential to drive additional developments within the discipline.

Future Instructions and Collaboration Alternatives

Trying forward, I’m dedicated to persevering with my analysis journey and dealing on much more strong and revolutionary fashions. A few of the tasks within the pipeline embrace vision-based fashions for medical functions, Genomics & multimodal fashions, and plenty of extra thrilling developments.

I’m presently exploring a number of analysis matters and can be thrilled to collaborate with anybody enthusiastic about becoming a member of forces. I firmly consider that by working collectively and leveraging our collective experience, we will push the boundaries of what’s doable in biomedical AI and create options which have an enduring impression on healthcare and analysis. If any of those analysis areas resonate with you or if in case you have concepts for collaboration, please don’t hesitate to succeed in out. I’m enthusiastic about the way forward for biomedical AI and the position we will play in shaping it.

The Significance of Growing Foundational Models in India

It’s extremely gratifying to know that many people and corporations are utilizing OpenBioLLM-70B in manufacturing and discovering it helpful. I’ve acquired quite a few queries and appreciation messages from customers who’ve benefited from the mannequin’s capabilities. As the primary Indian LLM to realize such widespread adoption, it feels nice to have contributed one thing of worth to the AI neighborhood.

Seeking to the long run, I hope that our nation will produce extra foundational models that may be utilized throughout numerous domains. I consider that Indian researchers and entrepreneurs ought to give attention to creating strong and revolutionary fashions from the bottom up, slightly than solely counting on APIs. Whereas utilizing APIs shouldn’t be inherently dangerous, it’s vital to push our limits and work on creating higher and extra superior fashions.

Artificial Intelligence (AI) in India

A Name to Motion: Leveraging India’s Potential in AI Innovation

There have been cases the place individuals claimed to launch spectacular fashions from India, however below the hood, they have been merely utilizing present APIs. As a substitute, we should always try to develop our personal state-of-the-art fashions that may compete on a worldwide stage. In latest occasions, we now have seen the emergence of exceptional language fashions for Indian languages, equivalent to Tamil-Llama and Odia-Llama. These initiatives showcase the potential and expertise inside our nation. Now, it’s time for us to take the following step and work on fashions that may make a major impression on a worldwide scale. India has a wealth of numerous and distinctive datasets that may be leveraged to coach highly effective AI fashions.

By accumulating and using these datasets successfully, we will contribute one thing really significant to the analysis society. Our nation has the potential to turn into a hub for AI innovation, and it’s as much as us to grab this chance and drive progress within the discipline. I strongly encourage my fellow researchers and entrepreneurs to collaborate, share data, and work towards constructing foundational fashions that may revolutionize numerous industries. By pooling our experience and assets, we will create AI options that not solely profit our nation but in addition have an enduring impression on the worldwide stage.


The story of MedMCQA and OpenBioLLM-70B is a testomony to the ability of perseverance, innovation, and collaboration within the discipline of medical AI. From the preliminary challenges confronted throughout the publication of MedMCQA to the groundbreaking success of OpenBioLLM-70B, this journey highlights the immense potential of Indian researchers and the significance of creating foundational fashions inside our nation.

As we glance to the long run, it’s essential for Indian researchers and entrepreneurs to leverage our nation’s numerous datasets and experience to create AI options that may make a worldwide impression. By collaborating, sharing data, and pushing the boundaries of what’s doable, we will set up India as a hub for AI innovation and contribute meaningfully to the development of assorted industries, together with healthcare.

The success of OpenBioLLM-70B is only the start. We’re very excited concerning the future potentialities and collaborations that lie forward. Collectively, allow us to embrace the problem of constructing strong and revolutionary fashions that may revolutionize the sector of AI and make an enduring distinction on this planet.

Leave a Reply

Your email address will not be published. Required fields are marked *