• AI Research Insights
  • Posts
  • Marktechpost AI Newsletter: Predibase Researchers Present a Technical Report of 310 Fine-tuned LLMs that Rival GPT-4 + Google DeepMind Introduces Med-Gemini + many more...

Marktechpost AI Newsletter: Predibase Researchers Present a Technical Report of 310 Fine-tuned LLMs that Rival GPT-4 + Google DeepMind Introduces Med-Gemini + many more...

Marktechpost AI Newsletter: Predibase Researchers Present a Technical Report of 310 Fine-tuned LLMs that Rival GPT-4 + Google DeepMind Introduces Med-Gemini + many more...

Want to get in front of 1.5 Million AI enthusiasts? Work with us here

Featured Research..

Predibase Researchers Present a Technical Report of 310 Fine-tuned LLMs that Rival GPT-4

Researchers from Predibase introduced LoRA Land, a comprehensive project that evaluates fine-tuned LLMs across various tasks. The research team used 10 base models and 31 tasks to fine-tune 310 models. The tasks included classic NLP, coding, knowledge-based reasoning, and math-based problems. This effort was supported by LoRAX, the open-source inference server designed specifically for serving multiple LoRA fine-tuned LLMs. The server enables the simultaneous use of multiple models by leveraging shared base weights and dynamic adapter loading, thus allowing numerous models to be deployed on a single GPU.

To validate the proposed methodology, the research team conducted experiments using LoRA with 4-bit quantization on the base models, achieving remarkable results. They found that LoRA-based fine-tuned models outperformed their base models significantly, with performance improvements averaging over 34 points. Some models even surpassed GPT-4 by 10 points on average across different tasks. The researchers meticulously standardized their testing framework, ensuring consistency in fine-tuning parameters and queries to provide a fair assessment across models. LoRAX’s deployment capabilities were thoroughly evaluated, highlighting its ability to efficiently manage multiple models concurrently. With features like dynamic adapter loading and tiered weight caching, it achieved high concurrency levels while maintaining minimal latency.

 Editor’s Picks…

Google DeepMind Introduces Med-Gemini: A Groundbreaking Family of AI Models Revolutionizing Medical Diagnosis and Clinical Reasoning

The research team from Google Research, Google DeepMind, Google Cloud, and Verily introduced the Med-Gemini family of models, which extend the capabilities of the Gemini 1.0 and 1.5 architectures by integrating specialized components for medical tasks. Med-Gemini aims to address limitations in current AI models by improving clinical reasoning, multimodal understanding, and long-context processing. This new family of models surpasses previous benchmarks and sets a new standard in medical AI.

Med-Gemini builds on the Gemini architecture by introducing key innovations like uncertainty-guided web search for accurate medical question answering. This is coupled with customized encoders that can process health-related signals like electrocardiograms (ECGs). Med-Gemini also utilizes chain-of-reasoning techniques that help with processing and understanding long-context medical records. These models are fine-tuned to medical needs and can accurately answer complex medical questions by leveraging improved clinical reasoning.

Nvidia Publishes A Competitive Llama3-70B QA / Retrieval-Augmented Generation (RAG) Fine-Tune Model

Llama3-ChatQA-1.5-8B and Llama3-ChatQA-1.5-70B are the two versions of this state-of-the-art model that come with 8 billion and 70 billion parameters, respectively. These models, which were first trained with Megatron-LM, have been converted to the Hugging Face format for accessibility and convenience. Building on the success of ChatQA, a family of conversational QA models with performance levels comparable to GPT-4, Llama3-ChatQA-1.5 was developed. ChatQA greatly improves zero-shot conversational QA outcomes with Large Language Models (LLMs) by introducing a unique two-stage instruction tweaking strategy.

This AI Paper by Scale AI Introduces GSM1k for Measuring Reasoning Accuracy in Large Language Models LLMs

Researchers from Scale AI have introduced GSM1k, a new benchmark created to measure overfitting and reasoning capabilities in LLMs. The researchers developed this benchmark by creating 1,250 elementary math problems that mirror the complexity and content of existing benchmarks like GSM8k. The benchmark aims to identify whether models rely on memorization or possess genuine reasoning capabilities by comparing model performances across similar but distinct datasets.

The methodology behind GSM1k involves generating a new dataset of 1,250 elementary math problems. These were designed to match the complexity of benchmarks like GSM8k, ensuring comparable difficulty levels. The researchers employed human annotators to create issues that required basic arithmetic and reviewed the problems through multiple quality checks. They compared the results of models across GSM1k and GSM8k to measure performance differences, emphasizing how models solve problems rather than memorizing answers. This setup provides a clear understanding of model capabilities and identifies systematic overfitting.

BiomedRAG: Elevating Biomedical Data Analysis with Retrieval-Augmented Generation in Large Language Models

Researchers from the University of Minnesota, the University of Illinois at Urbana-Champaign, and Yale University have introduced BiomedRAG, a novel retrieval-augmented generation model tailored specifically for the biomedical domain. This model adopts a simpler design than previous retrieval-augmented LLMs, directly incorporating chunks of relevant information into the model’s input. This approach simplifies retrieval and enhances accuracy by enabling the model to bypass noisy details, particularly in noise-intensive tasks like triple extraction and relation extraction.

BiomedRAG relies on a tailored chunk scorer to identify and retrieve the most pertinent information from diverse documents. This tailored scorer is designed to align with the LLM’s internal structure, ensuring the retrieved data is highly relevant to the query. The model’s effectiveness is to dynamically integrate the retrieved chunky, significantly improving performance across tasks such as text classification & link prediction. The research demonstrates that the model achieves superior results, with micro-F1 scores reaching 88.83 on the ChemProt corpus for triple extraction, highlighting its capability to construct effective biomedical intervention systems.