AI Dev and Research News
Posts
Marktechpost AI Newsletter: 01.AI Introduces Yi-1.5-34B Model + Meta AI Introduces Chameleon + TIGER-Lab Introduces MMLU-Pro Dataset and many more....

Marktechpost AI Newsletter: 01.AI Introduces Yi-1.5-34B Model + Meta AI Introduces Chameleon + TIGER-Lab Introduces MMLU-Pro Dataset and many more....

ASIF RAZZAQ
May 19, 2024

In partnership with

Want to get in front of 1.5 Million AI enthusiasts? Work with us here

Featured Research..

01.AI Introduces Yi-1.5-34B Model: An Upgraded Version of Yi with a High-Quality Corpus of 500B Tokens and Fine-Tuned on 3M Diverse Fine-Tuning Samples

The recent Yi-1.5-34B model introduced by 01.AI has brought about yet another advancement in the field of Artificial Intelligence. Positioned as a major improvement over its predecessors, this unique model bridges the gap between Llama 3 8B and 70B. It promises better performance in a number of areas, such as multimodal capability, code production, and logical reasoning. The complexities of the Yi-1.5-34B model, its creation, and its possible effects on the AI community have been explored in depth by the team of researchers.

The Yi-34B model served as the basis for the Yi-1.5-34B model’s development. The Yi-1.5-34B carries on the tradition of Yi-34B, which was recognized for its superior performance and functioned as an unofficial benchmark in the AI community. This is due to its improved training and optimization. The model’s intense training regimen has been demonstrated by the fact that it was pre-trained on an incredible 500 billion tokens, earning 4.1 trillion tokens in total.

Editor’s Picks…

Researchers from Cerebras & Neural Magic Introduce Sparse Llama: The First Production LLM based on Llama at 70% Sparsity

Researchers from Neural Magic, Cerebras Systems, and IST Austria have introduced a novel approach to create sparse foundational versions of large language models. They specifically targeted the LLaMA-2 7B model, aiming to combine the SparseGPT pruning method with sparse pretraining techniques. This innovative method seeks to achieve high sparsity levels while preserving or enhancing the model’s accuracy. The researchers’ approach involves initially pruning the model to 50% sparsity, followed by further iterative training and pruning steps to reach 70% sparsity.

The method begins with sparse pretraining on subsets of high-quality datasets such as SlimPajama and The Stack. The sparse pretraining process includes fine-tuning with per-layer distillation, ensuring the model retains high accuracy across various complex tasks, including chat, code generation, and instruction following. This detailed process involves training the 50% sparse model until convergence and then pruning it further to achieve the 70% target. The weights are pruned and frozen, and sparsity masks are enforced during training to maintain the desired sparsity levels. This iterative process is crucial for maintaining high recovery levels after fine-tuning.

Researchers from Columbia University and Databricks Conducted a Comparative Study of LoRA and Full Finetuning in Large Language Models

Researchers from Columbia University and Databricks Mosaic AI have explored various methods to address this issue, including full finetuning and parameter-efficient finetuning techniques like Low-Rank Adaptation (LoRA). Full finetuning involves adjusting all model parameters, which is computationally expensive. In contrast, LoRA aims to save memory by only modifying a small subset of parameters, thereby reducing the computational load. Despite its popularity, the effectiveness of LoRA compared to full finetuning has been a topic of debate, especially in challenging domains such as programming and mathematics, where precise performance improvements are critical.

The researchers discovered that LoRA generally underperformed compared to full finetuning in programming and mathematics tasks. For example, in the programming domain, full finetuning achieved a peak HumanEval score of 0.263 at 20 billion tokens, while the best LoRA configuration reached only 0.175 at 16 billion tokens. Similarly, in the mathematics domain, full finetuning achieved a peak GSM8K score of 0.642 at 4 epochs, whereas the best LoRA configuration achieved 0.622 at the same point. Despite this underperformance, LoRA provided a beneficial form of regularization, which helped maintain the base model’s performance on tasks outside the target domain. This regularization effect was stronger than common techniques like weight decay and dropout, making LoRA advantageous when retaining base model performance, which is crucial.

MaxAI.me - Do More Faster with 1-Click AI

Discover MaxAI.me, one of the top 50 GenAI apps of 2024!

Chat with the latest AI like GPT-4, Claude 3, and Gemini 1.5, all in one place. Perfect your writing anywhere with 1-click AI without copy-pasting. Save 90% of your reading & watching time with AI summaries. Reply 10x faster with AI on email, social media, and messaging web apps. Rapidly turn your visions into stunning images with AI art generators.

Join over 1 million productive users on MaxAI.me to get more done in less time!

Meta AI Introduces Chameleon: A New Family of Early-Fusion Token-based Foundation Models that Set a New Bar for Multimodal Machine Learning

Meta researchers present Chameleon, a mixed-modal foundation model that facilitates generating and reasoning with interleaved textual and image sequences, enabling comprehensive multimodal document modeling. Unlike traditional models, Chameleon employs a unified architecture, treating both modalities equally by tokenizing images akin to text. This approach, termed early fusion, allows seamless reasoning across modalities but poses optimization challenges. To address these, the researchers propose architectural enhancements and training techniques. By adapting transformer architecture and finetuning strategies.

Researchers developed a novel image tokenizer, encoding 512 × 512 images into 1024 tokens from an 8192-codebook, focusing on licensed images and doubling face-containing images during pre-training. However, their tokenizer struggles with text-heavy image reconstruction. Also, they trained a BPE tokenizer with a 65,536-vocabulary, including image tokens, using the sentencepiece library, over a subset of training data. Chameleon addressed stability issues with QK-Norm, dropout, and z-loss regularization during training, facilitating successful training on Meta’s RSC. Inference streamlined processing for mixed-modal generation using PyTorch and xformers, supporting both streaming and non-streaming modes with token masking for conditional logic.

Tired of MMLU? The current models already hit the ceiling? It's time to upgrade MMLU! ----- TIGER-Lab Introduces MMLU-Pro Dataset for Comprehensive Benchmarking of Large Language Models’ Capabilities and Performance

Researchers from TIGER-Lab have introduced the MMLU-Pro dataset to address these limitations. This new dataset is designed to provide a more rigorous and comprehensive benchmark for evaluating LLMs. MMLU-Pro significantly increases the number of answer options from four to ten per question, enhancing the evaluation’s complexity and realism. Including more reasoning-focused questions addresses the shortcomings of the original MMLU dataset. This effort involves leading AI research labs and academic institutions, aiming to set a new standard in AI evaluation

The construction of the MMLU-Pro dataset involved a meticulous process to ensure its robustness and effectiveness. Researchers began by filtering the original MMLU dataset to retain only the most challenging and relevant questions. They then augmented the number of answer options per question from four to ten using GPT-4, a state-of-the-art AI model. This augmentation process was not merely about adding more options; it involved generating plausible distractors that require discriminative reasoning to navigate. The dataset sources questions from high-quality STEM websites, theorem-based QA datasets, and college-level science exams. Each question underwent rigorous review by a panel of over ten experts to ensure accuracy, fairness, and complexity, making the MMLU-Pro a robust tool for benchmarking.

Marktechpost AI Newsletter: 01.AI Introduces Yi-1.5-34B Model + Meta AI Introduces Chameleon + TIGER-Lab Introduces MMLU-Pro Dataset and many more....