• AI Research Insights
  • Posts
  • Marktechpost AI Newsletter: Cohere AI Releases Aya23 Models + Microsoft Introduces Phi Silica + LLMWare.ai Selected for 2024 GitHub Accelerator + OpenRLHF and many more...

Marktechpost AI Newsletter: Cohere AI Releases Aya23 Models + Microsoft Introduces Phi Silica + LLMWare.ai Selected for 2024 GitHub Accelerator + OpenRLHF and many more...

Marktechpost AI Newsletter: Cohere AI Releases Aya23 Models + Microsoft Introduces Phi Silica + LLMWare.ai Selected for 2024 GitHub Accelerator + OpenRLHF and many more...

Want to get in front of 1.5 Million AI enthusiasts? Work with us here

Foundational model transparency index from @StanfordCRFM

Featured Research..

Cohere AI Releases Aya23 Models: Transformative Multilingual NLP with 8B and 35B Parameter Models

Researchers from Cohere For AI have introduced the Aya-23 models. These models are designed to enhance multilingual capabilities in NLP significantly. The Aya-23 family includes models with 8 billion and 35 billion parameters, making them some of the largest and most powerful multilingual models available. The two models are as follows:

  • It features 8 billion parameters, making it a highly powerful model for multilingual text generation.

  • It supports 23 languages, including Arabic, Chinese, English, French, German, and Spanish, and is optimized for generating accurate and contextually relevant text in these languages.

  • It comprises 35 billion parameters, providing even greater capacity for handling complex multilingual tasks.

  • It also supports 23 languages, offering enhanced performance in maintaining consistency and coherence in generated text. This makes it suitable for applications requiring high precision and extensive linguistic coverage.

 Editor’s Picks…

Microsoft Introduces Phi Silica: A 3.3 Billion Parameter AI Model Transforming Efficiency and Performance in Personal Computing

Microsoft researchers introduced Phi Silica, a small language model specifically designed for the Neural Processing Units (NPUs) in their new Copilot+ PCs. Phi Silica is part of the Phi family of models and is intended to deliver high-performance AI capabilities while consuming minimal power. This design allows the CPU and GPU to remain available for other tasks, enhancing the overall computing experience.

Phi Silica stands out with its 3.3 billion parameters, making it the smallest model in the Phi family. Despite its compact size, Phi Silica achieves impressive performance metrics. It boasts a first-token latency of 650 tokens per second and consumes only 1.5 Watts of power. This efficiency ensures that the PC’s CPU and GPU are not burdened, allowing for smoother operation of other applications. Phi Silica’s token generation also reuses the NPU’s KV cache and runs on the CPU, producing approximately 27 tokens per second.

This AI Paper Introduces KernelSHAP-IQ: Weighted Least Square Optimization for Shapley Interactions

Researchers from Bielefeld University, LMU Munich, and Paderborn University have introduced a novel method called KernelSHAP-IQ to address these challenges. This method extends the capabilities of KernelSHAP to include higher-order Shapley Interaction Indices (SII). KernelSHAP-IQ utilizes a weighted least square (WLS) optimization approach to capture and quantify interactions beyond the first order accurately. Doing so provides a more detailed and precise framework for model interpretability. This advancement is significant as it allows for the inclusion of complex feature interactions often present in sophisticated models but should be noticed by traditional methods.

KernelSHAP-IQ constructs an optimal approximation of the Shapley Interaction Index using iterative k-additive approximations. It starts with first-order interactions and incrementally includes higher-order interactions. The method leverages weighted least square (WLS) optimization to capture feature interactions accurately. The approach was tested on various datasets, including the California Housing regression dataset, a sentiment analysis model using IMDB reviews, and image classifiers like ResNet18 and Vision Transformer. By sampling subsets and solving WLS problems, KernelSHAP-IQ provides a detailed representation of feature interactions, ensuring computational efficiency and precise interpretability.

LLMWare.ai Selected for 2024 GitHub Accelerator: Enabling the Next Wave of Innovation in Enterprise RAG with Small Specialized Language Models

It’s exciting to note that LLMWare.ai has been selected as one of the 11 outstanding open-source AI projects shaping the future of open source AI, and invited to join the 2024 GitHub Accelerator.

LLMWare has been unique in its focus on small, specialized language models, recognizing early that as model technology improved, small models offered many advantages in ease of integration into enterprise processes, enormous benefits in terms of privacy and security, and tremendous cost and speed benefits to be adapted and integrated into almost any enterprise back-end process.   To use smaller models, however, requires a lot of expertise, and innovating a different set of underlying technologies and capabilities.  To support and enable this vision of privately-deployed, decentralized AI, LLMWare has launched in breakneck pace over the last 8 months, both a comprehensive enterprise-grade RAG platform (llmware) and a growing collection of its own specialized models finetuned for key enterprise automation tasks under the brands BLING, DRAGON, SLIM and Industry-Bert.

OpenRLHF: An Open-Source AI Framework Enabling Efficient Reinforcement Learning from Human Feedback RLHF Scaling

Current RLHF approaches often involve dividing the LLM across multiple GPUs for training, but this strategy is not without its drawbacks. Firstly, excessive partitioning can lead to memory fragmentation on individual GPUs, resulting in a reduced effective batch size for training and thus slowing down the overall process. Secondly, the communication overhead between the fragmented parts creates bottlenecks, analogous to a team constantly exchanging messages, which ultimately hinders efficiency.

In response to these challenges, researchers propose a groundbreaking RLHF framework named OpenRLHF. OpenRLHF leverages two key technologies: Ray, the Distributed Task Scheduler, and vLLM, the Distributed Inference Engine. Ray functions as a sophisticated project manager, intelligently allocating the LLM across GPUs without excessive partitioning, thereby optimizing memory utilization and accelerating training by enabling larger batch sizes per GPU. Conversely, vLLM enhances computation speed by leveraging the parallel processing capabilities of multiple GPUs, akin to a network of high-performance computers collaborating on a complex problem.

This Machine Learning Paper from Stanford and the University of Toronto Proposes Observational Scaling Laws: Highlighting the Surprising Predictability of Complex Scaling Phenomena

Researchers from Stanford University, University of Toronto, and Vector Institute introduced observational scaling laws to improve language model performance predictions. This method uses publicly available models to create scaling laws, reducing the need for extensive training. By leveraging existing data from approximately 80 models, the researchers could build a generalized scaling law that accounts for variations in training compute efficiencies. This innovative approach offers a cost-effective and efficient way to predict model performance across different scales and capabilities, setting it apart from traditional scaling methods.

The methodology analyzes performance data from about 80 publicly available language models, including the Open LLM Leaderboard and standardized benchmarks such as MMLU, ARC-C, and HellaSwag. The researchers hypothesized that model performance could be mapped to a low-dimensional capability space. They developed a generalized scaling law by examining variations in training compute efficiencies among different model families. This process involved using principal component analysis (PCA) to identify key capability measures and fitting these measures into a log-linear relationship with compute resources, enabling accurate and high-resolution performance predictions.