  🚀 What is Trending in AI Research?: Open Interpreter + AVIS + TinyLlama + LLaSM + Qwen-VL and Qwen-VL-Chat ...What is Trending in AI Tools?

This newsletter brings AI research news that is much more technical than most resources but still digestible and applicable

Open Interpreter lets LLMs run code (Python, Javascript, Shell, and more) locally. You can chat with Open Interpreter through a ChatGPT-like interface in your terminal by running $ interpreter after installing. Open Interpreter equips developers with a broad array of capabilities, including Content Creation; it enables effortless content creation and editing of various formats such as photos, videos, PDFs, and more. Developers can take control of a Chrome browser, facilitating efficient research and automation. Open Interpreter seamlessly handles data-related tasks, allowing users to plot, clean, and analyze large datasets for informed decision-making.

Researchers from UCLA and Google propose AVIS, an Autonomous Information Seeking Visual Question Answering framework designed to tackle this challenge. The method uses a Large Language Model (LLM) for dynamic decision-making and strategizing the use of external tools, such as APIs, to gather necessary information. The system comprises three main components: a planner to decide the next tool to use, a reasoner to analyze the obtained data, and a working memory to retain this information. User studies were conducted to understand human decision-making in similar tasks, and this data was used in two critical ways: to create a transition graph that limits available actions at each state and to provide contextual examples that improve the LLM’s decision-making capabilities. The approach achieves state-of-the-art performance on benchmarks like Infoseek and OK-VQA.

This paper investigates the performance of ChatGPT against that of students in 32 university-level courses. Employing two specially designed classifiers, the study also explores the detectability of ChatGPT's text. Additionally, it surveys students and educators in five countries to gauge their perspectives on the use of such tools for school work. The results indicate that ChatGPT performs comparably or even better than students across multiple courses. Importantly, existing AI-text classifiers struggle to reliably flag AI-generated text, mainly due to false positives and the ease with which AI text can be edited. Both students and educators seem to converge on the opinion that using ChatGPT for academic work amounts to plagiarism. The findings could inform policies regarding AI's role in educational settings.

In the ever-evolving landscape of Language Model research, the quest for efficiency and scalability has led to a groundbreaking project – TinyLlama. This audacious endeavor, spearheaded by a research assistant at Singapore University, aims to pre-train a 1.1 billion parameter model on a staggering 3 trillion tokens within a mere 90 days, utilizing a modest setup of 16 A100-40G GPUs. The potential implications of this venture are monumental, as it promises to redefine the boundaries of what was once thought possible in the realm of compact Language Models. While existing models like Meta’s LLaMA and Llama 2 have already demonstrated impressive capabilities at reduced sizes, TinyLlama takes the concept a step further. The 1.1 billion parameter model occupies a mere 550MB of RAM, making it a potential game-changer for applications with limited computational resources.

Alibaba introduces two open-source large vision language models (LVLM) – Qwen-VL and Qwen-VL-Chat. Qwen-VL, the first of these models, is designed to be the sophisticated offspring of Alibaba’s 7-billion-parameter model, Tongyi Qianwen. It showcases an exceptional ability to process images and text prompts seamlessly. Qwen-VL-Chat, on the other hand, takes the concept further by tackling more intricate interactions. Empowered by advanced alignment techniques, this AI model demonstrates a remarkable array of talents, from composing poetry and narratives based on input images to solving complex mathematical questions embedded within images.

This paper introduces a novel framework called Large Language and Speech Model (LLaSM), aiming to address this gap. LLaSM is an end-to-end trained multi-modal model that combines both speech and language understanding capabilities, thereby facilitating cross-modal conversational abilities. The model is designed to follow instructions given through both speech and text, offering a more natural and convenient form of human-AI interaction. The authors also provide an initial dataset, LLaSM-Audio-Instructions, to enable further research and evaluation in the realm of multi-modal speech-and-language instruction following.

What is Trending in AI Tools?

