  • AI News: 🐶 Bark - Text2Speech | Wouldn't it be great if GPTs could learn about new APIs? | Generative Disco - generation of videos for music visualization.......

This newsletter brings AI research news that is much more technical than most resources but still digestible and applicable

Wouldn't it be great if GPTs could learn about new APIs? : With LlamaAcademy, you can teach GPTs to call Stripe, Notion, or even your own product's API. Instead of hosting API documentation, you can host an API implementation! Just point LlamaAcademy at your API docs, run the script, and -- shazam! -- a new LLaMA model will be created for you. You can host that model on your server, and users can call your bespoke mini-GPT to write their API glue.

Generative Disco: A new AI system called Generative Disco has been developed by researchers at Columbia University and HuggingFace. It enables the generation of videos for music visualization through a combination of a large language model and a text-to-image model. By using a music clip in waveform format, the system assists users in generating prompts that connect sound, language, and images. The prompts can be utilized to parameterize the generation of video clips that match the participant's desired outcome. For instance, a user exploring the system's capabilities might brainstorm prompts to create a video that shows "dancing at the disco" within a specific time frame.

Bark: A transformer-based text-to-speech model created by Suno. Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise, and simple sound effects. The model can also produce nonverbal communication, like laughing, sighing, and crying.

This AI Paper From NVIDIA Provides The Recipe To Reproduce RETRO Up To 9.5B Parameters While Retrieving A Text Corpus With 330B Tokens. Researchers at NVIDIA conduct extensive research on RETRO, as, to the best of their knowledge, RETRO is the only retrieval-augmented autoregressive LM that supports large-scale pretraining with retrieval on massive pretraining corpora containing hundreds of billions or trillions of tokens. Their thorough investigation sheds light on the promising direction of autoregressive LMs with retrieval as future foundation models, as they outperform standard GPT models in terms of perplexity, text generation quality, and downstream task performances, particularly for knowledge-intensive tasks such as open-domain QA.

CancerGPT: A few-shot learning approach based on LLMs to predict the synergy of drug pairs in rare tissues that lack structured data and features. CancerGPT (~124M parameters) is comparable to a larger fine-tuned GPT-3 model (~175B) on drug pair synergy prediction. This research involved seven rare tissues from different cancer types, demonstrating that the LLM-based prediction model achieved significant accuracy with very few or zero.

Researchers at Stanford Introduce Gisting: A Novel Technique for Efficient Prompt Compression in Language Models. The paper explains how gisting trains an LM to compress prompts into smaller sets of “gist” tokens. In order to reduce the cost of the prompt, techniques like fine-tuning or distillation can be used to train a model that would behave like the original one without the prompt, but in that case, the model would have to be re-trained for every new prompt, which is far from ideal. The idea behind gisting, however, is to use a meta-learning approach to predict gist tokens from a prompt which would not require re-training the model for each task and would enable generalization to unseen instructions without additional training.

Moss: An open-source tool-augmented conversational language model from Fudan University. MOSS is capable of following users' instructions to perform various natural language tasks, including question answering, generating text, summarizing text, generating code, etc. MOSS is also able to challenge incorrect premises and reject inappropriate requests. During the research preview, usage of MOSS is free.

