AI News: 🚀 What does ChatGPT return about human values? | Developer Successfully Made GPT-4-Powered Voice Assistant To Write Code | Do models like GPT-4 behave safely when given the ability to act?......

This newsletter brings AI research news that is much more technical than most resources but still digestible and applicable

SparseFormer: A neural architecture to perform visual recognition with a limited number of tokens along with the Transformer in the latent space. To imitate human eye behavior, researchers from Show Lab, the National University of Singapore, Tencent AI Lab, and Nanjing University design SparseFormer to focus these sparse latent tokens on discriminative foregrounds and make a recognition sparsely. As a very initial step to the sparse visual architecture, SparseFormer consistently yields promising results on challenging image classification and video classification benchmarks with a good performance-throughput tradeoff.

Meet AUDIT: An instruction-guided audio editing model based on latent diffusion models. A new diffusion model has been recently introduced by researchers that can easily edit audio clips. Called AUDIT, this latent diffusion model is an instruction-guided audio editing model. Audio editing mainly involves changing an input audio signal to produce an edited audio output. This includes tasks such as adding background sound effects, replacing background music, repairing incomplete audio, or enhancing low-quality audio. AUDIT takes both the input audio and human instructions as conditions and generates the edited audio output.

ChatAvatar: Generate your own avatar without login. Progressive Generation Of Animatable 3D Faces Under Text Guidance. ChatAvatar is now available on HuggingFace.

Another Large Language Model! Meet IGEL: An Instruction-Tuned German LLM Family. IGEL is the Instruction-tuned German large Language Model for Text. IGEL version 001 (Instruct-igel-001) is a primitive proof of concept meant to be used to determine whether or not it is feasible to construct a German instruction-tuned model from a combination of existing open-source models and a German-translated instruction dataset. The first version of IGEL was based on BigScience BLOOM, which Malte Ostendorff localized into German. IGEL is designed to perform various tasks related to natural language comprehension, including sentiment analysis, language translation, and question answering, with high accuracy and dependability in each area.

What does ChatGPT return about human values? Exploring ChatGPT's Value Bias: Large Language Models (LLMs) have raised concerns regarding potential ideological biases and discrimination in their generated text. To address this, a group of researchers conducted an experiment to test for value biases in ChatGPT, using the Schwartz basic value theory. The experiment involved prompting ChatGPT to generate text multiple times via the OpenAI API, and analyzing the resulting corpus for value content using a bag of words approach and a theory-driven value dictionary. The researchers found little evidence of explicit value bias in the generated text, indicating that the values were carried through into the outputs with high fidelity, in line with the theoretical predictions of the psychological model. However, they did observe some merging of socially oriented values, which could suggest that these values are less distinct at a linguistic level, or reflect underlying universal human motivations.

Do models like GPT-4 behave safely when given the ability to act?: A research group developed the Machiavelli benchmark to measure deception, power-seeking tendencies, and other unethical behaviors in complex interactive environments that simulate the real world. The Machiavelli benchmark consists of 134 text-based Choose Your Own Adventure games containing over half a million scenes. The games abstract away low-level control/navigation, instead spotlighting high-level social decisions alongside real-world goals.

SegGPT: SegGPT is a model developed by researchers from China that is capable of segmenting objects in images and videos through in-context learning. The model unifies different segmentation tasks and transforms them into image format to create a generalist framework. The training is done through an in-context coloring problem with random color mapping, allowing the model to accomplish diverse tasks without relying on specific colors. The model performs well in a range of tasks, including few-shot semantic segmentation, video object segmentation, semantic segmentation, and panoptic segmentation, and can segment both in-domain and out-of-domain targets.

Developer Successfully Made GPT-4-Powered Voice Assistant Write Code: By now, everyone knows the power of ChatGPT: it can do a lot of tasks for you and give you nearly infinite ideas about anything you can think of. But the developer of AI tools Mckay Wrigley went a step further and incorporated a GPT-4 model in an Apple Watch so that its voice assistant could write code for him. In the video posted on Twitter, Wrigley asks Siri to go to his chatbot repository and add a reset button to bring the chat back to its original state. Miraculously, the assistant does just that – in less than a minute you can see the result that does exactly what was asked.

