• AI Research Insights
  • Posts
  • 🚀 What is Trending in AI Research?: Text-Driven 3D Editing, Legal Reasoning Benchmarks, Trustworthiness of GPT Models & More!

🚀 What is Trending in AI Research?: Text-Driven 3D Editing, Legal Reasoning Benchmarks, Trustworthiness of GPT Models & More!

This newsletter brings AI research news that is much more technical than most resources but still digestible and applicable

➡️ How can we edit 3D objects in a localized manner using text without distorting their form? This research paper from Korea introduces Blending-NeRF, a novel framework based on Neural Radiance Fields (NeRF). The model uses two NeRF networks: one pre-trained and one editable. By incorporating new blending operations and leveraging the CLIP vision-language model, Blending-NeRF enables text-driven, localized edits to 3D objects, such as adding new elements, modifying textures, and removing parts. The authors show through extensive experiments that the method produces natural-looking, localized edits from text prompts.

➡️ How can we assess the capabilities of large language models (LLMs) in performing legal reasoning tasks? This paper introduces LegalBench, a benchmark comprising 162 tasks in six different categories of legal reasoning. Designed collaboratively by legal professionals, the benchmark aims to measure practical and intellectually engaging reasoning skills. The paper also aligns LegalBench with established legal frameworks to facilitate cross-disciplinary discussions between lawyers and LLM developers. An empirical evaluation of 20 LLMs is included to showcase the research potential of LegalBench.

Offboard NewsletterWeekly newsletter where jobseekers gain their edge with trends, tools, & more.

➡️ How trustworthy are Generative Pre-trained Transformer (GPT) models like GPT-3.5 and GPT-4, especially when deployed in sensitive sectors like healthcare and finance? This paper offers a thorough evaluation framework that gauges these models on a range of criteria—such as toxicity, bias, robustness, privacy, and ethics. It uncovers significant vulnerabilities; GPT models can produce toxic, biased outputs and can compromise privacy. Interestingly, while GPT-4 generally outperforms GPT-3.5, it shows increased vulnerability to being misled by specific user inputs. The study aims to fill the trustworthiness research gap in large language models.

➡️ How can we fairly evaluate Large Language Models (LLMs) when their performance varies due to the order of multiple-choice options and the way prompts are worded? This paper investigates LLMs' sensitivity to option order in multiple-choice questions, revealing significant performance gaps that range from 13% to 75% on various benchmarks. The study attributes this sensitivity to "positional bias" when LLMs are uncertain between the top choices. By identifying patterns that either amplify or mitigate this bias, the researchers propose calibration methods that improve model performance by up to 8 percentage points.

➡️ How can text-based character generation produce high-quality 3D avatars that are not only realistic but also easily animatable? This paper introduces TADA, a novel method that marries a 2D diffusion model with an animatable parametric body model derived from SMPL-X. Utilizing hierarchical rendering and score distillation sampling, TADA produces 3D avatars with superior geometry and textures that align seamlessly, especially in facial regions. The method allows for editable and animatable characters, outperforming existing techniques in both qualitative and quantitative metrics.

➡️ How can we achieve scalable and high-performing unified speech-to-speech translation that supports multiple languages? This paper from Meta AI introduces SeamlessM4T, a single model capable of handling various translation modes—speech-to-speech, speech-to-text, text-to-speech, and text-to-text—for up to 100 languages. Built on 1 million hours of open speech audio and a multimodal corpus called SeamlessAlign, the model sets new benchmarks in translation quality, showing significant improvements over previous cascaded systems. The system is also tested for robustness, gender bias, and added toxicity, and all resources are open-sourced.

Featured Tools:

and many more in our AI Tools Club.

CXOTalk UpdatesExecutive Conversations on Leadership, Enterprise AI, and the Digital Economy