• AI Research Insights
  • Posts
  • 🚀 AI News: Introducing OmniMotion, FinGPT, and MUSICGEN; Probing LLMs for Causal Inference, Text-Driven Video Editing & The Launch of Mind2Web...

🚀 AI News: Introducing OmniMotion, FinGPT, and MUSICGEN; Probing LLMs for Causal Inference, Text-Driven Video Editing & The Launch of Mind2Web...

This newsletter brings AI research news that is much more technical than most resources but still digestible and applicable

➡️ A team of researchers from Cornell University, Google Research, and UC Berkeley have developed a holistic and globally coherent motion representation system named OmniMotion. This innovative technology enables the precise calculation of the movement of every pixel throughout the duration of a video. OmniMotion employs a quasi-3D canonical volume to encapsulate a video and carries out pixel tracking by mapping correspondences between local and canonical spaces. Thorough assessments using the TAP-Vid benchmark and real-world video data demonstrate that OmniMotion significantly surpasses previous best-performing methodologies in terms of both numerical measures and qualitative attributes.

➡️ Meet FinGPT: A freely accessible large language model (LLM) tailored for the financial sector. This model operates on a data-centric basis, presenting researchers and practitioners with user-friendly tools for creating financial LLMs. In a unique approach, FinGPT effectively utilizes pre-existing LLMs and refines them for particular financial applications. This strategy drastically reduces the cost of adaptation and computational demands when compared to models such as BloombergGPT, delivering a solution that is more accessible, adaptable, and cost-efficient for financial language modeling. This enables regular updates, guaranteeing model precision and relevance — a key necessity in the rapidly evolving, time-critical finance domain.

➡️ Meta AI has unveiled MUSICGEN, a unique Language Model (LM) that functions across multiple channels of condensed, discrete musical representations, also known as tokens. In contrast to earlier models, MUSICGEN utilizes a single-stage transformer LM in conjunction with efficient token interweaving patterns, effectively removing the necessity for stacking several models in a hierarchical or upsampling manner. By adhering to this methodology, the researchers illustrate MUSICGEN's capability to create superior-quality samples. Furthermore, it can be conditioned based on textual descriptions or melodic characteristics, thereby offering improved control over the generated output.

➡️ Can Large Language Models Infer Causation from Correlation? This AI research proposes the first benchmark dataset to test the pure causal inference skills of large language models (LLMs). The research team introduced a novel task, CORR2CAUSE, to infer causation from correlation, and collected a large-scale dataset of more than 400K samples. They also show that it is possible to re-purpose LLMs on this task by finetuning, but future work needs to be aware of the out-of-distribution generalization problem

➡️ Meet ControlVideo: A Novel AI Method For Text-Driven Video Editing. Researchers from Tsinghua University, Renmin University of China, ShengShu, and Pazhou Laboratory introduce ControlVideo, a cutting-edge method based on a pretrained text-to-image diffusion model for faithful and reliable text-driven video editing. Drawing inspiration from ControlNet, ControlVideo amplifies the source video’s direction by including visual conditions such as Canny edge maps, HED borders, and depth maps for all frames as extra inputs. A ControlNet pretrained on the diffusion model handles these visual circumstances. Comparing such circumstances to the text and attention-based tactics now utilized in text-driven video editing approaches, it is noteworthy that they offer a more precise and adaptable manner of video control.

➡️ Researchers from The Ohio State University introduce Mind2Web, the first dataset for developing and evaluating generalist agents for the web that can follow language instructions to complete complex tasks on any website. Existing datasets for web agents either use simulated websites or only cover a limited set of websites and tasks, thus not suitable for generalist web agents. With over 2,000 open-ended tasks collected from 137 websites spanning 31 domains and crowdsourced action sequences for the tasks, Mind2Web provides three necessary ingredients for building generalist web agents: 1) diverse domains, websites, and tasks, 2) use of real-world websites instead of simulated and simplified ones, and 3) a broad spectrum of user interaction patterns.

➡️ Microsoft AI Unveils LLaVA-Med: An Efficiently Trained Large Language and Vision Assistant Revolutionizing Biomedical Inquiry, Delivering Advanced Multimodal Conversations in Under 15 Hours. The team proposes a novel curriculum learning approach to the fine-tuning of a large general-domain vision-language model using a large-scale, broad-coverage biomedical figure-caption dataset extracted from PubMed Central and GPT-4 to self-instruct open-ended instruction-following data from the captions.

Why you can’t afford to be skeptical of this investment opportunity (730,000+ people aren’t)

Note: This section is supported and sponsored by Masterworks

Picture this: an investment platform delivering tens of millions of dollars a year to investors, while realizing net returns of 17.8%, 21.5% and 35%, and giving everyday people access to a market that was previously only available to billionaires. Too good to be true, right?


All of the above, and more, has been made possible by Masterworks, an award-winning platform for blue-chip art investing (think Banksy, Basquiat, and Picasso.) To date, Masterworks has sold over $45 million dollars worth of artwork and distributed the net proceeds to investors.

All of Masterworks’ offerings are qualified with the SEC, making it simple for investors with no experience in art to benefit from this $1.7 trillion asset class – at a fraction of the cost.

See important disclosures at masterworks.com/cd