• AI Research Insights
  • Posts
  • ↗️ AI/ML Research Updates: Punica (An AI System to Serve Multiple LoRA Models); JARVIS-1 (Open-World Multi-Task Agents); TransLO (Window-Based Masked Point Transformer Framework); LLaVA-Plus and many more research trends

↗️ AI/ML Research Updates: Punica (An AI System to Serve Multiple LoRA Models); JARVIS-1 (Open-World Multi-Task Agents); TransLO (Window-Based Masked Point Transformer Framework); LLaVA-Plus and many more research trends

This newsletter brings AI research news that is much more technical than most resources but still digestible and applicable

Hey Folks!

This newsletter will discuss some cool AI research papers and AI tools. Happy learning!

👉 What is Trending in AI/ML Research?

How can multiple Low-rank adaptation (LoRA) models be efficiently served on a shared GPU cluster? "Punica" addresses this issue with an innovative CUDA kernel design enabling the batching of GPU operations for diverse LoRA models. This unique approach allows a single GPU to maintain only one copy of the base pre-trained model while serving multiple LoRA models, greatly improving GPU efficiency in terms of memory and computational resources. Punica's scheduler effectively manages multi-tenant LoRA workloads in a shared GPU environment. In comparative evaluations, Punica showcases a 12-fold increase in throughput for serving multiple LoRA models over existing state-of-the-art LLM serving systems, with a minimal addition of just 2ms latency per token.

How can agents achieve human-like planning and control in an open-world environment with infinite tasks? This paper introduces "JARVIS-1", a sophisticated agent designed for the Minecraft universe, capable of handling multimodal observations (visual and textual instructions) and executing complex tasks. Built on pre-trained multimodal language models, JARVIS-1 converts inputs into actionable plans, guiding goal-conditioned controllers. A key innovation is its multimodal memory, blending pre-trained knowledge with game experiences for enhanced planning. In tests, JARVIS-1 performed exceptionally in over 200 Minecraft tasks, showing a significant improvement in long-horizon tasks like the diamond pickaxe challenge. Its ability to self-improve through lifelong learning demonstrates potential for greater general intelligence and autonomy.

How can transformer architecture, successful in 2D vision tasks, be adapted for 3D vision, especially with the challenges posed by point clouds? This paper introduces TransLO, a novel approach that processes large-scale point clouds efficiently by projecting them onto a 2D surface and utilizing a local transformer with linear complexity. The key components of TransLO include a Window-based Masked transformer with Self Attention (WMSA) for long-range dependency capture, and a Masked Cross-Frame Attention (MCFA) for frame association and pose estimation. A binary mask is proposed to address the sparsity of point clouds. Notably, TransLO is the first transformer-based LiDAR odometry network. It outperforms existing learning-based methods and even bests LOAM in most evaluations on the KITTI odometry dataset, achieving an average rotation RMSE of 0.500°/100m and a translation RMSE of 0.993%.

How can a multimodal assistant be enhanced to better interact with and fulfill real-world tasks? "LLaVA-Plus" addresses this by expanding the capabilities of large multimodal models. It houses a skill repository containing various pre-trained vision and vision-language models, activating the appropriate tools in response to user inputs. Trained on multimodal instruction-following data, LLaVA-Plus adeptly handles tasks involving visual understanding, generation, external knowledge retrieval, and compositions. Empirical evidence shows that LLaVA-Plus not only surpasses its predecessor, LLaVA, in existing capabilities but also introduces new functionalities. A key innovation is its ability to ground image queries directly, maintaining active engagement throughout human-AI interactions, which significantly enhances tool use and opens up novel scenarios.

Featured AI Tools For You

  • Rask AI: Rask AI is a video localization tool with voice cloning that translates and dubs videos into 130+ languages. [Speech AI and Video Editor]

  • Codium: Automated test generation for faster and more confident development. [Coding]

  • Semrush: Semrush is an all-in-one online marketing platform that helps businesses improve their online visibility and grow their traffic. [Marketing]

  • SaneBox*: SaneBox: AI-powered email management that saves you time and brings sanity back to your inbox. Voted Best Productivity Apps for 2023 on PCMag. Sign up today and save $25 on any subscription. [Email and Productivity]

  • Leap: Boost your app with Leap's AI APIs and SDKs for image, model, and text generation, editing, and refinement. [Image Generator]

  • Threado AI: Threado AI is an intelligent sidekick that provides instant support anywhere, trained on your knowledge base and powered by AI. [Customer Support]

  • Adcreative AI*: Boost your advertising and social media game with AdCreative.ai - the ultimate Artificial Intelligence solution. [Marketing and Sales]

  • GPTConsole: GPTConsole makes AI app development easy with Pixie, an agent that automatically creates full-fledged web and mobile apps. [AI Assistant]

  • Retouch4me: Retouch4me's plugins make photo retouching such a breeze, ensuring professional results every time. [Photo Editing]

  • Tugan.ai: Tugan AI turns articles, videos, and sales pages into engaging content for newsletters, social media, and email. [Social Media and Marketing]

*We do make a small affiliate profit when you buy this product through the click link

Sponsorship: For newsletter sponsorship, please reach us at [email protected]

Marktechpost Media Inc. is a California-based Artificial Intelligence News Platform with 2 Million+ AI Tech Readers/Viewers

Who is Marktechpost's Audience?

Our audience consists of Data Engineers, MLOps Engineers, Data Scientists, ML Engineers, ML Researchers, Data Analysts, Software Developers, Architects, IT Managers, Software engineer/SDEs, CTO, Director/ VP data science, CEOs, PhD Researchers, Postdocs and Tech Investors.

Who should try our Advertisement or Sponsorship package?

We encourage companies who are developing AI software, Deep learning tools, Machine learning tools, NLP tools, MLOps tools, Data Science tools, AIOps, DataOps, BigData tools, AI Chips/Hardware, GPUs, TPUs, CPUs, and SaaS products.