👉 What is Trending in AI/ML Research?

How can multiple Low-rank adaptation (LoRA) models be efficiently served on a shared GPU cluster? "Punica" addresses this issue with an innovative CUDA kernel design enabling the batching of GPU operations for diverse LoRA models. This unique approach allows a single GPU to maintain only one copy of the base pre-trained model while serving multiple LoRA models, greatly improving GPU efficiency in terms of memory and computational resources. Punica's scheduler effectively manages multi-tenant LoRA workloads in a shared GPU environment. In comparative evaluations, Punica showcases a 12-fold increase in throughput for serving multiple LoRA models over existing state-of-the-art LLM serving systems, with a minimal addition of just 2ms latency per token.

How can agents achieve human-like planning and control in an open-world environment with infinite tasks? This paper introduces "JARVIS-1", a sophisticated agent designed for the Minecraft universe, capable of handling multimodal observations (visual and textual instructions) and executing complex tasks. Built on pre-trained multimodal language models, JARVIS-1 converts inputs into actionable plans, guiding goal-conditioned controllers. A key innovation is its multimodal memory, blending pre-trained knowledge with game experiences for enhanced planning. In tests, JARVIS-1 performed exceptionally in over 200 Minecraft tasks, showing a significant improvement in long-horizon tasks like the diamond pickaxe challenge. Its ability to self-improve through lifelong learning demonstrates potential for greater general intelligence and autonomy.

How can transformer architecture, successful in 2D vision tasks, be adapted for 3D vision, especially with the challenges posed by point clouds? This paper introduces TransLO, a novel approach that processes large-scale point clouds efficiently by projecting them onto a 2D surface and utilizing a local transformer with linear complexity. The key components of TransLO include a Window-based Masked transformer with Self Attention (WMSA) for long-range dependency capture, and a Masked Cross-Frame Attention (MCFA) for frame association and pose estimation. A binary mask is proposed to address the sparsity of point clouds. Notably, TransLO is the first transformer-based LiDAR odometry network. It outperforms existing learning-based methods and even bests LOAM in most evaluations on the KITTI odometry dataset, achieving an average rotation RMSE of 0.500°/100m and a translation RMSE of 0.993%.

How can a multimodal assistant be enhanced to better interact with and fulfill real-world tasks? "LLaVA-Plus" addresses this by expanding the capabilities of large multimodal models. It houses a skill repository containing various pre-trained vision and vision-language models, activating the appropriate tools in response to user inputs. Trained on multimodal instruction-following data, LLaVA-Plus adeptly handles tasks involving visual understanding, generation, external knowledge retrieval, and compositions. Empirical evidence shows that LLaVA-Plus not only surpasses its predecessor, LLaVA, in existing capabilities but also introduces new functionalities. A key innovation is its ability to ground image queries directly, maintaining active engagement throughout human-AI interactions, which significantly enhances tool use and opens up novel scenarios.

Featured AI Tools For You

  • Rask AI: Rask AI is a video localization tool with voice cloning that translates and dubs videos into 130+ languages. [Speech AI and Video Editor]

  • Codium: Automated test generation for faster and more confident development. [Coding]

  • Semrush: Semrush is an all-in-one online marketing platform that helps businesses improve their online visibility and grow their traffic. [Marketing]

  • SaneBox*: SaneBox: AI-powered email management that saves you time and brings sanity back to your inbox. Voted Best Productivity Apps for 2023 on PCMag. Sign up today and save $25 on any subscription. [Email and Productivity]

  • Leap: Boost your app with Leap's AI APIs and SDKs for image, model, and text generation, editing, and refinement. [Image Generator]

  • Threado AI: Threado AI is an intelligent sidekick that provides instant support anywhere, trained on your knowledge base and powered by AI. [Customer Support]

  • Adcreative AI*: Boost your advertising and social media game with AdCreative.ai - the ultimate Artificial Intelligence solution. [Marketing and Sales]

  • GPTConsole: GPTConsole makes AI app development easy with Pixie, an agent that automatically creates full-fledged web and mobile apps. [AI Assistant]

  • Retouch4me: Retouch4me's plugins make photo retouching such a breeze, ensuring professional results every time. [Photo Editing]

  • Tugan.ai: Tugan AI turns articles, videos, and sales pages into engaging content for newsletters, social media, and email. [Social Media and Marketing]

