NVIDIA-Reconstruct 3D Scenes from a Single Video and Google's-MedGemma 27B Multimodal and MedSigLIP Released

Cool AI Research from NVIDIA AI

NVIDIA AI Released DiffusionRenderer: An AI Model for Editable, Photorealistic 3D Scenes from a Single Video: In a groundbreaking new paper, researchers at NVIDIA, University of Toronto, Vector Institute and the University of Illinois Urbana-Champaign have unveiled an AI framework that can take a single video and reconstruct editable 3D scene representations—geometry, material, and lighting (G-buffers)—and then use these to synthesize photorealistic renderings under new lighting conditions. Unlike traditional physically based rendering (PBR), which requires accurate 3D geometry and light transport simulation, DiffusionRenderer approximates these effects through data-driven neural models. The system includes two core components: a video-based inverse renderer trained on synthetic data that generalizes well to real-world videos, and a forward renderer trained jointly on synthetic and auto-labeled real-world videos using environment-aware conditioning.

What sets DiffusionRenderer apart is its ability to perform practical editing tasks such as relighting, material tweaking, and realistic object insertion using only a single video as input. It surpasses existing neural rendering baselines on multiple benchmarks—producing high-fidelity shadows, reflections, and specularities even in complex scenes without requiring explicit 3D reconstruction. The model leverages Stable Video Diffusion as a backbone and is trained with techniques like domain embeddings and LoRA adaptation to bridge synthetic-real domain gaps. This positions it as a compelling alternative to classical PBR and NeRF-based pipelines, especially for real-world applications requiring flexible and efficient scene manipulation..

Why this AI research matters? This research matters because it addresses a longstanding challenge in computer vision and graphics: generating editable, photorealistic 3D scenes from limited real-world input. Traditional rendering pipelines depend on precise 3D geometry and lighting data, which are costly and difficult to obtain. DiffusionRenderer eliminates these constraints by leveraging video diffusion models to jointly solve inverse and forward rendering, enabling tasks like relighting and object insertion from just a single video. This opens up practical, scalable pathways for advanced scene editing in film, virtual production, AR/VR, and design without requiring complex capture setups.

Demo:

Promoted

Other AI News

Salesforce AI’s GTA1 Tops OSWorld: Salesforce AI has released GTA1, a powerful GUI agent that sets a new record on the OSWorld benchmark with a 45.2% success rate—surpassing OpenAI’s CUA. It features test-time planning with a multimodal judge and reinforcement learning-based grounding. Now open-sourced for the community.

Google Open Sourced MedGemma 27B Multimodal and MedSigLIP: Google DeepMind has released MedGemma 27B Multimodal and MedSigLIP—two new open-source models designed for advanced medical reasoning across text and images. With strong performance in diagnosis, report generation, and classification tasks, they offer scalable, privacy-conscious tools for real-world clinical applications.

NVIDIA-Reconstruct 3D Scenes from a Single Video and Google's-MedGemma 27B Multimodal and MedSigLIP Released

Cool AI Research from NVIDIA AI

Promoted

Other AI News

How was today’s email?

Awesome | Decent | Not Great

Keep Reading

AI News