Skip to content

DEV Community

Jimmy Guerrero for Voxel51

Posted on Nov 22, 2024

ECCV 2024 - Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models

#computervision #ai #datascience #machinelearning

In this talk, I will introduce our recent work on open-vocabulary 3D semantic understanding. We propose a novel method, namely Diff2Scene, which leverages frozen representations from text-image generative models, for open-vocabulary 3D semantic segmentation and visual grounding tasks. Diff2Scene gets rid of any labeled 3D data and effectively identifies objects, appearances, locations and their compositions in 3D scenes.

ECCV 2024 Paper: Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models

About the Speaker

Xiaoyu Zhu is a Ph.D. student at Language Technologies Institute, School of Computer Science, Carnegie Mellon University. Her research interest is computer vision, multimodal learning, and generative models.

Top comments (0)

Subscribe

Some comments may only be visible to logged-in visitors. Sign in to view all comments.

Read next

Part 11: Building Your Own AI - Introduction to Generative Models: GANs and VAEs

Trix Cyrus - Dec 17 '24

Unlocking Quickpix AI's Potential: Features, Pricing, and Performance Review

Techpulse - Nov 16 '24

Serverless GPU Computing: A Technical Deep Dive into CloudRun

Shannon Lal - Dec 19 '24

The Top 8 ML Model Monitoring Tools

Or Hillel - Nov 19 '24