3rd Workshop on Urban Scene Modeling — June 3, Full Day, Mile High 3B
Structured, Semantic, and Synthetic 3D Habitats (USM3D) - CVPR 2026
The 3rd Urban Scene Modeling (USM3D) Workshop at CVPR 2026 focuses on methods that reconstruct, structure, and semantically organize real-world built environments from heterogeneous visual and 3D data. We emphasize scene representations that move beyond raw point clouds and meshes toward high-level, editable, and task-ready models.
Following the success of the 2024 and 2025 editions, USM3D 2026 aims to bridge state-of-the-art 3D scene modeling with structured, semantic 3D reconstruction by bringing together researchers across photogrammetry, computer vision, generative models, learned representations, and computer graphics. The workshop includes invited talks, a peer-reviewed paper track, and public challenges on large-scale urban datasets to support benchmarking, reproducible pipelines, and integration across vision, graphics, photogrammetry, and 3D learning.
News
- June 10, 2026: Archive link with all years of S23DR results and writeups.
- June 3, 2026: Challenge winners announced! S23DR winners | Building3D winners
- May 18, 2026: Room number available: Mile High 3B.
- May 4, 2026: Accepted papers list is available on this website.
- March 13, 2026: Competitions started! 🎉
- December 22, 2025: 2026 proposal accepted! 🎉
Workshop Schedule
- 09:00am - 09:10am — Welcome and introduction
- 09:10am - 09:50am — Keynote 1: Matthias Nießner
- 09:50am - 10:30am — Building3D Challenge (Winner talks)
- 10:30am - 10:45am — Coffee break
-
10:45am - 11:25am — Keynote 2: Florent Lafarge
Title: Two Decades of 3D Building Reconstruction: Paradigms, Progress, and Prospects - 11:25am - 12:05pm — S23DR Challenge (Winner talks)
- 12:05pm - 13:00pm — Lunch
- 13:00pm - 13:40pm — Keynote 3: Marc Pollefeys
- 13:40pm - 15:10pm — Paper Oral Presentation
- City-Mesh3R: Simulation-Ready City-Scale 3D Mesh Reconstruction from Multi-View Images
Sayan Paul; Sourav Ghosh; Siddharth Katageri; Soumyadip Maity; Sanjana Sinha; Brojeshwar Bhowmick - DALES 2: A Renovated Aerial LiDAR Benchmark for 3D Scene Understanding
Moussa Bendjilali; Claire Peyran; Kaaviya Velumani; Antoine Mauri; Nicola Luminari; Pierre Alliez - EditSSC: Toward Editable Semantic Occupancy Scenes with Unconditional Diffusion Models
Fatima BALDE; Raoul de Charette; Alexandre Boulch - GeoPriorPC: Nadir-view to 3D Point Cloud Reconstruction for buildings via Two-Stage Diffusion Priors
Youssef Korny; Sunghwan Yoo; Daniel Panangian; Ksenia Bittner; Andreas Wichmann; Gunho Sohn - GS4City: Hierarchical Semantic Gaussian Splatting via City-Model Priors
Qilin Zhang; Jinyu Zhu; Olaf Wysocki; Benjamin Busam; Boris Jutzi - Text Reconstruction in 3D Scenes, An Empirical Study of Gaussian Splatting vs. Neural Radiance Fields
Agastya Todi; Pasan Gunawardena; Zhengming Yu; Wenping Wang - MeshSplatting: Differentiable Rendering with Opaque Meshes
Jan Held - Smaller and Faster 3DGS via Post-Training Dictionary Learning
Jiarong Gong; Jonas Unger; Ehsan Miandji - SynGlSat: Geographically Equitable Synthesis of Global Satellite Imagery
Umur Ciftci; Liam Melchior; Ilke Demir
- City-Mesh3R: Simulation-Ready City-Scale 3D Mesh Reconstruction from Multi-View Images
- 15:10pm - 15:25pm — Coffee break
- 15:25pm - 16:05pm — Keynote 4: Vasileios Balntas
- 16:05pm - 16:45pm — Keynote 5: Angel Xuan Chang
- 16:45pm - 17:25pm — Keynote 6: Daniel Barath
- 17:25pm - 17:55pm — Collaboration and Discussion Session (≈ interactive panel)
- 17:55pm - 18:00pm — Closing Remarks
Keynote Speakers
Florent Lafarge
Florent Lafarge: Researcher, Inria
Florent Lafarge is a researcher at Inria in the Titane research group. His research spans computer vision, geometry processing, and remote sensing. He works on the analysis and geometric modeling of 3D environments from physical measurements, typically multi-view imagery and laser scanning. His favorite topics include surface reconstruction and approximation, city modeling, piecewise-planar geometry, and spatial point processes.
Matthias Nießner
Matthias Nießner: Professor, Technical University of Munich; Founder, Synthesia; SpAItial
Matthias Nießner is a Professor at the Technical University of Munich, where he leads the Visual Computing Lab. His work spans computer vision, graphics, and machine learning, focusing on 3D reconstruction, scene understanding, and AI-based video synthesis. He has authored 150+ papers and co-founded Synthesia Inc. and SpAItial to develop foundational models for 3D world understanding and generation.
Angel Xuan Chang
Angel Xuan Chang: Associate Professor, Simon Fraser University
Angel Xuan Chang's research focuses on connecting language to 3D representations of shapes and scenes and grounding language for embodied agents in indoor environments. She has worked on methods for synthesizing 3D scenes and shapes from natural language, and on datasets for 3D scene understanding.
Vasileios Balntas
Vasileios Balntas: Honorary Research Associate, Imperial College London
Vasileios Balntas is an Honorary Research Associate at Imperial College London. Previously he was Senior Research Science Manager at Meta Reality Labs Research and Head of Research at Scape Technologies. His work spans 3D vision, localization, and scene understanding.At Meta he contributed to SceneScript, a research method that represents and infers scene geometry with an autoregressive structured language model. SceneScript takes egocentric images or point clouds (for example from Project Aria), encodes them into a latent representation of the physical space, and decodes that into a compact parametric scene description—similar in spirit to CAD—which can be interpreted as 3D layout and objects (walls, doors, windows, and object structure). The approach uses end-to-end learning and was trained on large-scale synthetic indoor environments aligned with Project Aria sensing.
Marc Pollefeys
Marc Pollefeys: Professor, ETH Zurich
Marc Pollefeys is a full professor in the Dept. of Computer Science of ETH Zurich since 2007 where he leads the Computer Vision and Geometry lab. He is also the director of the Microsoft Spatial AI Lab in Zurich, heading a team of scientists working on spatial perception algorithms for AI assistants and robotics. He was previously associated with the Dept. of Computer Science of the University of North Carolina at Chapel Hill where he started as an assistant professor in 2002 and became an associate professor in 2005. Before this he was a postdoctoral researcher at the Katholieke Universiteit Leuven in Belgium, where he also received his M.S. and Ph.D. degrees in 1994 and 1999, respectively. His main area of research is computer vision, but he is also active in robotics, machine learning and computer graphics. One of his main research goals is to develop flexible approaches to capture visual representations of real world objects, scenes and events. Dr. Pollefeys has received several prizes for his research, including a Marr prize, an NSF CAREER award, a Packard Fellowship and a European Research Council Starting Grant. He is the author or co-author of more than 300 peer-reviewed publications. He was the General Chair of ICCV 2019 and ECCV 2014 and Program Co-Chair for CVPR 2009. Prof. Pollefeys has served on the Editorial Board of the IEEE Transactions on Pattern Analysis and Machine Intelligence, the International Journal of Computer Vision and Foundations and Trends in Computer Graphics and Computer Vision. Several of Prof. Poll United States. He is a fellow of the IEEE and ACM.
Daniel Barath
Daniel Barath: Senior researcher of the Computer Vision and Geometry Group, ETH Zürich
Daniel Barath was a member of the Visual Recognition Group, FEE, Czech Technical University, Prague, Czech Republic, and the Machine Perception Research Laboratory at the Institute for Computer Science and Control (HUN-REN SZTAKI), Budapest, Hungary until 2021. Currently, he is a senior researcher of the Computer Vision and Geometry Group at ETH Zürich and a visiting researcher at Google in the Semantic Perception Group. His research interests are robust model estimation, minimal methods, scene reconstruction and understanding in computer vision. He has co-organized tutorials on various topics at CVPR 2020, 2022, ICPR 2020, ICCV 2023, and 3DV 2024.
Accepted Papers
CVPRW Proceedings
- City-Mesh3R: Simulation-Ready City-Scale 3D Mesh Reconstruction from Multi-View Images
Sayan Paul; Sourav Ghosh; Siddharth Katageri; Soumyadip Maity; Sanjana Sinha; Brojeshwar Bhowmick - DALES 2: A Renovated Aerial LiDAR Benchmark for 3D Scene Understanding
Moussa Bendjilali; Claire Peyran; Kaaviya Velumani; Antoine Mauri; Nicola Luminari; Pierre Alliez - EditSSC: Toward Editable Semantic Occupancy Scenes with Unconditional Diffusion Models
Fatima BALDE; Raoul de Charette; Alexandre Boulch - GeoPriorPC: Nadir-view to 3D Point Cloud Reconstruction for buildings via Two-Stage Diffusion Priors
Youssef Korny; Sunghwan Yoo; Daniel Panangian; Ksenia Bittner; Andreas Wichmann; Gunho Sohn - GS4City: Hierarchical Semantic Gaussian Splatting via City-Model Priors
Qilin Zhang; Jinyu Zhu; Olaf Wysocki; Benjamin Busam; Boris Jutzi - Text Reconstruction in 3D Scenes, An Empirical Study of Gaussian Splatting vs. Neural Radiance Fields
Agastya Todi; Pasan Gunawardena; Zhengming Yu; Wenping Wang
Non-archival
- MeshSplatting: Differentiable Rendering with Opaque Meshes
Jan Held - Smaller and Faster 3DGS via Post-Training Dictionary Learning
Jiarong Gong; Jonas Unger; Ehsan Miandji - SynGlSat: Geographically Equitable Synthesis of Global Satellite Imagery
Umur Ciftci; Liam Melchior; Ilke Demir
Call for Papers
We invite submissions of original research related to structured, semantic, and synthetic 3D reconstruction and modeling of human environments. Topics of interest include, but are not limited to:
- Structured 3D reconstruction/modeling of human environments from sparse, noisy, or partial point clouds and images
- Semantic, instance, and panoptic segmentation and parsing of 3D point clouds and images
- Fusion of images and point clouds to improve structure of human-centric 3D scene modeling
- Structured representations of 3D scenes (e.g., CAD, B-Rep, wireframe, procedural models)
- Learning priors for structured 3D modeling and structural consistency
- Generative models for image generation and realistic textures
- Intersection of world models and 3D reconstruction for realistic generation with sparse 3D
- Multiview 3D matching and registration for complex spaces
- Pose estimation and structured 3D recovery from sparse image sets
- Differentiable rendering and occlusion reasoning in human environments
- Cross-disciplinary influence of 3D representations for built environments
- Benchmarks and datasets for large-scale 3D modeling of human environments
We will accept submissions on two tracks: extended abstracts (≤4 pages) and full papers (≤8 pages) in the standard CVPR format. Accepted submissions will be presented as posters, and some will be selected for spotlight talks.
Where to Submit
Submission site: https://cmt3.research.microsoft.com/USM2026
Important Dates
- Paper submission deadline: March 24, 2026 (Anywhere on Earth)
- Notification to authors: April 1, 2026
- Camera-ready deadline (to USM3D): April 8, 2026
Challenges
S23DR Challenge (HoHo)
The S23DR challenge involves recovering 3D models with structural details from multi-view images captured at ground level via mobile devices. The HoHo dataset consists of 26k anonymized US house exteriors with structured annotations, wireframes, sparse point clouds, camera poses, and derived semantic signals. Raw images are not released due to licensing and privacy constraints.
Evaluation
Participants output a spatial graph for each challenge, with edge semantics defined by neighboring faces. As in 2024 and 2025, evaluation takes place on a private HuggingFace server using separate test sets for public and private leaderboards. The Hybrid Structure Score metric will be used for evaluation.
Important Dates
March 13, 2026: Dataset release and competition start
May 25, 2026: Submission deadline
Winners announced: At workshop
Prize fund: $12 000
Dataset: HoHo2026
Competition: S23DR 2026
Building3D Challenge
We are delighted to announce that the Building3D Challenge, held as part of the 3rd USM3D Workshop, will feature a newly released dataset—BuildingWorld. BuildingWorld is a comprehensive and structured 3D building dataset designed to bridge the gap in stylistic diversity. It encompasses buildings from geographically and architecturally diverse regions—including North America, Europe, Asia, Africa, and Oceania—offering a globally representative dataset for urban-scale foundation modeling and analysis. Specifically, BuildingWorld provides about five million LOD2 building models collected from diverse sources, accompanied by real and simulated airborne LiDAR point clouds.
In this challenge, participants are expected to train or build their models using the BuildingWorld dataset. All submissions must be made through the Hugging Face platform. The total prize pool remains at $2 000, and only submissions that outperform the baseline method will be considered for monetary awards.
Important Dates
March 13, 2026: Dataset release and competition start
May 25, 2026: Submission deadline
Winners announced: At workshop
Datasets: BuildingWorld and HuggingFace BuildingWorld
Competition website: Building3D Challenge 2026
Sponsorships
We are actively recruiting sponsors (including corporate and institutional partners). If you are interested in sponsoring USM3D 2026, please email usm3d@jackml.com.
Organizers
Ruisheng Wang Professor
Shenzhen University
Professor at Shenzhen University with research in photogrammetry and computer vision, focused on large-scale urban modeling. Recipient of ISPRS Samuel Gamble Award and multiple industry awards.
Jack Langerman Applied Researcher
Apple
Applied researcher at Apple focused on interpretability, steerability, and alignment. Previously led structured geometry research at Hover and founded the Generative Machine Learning Group.
Dmytro Mishkin HOVER Inc. / FEE, Czech Technical University in Prague
Researcher at HOVER and Czech Technical University in Prague.
Ilke Demir Founder & CEO
Cauth AI
Founder and CEO of Cauth AI, previously at Intel Labs. Her work spans proceduralization of 3D data, trusted media, and generative research in graphics and vision.
Tolga Birdal Assistant Professor
Imperial College London
UKRI Future Leaders Fellow researching geometric machine learning and 3D computer vision, with a focus on geometric inference and learning.
Sean (Xiang) Ma Head of Research
Amazon Web Services
Head of research and senior applied science manager at AWS, with extensive experience in mapping, localization, and autonomous driving.
Yang Wang Associate Professor
Concordia University
Associate professor at Concordia University with prior roles at University of Manitoba and Huawei Canada as chief scientist in computer vision.
Shangfeng Huang Researcher
University of Calgary
Researcher focusing on 3D building reconstruction from aerial point clouds for digital twins, with expertise in wireframe and B-rep reconstruction.
Yuzhong Huang Senior CV Engineer
HOVER Inc.
Senior computer vision engineer at HOVER with research interests in CAD reconstruction from sensor data and images. Co-organized USM3D 2025.
Links
This is a CVPR 2026 workshop
The Microsoft CMT service was used for managing the peer-reviewing process for this conference. This service was provided for free by Microsoft and they bore all expenses, including costs for Azure cloud services as well as for software development and support.
