3rd Workshop on Urban Scene Modeling: Structured, Semantic, and Synthetic 3D Habitats

3rd Workshop on Urban Scene Modeling — June 3, Full Day, Mile High 3B

Structured, Semantic, and Synthetic 3D Habitats (USM3D) - CVPR 2026

The 3rd Urban Scene Modeling (USM3D) Workshop at CVPR 2026 focuses on methods that reconstruct, structure, and semantically organize real-world built environments from heterogeneous visual and 3D data. We emphasize scene representations that move beyond raw point clouds and meshes toward high-level, editable, and task-ready models.

Following the success of the 2024 and 2025 editions, USM3D 2026 aims to bridge state-of-the-art 3D scene modeling with structured, semantic 3D reconstruction by bringing together researchers across photogrammetry, computer vision, generative models, learned representations, and computer graphics. The workshop includes invited talks, a peer-reviewed paper track, and public challenges on large-scale urban datasets to support benchmarking, reproducible pipelines, and integration across vision, graphics, photogrammetry, and 3D learning.

News

June 10, 2026: Archive link with all years of S23DR results and writeups.
June 3, 2026: Challenge winners announced! S23DR winners | Building3D winners
May 18, 2026: Room number available: Mile High 3B.
May 4, 2026: Accepted papers list is available on this website.
March 13, 2026: Competitions started! 🎉
December 22, 2025: 2026 proposal accepted! 🎉

Workshop Schedule

09:00am - 09:10am — Welcome and introduction
09:10am - 09:50am — Keynote 1: Matthias Nießner
09:50am - 10:30am — Building3D Challenge (Winner talks)
10:30am - 10:45am — Coffee break
10:45am - 11:25am — Keynote 2: Florent Lafarge
Title: Two Decades of 3D Building Reconstruction: Paradigms, Progress, and Prospects
11:25am - 12:05pm — S23DR Challenge (Winner talks)
12:05pm - 13:00pm — Lunch
13:00pm - 13:40pm — Keynote 3: Marc Pollefeys
13:40pm - 15:10pm — Paper Oral Presentation
15:10pm - 15:25pm — Coffee break
15:25pm - 16:05pm — Keynote 4: Vasileios Balntas
16:05pm - 16:45pm — Keynote 5: Angel Xuan Chang
16:45pm - 17:25pm — Keynote 6: Daniel Barath
17:25pm - 17:55pm — Collaboration and Discussion Session (≈ interactive panel)
17:55pm - 18:00pm — Closing Remarks

Keynote Speakers

Florent Lafarge

Florent Lafarge: Researcher, Inria

Florent Lafarge is a researcher at Inria in the Titane research group. His research spans computer vision, geometry processing, and remote sensing. He works on the analysis and geometric modeling of 3D environments from physical measurements, typically multi-view imagery and laser scanning. His favorite topics include surface reconstruction and approximation, city modeling, piecewise-planar geometry, and spatial point processes.

Matthias Nießner

Matthias Nießner: Professor, Technical University of Munich; Founder, Synthesia; SpAItial

Matthias Nießner is a Professor at the Technical University of Munich, where he leads the Visual Computing Lab. His work spans computer vision, graphics, and machine learning, focusing on 3D reconstruction, scene understanding, and AI-based video synthesis. He has authored 150+ papers and co-founded Synthesia Inc. and SpAItial to develop foundational models for 3D world understanding and generation.

Angel Xuan Chang

Angel Xuan Chang: Associate Professor, Simon Fraser University

Angel Xuan Chang's research focuses on connecting language to 3D representations of shapes and scenes and grounding language for embodied agents in indoor environments. She has worked on methods for synthesizing 3D scenes and shapes from natural language, and on datasets for 3D scene understanding.

Vasileios Balntas

Vasileios Balntas: Honorary Research Associate, Imperial College London

Vasileios Balntas is an Honorary Research Associate at Imperial College London. Previously he was Senior Research Science Manager at Meta Reality Labs Research and Head of Research at Scape Technologies. His work spans 3D vision, localization, and scene understanding.At Meta he contributed to SceneScript, a research method that represents and infers scene geometry with an autoregressive structured language model. SceneScript takes egocentric images or point clouds (for example from Project Aria), encodes them into a latent representation of the physical space, and decodes that into a compact parametric scene description—similar in spirit to CAD—which can be interpreted as 3D layout and objects (walls, doors, windows, and object structure). The approach uses end-to-end learning and was trained on large-scale synthetic indoor environments aligned with Project Aria sensing.

Marc Pollefeys

Marc Pollefeys: Professor, ETH Zurich

Marc Pollefeys is a full professor in the Dept. of Computer Science of ETH Zurich since 2007 where he leads the Computer Vision and Geometry lab. He is also the director of the Microsoft Spatial AI Lab in Zurich, heading a team of scientists working on spatial perception algorithms for AI assistants and robotics. He was previously associated with the Dept. of Computer Science of the University of North Carolina at Chapel Hill where he started as an assistant professor in 2002 and became an associate professor in 2005. Before this he was a postdoctoral researcher at the Katholieke Universiteit Leuven in Belgium, where he also received his M.S. and Ph.D. degrees in 1994 and 1999, respectively. His main area of research is computer vision, but he is also active in robotics, machine learning and computer graphics. One of his main research goals is to develop flexible approaches to capture visual representations of real world objects, scenes and events. Dr. Pollefeys has received several prizes for his research, including a Marr prize, an NSF CAREER award, a Packard Fellowship and a European Research Council Starting Grant. He is the author or co-author of more than 300 peer-reviewed publications. He was the General Chair of ICCV 2019 and ECCV 2014 and Program Co-Chair for CVPR 2009. Prof. Pollefeys has served on the Editorial Board of the IEEE Transactions on Pattern Analysis and Machine Intelligence, the International Journal of Computer Vision and Foundations and Trends in Computer Graphics and Computer Vision. Several of Prof. Poll United States. He is a fellow of the IEEE and ACM.

Daniel Barath

Daniel Barath: Senior researcher of the Computer Vision and Geometry Group, ETH Zürich

Daniel Barath was a member of the Visual Recognition Group, FEE, Czech Technical University, Prague, Czech Republic, and the Machine Perception Research Laboratory at the Institute for Computer Science and Control (HUN-REN SZTAKI), Budapest, Hungary until 2021. Currently, he is a senior researcher of the Computer Vision and Geometry Group at ETH Zürich and a visiting researcher at Google in the Semantic Perception Group. His research interests are robust model estimation, minimal methods, scene reconstruction and understanding in computer vision. He has co-organized tutorials on various topics at CVPR 2020, 2022, ICPR 2020, ICCV 2023, and 3DV 2024.

Accepted Papers

CVPRW Proceedings

City-Mesh3R: Simulation-Ready City-Scale 3D Mesh Reconstruction from Multi-View Images
Sayan Paul; Sourav Ghosh; Siddharth Katageri; Soumyadip Maity; Sanjana Sinha; Brojeshwar Bhowmick
DALES 2: A Renovated Aerial LiDAR Benchmark for 3D Scene Understanding
Moussa Bendjilali; Claire Peyran; Kaaviya Velumani; Antoine Mauri; Nicola Luminari; Pierre Alliez
EditSSC: Toward Editable Semantic Occupancy Scenes with Unconditional Diffusion Models
Fatima BALDE; Raoul de Charette; Alexandre Boulch
GeoPriorPC: Nadir-view to 3D Point Cloud Reconstruction for buildings via Two-Stage Diffusion Priors
Youssef Korny; Sunghwan Yoo; Daniel Panangian; Ksenia Bittner; Andreas Wichmann; Gunho Sohn
GS4City: Hierarchical Semantic Gaussian Splatting via City-Model Priors
Qilin Zhang; Jinyu Zhu; Olaf Wysocki; Benjamin Busam; Boris Jutzi
Text Reconstruction in 3D Scenes, An Empirical Study of Gaussian Splatting vs. Neural Radiance Fields
Agastya Todi; Pasan Gunawardena; Zhengming Yu; Wenping Wang

Non-archival

MeshSplatting: Differentiable Rendering with Opaque Meshes
Jan Held
Smaller and Faster 3DGS via Post-Training Dictionary Learning
Jiarong Gong; Jonas Unger; Ehsan Miandji
SynGlSat: Geographically Equitable Synthesis of Global Satellite Imagery
Umur Ciftci; Liam Melchior; Ilke Demir

Call for Papers

We invite submissions of original research related to structured, semantic, and synthetic 3D reconstruction and modeling of human environments. Topics of interest include, but are not limited to:

Structured 3D reconstruction/modeling of human environments from sparse, noisy, or partial point clouds and images
Semantic, instance, and panoptic segmentation and parsing of 3D point clouds and images
Fusion of images and point clouds to improve structure of human-centric 3D scene modeling
Structured representations of 3D scenes (e.g., CAD, B-Rep, wireframe, procedural models)
Learning priors for structured 3D modeling and structural consistency
Generative models for image generation and realistic textures
Intersection of world models and 3D reconstruction for realistic generation with sparse 3D
Multiview 3D matching and registration for complex spaces
Pose estimation and structured 3D recovery from sparse image sets
Differentiable rendering and occlusion reasoning in human environments
Cross-disciplinary influence of 3D representations for built environments
Benchmarks and datasets for large-scale 3D modeling of human environments

We will accept submissions on two tracks: extended abstracts (≤4 pages) and full papers (≤8 pages) in the standard CVPR format. Accepted submissions will be presented as posters, and some will be selected for spotlight talks.

Where to Submit

Submission site: https://cmt3.research.microsoft.com/USM2026

Important Dates

Paper submission deadline: March 24, 2026 (Anywhere on Earth)
Notification to authors: April 1, 2026
Camera-ready deadline (to USM3D): April 8, 2026

Challenges

S23DR Challenge (HoHo)

The S23DR challenge involves recovering 3D models with structural details from multi-view images captured at ground level via mobile devices. The HoHo dataset consists of 26k anonymized US house exteriors with structured annotations, wireframes, sparse point clouds, camera poses, and derived semantic signals. Raw images are not released due to licensing and privacy constraints.

Evaluation

Participants output a spatial graph for each challenge, with edge semantics defined by neighboring faces. As in 2024 and 2025, evaluation takes place on a private HuggingFace server using separate test sets for public and private leaderboards. The Hybrid Structure Score metric will be used for evaluation.

Important Dates

March 13, 2026: Dataset release and competition start
May 25, 2026: Submission deadline
Winners announced: At workshop
Prize fund: $12 000

Dataset: HoHo2026

Competition: S23DR 2026

Building3D Challenge

We are delighted to announce that the Building3D Challenge, held as part of the 3rd USM3D Workshop, will feature a newly released dataset—BuildingWorld. BuildingWorld is a comprehensive and structured 3D building dataset designed to bridge the gap in stylistic diversity. It encompasses buildings from geographically and architecturally diverse regions—including North America, Europe, Asia, Africa, and Oceania—offering a globally representative dataset for urban-scale foundation modeling and analysis. Specifically, BuildingWorld provides about five million LOD2 building models collected from diverse sources, accompanied by real and simulated airborne LiDAR point clouds.

In this challenge, participants are expected to train or build their models using the BuildingWorld dataset. All submissions must be made through the Hugging Face platform. The total prize pool remains at $2 000, and only submissions that outperform the baseline method will be considered for monetary awards.

Important Dates

March 13, 2026: Dataset release and competition start
May 25, 2026: Submission deadline
Winners announced: At workshop

Datasets: BuildingWorld and HuggingFace BuildingWorld

Competition website: Building3D Challenge 2026

Sponsorships

We are actively recruiting sponsors (including corporate and institutional partners). If you are interested in sponsoring USM3D 2026, please email usm3d@jackml.com.

Organizers

Ruisheng
Wang Professor
Shenzhen University

Professor at Shenzhen University with research in photogrammetry and computer vision, focused on large-scale urban modeling. Recipient of ISPRS Samuel Gamble Award and multiple industry awards.

Jack
Langerman Applied Researcher
Apple

Applied researcher at Apple focused on interpretability, steerability, and alignment. Previously led structured geometry research at Hover and founded the Generative Machine Learning Group.

Dmytro
Mishkin HOVER Inc. / FEE, Czech Technical University in Prague

Researcher at HOVER and Czech Technical University in Prague.

Ilke
Demir Founder & CEO
Cauth AI

Founder and CEO of Cauth AI, previously at Intel Labs. Her work spans proceduralization of 3D data, trusted media, and generative research in graphics and vision.

Tolga
Birdal Assistant Professor
Imperial College London

UKRI Future Leaders Fellow researching geometric machine learning and 3D computer vision, with a focus on geometric inference and learning.

Sean (Xiang)
Ma Head of Research
Amazon Web Services

Head of research and senior applied science manager at AWS, with extensive experience in mapping, localization, and autonomous driving.

Yang
Wang Associate Professor
Concordia University

Associate professor at Concordia University with prior roles at University of Manitoba and Huawei Canada as chief scientist in computer vision.

Shangfeng
Huang Researcher
University of Calgary

Researcher focusing on 3D building reconstruction from aerial point clouds for digital twins, with expertise in wireframe and B-rep reconstruction.

Yuzhong
Huang Senior CV Engineer
HOVER Inc.

Senior computer vision engineer at HOVER with research interests in CAD reconstruction from sensor data and images. Co-organized USM3D 2025.

Sunbelt Computer Software

PL/B Language Development and Support

3rd Workshop on Urban Scene Modeling — June 3, Full Day, Mile High 3B

Structured, Semantic, and Synthetic 3D Habitats (USM3D) - CVPR 2026

News

Workshop Schedule

Keynote Speakers

Florent Lafarge

Florent Lafarge: Researcher, Inria

Matthias Nießner

Matthias Nießner: Professor, Technical University of Munich; Founder, Synthesia; SpAItial

Angel Xuan Chang

Angel Xuan Chang: Associate Professor, Simon Fraser University

Vasileios Balntas

Vasileios Balntas: Honorary Research Associate, Imperial College London

Marc Pollefeys

Marc Pollefeys: Professor, ETH Zurich

Daniel Barath

Daniel Barath: Senior researcher of the Computer Vision and Geometry Group, ETH Zürich

Accepted Papers

CVPRW Proceedings

Non-archival

Call for Papers

Where to Submit

Important Dates

Challenges

S23DR Challenge (HoHo)

Evaluation

Important Dates

Building3D Challenge

Important Dates

Sponsorships

Organizers

Links