HomeDatasetsnvidia/PhysicalAI-WorldModel-Synthetic-Autonomous-Driving-Scenarios
P

nvidia/PhysicalAI-WorldModel-Synthetic-Autonomous-Driving-Scenarios

General · nvidia· 158.9K
other 7.5 TB language:enlicense:othersize_categories:100K<n<1Mmodality:videoregion:us

Dataset Description: PhysicalAI-WorldModel-Synthetic-Autonomous-Driving-Scenarios is a large-scale synthetic video dataset of autonomous-driving scenes generated with NVIDIA's internal Omniverse simulation platform. Each clip is a temporally consistent multi-camera surround capture of one ego vehicle and surrounding traffic participants, paired with per-camera VLM captions. The dataset is designe

Open in MLForge Sign up free Desktop app
# download instantly
mlforge datasets pull nvidia/PhysicalAI-WorldModel-Synthetic-Autonomous-Driving-Scenarios

Dataset details

Task
General
Language
en
License
other
Size
7.5 TB
Creator
nvidia
Downloads
158.9K
Source
huggingface_datasets
Updated
2026-06-09

About nvidia/PhysicalAI-WorldModel-Synthetic-Autonomous-Driving-Scenarios

Dataset Description: PhysicalAI-WorldModel-Synthetic-Autonomous-Driving-Scenarios is a large-scale synthetic video dataset of autonomous-driving scenes generated with NVIDIA's internal Omniverse simulation platform. Each clip is a temporally consistent multi-camera surround capture of one ego vehicle and surrounding traffic participants, paired with per-camera VLM captions. The dataset is designed to fill gaps in real-world driving data along two axes: (1) targeted long-tail coverage of safety-critical scenarios — emergency-vehicle interactions, nudging around parked obstacles, cut-ins from adjacent lanes, weather-degraded visibility, and pedestrian crossings with non-standard trajectories — authored declaratively from natural-language prompts via a Scenario Agent; and (2) environment variation, where each authored scenario is expanded into deterministic permutations over time of day, cloud coverage, visibility, road material, and vehicle and pedestrian asset choices, so the same underlying interaction is observed under varied environment conditions. As a fully synthetic dataset, PhysicalAI-WorldModel-Synthetic-Autonomous-Driving-Scenarios exhibits a sim-to-real appearance gap relative to real driving footage. A subset of authored agent behaviors may also appear unnatural (e.g., emergency vehicles cutting through dense traffic when open space is available nearby); refining behavior priors during scenario authoring is an ongoing area of work. Users training safety-critical autonomy systems should pair this dataset with real-world driving data and validate behaviors in real-