nvidia/PhysicalAI-WorldModel-Synthetic-Autonomous-Driving-Scenarios
Dataset Description: PhysicalAI-WorldModel-Synthetic-Autonomous-Driving-Scenarios is a large-scale synthetic video dataset of autonomous-driving scenes generated with NVIDIA's internal Omniverse simulation platform. Each clip is a temporally consistent multi-camera surround capture of one ego vehicle and surrounding traffic participants, paired with per-camera VLM captions. The dataset is designe
mlforge datasets pull nvidia/PhysicalAI-WorldModel-Synthetic-Autonomous-Driving-Scenarios
Dataset details
About nvidia/PhysicalAI-WorldModel-Synthetic-Autonomous-Driving-Scenarios
Dataset Description: PhysicalAI-WorldModel-Synthetic-Autonomous-Driving-Scenarios is a large-scale synthetic video dataset of autonomous-driving scenes generated with NVIDIA's internal Omniverse simulation platform. Each clip is a temporally consistent multi-camera surround capture of one ego vehicle and surrounding traffic participants, paired with per-camera VLM captions. The dataset is designed to fill gaps in real-world driving data along two axes: (1) targeted long-tail coverage of safety-critical scenarios — emergency-vehicle interactions, nudging around parked obstacles, cut-ins from adjacent lanes, weather-degraded visibility, and pedestrian crossings with non-standard trajectories — authored declaratively from natural-language prompts via a Scenario Agent; and (2) environment variation, where each authored scenario is expanded into deterministic permutations over time of day, cloud coverage, visibility, road material, and vehicle and pedestrian asset choices, so the same underlying interaction is observed under varied environment conditions. As a fully synthetic dataset, PhysicalAI-WorldModel-Synthetic-Autonomous-Driving-Scenarios exhibits a sim-to-real appearance gap relative to real driving footage. A subset of authored agent behaviors may also appear unnatural (e.g., emergency vehicles cutting through dense traffic when open space is available nearby); refining behavior priors during scenario authoring is an ongoing area of work. Users training safety-critical autonomy systems should pair this dataset with real-world driving data and validate behaviors in real-