Source robotics video data. Enrich it. Contract verified contributors.
HumanoidLayer connects robotics teams with reviewed data providers, annotators, and QA reviewers, turning raw video and robot logs into licensed, labeled, buyer-ready physical AI datasets.
Built for humanoids, manipulation, teleoperation, egocentric video, RGB-D, robot learning workflows, annotation QA, and procurement-aware data contracts.
Buyer demand
Open structured data briefs by task, modality, format, budget, and license constraints.
Platform QA
Review annotations, source rights, commercial flags, sensitive content, and dataset readiness.
Contributor network
Route collection, annotation, enrichment, and review work to verified contributors.
Dataset owners and annotators can join the contributor network for reviewed collection, labeling, QA, and future revenue-share workflows.
Modality
Robot type
Task
License
Environment
DROID
phase labels recommended
Format
RLDS
License
CC-BY 4.0
Enrichment
ready
ALOHA
bimanual segmentation ready
Format
LeRobot
License
Apache 2.0
Enrichment
ready
Open X-Embodiment
subset screening
Format
RLDS
License
Mixed
Enrichment
ready
A three-sided workflow for robotics data supply, annotation, QA, and buyer contracts.
The new HumanoidLayer MVP shows the full supply chain: providers bring data, annotators enrich it, reviewers verify it, and robotics teams buy or request the exact data they need.
Providers upload or collect raw video data
Egocentric video, RGB-D, teleoperation, robot logs, manipulation demos, and source links enter a reviewed supply pipeline.
Annotators enrich inside HumanoidLayer
Object labels, action phases, task metadata, success states, language fields, and format tags are created in the workspace.
Reviewers verify quality and rights
QA review, license checks, source attribution, sensitive-content flags, and version readiness become buyer trust signals.
Robotics teams contract for buyer-ready data
Teams search existing datasets or open data briefs for task-specific collection through the platform.
raw data
Supply intake
Providers and collectors upload source links, large video files, robot logs, and sample clips.
enrichment
Annotation workspace
Annotators add object labels, action phases, task metadata, temporal segments, and instruction fields.
trust layer
QA and licensing
Reviewers validate labels, source rights, commercial posture, sensitive content, and dataset readiness.
contracts
Buyer marketplace
Robotics teams discover verified data products or open briefs for new task-specific collection.
Built for the questions robotics teams ask before they can use data.
A useful robotics dataset is not just a link. Buyers need provenance, licensing posture, schema clarity, enrichment options, and a credible path from discovery to access.
Provenance before procurement
Every dataset card foregrounds source organization, official links, review status, and attribution requirements.
License clarity by default
Commercial suitability, non-commercial flags, mixed subsets, and link-only states are explicit instead of buried.
Structured access workflows
Buyer CTAs route to access, enrichment, source review, or custom collection requests with clear next steps.
Format normalization
Datasets are compared and delivered around robotics formats such as LeRobot, HDF5, RLDS, WebDataset, and Parquet.
Enrichment scopes
Labels, metadata, segmentation, QA, and conversion are framed as data infrastructure work, not generic annotation labor.
Path when open data fails
Custom collection briefs capture task, embodiment, environment, modality, timeline, and budget signals for real pilots.
Robotics data is scattered, inconsistent, and hard to use.
Raw data is rarely the limiting asset by itself. Searchability, licensing, format alignment, task metadata, and annotations decide whether a robotics team can actually use it.
Scattered sources
Open robotics datasets live across labs, Hugging Face, GitHub, university websites, and private links.
Inconsistent formats
Teams waste time converting between HDF5, RLDS, LeRobot, WebDataset, TFDS, Parquet, and custom schemas.
Missing annotations
Many datasets lack task labels, object labels, action phases, success/failure tags, scene metadata, license clarity, or QA signals.
Data is the new oil. HumanoidLayer is the refinery.
Raw data alone is not enough. Robotics teams need searchable, licensed, formatted, annotated, and model-ready datasets. HumanoidLayer turns raw robotics data into usable physical AI fuel.
Stage 1
Raw robotics data
Stage 2
Structured dataset catalog
Stage 3
Enriched training-ready data products
Discover
Find relevant public and private robotics datasets by task, embodiment, modality, format, and license.
Verify
Track source attribution, license terms, commercial flags, and subset-level restrictions.
Normalize
Turn inconsistent metadata into comparable dataset cards, manifests, and schema notes.
Annotate
Add labels for objects, actions, task phases, success states, affordances, and scene metadata.
Enrich
Run QA, deduplicate data, generate instructions, and prepare data for model workflows.
Deliver
Package robotics-ready datasets in LeRobot, HDF5, RLDS, WebDataset, Parquet, or custom formats.
For robotics and physical AI teams
Search from the task backwards: embodiment, modality, environment, license, format, and enrichment need.
Browse curated open datasets
Request dataset access packages
Request enrichment and additional annotations
Launch custom data collection pilots
Compare datasets by modality and license
Convert datasets into preferred formats
Dataset categories we are indexing and refining
The initial taxonomy is built for robotics search, procurement, enrichment, and model workflow planning.
Humanoid manipulation
Pretraining and evaluating whole-body manipulation policies.
Modality
Video, RGB-D, proprioception
Example formats
LeRobot, HDF5, RLDS
Bimanual teleoperation
Learning coordinated two-arm tasks and teleop transfer.
Modality
Video, actions, proprioception
Example formats
HDF5, LeRobot
Egocentric human task video
Mining human workflows for task plans and hand-object cues.
Modality
POV video, metadata, camera intrinsics
Example formats
WebDataset, MP4, Parquet
RGB-D scene capture
Scene understanding, manipulation context, and 3D perception.
Modality
RGB-D, depth, camera pose
Example formats
TFDS, RLDS, custom
Robot arm manipulation
Policy training for grasp, place, push, and insert tasks.
Modality
Images, actions, states
Example formats
HDF5, RLDS, LeRobot
Dexterous hand and grasping
Grasp strategy, contact-rich control, and hand-object labels.
Modality
Video, tactile, force/torque
Example formats
HDF5, Parquet
Warehouse and logistics
Picking, packing, sorting, exception handling, and inspection.
Modality
Video, scans, task metadata
Example formats
MP4, WebDataset, Parquet
Kitchen and household tasks
Open-vocabulary household manipulation and long-horizon tasks.
Modality
RGB-D, actions, language
Example formats
RLDS, TFDS, LeRobot
Tool use
Learning tool affordances, sequence logic, and recovery behavior.
Modality
Video, hand-object labels, action phases
Example formats
HDF5, MP4, Parquet
Assembly and maintenance
Task phase segmentation and industrial workflow modeling.
Modality
Egocentric video, labels, metadata
Example formats
WebDataset, Parquet
Simulation and synthetic demos
Benchmarking, sim-to-real, and controlled policy comparison.
Modality
Simulation states, actions, images
Example formats
HDF5, benchmark, custom
Navigation and SLAM
Embodied navigation, mapping, and scene memory evaluation.
Modality
Video, depth, poses, maps
Example formats
ROS bags, custom, Parquet
Curated starting catalog
Seeded with license-aware robotics datasets that can be searched, reviewed, enriched, or routed into access workflows.
DROID
DROID research consortium · v1.0.0
Large-scale in-the-wild robot manipulation dataset.
Type
In-the-wild robot manipulation
Size
76K+ trajectories, 350h
Format
TFDS / RLDS / LeRobot
License
CC-BY 4.0 or source-verified
BridgeData V2
UC Berkeley · v1.0.0
Low-cost robot manipulation dataset.
Type
Low-cost robot manipulation
Size
60K trajectories
Format
TFDS / raw
License
CC-BY 4.0
Open X-Embodiment
Open X-Embodiment collaboration · v1.0.0
Cross-robot dataset across many robot embodiments.
Type
Cross-robot embodied dataset
Size
1M+ episodes, 22 robot types, 500+ skills
Format
RLDS
License
Mixed
ALOHA
ALOHA project community · v1.0.0
Bimanual teleoperation and mobile manipulation datasets.
Type
Bimanual teleoperation
Size
Varies by subset
Format
HDF5 / LeRobot
License
Apache 2.0 for selected LeRobot-hosted subsets
LIBERO
LIBERO research project · v1.0.0
Lifelong robot learning benchmark.
Type
Lifelong robot learning benchmark
Size
130 tasks, 65K demos
Format
benchmark / simulation / HDF5
License
Open benchmark, verify before mirroring
RoboNet
RoboNet project · v1.0.0
Multi-robot manipulation dataset.
Type
Multi-robot manipulation
Size
15M frames, 7 robot platforms
Format
custom
License
Verify before mirroring
RoboMimic / MimicGen
RoboMimic and MimicGen communities · v1.0.0
Imitation learning framework and generated demonstration datasets.
Type
Imitation learning and generated demos
Size
50K+ demos
Format
HDF5
License
MIT for framework / verify dataset subset
Egocentric-100K
Egocentric data project · v1.0.0
Large-scale egocentric manual labor video dataset.
Type
Egocentric manual labor video
Size
100K+ hours, 10.8B frames
Format
WebDataset / MP4
License
Apache 2.0
HumanoidLayer does not claim ownership of third-party open datasets. We index, curate, normalize metadata, and provide access workflows according to each dataset's license. Some datasets may be link-only until licensing is verified. Commercial use depends on the original license and subset restrictions.
Turn existing datasets into higher-value training assets
HumanoidLayer can enrich public or private datasets with additional labels, metadata, and structure so they become easier to train on, evaluate, and compare.
Metadata cleanup
Normalizes source, license, task, robot, modality, and environment metadata.
Action and object labeling
Adds action verbs, object categories, tool references, and interaction tags.
Temporal segmentation
Splits long demonstrations into task phases, attempts, recoveries, and outcomes.
Language instruction generation
Creates concise task instructions and natural-language episode descriptions.
Format conversion
Packages datasets into LeRobot, HDF5, RLDS, WebDataset, Parquet, or custom schemas.
QA and validation
Flags duplicates, corrupt files, missing metadata, low-quality clips, and schema drift.
When open data is not enough
Robotics teams can request custom data collection programs for specific tasks, environments, embodiments, and modalities.
Kitchen manipulation pilot
Pick-and-place sprint
Tool-use demonstration set
Warehouse exception handling
Egocentric inspection workflow
Bimanual teleoperation capture
Hand-centric grasp data
A reviewed supply network behind the buyer platform
Dataset owners and collectors can submit robotics-relevant data for review, but contributor acquisition stays downstream of buyer trust, license clarity, and dataset quality.
Qualified submissions are reviewed before listing or bonus eligibility.
The early review bonus is a limited supply-growth mechanic, not the product promise. Availability may vary by country, quality, originality, licensing, and verification status.
Submit existing datasets
Add metadata and license information
Join future paid data collection programs
Help enrich datasets with annotations
Build reputation as a verified robotics data contributor
How it works
Two workflows, one structured data layer for robotics.
For buyers
Search or request
Select dataset or enrichment need
Review license and format
Receive structured access or proposal
For contributors
Submit dataset
Verification and license review
Metadata normalization
Listing, enrichment, or future paid opportunities
Built for data that can actually be used
HumanoidLayer treats metadata, licensing, QA, and sensitive-content flags as part of the product, not a footnote.
license checks
source attribution
format metadata
modality tags
task taxonomy
annotation QA
contributor verification
dataset versioning
non-commercial restriction flags
PII and sensitive content warnings
Start a real robotics data conversation.
Bring the task, embodiment, modality, target format, and license constraints. HumanoidLayer will help identify usable datasets, enrichment paths, or custom collection options.
Buyer workflow
Find, license-screen, package, enrich, normalize, or source robotics data for model workflows.
Contributor intake
Submit original robotics-relevant datasets for review and future verified programs.