Managed robotics data marketplace

Source robotics video data. Enrich it. Contract verified contributors.

HumanoidLayer connects robotics teams with reviewed data providers, annotators, and QA reviewers, turning raw video and robot logs into licensed, labeled, buyer-ready physical AI datasets.

Built for humanoids, manipulation, teleoperation, egocentric video, RGB-D, robot learning workflows, annotation QA, and procurement-aware data contracts.

Buyer demand

Open structured data briefs by task, modality, format, budget, and license constraints.

Platform QA

Review annotations, source rights, commercial flags, sensitive content, and dataset readiness.

Contributor network

Route collection, annotation, enrichment, and review work to verified contributors.

Dataset owners and annotators can join the contributor network for reviewed collection, labeling, QA, and future revenue-share workflows.

Refinery console
catalog sync active
Search facets

Modality

RGB-D
actions
language

Robot type

humanoid
bimanual
arm

Task

tool use
grasping
household

License

Apache 2.0
CC-BY
mixed

Environment

kitchen
warehouse
lab
8
seed datasets
6
formats
12
task classes

DROID

phase labels recommended

metadata verified

Format

RLDS

License

CC-BY 4.0

Enrichment

ready

ALOHA

bimanual segmentation ready

subset review

Format

LeRobot

License

Apache 2.0

Enrichment

ready

Open X-Embodiment

subset screening

license matrix

Format

RLDS

License

Mixed

Enrichment

ready

Operating model

A three-sided workflow for robotics data supply, annotation, QA, and buyer contracts.

The new HumanoidLayer MVP shows the full supply chain: providers bring data, annotators enrich it, reviewers verify it, and robotics teams buy or request the exact data they need.

Providers upload or collect raw video data

Egocentric video, RGB-D, teleoperation, robot logs, manipulation demos, and source links enter a reviewed supply pipeline.

Annotators enrich inside HumanoidLayer

Object labels, action phases, task metadata, success states, language fields, and format tags are created in the workspace.

Reviewers verify quality and rights

QA review, license checks, source attribution, sensitive-content flags, and version readiness become buyer trust signals.

Robotics teams contract for buyer-ready data

Teams search existing datasets or open data briefs for task-specific collection through the platform.

raw data

Supply intake

Providers and collectors upload source links, large video files, robot logs, and sample clips.

enrichment

Annotation workspace

Annotators add object labels, action phases, task metadata, temporal segments, and instruction fields.

trust layer

QA and licensing

Reviewers validate labels, source rights, commercial posture, sensitive content, and dataset readiness.

contracts

Buyer marketplace

Robotics teams discover verified data products or open briefs for new task-specific collection.

Buyer trust

Built for the questions robotics teams ask before they can use data.

A useful robotics dataset is not just a link. Buyers need provenance, licensing posture, schema clarity, enrichment options, and a credible path from discovery to access.

Provenance before procurement

Every dataset card foregrounds source organization, official links, review status, and attribution requirements.

License clarity by default

Commercial suitability, non-commercial flags, mixed subsets, and link-only states are explicit instead of buried.

Structured access workflows

Buyer CTAs route to access, enrichment, source review, or custom collection requests with clear next steps.

Format normalization

Datasets are compared and delivered around robotics formats such as LeRobot, HDF5, RLDS, WebDataset, and Parquet.

Enrichment scopes

Labels, metadata, segmentation, QA, and conversion are framed as data infrastructure work, not generic annotation labor.

Path when open data fails

Custom collection briefs capture task, embodiment, environment, modality, timeline, and budget signals for real pilots.

The problem

Robotics data is scattered, inconsistent, and hard to use.

Raw data is rarely the limiting asset by itself. Searchability, licensing, format alignment, task metadata, and annotations decide whether a robotics team can actually use it.

Scattered sources

Open robotics datasets live across labs, Hugging Face, GitHub, university websites, and private links.

Inconsistent formats

Teams waste time converting between HDF5, RLDS, LeRobot, WebDataset, TFDS, Parquet, and custom schemas.

Missing annotations

Many datasets lack task labels, object labels, action phases, success/failure tags, scene metadata, license clarity, or QA signals.

Refinery model

Data is the new oil. HumanoidLayer is the refinery.

Raw data alone is not enough. Robotics teams need searchable, licensed, formatted, annotated, and model-ready datasets. HumanoidLayer turns raw robotics data into usable physical AI fuel.

Stage 1

Raw robotics data

Stage 2

Structured dataset catalog

Stage 3

Enriched training-ready data products

Discover

Find relevant public and private robotics datasets by task, embodiment, modality, format, and license.

Verify

Track source attribution, license terms, commercial flags, and subset-level restrictions.

Normalize

Turn inconsistent metadata into comparable dataset cards, manifests, and schema notes.

Annotate

Add labels for objects, actions, task phases, success states, affordances, and scene metadata.

Enrich

Run QA, deduplicate data, generate instructions, and prepare data for model workflows.

Deliver

Package robotics-ready datasets in LeRobot, HDF5, RLDS, WebDataset, Parquet, or custom formats.

Buyer workflow

For robotics and physical AI teams

Search from the task backwards: embodiment, modality, environment, license, format, and enrichment need.

Browse curated open datasets

Request dataset access packages

Request enrichment and additional annotations

Launch custom data collection pilots

Compare datasets by modality and license

Convert datasets into preferred formats

Indexing scope

Dataset categories we are indexing and refining

The initial taxonomy is built for robotics search, procurement, enrichment, and model workflow planning.

Humanoid manipulation

Pretraining and evaluating whole-body manipulation policies.

Modality

Video, RGB-D, proprioception

Example formats

LeRobot, HDF5, RLDS

Bimanual teleoperation

Learning coordinated two-arm tasks and teleop transfer.

Modality

Video, actions, proprioception

Example formats

HDF5, LeRobot

Egocentric human task video

Mining human workflows for task plans and hand-object cues.

Modality

POV video, metadata, camera intrinsics

Example formats

WebDataset, MP4, Parquet

RGB-D scene capture

Scene understanding, manipulation context, and 3D perception.

Modality

RGB-D, depth, camera pose

Example formats

TFDS, RLDS, custom

Robot arm manipulation

Policy training for grasp, place, push, and insert tasks.

Modality

Images, actions, states

Example formats

HDF5, RLDS, LeRobot

Dexterous hand and grasping

Grasp strategy, contact-rich control, and hand-object labels.

Modality

Video, tactile, force/torque

Example formats

HDF5, Parquet

Warehouse and logistics

Picking, packing, sorting, exception handling, and inspection.

Modality

Video, scans, task metadata

Example formats

MP4, WebDataset, Parquet

Kitchen and household tasks

Open-vocabulary household manipulation and long-horizon tasks.

Modality

RGB-D, actions, language

Example formats

RLDS, TFDS, LeRobot

Tool use

Learning tool affordances, sequence logic, and recovery behavior.

Modality

Video, hand-object labels, action phases

Example formats

HDF5, MP4, Parquet

Assembly and maintenance

Task phase segmentation and industrial workflow modeling.

Modality

Egocentric video, labels, metadata

Example formats

WebDataset, Parquet

Simulation and synthetic demos

Benchmarking, sim-to-real, and controlled policy comparison.

Modality

Simulation states, actions, images

Example formats

HDF5, benchmark, custom

Navigation and SLAM

Embodied navigation, mapping, and scene memory evaluation.

Modality

Video, depth, poses, maps

Example formats

ROS bags, custom, Parquet

Starting catalog

Curated starting catalog

Seeded with license-aware robotics datasets that can be searched, reviewed, enriched, or routed into access workflows.

Open Full Catalog

DROID

DROID research consortium · v1.0.0

Commercial use likely permitted

Large-scale in-the-wild robot manipulation dataset.

Type

In-the-wild robot manipulation

Size

76K+ trajectories, 350h

Format

TFDS / RLDS / LeRobot

License

CC-BY 4.0 or source-verified

RGB-D
proprioception
actions
language
manipulation
household
teleoperation
Attribution required
Source link placeholder until final verification
Enrichment recommended
0 files76K+ trajectories, 350hNo rating yet · 0 comments

BridgeData V2

UC Berkeley · v1.0.0

Commercial use likely permitted

Low-cost robot manipulation dataset.

Type

Low-cost robot manipulation

Size

60K trajectories

Format

TFDS / raw

License

CC-BY 4.0

RGB-D
actions
proprioception
language
manipulation
household
grasping
Attribution required
Source link placeholder
Enrichment available
0 files60K trajectoriesNo rating yet · 0 comments

Open X-Embodiment

Open X-Embodiment collaboration · v1.0.0

Check subset license

Cross-robot dataset across many robot embodiments.

Type

Cross-robot embodied dataset

Size

1M+ episodes, 22 robot types, 500+ skills

Format

RLDS

License

Mixed

RGB
actions
proprioception
language
manipulation
grasping
household
Check subset license
Link-only until subset licensing is verified
Enrichment custom review
0 files1M+ episodes, 22 robot types, 500+ skillsNo rating yet · 0 comments

ALOHA

ALOHA project community · v1.0.0

Commercial use likely permitted

Bimanual teleoperation and mobile manipulation datasets.

Type

Bimanual teleoperation

Size

Varies by subset

Format

HDF5 / LeRobot

License

Apache 2.0 for selected LeRobot-hosted subsets

video
actions
proprioception
bimanual
teleoperation
manipulation
Non-commercial subsets excluded
Source link placeholder
Enrichment available
0 filesVaries by subsetNo rating yet · 0 comments

LIBERO

LIBERO research project · v1.0.0

Link-only until verified

Lifelong robot learning benchmark.

Type

Lifelong robot learning benchmark

Size

130 tasks, 65K demos

Format

benchmark / simulation / HDF5

License

Open benchmark, verify before mirroring

simulation
images
actions
states
manipulation
benchmarking
household
Link-only until verified
Link-only until license review is complete
Enrichment limited
0 files130 tasks, 65K demosNo rating yet · 0 comments

RoboNet

RoboNet project · v1.0.0

Link-only until verified

Multi-robot manipulation dataset.

Type

Multi-robot manipulation

Size

15M frames, 7 robot platforms

Format

custom

License

Verify before mirroring

video
actions
manipulation
grasping
Link-only until verified
Link-only until verified
Enrichment custom review
0 files15M frames, 7 robot platformsNo rating yet · 0 comments

RoboMimic / MimicGen

RoboMimic and MimicGen communities · v1.0.0

Commercial use likely permitted

Imitation learning framework and generated demonstration datasets.

Type

Imitation learning and generated demos

Size

50K+ demos

Format

HDF5

License

MIT for framework / verify dataset subset

simulation
actions
images
states
manipulation
benchmarking
assembly
Check subset license
Source link placeholder
Enrichment available
0 files50K+ demosNo rating yet · 0 comments

Egocentric-100K

Egocentric data project · v1.0.0

Commercial use likely permitted

Large-scale egocentric manual labor video dataset.

Type

Egocentric manual labor video

Size

100K+ hours, 10.8B frames

Format

WebDataset / MP4

License

Apache 2.0

video
metadata
camera intrinsics
egocentric
tool use
warehouse
Attribution required
Source link placeholder
Enrichment recommended
0 files100K+ hours, 10.8B framesNo rating yet · 0 comments
License and ownership notice

HumanoidLayer does not claim ownership of third-party open datasets. We index, curate, normalize metadata, and provide access workflows according to each dataset's license. Some datasets may be link-only until licensing is verified. Commercial use depends on the original license and subset restrictions.

Dataset enrichment

Turn existing datasets into higher-value training assets

HumanoidLayer can enrich public or private datasets with additional labels, metadata, and structure so they become easier to train on, evaluate, and compare.

Request Dataset Enrichment

Metadata cleanup

Normalizes source, license, task, robot, modality, and environment metadata.

Action and object labeling

Adds action verbs, object categories, tool references, and interaction tags.

Temporal segmentation

Splits long demonstrations into task phases, attempts, recoveries, and outcomes.

Language instruction generation

Creates concise task instructions and natural-language episode descriptions.

Format conversion

Packages datasets into LeRobot, HDF5, RLDS, WebDataset, Parquet, or custom schemas.

QA and validation

Flags duplicates, corrupt files, missing metadata, low-quality clips, and schema drift.

Custom collection

When open data is not enough

Robotics teams can request custom data collection programs for specific tasks, environments, embodiments, and modalities.

Kitchen manipulation pilot

Pick-and-place sprint

Tool-use demonstration set

Warehouse exception handling

Egocentric inspection workflow

Bimanual teleoperation capture

Hand-centric grasp data

Request Custom Collection
Contributor network

A reviewed supply network behind the buyer platform

Dataset owners and collectors can submit robotics-relevant data for review, but contributor acquisition stays downstream of buyer trust, license clarity, and dataset quality.

Qualified submissions are reviewed before listing or bonus eligibility.

The early review bonus is a limited supply-growth mechanic, not the product promise. Availability may vary by country, quality, originality, licensing, and verification status.

Submit a Dataset

Submit existing datasets

Add metadata and license information

Join future paid data collection programs

Help enrich datasets with annotations

Build reputation as a verified robotics data contributor

How it works

Two workflows, one structured data layer for robotics.

For buyers

1

Search or request

2

Select dataset or enrichment need

3

Review license and format

4

Receive structured access or proposal

For contributors

1

Submit dataset

2

Verification and license review

3

Metadata normalization

4

Listing, enrichment, or future paid opportunities

Quality and trust

Built for data that can actually be used

HumanoidLayer treats metadata, licensing, QA, and sensitive-content flags as part of the product, not a footnote.

license checks

source attribution

format metadata

modality tags

task taxonomy

annotation QA

contributor verification

dataset versioning

non-commercial restriction flags

PII and sensitive content warnings

Start a real robotics data conversation.

Bring the task, embodiment, modality, target format, and license constraints. HumanoidLayer will help identify usable datasets, enrichment paths, or custom collection options.

Buyer workflow

Find, license-screen, package, enrich, normalize, or source robotics data for model workflows.

Contributor intake

Submit original robotics-relevant datasets for review and future verified programs.