Turn existing datasets into higher-value training assets.
HumanoidLayer enriches public or private robotics datasets with labels, metadata, QA, and format structure so teams can train, evaluate, and compare models faster.
Designed for robot learning
Outputs are scoped around robotics fields: actions, objects, phases, outcomes, sensors, source context, formats, and license notes.
Metadata cleanup
Normalizes source, license, task, robot, modality, and environment metadata.
Input data
Public or private dataset folders, manifests, or links.
Output
Dataset card, schema map, license notes, and searchable metadata.
Best for
Datasets that are valuable but hard to inspect or compare.
Action and object labeling
Adds action verbs, object categories, tool references, and interaction tags.
Input data
Video, RGB-D, or robot episode data.
Output
Frame, clip, or episode-level labels in buyer-preferred schema.
Best for
Manipulation, tool-use, warehouse, and household datasets.
Temporal segmentation
Splits long demonstrations into task phases, attempts, recoveries, and outcomes.
Input data
Continuous videos, teleoperation sessions, or demonstrations.
Output
Segment boundaries, phase labels, success/failure tags, and QA notes.
Best for
Long-horizon tasks and egocentric workflow video.
Language instruction generation
Creates concise task instructions and natural-language episode descriptions.
Input data
Episodes with video, actions, or metadata.
Output
Instruction fields, captions, and task taxonomies ready for VLA workflows.
Best for
Language-conditioned policy training and retrieval.
Format conversion
Packages datasets into LeRobot, HDF5, RLDS, WebDataset, Parquet, or custom schemas.
Input data
Raw assets, TFDS/RLDS, HDF5, MP4, folders, or manifests.
Output
Versioned data pack with schema notes and validation report.
Best for
Teams that need data to match an existing training stack.
QA and validation
Flags duplicates, corrupt files, missing metadata, low-quality clips, and schema drift.
Input data
Dataset pack, manifest, or source archive.
Output
QA report, exclusion list, quality signals, and review notes.
Best for
Commercial delivery, procurement review, and benchmark hygiene.
Custom annotation workflow
Designs a domain-specific labeling workflow for robotics teams.
Input data
Buyer taxonomy, sample data, target model use case, and acceptance criteria.
Output
Annotation protocol, pilot batch, QA rubric, and production estimate.
Best for
New tasks, specialized embodiments, and proprietary data.
Request Dataset Enrichment
Send the dataset, target labels, desired output format, and model workflow context. We will return a pilot scope, QA assumptions, and delivery plan.
Enrichment request
Use this for public datasets in the catalog or private datasets your team already controls.
Need new data instead of enrichment?
When an existing dataset cannot cover the target task, environment, embodiment, or modality, move into a custom collection pilot.