Hyphenbox turns raw egocentric/multi-modal data into training-ready datasets for general-purpose robotics.
Frame-accurate action segmentation over long-horizon manipulation. Scene context, object state, and contact sequences, annotated as the model will consume them.
Millimeter-level 3D hand reconstruction from 2D egocentric video. Stable under self-occlusion and close-range object interaction, the regime where off-the-shelf pose fails.
Full-body pose reconstruction from a single egocentric camera. Physics-aware, temporally smooth, with root trajectory, the rig a policy can actually consume.
Capture is scaling exponentially. Training-ready data isn't. Models train on whatever reaches them as usable signal, not on hours of raw footage.
Across real homes, factories, and retail floors, partners are capturing 0+ hours of egocentric video daily.
What ships to a model is a different story. Raw RGB isn't a training-ready dataset. It becomes one only after someone decides, frame by frame, what happened, in what order, with what body, with what contact.
That transformation is the bottleneck. We built the stack that closes it.
You already do the hard part. Fielding head-mounted rigs, recruiting operators, clearing consent, capturing the long tail of real homes, factories, and retail floors. None of that is trivial.
Labs don't train on hours of raw footage. They train on hours of enriched data. Turning one into the other is a research problem: dense action labels, 2D to 3D hand, 2D to 3D body, human-verified, frame by frame.
We sit on the research side of the pipeline. You keep scaling collection. We turn it into the dataset labs train on.
Video, depth, and other signal from real homes, factories, and retail.
→Our proprietary VLM proposes dense action segments and scene context.
→Millimeter 3D hand and full-body pose, lifted from 2D, temporally grounded.
→Every label passes through an expert reviewer. No crowdsourced floor.
→Dense labels · 3D hand · 3D pose · human-verified. Model-consumable.
Frontier robotics labs. Data-collection partners. Anyone whose roadmap is bottlenecked on training-ready data.