
Autonomous driving
Roads, signage, vehicles, pedestrians — captured across weather and time of day.
SCROLL TO EXPLORE
License-free, high-quality data you can use with confidence.
Training data you won't find anywhere else, collected fresh from consenting people. Every datapoint carries a rights-cleared trail, ready to train on with confidence.
From autonomous driving to physical AI and LLMs — we collect the rare data frontier AI development demands.

Roads, signage, vehicles, pedestrians — captured across weather and time of day.

Household tasks, manipulation, assembly — first-person video with sensor logs.

Natural conversation, prompts, and long-form documents.

Diverse speakers, dialects, expressions — captured at studio quality.

Paired image-caption and video-transcript data to accelerate VLM training.

Curated data for specific scenes, styles, and subjects.
License confidence, quality, flexibility — the data infrastructure to take AI from research to production.
Rights-cleared datasets with commercial use covered. Reduce legal risk and move forward with confidence.
We understand your model architecture, domain, and accuracy targets — then design data specifically for it.
Image, video, voice, and text. We deliver across the modalities modern AI training demands.
First-party collection from a verified contributor network, with a rights-cleared trail on every datapoint.
Common questions from teams evaluating Minamoto.
Every datapoint comes with consent from the contributor covering AI model R&D, training, evaluation, output generation, and improvement of those models. Specific terms — redistribution, exclusivity, and similar — are set in your data license agreement.
Yes. Contributors consent to cross-border provision, and we follow APPI Article 27 (third-party provision) and Article 28 (provision to recipients outside Japan). See Sections 4 and 5 of our Privacy Policy for details.
Both off-the-shelf and custom collection are quoted per project. Modality, scale, quality bar, and timeline materially affect price. Book a demo with your requirements and we'll come back with a quote.
It depends on modality, scale, and quality bar. As a rough guide, a few hundred hours of audio typically lands in 4–8 weeks. We commit to a firm timeline after scoping the project with you.
Open datasets are usually "what was easy to collect" — provenance, consent, and licensing are often unclear. Minamoto is collected to spec, with per-contributor consent, capture metadata, and third-party provision records following the data end-to-end.
We delete the data from our systems within 30 days and notify any buyers we've provided it to. Removing the influence of a specific datapoint from a trained model's weights is generally not feasible with current technology — a fact we make explicit in both the Contributor Agreement and your data license. See Section 11 of our Privacy Policy.
Tell us what your model needs to learn. We'll tell you how we can collect it in Japan.