To the source of intelligence.

SCROLL TO EXPLORE

AI training data unlike any other.

License-free, high-quality data you can use with confidence.

Training data you won't find anywhere else, collected fresh from consenting people. Every datapoint carries a rights-cleared trail, ready to train on with confidence.

Book a demo

AI use cases we power

From autonomous driving to physical AI and LLMs — we collect the rare data frontier AI development demands.

Autonomous driving

Roads, signage, vehicles, pedestrians — captured across weather and time of day.

Physical AI & robotics

Household tasks, manipulation, assembly — first-person video with sensor logs.

LLMs & conversational AI

Natural conversation, prompts, and long-form documents.

Speech recognition & TTS

Diverse speakers, dialects, expressions — captured at studio quality.

Vision-language & multimodal

Paired image-caption and video-transcript data to accelerate VLM training.

Image & video models

Curated data for specific scenes, styles, and subjects.

A verified contributor network

We deliver data unavailable elsewhere through a verified contributor network nationwide. Need something specific? We can also collect to spec.

Become a contributor

Why Minamoto

License confidence, quality, flexibility — the data infrastructure to take AI from research to production.

License-free, commercially safe

Rights-cleared datasets with commercial use covered. Reduce legal risk and move forward with confidence.

Bespoke data design

We understand your model architecture, domain, and accuracy targets — then design data specifically for it.

Multimodal coverage

Image, video, voice, and text. We deliver across the modalities modern AI training demands.

Consent and traceable provenance

First-party collection from a verified contributor network, with a rights-cleared trail on every datapoint.

FAQ

Common questions from teams evaluating Minamoto.

What does "license-free" mean in practice?

Every datapoint comes with consent from the contributor covering AI model R&D, training, evaluation, output generation, and improvement of those models. Specific terms — redistribution, exclusivity, and similar — are set in your data license agreement.

Can APPI-compliant data be used to train AI models outside Japan?

Yes. Contributors consent to cross-border provision, and we follow APPI Article 27 (third-party provision) and Article 28 (provision to recipients outside Japan). See Sections 4 and 5 of our Privacy Policy for details.

What's your minimum order size and price range?

Both off-the-shelf and custom collection are quoted per project. Modality, scale, quality bar, and timeline materially affect price. Book a demo with your requirements and we'll come back with a quote.

How quickly can you complete a custom collection?

It depends on modality, scale, and quality bar. As a rough guide, a few hundred hours of audio typically lands in 4–8 weeks. We commit to a firm timeline after scoping the project with you.

How is this different from existing open datasets?

Open datasets are usually "what was easy to collect" — provenance, consent, and licensing are often unclear. Minamoto is collected to spec, with per-contributor consent, capture metadata, and third-party provision records following the data end-to-end.

If a contributor requests deletion later, what happens to models already trained on the data?

We delete the data from our systems within 30 days and notify any buyers we've provided it to. Removing the influence of a specific datapoint from a trained model's weights is generally not feasible with current technology — a fact we make explicit in both the Contributor Agreement and your data license. See Section 11 of our Privacy Policy.

Let's talk about your data needs

Tell us what your model needs to learn. We'll tell you how we can collect it in Japan.