Minamoto

To the source of intelligence.

SCROLL TO EXPLORE

AI training data unlike any other.

License-free, high-quality data you can use with confidence.

Training data you won't find anywhere else, collected fresh from consenting people. Every datapoint carries a rights-cleared trail, ready to train on with confidence.

Minamoto data sample 1
Minamoto data sample 1 (back)
Minamoto data sample 2
Minamoto data sample 2 (back)
Minamoto data sample 3
Minamoto data sample 3 (back)
Minamoto data sample 4
Minamoto data sample 4 (back)
Minamoto data sample 5
Minamoto data sample 5 (back)
Minamoto data sample 6
Minamoto data sample 6 (back)
Minamoto data sample 7
Minamoto data sample 7 (back)
Minamoto data sample 8
Minamoto data sample 8 (back)
Minamoto data sample 9
Minamoto data sample 9 (back)
Minamoto data sample 10
Minamoto data sample 10 (back)
Minamoto data sample 11
Minamoto data sample 11 (back)
Minamoto data sample 12
Minamoto data sample 12 (back)
Minamoto data sample 13
Minamoto data sample 13 (back)
Minamoto data sample 14
Minamoto data sample 14 (back)
Minamoto data sample 15
Minamoto data sample 15 (back)
Minamoto data sample 16
Minamoto data sample 16 (back)
Minamoto data sample 17
Minamoto data sample 17 (back)
Minamoto data sample 18
Minamoto data sample 18 (back)
Minamoto data sample 19
Minamoto data sample 19 (back)
Minamoto data sample 20
Minamoto data sample 20 (back)

AI use cases we power

From autonomous driving to physical AI and LLMs — we collect the rare data frontier AI development demands.

Autonomous driving

Autonomous driving

Roads, signage, vehicles, pedestrians — captured across weather and time of day.

Physical AI & robotics

Physical AI & robotics

Household tasks, manipulation, assembly — first-person video with sensor logs.

LLMs & conversational AI

LLMs & conversational AI

Natural conversation, prompts, and long-form documents.

Speech recognition & TTS

Speech recognition & TTS

Diverse speakers, dialects, expressions — captured at studio quality.

Vision-language & multimodal

Vision-language & multimodal

Paired image-caption and video-transcript data to accelerate VLM training.

Image & video models

Image & video models

Curated data for specific scenes, styles, and subjects.

A verified contributor network

We deliver data unavailable elsewhere through a verified contributor network nationwide. Need something specific? We can also collect to spec.

Become a contributor

Download on the App StoreGet it on Google Play

Why Minamoto

License confidence, quality, flexibility — the data infrastructure to take AI from research to production.

License-free, commercially safe

Rights-cleared datasets with commercial use covered. Reduce legal risk and move forward with confidence.

Bespoke data design

We understand your model architecture, domain, and accuracy targets — then design data specifically for it.

Multimodal coverage

Image, video, voice, and text. We deliver across the modalities modern AI training demands.

Consent and traceable provenance

First-party collection from a verified contributor network, with a rights-cleared trail on every datapoint.

FAQ

Common questions from teams evaluating Minamoto.

What does "license-free" mean in practice?

Every datapoint comes with consent from the contributor covering AI model R&D, training, evaluation, output generation, and improvement of those models. Specific terms — redistribution, exclusivity, and similar — are set in your data license agreement.

Can APPI-compliant data be used to train AI models outside Japan?

Yes. Contributors consent to cross-border provision, and we follow APPI Article 27 (third-party provision) and Article 28 (provision to recipients outside Japan). See Sections 4 and 5 of our Privacy Policy for details.

What's your minimum order size and price range?

Both off-the-shelf and custom collection are quoted per project. Modality, scale, quality bar, and timeline materially affect price. Book a demo with your requirements and we'll come back with a quote.

How quickly can you complete a custom collection?

It depends on modality, scale, and quality bar. As a rough guide, a few hundred hours of audio typically lands in 4–8 weeks. We commit to a firm timeline after scoping the project with you.

How is this different from existing open datasets?

Open datasets are usually "what was easy to collect" — provenance, consent, and licensing are often unclear. Minamoto is collected to spec, with per-contributor consent, capture metadata, and third-party provision records following the data end-to-end.

If a contributor requests deletion later, what happens to models already trained on the data?

We delete the data from our systems within 30 days and notify any buyers we've provided it to. Removing the influence of a specific datapoint from a trained model's weights is generally not feasible with current technology — a fact we make explicit in both the Contributor Agreement and your data license. See Section 11 of our Privacy Policy.

Let's talk about your data needs

Tell us what your model needs to learn. We'll tell you how we can collect it in Japan.