This directory contains the JSONL datasets for fine-tuning Mistral models with Audierne2026 documentation.
dataset_train.jsonl - Training dataset (90% of data)dataset_val.jsonl - Validation dataset (10% of data)dataset_metadata.json - Dataset statistics and metadataEach line in the JSONL files follows Mistral’s fine-tuning format:
{
"messages": [
{"role": "system", "content": "Tu es O Capistaine..."},
{"role": "user", "content": "Question about Audierne2026"},
{"role": "assistant", "content": "Factual answer with sources"}
]
}
# From repository root
python scripts/prepare_mistral_dataset.py
# With custom options
python scripts/prepare_mistral_dataset.py --split 0.85 --output data/mistral/custom.jsonl
mistral files upload data/mistral/dataset_train.jsonl
mistral files upload data/mistral/dataset_val.jsonl
curl -X POST "https://api.mistral.ai/v1/files" \
-H "Authorization: Bearer $MISTRAL_API_KEY" \
-F "file=@data/mistral/dataset_train.jsonl" \
-F "purpose=fine-tune"
Trigger the “Prepare Mistral Dataset” workflow with upload_to_mistral: true.
The dataset is generated from:
docs/*/README.md)docs/*/contributions/*.md)Add MISTRAL_API_KEY to your repository secrets for automated uploads.