Skip to content

morrisalp/ConlangCrafter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ConlangCrafter: Constructing Languages with a Multi-Hop LLM Pipeline (ACL 2026 Oral)

Project Page: conlangcrafter.github.io
Paper: arxiv.org/abs/2508.06094
Dataset: huggingface.co/datasets/malper/ConlangCrafter — 64 generated languages

We introduce a fully automated system for constructing languages (conlangs) using large language models. Our multi-stage pipeline creates coherent, diverse artificial languages with their own phonology, grammar, lexicon, and translation capabilities.

Quick Start

  1. Install dependencies:

    pip install -r requirements.txt
    # or: uv sync if using uv
  2. Set up API keys — copy .env.example to .env and add keys for whichever APIs you will use:

  3. Generate a language sketch (default model: gemini-2.5-pro):

    python src/run_pipeline.py
    # or: uv run src/run_pipeline.py

Configuration

Run python src/run_pipeline.py --help to see all options. Key flags:

python src/run_pipeline.py \
    --model gemini-2.5-pro \
    --custom-constraints "The language has only 3 vowels" \
    --temperature 0.8 \
    --qa-disabled        # QA self-refinement loops are on by default; use this to turn it off

To resume a previous run (e.g. starting from grammar after phonology completed):

python src/run_pipeline.py --language-id <id> --steps grammar,lexicon

Supported models are:

  • Google Gemini (e.g., gemini-2.5-pro, gemini-1.5-flash)
  • OpenAI models (e.g., o4-mini, gpt-4o, gpt-5)
  • DeepSeek via Together AI (e.g., deepseek-ai/DeepSeek-R1)

Pregenerated language sketches

You can load pregenerated language sketches from our dataset in this pipeline's format with this script:

python src/load_hf_languages.py

Translation

Translation is not run by default. To translate into a generated language, run the translation step separately. By default it translates the 10 sentences in configs/sentences_default.txt:

python src/run_pipeline.py --language-id <id> --steps translation

To translate a single custom sentence instead:

python src/run_pipeline.py --language-id <id> --steps translation --translation-sentence "Hello, world!"

Pass --translation-sketch-update to feed new vocabulary and grammar rules introduced during translation back into the sketch for each subsequent sentence, expanding the language as translation proceeds (constructive translation).

Improvements

This implementation includes minor improvements to the system used for results from our paper:

  • QA loop: Degenerate outputs (e.g. JSON instead of text) are detected and skipped inline, rather than post-hoc rejection sampling.
  • QA amend prompt: Prompt wording is slightly adjusted for consistency with our system.

Citation

@article{conlangcrafter2025,
    title={ConlangCrafter: Constructing Languages with a Multi-Hop LLM Pipeline},
    author={Morris Alper and Moran Yanuka and Raja Giryes and Ga{\v{s}}per Begu{\v{s}}},
    year={2025},
    eprint={2508.06094},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    url={https://arxiv.org/abs/2508.06094}
}

License

This project is licensed under the MIT License — see the LICENSE file for details.

About

Constructing languages with LLMs, based on the ACL 2026 (Oral) paper: "ConlangCrafter: Constructing Languages with a Multi-Hop LLM Pipeline"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages