A command-line tool for generating glossaries from ebooks and documents, with language-specific filtering capabilities.
- Multi-format Support: Extract text from EPUB, PDF, and TXT files
- Dictionary Integration: Look up word definitions using the Kaikki.org API
- Language-Specific Filtering: Maintain separate filter lists for different languages
- Markdown Output: Generate glossaries in markdown format
cargo build --release# Basic usage
glost generate book.epub
# Specify language and output file
glost generate --lang Swedish --output swedish_glossary.md book.epub
# Use custom filter file
glost generate --filter my_filters.txt book.epubexport YOUTUBE_API_KEY=<your_api_key>
glost youtube <video_uri>Filter lists allow you to exclude words you already know from the generated glossary.
# Add words to filter (defaults to English)
glost filter add the and it is was were
# Add words for specific language
glost filter add --lang Swedish och att det är
# List all filtered words
glost filter list
# List words for specific language
glost filter list --lang Swedish
# Remove words from filter
glost filter remove --lang English the and
# Clear words for specific language
glost filter clear --lang Swedish
# Clear all filter lists
glost filter clearThe filter file uses a simple format:
- English words:
word(no prefix for backward compatibility) - Other languages:
language:word - Comments: Lines starting with
#
Example:
# Filter list - Format: language:word or just word (defaults to English)
and
is
the
Swedish:och
Swedish:att
Swedish:det
- Afrikaans
- Dutch
- English
- French
- German
- Italian
- Japanese
- Korean
- Mandarin
- Portuguese
- Russian
- Spanish
- Swedish
src/main.rs- Entry pointsrc/cli.rs- Command-line interface definitionssrc/commands.rs- Command handlerssrc/content.rs- File content extractionsrc/filter.rs- Filter list managementsrc/glossary.rs- Glossary generation and outputsrc/kaikki/- Kaikki.org API integrationsrc/language.rs- Language definitions and utilities