Python library for parsing NooJ's dictionary files.
pynooj is a Python library that parses NooJ dictionary files (.dic) and extracts lexical information including inflected forms, lemmas, grammatical categories, and morphological traits.
For more information about NooJ, visit the NooJ website.
pip install pynoojfrom pynooj import read_dic
# Parse a NooJ dictionary file
entries = read_dic("path/to/dictionary.dic")
# Each entry is a dictionary containing:
# - "inflected form": the word form
# - "lemma": the base form (optional)
# - "category": grammatical category (e.g., "V", "N", "A")
# - "traits": morphological attributes
for entry in entries:
print(entry["inflected form"], "→", entry["lemma"])
print(f" Category: {entry['category']}")
print(f" Traits: {entry['traits']}")NooJ dictionary files use a comma-separated format:
inflected_form,lemma,category+Trait1=Value1+Trait2=Value2
Examples:
amo,amare,V+Theme=INF+FLX=GP1_INF+GP=1
casa,casa,N+Number=SG+Gender=F
Parses a NooJ dictionary file and returns a list of lexical entries.
Parameters:
path(str): Path to the.dicfile
Returns:
- List of dictionaries, each containing:
"inflected form": the word form (string)"lemma": the base form (string, optional)"category": grammatical category (string)"traits": dictionary of morphological traits (dict)
To run the test suite with unittest:
python -m unittest discover -s testsEnsure you have the necessary tools installed:
pip install build twine-
Update version in
pyproject.toml -
Build the package:
python -m build
-
Upload to PyPI:
twine upload dist/* -
Upload to TestPyPI (optional, for testing):
twine upload --repository testpypi dist/*
Contributions are welcome! Please feel free to submit issues and pull requests.
See the LICENSE file for details.