docs: update project documentation#84
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR removes deprecated functionality and updates documentation to reflect simplified API signatures. The changes primarily focus on removing the run_pipeline function wrapper and cleaning up function signatures in documentation.
- Removed deprecated
run_pipelinefunction and its associated tests - Updated regex pattern for phone number detection to prevent false positives
- Simplified documentation by removing parameter details that are no longer configurable
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| leksara/tests/test_chain.py | Removed import of run_pipeline and deleted two tests that used the deprecated function |
| leksara/core/chain.py | Removed the run_pipeline function wrapper |
| leksara/resources/regex_patterns/pii_patterns.json | Added negative lookbehind to phone pattern to avoid matching within longer numeric sequences |
| docs/features.md | Updated function signatures to remove parameters that are no longer exposed in the API |
| docs/examples.md | Updated example outputs to show numeric rating values instead of placeholder tokens |
| docs/api.md | Updated function signatures and removed reference to deprecated list_presets function |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| | `remove_digits(text)` | `str -> str` | Drops ASCII digits. | | ||
| | `remove_emoji(text)` | `str -> str` | Removes emoji code points. Falls back to identity when `emoji` package is missing. | | ||
| | `replace_url(text, mode="remove"\|"replace", placeholder="[URL]")` | `str -> str` | Detects URLs (with optional protocol) and either removes or replaces them with the placeholder. | | ||
| | `replace_url(text, mode="remove"\|"replace"")` | `str -> str` | Detects URLs (with optional protocol) and either removes or replaces them with the placeholder. | |
There was a problem hiding this comment.
Extra double quote at the end of the signature. Should be mode=\"remove\"\|\"replace\".
| | `replace_url(text, mode="remove"\|"replace"")` | `str -> str` | Detects URLs (with optional protocol) and either removes or replaces them with the placeholder. | | |
| | `replace_url(text, mode="remove"|"replace")` | `str -> str` | Detects URLs (with optional protocol) and either removes or replaces them with the placeholder. | |
| | `normalize_slangs(text, mode="replace"\|"remove")` | `mode`. | Substitutes colloquial slang with dictionary entries. | | ||
| | `expand_contraction(text)` | – | Expands Indonesian contractions; returns original value when not a string. | | ||
| | `word_normalization(text, method="stem", word_list=None, mode="keep")` | `method`: currently `"stem"`. `word_list`: iterable of tokens to protect or include depending on `mode`. `mode`: `keep`, `only`, `exclude`. | Applies stemming using Sastrawi when available; automatically masks placeholders before stemming. | | ||
| | `word_normalization(text, word_list=None, mode="keep")` | `word_list`: iterable of tokens to protect or include depending on `mode`. `mode`: `keep`, `only`. | Applies stemming using Sastrawi when available; automatically masks placeholders before stemming. | |
There was a problem hiding this comment.
The documentation states mode can be keep or only, but the original version mentioned three modes: keep, only, and exclude. If exclude mode was removed, this is correct; otherwise, it should still be documented.
| | `word_normalization(text, word_list=None, mode="keep")` | `word_list`: iterable of tokens to protect or include depending on `mode`. `mode`: `keep`, `only`. | Applies stemming using Sastrawi when available; automatically masks placeholders before stemming. | | |
| | `word_normalization(text, word_list=None, mode="keep")` | `word_list`: iterable of tokens to protect or include depending on `mode`. `mode`: `keep`, `only`, `exclude`. | Applies stemming using Sastrawi when available; automatically masks placeholders before stemming. | |
No description provided.