Skip to content

docs: update project documentation#84

Merged
RedEye1605 merged 1 commit into
mainfrom
docs
Oct 25, 2025
Merged

docs: update project documentation#84
RedEye1605 merged 1 commit into
mainfrom
docs

Conversation

@RedEye1605
Copy link
Copy Markdown
Owner

No description provided.

Copilot AI review requested due to automatic review settings October 25, 2025 13:11
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR removes deprecated functionality and updates documentation to reflect simplified API signatures. The changes primarily focus on removing the run_pipeline function wrapper and cleaning up function signatures in documentation.

  • Removed deprecated run_pipeline function and its associated tests
  • Updated regex pattern for phone number detection to prevent false positives
  • Simplified documentation by removing parameter details that are no longer configurable

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
leksara/tests/test_chain.py Removed import of run_pipeline and deleted two tests that used the deprecated function
leksara/core/chain.py Removed the run_pipeline function wrapper
leksara/resources/regex_patterns/pii_patterns.json Added negative lookbehind to phone pattern to avoid matching within longer numeric sequences
docs/features.md Updated function signatures to remove parameters that are no longer exposed in the API
docs/examples.md Updated example outputs to show numeric rating values instead of placeholder tokens
docs/api.md Updated function signatures and removed reference to deprecated list_presets function

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docs/api.md
| `remove_digits(text)` | `str -> str` | Drops ASCII digits. |
| `remove_emoji(text)` | `str -> str` | Removes emoji code points. Falls back to identity when `emoji` package is missing. |
| `replace_url(text, mode="remove"\|"replace", placeholder="[URL]")` | `str -> str` | Detects URLs (with optional protocol) and either removes or replaces them with the placeholder. |
| `replace_url(text, mode="remove"\|"replace"")` | `str -> str` | Detects URLs (with optional protocol) and either removes or replaces them with the placeholder. |
Copy link

Copilot AI Oct 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra double quote at the end of the signature. Should be mode=\"remove\"\|\"replace\".

Suggested change
| `replace_url(text, mode="remove"\|"replace"")` | `str -> str` | Detects URLs (with optional protocol) and either removes or replaces them with the placeholder. |
| `replace_url(text, mode="remove"|"replace")` | `str -> str` | Detects URLs (with optional protocol) and either removes or replaces them with the placeholder. |

Copilot uses AI. Check for mistakes.
Comment thread docs/api.md
| `normalize_slangs(text, mode="replace"\|"remove")` | `mode`. | Substitutes colloquial slang with dictionary entries. |
| `expand_contraction(text)` | – | Expands Indonesian contractions; returns original value when not a string. |
| `word_normalization(text, method="stem", word_list=None, mode="keep")` | `method`: currently `"stem"`. `word_list`: iterable of tokens to protect or include depending on `mode`. `mode`: `keep`, `only`, `exclude`. | Applies stemming using Sastrawi when available; automatically masks placeholders before stemming. |
| `word_normalization(text, word_list=None, mode="keep")` | `word_list`: iterable of tokens to protect or include depending on `mode`. `mode`: `keep`, `only`. | Applies stemming using Sastrawi when available; automatically masks placeholders before stemming. |
Copy link

Copilot AI Oct 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation states mode can be keep or only, but the original version mentioned three modes: keep, only, and exclude. If exclude mode was removed, this is correct; otherwise, it should still be documented.

Suggested change
| `word_normalization(text, word_list=None, mode="keep")` | `word_list`: iterable of tokens to protect or include depending on `mode`. `mode`: `keep`, `only`. | Applies stemming using Sastrawi when available; automatically masks placeholders before stemming. |
| `word_normalization(text, word_list=None, mode="keep")` | `word_list`: iterable of tokens to protect or include depending on `mode`. `mode`: `keep`, `only`, `exclude`. | Applies stemming using Sastrawi when available; automatically masks placeholders before stemming. |

Copilot uses AI. Check for mistakes.
@RedEye1605 RedEye1605 merged commit 6a35ed2 into main Oct 25, 2025
7 checks passed
@RedEye1605 RedEye1605 deleted the docs branch January 1, 2026 18:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants