Skip to content

Conversation

@bbae0312
Copy link

@bbae0312 bbae0312 commented Dec 16, 2025

What does this PR do ?

Add fixes and improvements for Korean TN: cardinal, decimal, ordinal, fraction, date, and post-processing.

Before your PR is "Ready for review"

Pre checks:

  • Have you signed your commits? Use git commit -s to sign.
  • Do all unittests finish successfully before sending PR?
    1. pytest or (if your machine does not have GPU) pytest --cpu from the root folder (given you marked your test cases accordingly @pytest.mark.run_only_on('CPU')).
    2. Sparrowhawk tests bash tools/text_processing_deployment/export_grammars.sh --MODE=test ...
  • If you are adding a new feature: Have you added test cases for both pytest and Sparrowhawk here.
  • Have you added __init__.py for every folder and subfolder, including data folder which has .TSV files?
  • Have you followed codeQL results and removed unused variables and imports (report is at the bottom of the PR in github review box) ?
  • Have you added the correct license header Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. to all newly added Python files?
  • If you copied nemo_text_processing/text_normalization/en/graph_utils.py your header's second line should be Copyright 2015 and onwards Google, Inc.. See an example here.
  • Remove import guards (try import: ... except: ...) if not already done.
  • If you added a new language or a new feature please update the NeMo documentation (lives in different repo).
  • Have you added your language support to tools/text_processing_deployment/pynini_export.py.

PR Type:

  • New Feature
  • Bugfix
  • Documentation
  • Test

If you haven't finished some of the above items you can still open "Draft" PR.

bbae0312 and others added 4 commits December 15, 2025 16:00
Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com>
…zation

Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com>
@github-actions
Copy link

github-actions bot commented Jan 1, 2026

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

@github-actions github-actions bot added the Stale label Jan 1, 2026
@mgrafu mgrafu removed the Stale label Jan 5, 2026
@github-actions
Copy link

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

@github-actions github-actions bot added the Stale label Jan 20, 2026
@github-actions
Copy link

This PR was closed because it has been inactive for 7 days since being marked as stale.

@github-actions github-actions bot closed this Jan 27, 2026
@tbartley94 tbartley94 reopened this Jan 27, 2026
optional_sign = pynini.closure(pynutil.insert('negative: "true" ') + pynini.cross("-", ""), 0, 1)
final_graph = optional_sign + pynutil.insert('integer: "') + graph_num + pynutil.insert('"')
# Delete group separators when they appear between digits (e.g., "1,234" -> "1234")
delete_sep_between_digits = pynini.cdrewrite(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checking: is there any occurence of European numbering in Korean text?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does show up sometimes, but not very often. I agree it may be better to drop it and keep the Korean cardinal grammar simpler.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay we can assume canonical numbering along the US standard.

@github-actions github-actions bot removed the Stale label Jan 28, 2026
@tbartley94
Copy link
Member

@bbae0312 Can you confirm tests passing (sparrowhawk and unit)?

bbae0312 and others added 2 commits January 30, 2026 10:27
Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants