⚡ Bolt: Pre-compile text extraction regex constants#218
⚡ Bolt: Pre-compile text extraction regex constants#218
Conversation
This commit optimizes text extraction operations by hoisting locally defined regex patterns to module/class level compiled constants. Specifically: - Defined `_TITLE_PATTERNS` and `_COMPANY_PATTERNS` as module-level constants in `cli/utils/keyword_density.py` and modified `_extract_job_details` to use `pattern.search()`. - Defined `_SALARY_PATTERNS`, `_JOB_TYPE_PATTERNS`, and `_EXPERIENCE_LEVEL_PATTERNS` as class constants in `cli/integrations/job_parser.py` and updated the extraction methods to utilize `pattern.search()`. Impact: Faster parsing execution since Python avoids recreating lists and retrieving cached regex patterns from `re.search` repeatedly. Co-authored-by: anchapin <6326294+anchapin@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
Reviewer's GuidePre-compiles frequently used text-extraction regexes into module/class-level constants and updates parsing helpers to reuse these compiled patterns, reducing per-call allocation and regex compilation overhead in JobParser and KeywordDensityGenerator. Class diagram for precompiled regex usage in JobParser and KeywordDensityGeneratorclassDiagram
class JobParser {
-_SALARY_PATTERNS: list
-_JOB_TYPE_PATTERNS: list
-_EXPERIENCE_LEVEL_PATTERNS: list
+__init__(cache_dir: OptionalPath)
-_extract_salary_from_text(text: str) OptionalStr
-_extract_job_type(html: str) OptionalStr
-_extract_experience_level(html: str) OptionalStr
}
class KeywordDensityGenerator {
-_TITLE_PATTERNS: list
-_COMPANY_PATTERNS: list
-config: Config
+__init__(config: Config)
-_extract_job_details(job_description: str) TupleStrStr
}
class Config
class OptionalPath
class OptionalStr
class TupleStrStr
KeywordDensityGenerator --> Config
JobParser --> OptionalPath
JobParser --> OptionalStr
KeywordDensityGenerator --> TupleStrStr
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
This commit runs `black` on the modified files to ensure they pass CI code quality checks. Co-authored-by: anchapin <6326294+anchapin@users.noreply.github.com>
Ran `black` using `py310` target-version which satisfies GitHub Actions' `black --check` requirements. Co-authored-by: anchapin <6326294+anchapin@users.noreply.github.com>
💡 What:
Optimized text parsing methods in
cli/utils/keyword_density.pyandcli/integrations/job_parser.pyby hoisting inline list-based regular expressions into module-level and class-level pre-compiled regex constants (re.compile).🎯 Why:
Prior to this change, functions like
_extract_salary_from_textdefined their patterns locally and calledre.search()insideforloops. Even with Python's internal caching, this required allocating a list of strings on each invocation and a dictionary lookup. Pre-compiling the expressions completely avoids this overhead, making keyword extractions significantly faster especially when processing large job descriptions or resume dumps.📊 Impact:
Faster parsing speed and reduced allocation overhead on hot paths for
JobParserandKeywordDensityGenerator.🔬 Measurement:
Run the test suite
pytest tests/test_keyword_density.py tests/test_job_parser_integration.pyto ensure matching logic correctness, verifying performance via basic profilers.PR created automatically by Jules for task 1641311279977203139 started by @anchapin
Summary by Sourcery
Enhancements: