Description
The config loader in graphrag-common uses string.Template.substitute() to process environment variables in settings.yaml. However, substitute() treats every $ character as a placeholder prefix — including $ that happens to be part of a regex pattern or any other string value. This causes a hard crash with an unhelpful ValueError.
Reproduction
Create a settings.yaml with:
input:
type: markitdown
file_pattern: ".*\\.md$"
Run graphrag index --root <dir>.
Error
ValueError: Invalid placeholder in string: line 50, col 25
This happens because $ at the end of .*\.md$ (a valid regex anchor) is treated by string.Template as a template placeholder, and a bare $ without a valid identifier causes substitute() to raise ValueError.
Root Cause
File: graphrag-common/graphrag_common/config/load_config.py, the _parse_env_variables function:
def _parse_env_variables(text: str) -> str:
"""Parse environment variables in the configuration text."""
try:
return Template(text).substitute(os.environ)
except KeyError as error:
msg = f"Environment variable not found: {error}"
raise ConfigParsingError(msg) from error
substitute() raises ValueError for any invalid placeholder (not just KeyError for missing env vars). So bare $ in regex patterns, $VAR with special characters, etc. all crash the config loader.
This affects:
file_pattern with regex ending in $ (very common, e.g. .*\.md$)
- Any other config value that happens to contain
$
prompt paths containing $ characters
Proposed Fix
Replace substitute() with safe_substitute(), which silently leaves unrecognized placeholders as literal text instead of raising an error:
def _parse_env_variables(text: str) -> str:
"""Parse environment variables in the configuration text."""
return Template(text).safe_substitute(os.environ)
safe_substitute() handles $$ escape sequences but does not raise an error on bare $ or missing keys — it leaves them unchanged as literal text. This is the expected behavior: a $ in a regex pattern should stay as $, not crash the config loader.
A KeyError or ValueError from substitute() is the wrong place to validate config values anyway — Pydantic schema validation in the config models already handles that properly.
Environment
- graphrag version: 3.0.9
- Python: 3.12
Description
The config loader in
graphrag-commonusesstring.Template.substitute()to process environment variables insettings.yaml. However,substitute()treats every$character as a placeholder prefix — including$that happens to be part of a regex pattern or any other string value. This causes a hard crash with an unhelpfulValueError.Reproduction
Create a
settings.yamlwith:Run
graphrag index --root <dir>.Error
This happens because
$at the end of.*\.md$(a valid regex anchor) is treated bystring.Templateas a template placeholder, and a bare$without a valid identifier causessubstitute()to raiseValueError.Root Cause
File:
graphrag-common/graphrag_common/config/load_config.py, the_parse_env_variablesfunction:substitute()raisesValueErrorfor any invalid placeholder (not justKeyErrorfor missing env vars). So bare$in regex patterns,$VARwith special characters, etc. all crash the config loader.This affects:
file_patternwith regex ending in$(very common, e.g..*\.md$)$promptpaths containing$charactersProposed Fix
Replace
substitute()withsafe_substitute(), which silently leaves unrecognized placeholders as literal text instead of raising an error:safe_substitute()handles$$escape sequences but does not raise an error on bare$or missing keys — it leaves them unchanged as literal text. This is the expected behavior: a$in a regex pattern should stay as$, not crash the config loader.A
KeyErrororValueErrorfromsubstitute()is the wrong place to validate config values anyway — Pydantic schema validation in the config models already handles that properly.Environment