A test-covered implementation of chunk-wise long-text synthesis with two parallel pipelines inspired by Kimi-K2:
ChunkWiseRephrasePipeline: faithful chunk-wise autoregressive rephrasing.ChunkWiseGenerationPipeline: plan-driven chunk-wise autoregressive long-form generation.
- Hierarchical no-overlap chunk splitting with overlap-aware stitching.
- Autoregressive generation with rolling prefix windows.
- Parallel workflows for rephrase and pure generation.
- Rephrase retries with pluggable fidelity verification.
- Generation section retries with issue-targeted repair prompts.
- Optional prompt compression for long-context section generation.
- Plan + state based long-form generation with consistency pass guard.
- Built-in quality checks for coverage, terminology, repetition, drift, and required entities.
- OpenAI-compatible backend with environment-based configuration.
The repository now follows explicit domain boundaries:
pipelines/: orchestration only (rephrase.py,generation.py, shared helpers inbase.py).prompts/: prompt rendering only (rephrase.py,generation.py, shared language helpers inbase.py).quality/: quality and fidelity checks (fidelity.py,generation.py, shared text/token helpers inbase.py).backends/: provider adapters (openai.py).core/: stable grouped API exports (protocols.py,types.py,config.py).- Top-level domain modules remain focused (
chunking.py,generation_state.py,generation_types.py,model.py).
Legacy wrapper modules were removed and should not be imported anymore: pipeline.py, prompting.py, fidelity.py, openai_backend.py, generation_pipeline.py, generation_prompting.py, generation_quality.py, tokenizer.py.
src/
__init__.py # unified package-level public exports
chunking.py # chunk split and overlap logic
generation_state.py # generation state table update logic
generation_types.py # generation dataclasses and result types
model.py # model request/task protocols and adapters
pipelines/
__init__.py
rephrase.py # chunk-wise rephrase orchestration + PipelineConfig
generation.py # chunk-wise long-form generation orchestration
base.py # overlap detection and stitching
prompts/
__init__.py
rephrase.py # RewriteRequest + rephrase prompt rendering
generation.py # plan/section/repair/consistency prompt rendering
base.py # shared prompt language helpers
quality/
__init__.py
fidelity.py # fidelity verifier contracts and implementations
generation.py # generation quality checkers and consistency guard
base.py # shared token/text matching helpers
backends/
__init__.py
openai.py # OpenAI-compatible backend and configs
core/
__init__.py
protocols.py # Tokenizer/LLMModel/RewriteModel/FidelityVerifier
types.py # LLMRequest, RewriteRequest, GenerationPlan, SectionSpec
config.py # PipelineConfig, GenerationConfig, OpenAIBackendConfig
tokenization/
__init__.py # tokenizer contracts and helpers
tests/
test_*.py # deterministic unittest coverage + refactor compatibility tests
scripts/
run_live_openai_pipeline.py # live rephrase runner
run_live_openai_generation_pipeline.py # live generation runner
run_generation_ab_baseline.py # one-shot vs chunk-wise baseline evaluation
This project uses uv for environment and dependency management.
uv syncRun full offline test suite:
uv run python -m unittest discover -s tests -vRun one module during iteration:
uv run python -m unittest tests.test_generation_pipeline -vValidate refactor-era API boundaries and exports:
PYTHONPATH=src:tests uv run python -m unittest \
tests.test_package_entrypoint \
tests.test_core_api_compat \
tests.test_pipelines_api -vexport LLM_API_KEY=your_key_here
uv run python scripts/run_live_openai_pipeline.py \
--input tests/data/live_rephrase_input.txt \
--output tests/data/rephrase_output.txtexport LLM_API_KEY=your_key_here
uv run python scripts/run_live_openai_generation_pipeline.py \
--topic "Chunk-wise autoregressive long-form generation" \
--objective "Create long-context training text" \
--target-tokens 1800 \
--audience "ML engineers" \
--tone "neutral technical" \
--output tests/data/generation_output.txtYou can also pass a manual plan JSON:
uv run python scripts/run_live_openai_generation_pipeline.py \
--manual-plan-path tests/data/manual_plan.json \
--output tests/data/generation_output.txtProfile-based quick switch (default is coherence_first):
uv run python scripts/run_live_openai_generation_pipeline.py \
--topic "Chunk-wise autoregressive long-form generation" \
--objective "Create long-context training text" \
--profile cost_first \
--output tests/data/generation_output_cost_first.txtThe live integration test makes a real API request and is disabled by default:
export LLM_API_KEY=your_key_here
export RUN_LIVE_LLM_TESTS=1
uv run python -m unittest tests.test_openai_backend_live -vUse the fixed cases file to build a reproducible baseline report:
export LLM_API_KEY=your_key_here
uv run python scripts/run_generation_ab_baseline.py \
--cases tests/fixtures/generation_eval_cases.json \
--output-dir tests/data/ab_eval_reports \
--prompt-language enOutputs:
ab_baseline_report.json: machine-readable aggregate + per-case detailsab_baseline_report.md: human-readable summary + manual scoring table<case_id>.json: per-case raw outputs and metrics
Recommended grouped imports:
from pipelines import ChunkWiseRephrasePipeline, ChunkWiseGenerationPipeline, PipelineConfigfrom prompts import RewriteRequest, render_rewrite_prompt, render_plan_promptfrom quality import FidelityVerifier, CompositeFidelityVerifier, NumericFactCheckerfrom backends import OpenAIBackendConfig, OpenAILLMModel, OpenAIRewriteModelfrom core.protocols import Tokenizer, LLMModel, RewriteModel, FidelityVerifierfrom core.types import LLMRequest, RewriteRequest, GenerationPlan, SectionSpecfrom core.config import PipelineConfig, GenerationConfig, OpenAIBackendConfig
Compatibility package entrypoint is available at src:
from src import ChunkWiseRephrasePipeline, PipelineConfig, RewriteRequest, WhitespaceTokenizer
from core.config import PipelineConfig
from core.types import RewriteRequest
from pipelines import ChunkWiseRephrasePipeline
from tokenization import WhitespaceTokenizer
class EchoRewriteModel:
def rewrite(self, request: RewriteRequest) -> str:
return request.current_chunk
pipeline = ChunkWiseRephrasePipeline(
model=EchoRewriteModel(),
tokenizer=WhitespaceTokenizer(),
config=PipelineConfig(
chunk_size=256,
length_mode="token",
prefix_window_tokens=1024,
max_stitch_overlap_tokens=64,
),
)
rewritten = pipeline.run("Your long document here.", style_instruction="Rewrite for clarity.")
print(rewritten)from core.config import GenerationConfig
from core.types import GenerationPlan, LLMRequest, SectionSpec
from pipelines import ChunkWiseGenerationPipeline
from tokenization import WhitespaceTokenizer
class StubLLM:
def generate(self, request: LLMRequest) -> str:
if request.task == "section_generation":
return "Section body with required entities and key points."
if request.task == "consistency_pass":
return "Section body with required entities and key points."
raise ValueError("manual plan run should not call plan_generation")
plan = GenerationPlan(
topic="Chunk-wise generation",
objective="Teach the method",
audience="ML engineers",
tone="neutral technical",
target_total_length=300,
sections=[
SectionSpec(
title="Intro",
key_points=["global anchor controls structure"],
required_entities=["global anchor"],
constraints=[],
target_length=120,
)
],
terminology_preferences={"global anchor": "global anchor"},
narrative_voice="third-person",
do_not_include=[],
)
pipeline = ChunkWiseGenerationPipeline(
model=StubLLM(),
tokenizer=WhitespaceTokenizer(),
config=GenerationConfig(prefix_window_tokens=800),
)
result = pipeline.run(manual_plan=plan)
print(result.final_text)
print(result.qc_report.coverage_missing)Environment variables:
LLM_API_KEY(required): API key.LLM_MODEL(optional): override model ID.LLM_BASE_URL(optional): override provider base URL.
Current defaults in src/backends/openai.py:
DEFAULT_BASE_URL = "https://openrouter.ai/api/v1"DEFAULT_MODEL = "stepfun/step-3.5-flash:free"
Live rephrase script flags (scripts/run_live_openai_pipeline.py):
--chunk-size--length-mode(auto/token/char)--prefix-window-tokens--style--prompt-language(en/zh)--model--base-url--temperature--top-p--max-new-tokens--verbose
Live generation script flags (scripts/run_live_openai_generation_pipeline.py):
--topic--objective--target-tokens--audience--tone--prompt-language(en/zh)--manual-plan-path--profile(coherence_first/cost_first)--prompt-compression(on/off) - override profile--section-retry-strategy(off/balanced/aggressive) - override profile--consistency-pass(on/off) - override profile--consistency-guard(on/off) - override profile--prefix-window-tokens--disable-consistency-pass(deprecated alias for--consistency-pass off)--enable-reasoning--model--base-url--temperature--top-p--max-new-tokens--verbose
- Error contains
not a valid model ID: set a provider-valid model, for example:export LLM_MODEL=your_valid_model_id. - Missing API key error:
make sure
LLM_API_KEYis exported in the current shell.