Skip to content

Replace StandardisePostprocessor with Transform#1288

Open
padam-prakash wants to merge 1 commit intozinggAI:mainfrom
padam-prakash:Issue#1245-DocsTransformPhase
Open

Replace StandardisePostprocessor with Transform#1288
padam-prakash wants to merge 1 commit intozinggAI:mainfrom
padam-prakash:Issue#1245-DocsTransformPhase

Conversation

@padam-prakash
Copy link
Copy Markdown
Contributor

Remove docs/StandardisePostprocessor.md and add docs/Transform.md which relocates and reworks the standardisation documentation into the Transform phase. Update docs/SUMMARY.md to point to Transform.md. The new document updates examples and configuration keys (e.g. setTransformers / StandardiseTransformerType, JSON "transformers" entry), expands Python and JSON examples, and adds CLI usage for running the Transform phase.

Remove docs/StandardisePostprocessor.md and add docs/Transform.md which relocates and reworks the standardisation documentation into the Transform phase. Update docs/SUMMARY.md to point to Transform.md. The new document updates examples and configuration keys (e.g. setTransformers / StandardiseTransformerType, JSON "transformers" entry), expands Python and JSON examples, and adds CLI usage for running the Transform phase.
@padam-prakash padam-prakash requested a review from Copilot March 31, 2026 05:45
@padam-prakash padam-prakash marked this pull request as ready for review March 31, 2026 05:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the documentation to move/rename “Standardise Postprocessor” guidance into a new “Transform” phase doc and updates the docs table of contents accordingly.

Changes:

  • Added docs/Transform.md with updated standardisation/transform documentation, examples, and CLI usage.
  • Updated docs/SUMMARY.md to link to the new Transform documentation.
  • Removed docs/StandardisePostprocessor.md.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File Description
docs/Transform.md New documentation for the Transform phase and standardisation usage, including Python/JSON/CLI examples.
docs/SUMMARY.md Updates navigation entry from Standardise Postprocessor to Transform.
docs/StandardisePostprocessor.md Removes the old Standardise Postprocessor document in favor of the new Transform doc.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +81 to +82
inputPipe = CsvPipe("input", "examples/febrl/input.csv")
outputPipe = CsvPipe("output", "examples/febrl/transformed_output.csv")
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example constructs CsvPipe(...) even though it imports Enterprise pipes (zinggEC.enterprise.common.epipes), where the documented CSV pipe is ECsvPipe. This mismatch is likely to confuse users or fail at runtime; please switch to the correct pipe class for the chosen API.

Suggested change
inputPipe = CsvPipe("input", "examples/febrl/input.csv")
outputPipe = CsvPipe("output", "examples/febrl/transformed_output.csv")
inputPipe = ECsvPipe("input", "examples/febrl/input.csv")
outputPipe = ECsvPipe("output", "examples/febrl/transformed_output.csv")

Copilot uses AI. Check for mistakes.
Comment on lines +51 to +54
```python
fname = EFieldDefinition("fname", "string", MatchType.FUZZY)
fname.setTransformers([StandardiseTransformerType("STANDARDISE", "nicknames_test")])
```
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc introduces setTransformers(...) and StandardiseTransformerType(...), but there are no other references to these APIs in the repo, and the existing Enterprise API docs for EFieldDefinition document setPostProcessors(...)/StandardisePostprocessorType(...) instead. Please align the naming with the actual API (or update the Enterprise API docs in the same PR if the API truly changed).

Copilot uses AI. Check for mistakes.
Comment on lines +61 to +65
from zinggEC.enterprise.common.EFieldDefinition import EFieldDefinition
from zinggEC.enterprise.common.StandardiseTransformerType import StandardiseTransformerType
from zinggEC.enterprise.common.MappingMatchType import MappingMatchType
from zinggEC.enterprise.common.epipes import *
from zinggES.enterprise.spark.ESparkClient import *
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The imports in this example mix Enterprise modules (zinggEC.enterprise.common.*, zinggES.enterprise.spark.*) with later usage of CsvPipe(...) (OSS naming). Please keep the example consistent with one API surface (Enterprise: ECsvPipe, OSS: zingg.pipes.CsvPipe) to avoid copy/paste failures.

Copilot uses AI. Check for mistakes.
Comment on lines +81 to +82
inputPipe = CsvPipe("input", "examples/febrl/input.csv")
outputPipe = CsvPipe("output", "examples/febrl/transformed_output.csv")
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example constructs CsvPipe(...) even though it imports Enterprise pipes (zinggEC.enterprise.common.epipes), where the documented CSV pipe is ECsvPipe. This mismatch is likely to confuse users or fail at runtime; please switch to the correct pipe class for the chosen API.

Suggested change
inputPipe = CsvPipe("input", "examples/febrl/input.csv")
outputPipe = CsvPipe("output", "examples/febrl/transformed_output.csv")
inputPipe = ECsvPipe("input", "examples/febrl/input.csv")
outputPipe = ECsvPipe("output", "examples/febrl/transformed_output.csv")

Copilot uses AI. Check for mistakes.
Comment on lines +112 to +116
"fieldName": "job_title",
"matchType": "fuzzy",
"dataType": "string",
"transformers": "STANDARDISE_jobtitles"
}
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JSON example uses a transformers key on fieldDefinition, but there are no other references to this config key in the repo and the existing Enterprise docs reference postProcessors for standardisation. Please confirm the actual JSON config key supported by this repo’s CLI and align the example accordingly (or document edition/version differences explicitly).

Copilot uses AI. Check for mistakes.
Comment on lines +148 to +150
Run the Transform phase with the following command:

```bash
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CLI example uses --phase transform, but the CLI’s phase registry in this repo (ZinggOptions) does not include a transform phase, so this command will fail validation. Please update the docs to use a valid phase name for this codebase’s CLI, or clearly call out that transform is only available in a different edition/version.

Suggested change
Run the Transform phase with the following command:
```bash
In editions/versions of Zingg that support a separate Transform phase, you can run it with the following command:
```bash
# Note: the 'transform' phase is not registered in this repository's CLI (ZinggOptions) and may only be available in other editions/versions.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants