NIFI-15209 JoltTransformRecord should not only take schema from first record by sammu97 · Pull Request #10545 · apache/nifi

sammu97 · 2025-11-18T20:09:03Z

This PR aims to fix an unwanted behaviour of having fields omitted after a JoltTransformRecord on a batch of records within the same FlowFile, due to multiple outputs having more than 1 schema. The current implementation of the processor retrieves the schema of the FIRST transformed record, and abides by that schema throughout the rest of the transformations. A new property is introduced for the JoltTransformRecord, where users can decide to either keep the same behaviour, or utilize the new PARTITION_BY_SCHEMA strategy, which will split the transformations into separate FlowFIles, according to the number of schemas.

Summary

NIFI-15209

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Apache NiFi Jira issue created

Pull Request Tracking

Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-00000
Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-00000

Pull Request Formatting

Pull Request based on current revision of the main branch
Pull Request refers to a feature branch with one commit containing changes

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

Build completed using ./mvnw clean install -P contrib-check
- JDK 21
- JDK 25

Licensing

New dependencies are compatible with the Apache License 2.0 according to the License Policy
New dependencies are documented in applicable LICENSE and NOTICE files

Documentation

Documentation formatting appears as expected in rendered files

exceptionfactory

Thanks for working on this issue @sammu97. As a general note, it seems like it would be cleaner to avoid moving all of the test schema and JSON files to a new directory, in order to focus on the actual changes proposed. Can that be adjusted?

sammu97 · 2025-11-19T09:26:28Z

Yes sure @exceptionfactory , will handle this as soon as i can.

Also, I'm seeing that some checks are failing on code checkout, is this due to the Cloudflare outage?

exceptionfactory · 2025-11-19T09:29:33Z

Yes sure @exceptionfactory , will handle this as soon as i can.

Also, I'm seeing that some checks are failing on code checkout, is this due to the Cloudflare outage?

Yes, they were due to the outage, I have restarted the checks.

sammu97 · 2025-11-19T12:56:42Z

Looks like the build failed on some of the OSs, im suspecting a file ordering issue. Will investigate and update the PR accordingly

sammu97 · 2025-11-23T14:48:32Z

@exceptionfactory Had to make some fixes for Windows as the checks are usually omitted. However, any idea about the error for the Mac tests?

The template is not valid. .github/workflows/ci-workflow.yml (Line: 224, Col: 16): hashFiles('**/package-lock.json') failed. Fail to hash files under directory '/Users/runner/work/nifi/nifi'

ChrisSamo632 · 2025-11-24T08:31:41Z

@exceptionfactory Had to make some fixes for Windows as the checks are usually omitted. However, any idea about the error for the Mac tests?

The template is not valid. .github/workflows/ci-workflow.yml (Line: 224, Col: 16): hashFiles('**/package-lock.json') failed. Fail to hash files under directory '/Users/runner/work/nifi/nifi'

@sammu97 the node cache issue in the build appears to have been an intermittent problem over the weekend. I spotted other PRs with similar errors, but then things seem to be working again this morning. I've restarted the failed job on your PR and so far things like happier 🤞

sammu97 · 2025-11-24T08:42:18Z

@exceptionfactory Had to make some fixes for Windows as the checks are usually omitted. However, any idea about the error for the Mac tests?
The template is not valid. .github/workflows/ci-workflow.yml (Line: 224, Col: 16): hashFiles('**/package-lock.json') failed. Fail to hash files under directory '/Users/runner/work/nifi/nifi'

@sammu97 the node cache issue in the build appears to have been an intermittent problem over the weekend. I spotted other PRs with similar errors, but then things seem to be working again this morning. I've restarted the failed job on your PR and so far things like happier 🤞

@ChrisSamo632 Yep, seems like it's already past the step that was failing. Thanks!

sammu97 · 2025-11-24T12:54:02Z

@exceptionfactory Just a small note too. I've also amended some logic for the testNoRecords() test, as I have put out a small change that if the Jolt has no records to transform, in my opinion there should be no resulting flowfile as there is nothing to write. Not sure what you think about this, should I be leaving the old logic?

sammu97 · 2026-01-09T18:11:21Z

Hey @exceptionfactory, I've resolved some conflicts due to the PR being stale, and also made some fixes relating to issues where records with the same schema were still being partitioned due field ordering. Let me know what you think about the change please

Fix JoltTransformRecord partitioning and normalization

sammu97 · 2026-03-09T10:03:48Z

Hi @exceptionfactory , was wondering if you could take a look at this PR when possible please? Would love to have this fixed.

exceptionfactory

@sammu97 the current state of the pull request includes a number of unnecessary formatting changes, which need to be reverted.

On a general review, the set of changes introduce significant complexity, raising questions about the viability of the approach.

One general question before proceeding, is the intended use of this Processor with JSON, or some other format? If it is with JSON, then it may be better to consider a new Processor that deals directly with JSON, instead of the Record API.

sammu97 · 2026-03-11T08:45:24Z

Hi @exceptionfactory, you're right about the formatting. Will revert any unnecessary ones, apologies.

Regarding it's intended use, we are specifically making use of this with the JoltTransformRecord to transform ND Json content. I agree the solution turned out a bit complex, though I am not sure on the alternatives so if you have any ideas please go ahead. My primary concern here is that this issue is not easily noticeable and can do a lot of damage before it is noticed when making use of this processor.

exceptionfactory · 2026-03-11T13:46:29Z

Thanks for the reply @sammu97.

The source data being NDJSON is an important detail and may lead to a different solution.

Right now, the JoltTransformJSON Processor works with either the entire FlowFile, or with Attributes, neither of which align with NDJSON. However, if that Processor were changed to support handling NDJSON, applying the configured transform to each line of NDJSON, it sounds like that might be a better fit, avoiding any kind of schema inference issues. I might be able to put something together if that seems like a potential solution.

sammu97 · 2026-03-11T23:03:40Z

@exceptionfactory That solution works for me to be honest. Just one thing if we go this route though, I do still believe that this behaviour with the JoltTransformRecord should be documented somewhere, so that users are aware of it. What do you think?

dan-s1 · 2026-03-12T21:22:33Z

Correct me if I am wrong but this or a similar conversation seems to have taken place on NIFI-14309

exceptionfactory · 2026-03-13T02:17:15Z

Correct me if I am wrong but this or a similar conversation seems to have taken place on NIFI-14309

@dan-s1, yes, this relates to the discussion on NIFI-14309, although that issue refers to the JSLT Processor, not the Jolt Processor. This discussion on NDJSON handling highlights a use case gap with record-oriented processing, so it seems best to introduce something more specific in this case for Jolt.

exceptionfactory · 2026-03-13T02:21:33Z

@exceptionfactory That solution works for me to be honest. Just one thing if we go this route though, I do still believe that this behaviour with the JoltTransformRecord should be documented somewhere, so that users are aware of it. What do you think?

Thanks for the confirmation, and for your work in this pull request @sammu97. Yes, I agree it would be helpful to document the current behavior of JoltTransformRecord as it stands. The documentation for the Record Reader property mentions something about the record schema, so updating that property documentation in a separate pull request seems reasonable.

To address the primary goal, I submitted pull request #11001 for NIFI-15712 adding support for JSON Lines/NDJSON to the JoltTransformJSON Processor. After some minor refactoring, it was a straightforward addition, which sounds like it should fit the use case of widely varying JSON elements, avoiding the record schema challenges.

sammu97 · 2026-03-13T07:17:42Z

@exceptionfactory That solution works for me to be honest. Just one thing if we go this route though, I do still believe that this behaviour with the JoltTransformRecord should be documented somewhere, so that users are aware of it. What do you think?

Thanks for the confirmation, and for your work in this pull request @sammu97. Yes, I agree it would be helpful to document the current behavior of JoltTransformRecord as it stands. The documentation for the Record Reader property mentions something about the record schema, so updating that property documentation in a separate pull request seems reasonable.

To address the primary goal, I submitted pull request #11001 for NIFI-15712 adding support for JSON Lines/NDJSON to the JoltTransformJSON Processor. After some minor refactoring, it was a straightforward addition, which sounds like it should fit the use case of widely varying JSON elements, avoiding the record schema challenges.

Awesome, thanks @exceptionfactory for the solution! Looking forward to having this deployed

sammu97 marked this pull request as draft November 18, 2025 20:09

exceptionfactory reviewed Nov 19, 2025

View reviewed changes

sammu97 marked this pull request as ready for review November 23, 2025 16:01

sammu97 requested a review from exceptionfactory December 3, 2025 14:23

sammu97 marked this pull request as draft January 9, 2026 15:18

sammu97 marked this pull request as ready for review January 9, 2026 18:10

sammu97 force-pushed the NIFI-15209 branch from 6121767 to 6dec2e0 Compare March 7, 2026 20:53

NIFI-15209

9ab5ff4

Fix JoltTransformRecord partitioning and normalization

sammu97 force-pushed the NIFI-15209 branch from 6dec2e0 to 9ab5ff4 Compare March 7, 2026 20:58

Checkstyle fixes

8c53159

exceptionfactory requested changes Mar 10, 2026

View reviewed changes

exceptionfactory closed this Mar 13, 2026

Conversation

sammu97 commented Nov 18, 2025

Summary

Tracking

Issue Tracking

Pull Request Tracking

Pull Request Formatting

Verification

Build

Licensing

Documentation

Uh oh!

exceptionfactory left a comment

Choose a reason for hiding this comment

Uh oh!

sammu97 commented Nov 19, 2025

Uh oh!

exceptionfactory commented Nov 19, 2025

Uh oh!

sammu97 commented Nov 19, 2025

Uh oh!

sammu97 commented Nov 23, 2025

Uh oh!

ChrisSamo632 commented Nov 24, 2025

Uh oh!

sammu97 commented Nov 24, 2025

Uh oh!

sammu97 commented Nov 24, 2025

Uh oh!

sammu97 commented Jan 9, 2026

Uh oh!

sammu97 commented Mar 9, 2026

Uh oh!

exceptionfactory left a comment

Choose a reason for hiding this comment

Uh oh!

sammu97 commented Mar 11, 2026

Uh oh!

exceptionfactory commented Mar 11, 2026

Uh oh!

sammu97 commented Mar 11, 2026

Uh oh!

dan-s1 commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

exceptionfactory commented Mar 13, 2026

Uh oh!

exceptionfactory commented Mar 13, 2026

Uh oh!

sammu97 commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dan-s1 commented Mar 12, 2026 •

edited

Loading