-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[WIP] Json decoder factory v2 #9272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
scovich
wants to merge
1
commit into
apache:main
Choose a base branch
from
scovich:json-decoder-factory-v2
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the key part that might be deemed controversial. Is Tape really a good thing to expose publicly? It's been a while since I wrote it, but I remember it not being especially friendly as an API, and something that stands a good chance of being changed in future - e.g. to avoid copying strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, and I don't remember seeing any discussion on the original PR I'm building on here:
Is there any way to allow users to customize parsing without exposing something? Other options might include:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree if we want to allow users to override decoding behavior we are going to have to given them direct access to Tape / Tape Element - I don't really see any way around it
@tustvold -- what strings are you referring to? I don't see any strings copied here:
https://github.com/apache/arrow-rs/blob/7e5076f1f775a6fd08a4d63389e26e2920fe3f6a/arrow-json/src/reader/tape.rs#L34-L33
arrow-rs/arrow-json/src/reader/tape.rs
Lines 96 to 101 in 7e5076f
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were some discussions a while back about the way that it copies all strings from the source data being unfortunate. I'm not sure how this is avoidable with the push-based decoder interface, but it has been discussed.
TBC I am not against making Tape public, but it likely needs some TLC prior to that to ensure it is usable and vaguely future-proof. Even basic things like adding non_exhaustive, hiding methods like this that are a bit odd to expose, etc...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If someone could come up with a list of changes that would make us comfortable making Tape public, I can try to put up a PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@scovich is this something you can help drive? I think it is important but I really just don't have the bandwidth to give it the attention it deserves
From my perspective, the Arrow crates now permit making breaking API changes every 3 months (major releases) so if we expose a structure and then decide to make breaking changes to it, that isn't impossible
Thus in my mind, I think we should get something out and working
I don't suspect we are not likely to see changes to this interface unless we expose it and there are new use cases put forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I won't have time in the next couple of weeks, sadly. But yes I do want to see this through.
I do tend to agree we should get something out once it's reasonable, but we already know of several use cases and it's not obvious to me that even those known use cases are well-served by the current approach (speaking as somebody who wants to actually use whatever we come up with).
So: If we have something that works well for the use cases we've thought of, it's probably Good Enough and can evolve a quarter or two later as we discover more use cases or warts.
Is that a reasonable go/no-go criteria?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is my personal preferred approach -- let's get it out rather than waiting on something that is perfect