feat: iterate over more than one prefix (#109)#127
feat: iterate over more than one prefix (#109)#127wodny wants to merge 6 commits intoICRAR:masterfrom
Conversation
prepare to create prefixed_items_basecoro using the same code
keep set for prefixed_items
Reviewer's Guide by SourceryThis PR adds support for iterating over multiple prefixes in ijson by introducing a new Class diagram for the new prefixed_items functionalityclassDiagram
class ItemsCommonBasecoro {
+builder_t builder
+int pending_builder_reset
+PyObject *target_send
+PyObject *prefix
+int object_depth
+yajl2_state *module_state
}
class PrefixedItemsBasecoro {
+PyObject *target_send
+PyObject *prefix
+int object_depth
+int pending_builder_reset
+builder_t builder
+yajl2_state *module_state
+PyObject* prefixed_items_basecoro_send_impl(PyObject *self, PyObject *path, PyObject *event, PyObject *value)
}
class ItemsBasecoro {
+PyObject *target_send
+PyObject *prefix
+int object_depth
+int pending_builder_reset
+builder_t builder
+yajl2_state *module_state
}
ItemsCommonBasecoro <|-- PrefixedItemsBasecoro
ItemsCommonBasecoro <|-- ItemsBasecoro
PrefixedItemsBasecoro <|-- PrefixedItemsGen
PrefixedItemsBasecoro <|-- PrefixedItemsAsync
class PrefixedItemsGen {
+reading_generator_t reading_gen
}
class PrefixedItemsAsync {
+async_reading_generator *reading_generator
}
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey @wodny - I've reviewed your changes and they look great!
Here's what I looked at during the review
- 🟡 General issues: 1 issue found
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟡 Complexity: 1 issue found
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| N_N(retval = items_common_send_top(self, path, event, value)); | ||
| if (retval == Py_None) | ||
| Py_RETURN_NONE; | ||
| PyObject *tuple = PyTuple_Pack(2, path, retval); |
There was a problem hiding this comment.
suggestion (bug_risk): Add NULL check after PyTuple_Pack
PyTuple_Pack can return NULL on error. Should check the return value and handle the error case appropriately.
| PyObject *tuple = PyTuple_Pack(2, path, retval); | |
| PyObject *tuple = PyTuple_Pack(2, path, retval); | |
| if (tuple == NULL) { | |
| return NULL; | |
| } |
There was a problem hiding this comment.
This is correct, but it is not the only place with lacking checks (kvitems_basecoro.c, parse_basecoro.c). Probably a separate commit correcting all of them at once would be beneficial.
There was a problem hiding this comment.
I have created a GitHub issue for your comment: #128
| ) | ||
|
|
||
|
|
||
| def _prefixed_items_pipeline(backend, prefix, map_type, config): |
There was a problem hiding this comment.
issue (complexity): Consider consolidating the parallel implementations into a single core prefixed pipeline with wrappers for non-prefixed functionality
The parallel implementations for prefixed and non-prefixed items create unnecessary complexity. We can simplify by making the prefixed version the core implementation and removing duplicate pipeline definitions. Here's how:
def _prefixed_items_pipeline(backend, prefix, map_type, config):
return (
(backend['prefixed_items_basecoro'], (prefix,), {'map_type': map_type}),
(backend['parse_basecoro'], [], {}),
(backend['basic_parse_basecoro'], [], config)
)
def _items_pipeline(backend, prefix, map_type, config):
# Reuse prefixed pipeline with wrapper to strip prefix
@utils.coroutine
def strip_prefix(target):
while True:
prefix, value = (yield)
target.send(value)
return _prefixed_items_pipeline(backend, prefix, map_type, config)[:-1] + \
((strip_prefix, [], {}),)This approach:
- Makes prefixed_items_pipeline the core implementation
- Removes duplicate pipeline logic
- Maintains all functionality while reducing code complexity
- Keeps the separation between prefixed and non-prefixed interfaces
The same pattern can be applied to the generator and async implementations.
rtobar
left a comment
There was a problem hiding this comment.
@wodny as promised in #109, I took some time to come back to this. Once again, thanks a lot! I was now able to read your description, and have a cursory look at the changes. I'll kick CI now to run, which might flag any potential obvious issues. In the meantime I have only some minor comments for things I saw on passing. From what I saw, I think the solution looks good, although I do wonder whether there's more code we can unify -- that can be done later though.
I missed seeing some tests. The test_items.py test suite should be a good starting point, and you'll probably want to add some tests that are specific to the new functionality. There's also a test that should be added to test_misc.py for the new API entry point.
You said you benchmarked this, so presumably you have some changes for benchmark.py too? And hopefully for dump.py as well -- both of which also have respective tests to be added. All of these should be fairly straightforward though. If you don't feel like adding them, I can do it after the fact. The tests are a must though, otherwise we can't really be sure things are working as they should.
| items = backend.items | ||
| items_coro = backend.items_coro | ||
| prefixed_items = backend.prefixed_items | ||
| prefixed_items_coro = backend.prefixed_items_coro |
| PyObject *retval; | ||
| N_N(retval = items_common_send_top(self, path, event, value)); | ||
| if (retval == Py_None) | ||
| Py_RETURN_NONE; |
There was a problem hiding this comment.
I think this is incrementing a reference to None unnecessarily, you could just return retval
| parse_coro = backend.parse_coro | ||
| items = backend.items | ||
| items_coro = backend.items_coro | ||
| prefixed_items = backend.prefixed_items |
There was a problem hiding this comment.
This is a good place as any to discuss the name. I'm not convinced by "prefixed items" doesn't convey the fact that now you can specify multiple prefixes. I originally thought of "multi_items" or something along those lines, to make it clearer that there was no restriction on the number of prefixes.
|
@wodny while you're here, could you rebase on top of the latest master? I updated the CI infra in the meanwhile and macos runners should be running correctly now |
I prepared some proposals to check if the way I tackle the problem from #109 is acceptable.
I implemented the changes in Python and
yajl2_cbackends.The commit stack contains two variants:
items(),items()and the newprefixed_items(), but has a drawback of performing 10% worse than the original.I benchmarked the code both using defaults and some input I parse in production which is a map of very long lists of maps for which using
items()is essential.Both versions use
items_common.hwhich containsstatic inlinefunctions used byitems()and the newprefixed_items(). An analogy to thebuilder.hfile. The code is all that I was able to move fromitems_basecoro.c. Because the code is shared, in the non-set-items()variant a macro is used to define the comparison method (PyObject_RichCompareBoolvsPySet_Contains). The shared function is divided into top and bottom halves so the calling side can do a specializedCORO_SENDin between and avoid mangling a generic tuple. Division into halves required one additional state information -pending_builder_reset.Before I did the
items_common.hI tried extending pipelines but the built-in tests changed my mind. All elements of the pipelines have input and output formats already defined and eg. the items basecoro is tested directly with events as inputs.I think I managed to avoid additional tuple creation/destruction with the exception of the pure Python code where I assumed it won't matter. I can always write two specialized functions.
I extended the list of
ifs inparse_basecoro.cto detect the newPrefixedItemsBasecoro:Everything else is just copying
items*.{c,h}as a template and adding theprefixedprefix ;) I made it in a separate commit ("add prefixed_items") because it's the biggest one and I didn't want to mix it with the actual changes.All tests (including memleaks) pass without modification.
I did not touch the
kvitems()part yet nor did I write any tests forprefixed_items()in case you decide the changes should be done in a completely different manner.Summary by Sourcery
Add support for iterating over multiple prefixes in the ijson library, enhancing data parsing capabilities. This includes new coroutine and generator implementations for prefixed_items, with shared code to maintain efficiency.
New Features:
Enhancements: