Improved fast serializer code generation logic for Enum type#54
Improved fast serializer code generation logic for Enum type#54volauvent merged 2 commits intolinkedin:masterfrom
Conversation
|
Cool result! And how does it compare with vanilla Avro now? Is it equal or better? Can we get a diff of the generated code? Actually, I'm wondering... should we make the build trigger the code-gen for a set of schemas we care about, and check it under source control (not in the main resource, but perhaps in tests or in some kind of |
|
Fast-avro 1.4 serialization speed is ~10% faster than vanilla 1.4 now for Enum array. Below is the benchmark result of vanilla avro: Here are fast-serializers of EnumArray schema:
Like the idea of source control of code-gen to review related changes more carefully and properly. |
|
Thanks for the gist... the change looks good. It looks quite weird to have a map lookup/set in this code. I wonder why it was done like this in the first place, and if that's a hint that there may be edge cases where this is important (I can't think of any, though). Also, you'll notice that there is yet another map lookup hidden in the |
FelixGV
left a comment
There was a problem hiding this comment.
Looks good, but I'd like to suggest some readability / code-style improvement while we're in this section of the code, if you don't mind...
|
|
||
| codeModel.ref(Schema.class).staticInvoke("parse").arg(enumSchema.toString())); | ||
| enumSchemaVarMap.put(enumSchemaFingerprint, enumSchemaVar); | ||
| valueToWrite = JExpr.invoke(enumSchemaVar, "getEnumOrdinal").arg(enumValueCasted.invoke("toString")); |
There was a problem hiding this comment.
I find this code to be a bit spaghetti... could we instead follow a pattern of
JVar enumSchemaVar = enumSchemaVarMap.computeIfAbsent(enumSchemaFingerprint, () ->
serializerClass.field(
JMod.PRIVATE | JMod.FINAL,
Schema.class,
getVariableName(enumSchema.getName() + "EnumSchema"),
codeModel.ref(Schema.class).staticInvoke("parse").arg(enumSchema.toString()))
);
valueToWrite = JExpr.invoke(enumSchemaVar, "getEnumOrdinal").arg(enumValueCasted.invoke("toString"));
... or something like that? Otherwise, the JExpr.invoke part duplicated twice, and since it is meta-code, that makes things extra confusing :D ...
There was a problem hiding this comment.
Thanks for the suggestion. Fixed.
There was a problem hiding this comment.
Beautiful! Love the red tint :D !
27c2c74 to
373916b
Compare
Support Avro 1.4 Enum type serialization here was initially introduced by @gaojieliu . Do you have any concern? |
Improved fast serialization speed by avoiding the cost of storing and retrieving Enum schemas in a Hashmap in Avro 1.4
373916b to
14b17a3
Compare
FelixGV
left a comment
There was a problem hiding this comment.
Thanks a lot Bingfeng! Looks great!
| implements FastSerializer<IndexedRecord> | ||
| { | ||
|
|
||
| private Map<Long, Schema> enumSchemaMap = new ConcurrentHashMap<Long, Schema>(); |
There was a problem hiding this comment.
Heh... deleting an unused variable... that sounds like the right call!
Improved fast serialization speed by avoiding the cost of storing and retrieving Enum schemas
in a Hashmap in Avro 1.4. We have seen ~44% improvement of serialization latency for Enum array.
This PR helps to resolve issue #50
JMH benchmark results of fast serialization time of an 200 elements EnumArray under Avro 1.4
Before
After