Skip to content

MAX_COLUMN_BYTES is applied only by default only if no annotation is …#59

Open
jakubZielAdform wants to merge 3 commits into
masterfrom
get-rid-of-max-column-bytes-default-limit-for-string-fields-in-generated-encoders
Open

MAX_COLUMN_BYTES is applied only by default only if no annotation is …#59
jakubZielAdform wants to merge 3 commits into
masterfrom
get-rid-of-max-column-bytes-default-limit-for-string-fields-in-generated-encoders

Conversation

@jakubZielAdform

Copy link
Copy Markdown
Collaborator

…applied on a field

@jakubZielAdform jakubZielAdform force-pushed the get-rid-of-max-column-bytes-default-limit-for-string-fields-in-generated-encoders branch 2 times, most recently from 5742e55 to d941d85 Compare July 2, 2026 10:06
@jakubZielAdform jakubZielAdform force-pushed the get-rid-of-max-column-bytes-default-limit-for-string-fields-in-generated-encoders branch from d941d85 to 91a07a1 Compare July 2, 2026 10:07
Column(q"false", q"${fl.length}", q"pw.writeFixedString(r, ${fl.length}, ${fl.truncate})")
case ml: MaxLength =>
Column(q"false", q"-1", q"pw.writeVarString(r, ${ml.length}, ${ml.truncate})")
}.get

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we get a non matching field in annotations this get will throw exception, can we add getOrElse abort so that we may get a clear msg instead of None.get

case fl: FixedLength =>
Column(q"false", q"${fl.length}", q"pw.writeFixedString(r, ${fl.length}, ${fl.truncate})")
case ml: MaxLength =>
Column(q"false", q"-1", q"pw.writeVarString(r, ${ml.length}, ${ml.truncate})")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the maximum allowed length of the field ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo, there should no be any max length when user is adding an annotation.
Case class that is processed here is supposed to be a mapping of db table and be an ultimate src of truth.

Tbh I think that silent truncating is also a bad option, i suppose that is probably done because one 'too big' record would fail entire file loading.

I think that would be nice to consider reading a schema at the beginning and treating it as a src of truth and based on that removing 'too big' records from a batch. Of course it would have to be done in runtime, not in preprocessing.

wdyt?

@jakubZielAdform jakubZielAdform force-pushed the get-rid-of-max-column-bytes-default-limit-for-string-fields-in-generated-encoders branch 2 times, most recently from f4513fb to 76bd37a Compare July 3, 2026 13:51
@jakubZielAdform jakubZielAdform force-pushed the get-rid-of-max-column-bytes-default-limit-for-string-fields-in-generated-encoders branch 2 times, most recently from 813c9bd to f53b5ac Compare July 4, 2026 22:59
s"KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://${dockerNetwork.ip}:$kafkaPort",
s"KAFKA_CONTROLLER_QUORUM_VOTERS=1@127.0.0.1:$kafkaControllerPort",
s"KAFKA_LOG_RETENTION_HOURS=${Int.MaxValue}",
s"KAFKA_MESSAGE_MAX_BYTES=${32_000_000}",

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allow for bigger msg sizes on a kafka broker (will be usable when test classes are exposed via #60)

@jakubZielAdform jakubZielAdform force-pushed the get-rid-of-max-column-bytes-default-limit-for-string-fields-in-generated-encoders branch from f53b5ac to 58108ec Compare July 4, 2026 23:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants