Fix unexpected EOF on log upload by akselaase · Pull Request #1978 · mendersoftware/mender

akselaase · 2026-06-23T11:07:09Z

Upon upgrading to Mender 5.1.0, we noticed that deployment log uploads would fail with a HTTP 400 "unexpected EOF" errors. Turns out this was an interaction with a patch we applied to the client and a latent bug in the upload retry code. Our patch simply logged a "Connected to ..." line with the resolved address of the remote. This however, caused the deployment log to grow in between the calculation of Content-Length and the actual log body streaming, causing a mismatch between declared and actual length. This caused server-side truncation of the uploaded JSON body, and thus JSON parsing failed with "unexpected EOF".

The fix is to avoid re-sanitizing the log file between content length calculation and body upload, such that the uploaded contents match the declared content length. If the upload fails, the client FSM logic will retry the whole task later (including re-sanitizing and calculating Content-Length).

Even though our patch exposed this issue, I think it's reachable on stock Mender as well by enabling debug logging, as there are some debug log statements in the same window IIRC.

Note that I initially developed this fix against 5.1.0, and have successfully tested it on our devices. When I rebased it on 5.1.x for upstreaming there were some conflicts with the large-log-file truncation logic (which necessitates the first commit). I have not tested these changes on top of 5.1.x in the field.

FileReader::Rewind() refused to rewind once the stream had reached EOF because the eofbit/failbit left by the previous read made the seek fail. Clear the stream state and seek to the configured start offset instead, so a reader created with a non-zero offset can be re-read from its start after reaching EOF. Ticket: None Changelog: None Signed-off-by: Aksel Lunde Aase <aksel.aase@disruptive-technologies.com>

Calling SanitizeLogs() as part of Rewind would invalidate the previous Content-Length calculation if the log has grown in the meantime. This would cause the server to reject the log upload with HTTP 400: unexpected EOF. The log file can grow between computing the Content-Length and streaming the body, because the deployment keeps logging to it while the upload is in flight. Ticket: None Changelog: Fix deployment log uploads being rejected in some cases. Signed-off-by: Aksel Lunde Aase <aksel.aase@disruptive-technologies.com>

mender-test-bot · 2026-06-23T11:07:20Z

Merging these commits will result in the following changelog entries:

Changelogs

mender (aksel.aase/log-upload-unexpected-eof)

New changes in mender since 5.1.x:

Bug Fixes

Fix deployment log uploads being rejected in some cases.

vpodzime

First of all -- thanks for the potential improvement!

I think this is a good change. But it made me think about this once again (last time was when I worked on the logs sanitation and later trimming). What is the point of recording deployment logs at the point where it's no longer possible to upload them? With this change any extra logs appended to the deployment logs file after starting the upload wont' be uploaded, right? And without it, they would be attempted to be uploaded with an unavoidable failure. So what is the point in recording them in the first place? I think we need to either:

call FinishDeploymentLogging() earlier to make sure deployment logs are finished/closed before being uploaded, or
make sure the total size is recalculated (and re-sent?) on a rewind.

akselaase · 2026-06-23T14:07:16Z

At least for us, continuing to log to the deployment log is valuable in at least two cases:

The deployment log upload fails temporarily (connectivity issues etc.,). The failures will be logged and uploaded on the later retries (re-sanitized).
The upload fails for some more permanent reason. They are still stored on the device, and we can later use mender-connect to inspect them on the device.

So its the "won't be uploaded" part that's inaccurate. There's still a chance they'll be uploaded (later retries), or at least accessed (through mender-connect or other means).

vpodzime · 2026-06-24T10:16:28Z

So its the "won't be uploaded" part that's inaccurate.

I see. Then I think the focus should be on how to make sure they are fully uploaded at the first possible attempt. To me that means we should make the Rewind() work with the original intention and fix the size mismatch.

akselaase · 2026-06-24T16:06:37Z

I'm not sure I get what you mean. Isn't this PR already doing that?

By fixing the size mismatch we either have to make the request body fit the declared content length (this PR), or make the content length fit the request body after its re-sanitized (when headers are already sent). The latter seems much more difficult, and I don't understand why you'd want to sanitize twice per upload anyway (once for calculating content length, then again when streaming the body).

It seems much more robust to just sanitize once when preparing the request, then use Rewind to seek to the correct offset in the sanitized file (without sanitizing again).

Please let me know if I'm missing something here.

vpodzime · 2026-06-25T07:58:33Z

The latter seems much more difficult

Absolutely. But isn't it the only way to make sure full deployment logs are uploaded? The way I understand it is that the Rewind() only happens in case of specific failures. Failures that should append messages to the deployment logs. So then we want these messages to be part of the deployment logs uploaded to the server. Which means that we need to re-sanitize the logs and sent a new Content-length header (perhaps by doing a completely new HTTP request if that's the only way).

Without that (and I believe that's what this PR is doing), incomplete deployment logs would be uploaded, masking the failure(s) from failed attempts. With complete deployment logs being available on the device. So this is an improvement for sure, I'm just trying to see (and suggest) if we could go one step further and implement the above behavior with complete deployment logs being uploaded.

akselaase · 2026-06-25T08:20:38Z

The way I understand it is that the Rewind() only happens in case of specific failures. Failures that should append messages to the deployment logs.

I think this is the misunderstanding. Rewind() happens unconditionally on each upload attempt, without any failure triggering it. I agree that any failing attempt should result in a new attempt including the failure logs, but that's not the worry here. Here's an excerpt of PushLogs on v5.1.0:

error::Error DeploymentClient::PushLogs(
	const string &deployment_id,
	const string &log_file_path,
	api::Client &client,
	LogsAPIResponseHandler api_handler) {
	auto logs_reader = make_shared<JsonLogMessagesReader>(log_file_path);
	auto err = logs_reader->SanitizeLogs(); // <-- (1) First call to SanitizeLogs
	if (err != error::NoError) {
		return err;
	}

	auto req = make_shared<api::APIRequest>();
	req->SetPath(http::JoinUrl(deployments_uri_prefix, deployment_id, logs_uri_suffix));
	req->SetMethod(http::Method::PUT);
	req->SetHeader("Content-Type", "application/json");
	req->SetHeader("Content-Length", to_string(logs_reader->TotalDataSize())); // <-- (2) Header length calculation
	req->SetHeader("Accept", "application/json");
	req->SetBodyGenerator([logs_reader]() {
		logs_reader->Rewind(); // <-- (3) Unconditional call to Rewind and second call to SanitizeLogs
                // If logs have been written to after (1) and before (3), which our patches do,
                // this will result in a mismatch of body size and content length.
                // As documented by SetBodyGenerator:
                //    "Make sure that the Content-Length set in headers matches the length of the body."
                // This means that we must not re-sanitize the log at this point, as that risks breaking that contract.
                // Therefore, remove SanitizeLogs from Rewind, and turn it into a pure seek to the proper offset.
		return logs_reader;
	});

Rewind is simply used as part of ordinary body streaming. Its wrong for it to re-sanitize logs at this point, because that is done just before. If this upload fails for any reason, the state machinery will schedule another attempt at PushLogs later, which will re-do the sanitation as desired.

Edit:

Note that we do not care that the log statements we do between (1) and (3) do not get included in the initial upload. They are only useful in case of failures, and in that case the upload will be retried later, including the relevant log entries.

vpodzime · 2026-06-25T09:05:38Z

I think this is the misunderstanding. Rewind() happens unconditionally on each upload attempt, without any failure triggering it.

🎯 ❗ This is it! And it's stupid! (I can say so because I wrote that code myself)

The Rewind() is required there because the body generator can be called multiple times by the HTTP code and it is expected to generate the same content repeatedly. And there are two issues:

it should do nothing on the first run of the generator (because the reader is properly initialized)
it should always generate the same content

Your change fixes the latter and makes the former issue harmless. But we should fix the former issue properly as well.

Thanks for your patience and keeping with me! ❤️

akselaase · 2026-06-25T09:11:31Z

No worries! Feel free to make edits to the branch if you had any specific changes in mind. I'll admit I'm a bit uneasy juggling all these offsets around to ensure we seek to the right place, and I believe you might have a better mental map of how that works. As far as I can tell the first bullet point is satisfied as well.

vpodzime · 2026-06-26T11:48:19Z

Hehe, understandable! I'll get (back) to this next week.

akselaase added 2 commits June 23, 2026 12:42

danielskinstad added the external contribution label Jun 23, 2026

danielskinstad requested review from michalkopczan and vpodzime June 23, 2026 11:16

vpodzime reviewed Jun 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix unexpected EOF on log upload#1978

Fix unexpected EOF on log upload#1978
akselaase wants to merge 2 commits into
mendersoftware:5.1.xfrom
akselaase:aksel.aase/log-upload-unexpected-eof

akselaase commented Jun 23, 2026

Uh oh!

mender-test-bot commented Jun 23, 2026

Uh oh!

vpodzime left a comment •

edited

Loading

Uh oh!

akselaase commented Jun 23, 2026

Uh oh!

vpodzime commented Jun 24, 2026

Uh oh!

akselaase commented Jun 24, 2026

Uh oh!

vpodzime commented Jun 25, 2026

Uh oh!

akselaase commented Jun 25, 2026 •

edited

Loading

Uh oh!

vpodzime commented Jun 25, 2026

Uh oh!

akselaase commented Jun 25, 2026

Uh oh!

vpodzime commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

akselaase commented Jun 23, 2026

Uh oh!

mender-test-bot commented Jun 23, 2026

Changelogs

mender (aksel.aase/log-upload-unexpected-eof)

Bug Fixes

Uh oh!

vpodzime left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

akselaase commented Jun 23, 2026

Uh oh!

vpodzime commented Jun 24, 2026

Uh oh!

akselaase commented Jun 24, 2026

Uh oh!

vpodzime commented Jun 25, 2026

Uh oh!

akselaase commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vpodzime commented Jun 25, 2026

Uh oh!

akselaase commented Jun 25, 2026

Uh oh!

vpodzime commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vpodzime left a comment •

edited

Loading

akselaase commented Jun 25, 2026 •

edited

Loading