Skip to content

Support enrolling newer RTL8720cm/CLIP firmware (protocolVer 4.9): publicKey whitespace fix + generic device fallback#81

Draft
aaronsb wants to merge 3 commits into
anszom:masterfrom
aaronsb:teach/newer-firmware-enrollment
Draft

Support enrolling newer RTL8720cm/CLIP firmware (protocolVer 4.9): publicKey whitespace fix + generic device fallback#81
aaronsb wants to merge 3 commits into
anszom:masterfrom
aaronsb:teach/newer-firmware-enrollment

Conversation

@aaronsb

@aaronsb aaronsb commented Jun 26, 2026

Copy link
Copy Markdown

Before you read

I wasn't sure how to introduce this into the project, because in order to enroll the machine, I had to broaden some assumptions built into the code that rethink has now - for instance, greedy device enroll vs selective, QA mode in headers, geo regions, crypto pairing, cert storage tics, etc. It's also not finished - I had to cycle the power on the dryer after enrollment was finished to put it in the active state, because I don't know what command to send back to issue a reset. Anyway, it's here for reference, not necessarily to pull in.

What & why

Brings up a current-gen LG appliance — an BDH_D30007_US dryer (DeviceType 202, RTK_RTL8720cm
"CLIP" module, protocolVer 4.9, sw 2.11.263) — fully local on rethink. It now completes SoftAP →
/route → cert → MQTT clip, enrolls, and streams telemetry with zero LG cloud contact (verified
across a full ~50-minute dryer cycle: one MQTT connection, 0 undeploys, ~960 telemetry packets).

Two focused code changes + a docs page. Each commit is self-explaining.

1. rethink-setup: strip whitespace from the setup publicKey (the real blocker)

The hardcoded setup publicKey is a tab-indented template literal, so every base64 line carries a
leading TAB inside the PEM. Older firmware tolerates this; the RTL8720cm CLIP parser is strict —
with the indentation, getDeviceInfo's RSA-encrypt step fails:

encrypt_val: ''
extra: 'POWER_ON|...|encryptRes:ffff'

…so the device can't build a cert request and loops on /route forever — which looks like a /route
bug but isn't. A clean PEM (same key bytes, base64 at column 0) → encryptRes:0, a valid
encrypt_val, and it proceeds straight through /route/certificate → cert → MQTT.

2. ha_bridge: generic raw-capture fallback for unknown device types

Today an unknown modelId is dropped (device type ... unknown → return), so a brand-new appliance
can provision perfectly and still never enroll — and you get no captured data to build a class from.
The fallback enrolls any unknown thinq2 device via a minimal class that completes onboarding (so it
stays connected) and republishes raw packet hex to a diagnostic HA sensor. Adding real support then
becomes a decode exercise against live data instead of guesswork.

3. docs/enrolling-newer-firmware.md

Ties it together and documents the findings that don't need code here: the legacy-TLS profile (your
issues #17/#18), the post-provision power-cycle this firmware needs to settle, and the things that
were NOT the problem (fake OTP, shared publicKey, minimal /route body, svcphase) so nobody
re-chases them.

Deliberately excluded / honest caveats

  • No changes to the OTP, the /route body, or TLS defaults here — kept focused. The OTP and the
    minimal /route body work as-is for this firmware; the legacy-TLS profile is your Support newer OpenSSL versions #17/Handshake failed #18.
  • Our working setup also ran svcphase: 'OP' (vs the debug-UART 'QA' default). We could not
    isolate whether OP is required for this firmware, so we left your default alone — flagging in
    case it matters for other newer modules.
  • The post-provision power-cycle is documented as a manual step; whether a clean software-triggerable
    reset exists is an open question.

What I'd value your input on

  • Does "strict PEM parser on newer CLIP firmware" match your experience with other modules, or is it
    specific to RTL8720cm?
  • Is there a cloud- or setup-triggerable clean reset that would avoid the manual power-cycle?
  • Preferred shape for the generic fallback — always-on, or behind an opt-in flag?

I can share full packet captures + decrypted MQTT clip transcripts (SoftAP setup, /route, cert,
deploy/completeProvisioning/_ack, telemetry). Device did fe8b2ea0-…-3034dbd055fe.


This was obviously drafted with Claude code and my edits and steering. Further work I won't continue to disclose that, assume it is co-written or just me replying like a caveman or something.

aaronsb added 3 commits June 26, 2026 11:50
The hardcoded setup publicKey was built from a tab-indented template literal,
so each base64 line carried a leading TAB inside the PEM. Older LG firmware
tolerates this, but the RTL8720cm "CLIP" module (DeviceType 202, protocolVer
4.9 — e.g. the BDH_D30007_US dryer) uses a strict PEM parser: with the
indentation, getDeviceInfo's RSA-encrypt step fails (encrypt_val:'' /
extra ...encryptRes:ffff), the device can't build a cert request, and it loops
on /route indefinitely — which looks like a /route bug but isn't.

Cleaning the PEM (base64 at column 0, \n only; same key bytes) makes the device
return encryptRes:0 with a valid encrypt_val and proceed straight through
/route -> /route/certificate -> POST /device/:id/certificate -> MQTT.

Claude-Session: https://claude.ai/code/session_01LGQAZq7ycoMjk4WdckeAQZ
Today an unknown modelId is dropped (logs 'device type ... unknown' and
returns), so a brand-new appliance that provisions perfectly still never
enrolls in HA, gets no answer from the cloud side, and you get zero captured
data to build a device class from — the chicken-and-egg that keeps new models
unsupported.

Add a generic 'loose capture' fallback: an unknown thinq2 device enrols via a
minimal class that completes onboarding (so it stays connected) and republishes
every raw packet's hex to a diagnostic HA sensor. Adding real support for a new
model then becomes a decode exercise against live data instead of guesswork.

Claude-Session: https://claude.ai/code/session_01LGQAZq7ycoMjk4WdckeAQZ
Ties together the two code changes (publicKey whitespace, generic fallback) and
documents the findings that don't need code here: the legacy-TLS profile (issues
anszom#17/anszom#18), the post-provision power-cycle this firmware needs to settle, and the
things that were NOT the problem (fake OTP, shared publicKey, minimal /route
body, svcphase) so nobody re-chases them.

Claude-Session: https://claude.ai/code/session_01LGQAZq7ycoMjk4WdckeAQZ
@anszom

anszom commented Jun 26, 2026

Copy link
Copy Markdown
Owner

Good find! It's likely that the whitespace issue might have been the root cause for #58.

I'll need some time to work through the backlog of PRs and issues, so I'll make more comments later.

@anszom

anszom commented Jun 26, 2026

Copy link
Copy Markdown
Owner

Is there a cloud- or setup-triggerable clean reset that would avoid the manual power-cycle?

In some old notes I've found mention that a "resetDevice" command does exactly that. You can try it on your module.
That string is also present in the LG app, so it's plausible that they use it - at least for some devices.

encrypt_val: ''
extra: 'POWER_ON|...|encryptRes:ffff' # the device's RSA-encrypt step failed
```
The device then can't build a cert request, reboots onto Wi-Fi, polls `/route` a few times, sends a

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My device did only one pull


## 4. Post-provision: power-cycle the appliance once

After `releaseDev` the device reboots itself, but on this firmware it then sits in "connecting…",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This did not occured on my machine, it stayed connected

@diijkstra

Copy link
Copy Markdown

Worked on my device, x-service-phase was set to QA, version 2.11.306 of the modem clip_ble_v1.9.215.

I would suggest checking encrypt_val in provisioning flow and aborting if it is empty string, will be easier to catch failures.

I am not sure about reboot after provisioning, my device did not needed it. logs are at #58 (comment)

@maciejsszmigiero

Copy link
Copy Markdown
Collaborator

The Home Assistant instance does not seem to have much use for such "placeholder" device - why not just leave the packets streaming to clip/message/devices/ and let someone read it from there much like tools/packet-parser.ts does?

This will also prevent polluting HA dashboard or history.

@anszom

anszom commented Jun 27, 2026

Copy link
Copy Markdown
Owner

I am not sure about reboot after provisioning, my device did not needed it. logs are at #58 (comment)

Indeed this is the first device that appears to require this. But if it works, then maybe we should simply send the reboot command to every device for maximum compatibility.

@anszom

anszom commented Jun 27, 2026

Copy link
Copy Markdown
Owner

The Home Assistant instance does not seem to have much use for such "placeholder" device - why not just leave the packets streaming to clip/message/devices/ and let someone read it from there much like tools/packet-parser.ts does?

Agreed. I don't think mapping the raw packets to HomeAssistant would provide any advantages.

Raw packets can also be observed through the management web UI & its underlying websocket.

@aaronsb

aaronsb commented Jun 27, 2026

Copy link
Copy Markdown
Author

My intention for tracking raw packets was to collect as much variety of history as possible (over a week or two of operations) which can then be dumped as a full body of data all at once. For me, maintaining multiple endpoints is inconvenient in a home network.

I performed some testing this morning and if x-service-phase is set to QA, enrollment of the dryer fails repeatedly, until set to OP. The newer firmware seems to want qic-qa-* routing and reject common.lgthinq.com (4.9+)

@aaronsb

aaronsb commented Jun 27, 2026

Copy link
Copy Markdown
Author

Just thinking out loud here - if someone intentionally or unknowingly enrolls a machine that's not supported, I think it would be a much better experience if the machine can at least be captured (a phase one, if you will) and marked as not supported (...yet 😉) (the intention of the code changes) which means that the traces can be exported. As new versions of the backend get opcode translation, then the unrecognized opcodes are marked as supported and data arrives in HA.

It seems like a lot of extra steps to just discard opcodes. Anyway, that's why I marked this as a draft PR with a teach/ header - no reason to pull any of this in if it doesn't fit the architecture you have in mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants