Skip to content

esp_websocket_client: Add Kconfig options for PSRAM allocation (IDFGH-17011)#980

Open
shardt68 wants to merge 7 commits intoespressif:masterfrom
shardt68:feature/websocket_psram_stack_config
Open

esp_websocket_client: Add Kconfig options for PSRAM allocation (IDFGH-17011)#980
shardt68 wants to merge 7 commits intoespressif:masterfrom
shardt68:feature/websocket_psram_stack_config

Conversation

@shardt68
Copy link
Copy Markdown

@shardt68 shardt68 commented Dec 29, 2025

Description

This PR introduces two new Kconfig options to the esp_websocket_client component to allow memory allocation in external RAM (PSRAM):

  1. ESP_WS_CLIENT_TASK_STACK_IN_EXT_RAM: Enables allocation of the WebSocket task stack in PSRAM using xTaskCreateStaticPinnedToCore.
  2. ESP_WS_CLIENT_ALLOC_IN_EXT_RAM: Enables allocation of the esp_websocket_client structure and its internal configuration storage in PSRAM.

Motivation

In applications requiring multiple concurrent secure connections (e.g., TLS/HTTPS), internal RAM is a critical bottleneck. Each TLS session typically requires 16-20KB of internal RAM for buffers. By allowing the WebSocket client's task stack and internal structures to be moved to PSRAM, significant internal memory is freed for these critical network operations, improving overall system stability and preventing ESP_ERR_NO_MEM failures in memory-constrained scenarios.

Implementation Details

  • Uses heap_caps_calloc with MALLOC_CAP_SPIRAM for the client structure and config.
  • Implements xTaskCreateStaticPinnedToCore for the task stack when PSRAM is selected.
  • Safety: Includes a fallback mechanism that reverts to internal RAM allocation if PSRAM allocation fails at runtime.
  • Compatibility: The default behavior remains unchanged (internal RAM allocation).

Related

  • Related to general ESP-IDF best practices for PSRAM-enabled devices (ESP32-S3, ESP32-WROVER) to mitigate internal RAM fragmentation and exhaustion during heavy network I/O.

Testing

  • Hardware: Tested on ESP32-S3 (8MB Octal PSRAM).
  • Scenario: Verified stable WebSocket connection while simultaneously performing multiple concurrent HTTPS/TLS requests to external APIs.
  • Memory Verification: Confirmed via heap_caps_get_free_size that internal RAM usage decreased by the expected amount (~5KB for structure + stack size).
  • Stability: Verified correct task startup and proper resource cleanup during client destruction.
  • Fallback Test: Verified that the client still initializes correctly on a device with PSRAM disabled/unavailable by falling back to internal RAM.

Note

Medium Risk
Touches task creation and memory allocation/free paths, so mis-sizing the stack or allocation failures could prevent the client task from starting or could leak memory. Defaults remain unchanged unless the new Kconfig options are enabled.

Overview
Adds two new Kconfig options to move WebSocket client memory usage into PSRAM: allocating the esp_websocket_client/config storage via heap_caps_calloc and optionally allocating the WebSocket task stack in PSRAM.

When ESP_WS_CLIENT_TASK_STACK_IN_EXT_RAM is enabled, esp_websocket_client_start() switches to xTaskCreateStaticPinnedToCore with a PSRAM-backed stack and internal-RAM TCB, and ensures these buffers are freed during client teardown.

Written by Cursor Bugbot for commit 46b2d47. This will update automatically on new commits. Configure here.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Dec 29, 2025

CLA assistant check
All committers have signed the CLA.

Comment thread components/esp_websocket_client/esp_websocket_client.c
Comment thread components/esp_websocket_client/esp_websocket_client.c Outdated
Comment thread components/esp_websocket_client/esp_websocket_client.c
Comment thread components/esp_websocket_client/esp_websocket_client.c
Comment thread components/esp_websocket_client/esp_websocket_client.c Outdated
Comment thread components/esp_websocket_client/esp_websocket_client.c Outdated
Comment thread components/esp_websocket_client/esp_websocket_client.c Outdated
@shardt68 shardt68 force-pushed the feature/websocket_psram_stack_config branch 2 times, most recently from 13277dd to 27c4e4b Compare December 29, 2025 12:14
@github-actions github-actions bot changed the title esp_websocket_client: Add Kconfig options for PSRAM allocation esp_websocket_client: Add Kconfig options for PSRAM allocation (IDFGH-17011) Dec 29, 2025
@espressif-bot espressif-bot added the Status: Opened Issue is new label Dec 29, 2025
Comment thread components/esp_websocket_client/esp_websocket_client.c Outdated
@shardt68 shardt68 force-pushed the feature/websocket_psram_stack_config branch from 27c4e4b to 8a75888 Compare December 29, 2025 12:40
Comment thread components/esp_websocket_client/esp_websocket_client.c Outdated
@shardt68 shardt68 force-pushed the feature/websocket_psram_stack_config branch from 8a75888 to b683518 Compare December 29, 2025 12:54
Comment thread components/esp_websocket_client/esp_websocket_client.c Outdated
Comment thread components/esp_websocket_client/esp_websocket_client.c Outdated
@shardt68 shardt68 force-pushed the feature/websocket_psram_stack_config branch from b683518 to 6c381b9 Compare December 29, 2025 13:03
Comment thread components/esp_websocket_client/esp_websocket_client.c Outdated
@shardt68 shardt68 force-pushed the feature/websocket_psram_stack_config branch from 6c381b9 to e84c0eb Compare December 29, 2025 15:43
Comment thread components/esp_websocket_client/esp_websocket_client.c Outdated
@shardt68 shardt68 force-pushed the feature/websocket_psram_stack_config branch 2 times, most recently from cbf6bad to 016bf62 Compare December 29, 2025 16:14
Comment thread components/esp_websocket_client/esp_websocket_client.c Outdated
Comment thread components/esp_websocket_client/esp_websocket_client.c
@shardt68 shardt68 force-pushed the feature/websocket_psram_stack_config branch from 016bf62 to 0ae9008 Compare December 29, 2025 16:46
Comment thread components/esp_websocket_client/esp_websocket_client.c Outdated
@shardt68 shardt68 force-pushed the feature/websocket_psram_stack_config branch 2 times, most recently from 15cba04 to 9d9f1c9 Compare December 29, 2025 17:04
Comment thread components/esp_websocket_client/esp_websocket_client.c Outdated
Comment thread components/esp_websocket_client/esp_websocket_client.c Outdated
@shardt68 shardt68 force-pushed the feature/websocket_psram_stack_config branch from 9d9f1c9 to 391fb0b Compare December 29, 2025 17:20
@shardt68
Copy link
Copy Markdown
Author

Hi @gabsuren,

I wanted to follow up on this PR to see if there is any feedback or if any additional information is needed from my side to move this forward.

These changes are quite important for memory-constrained applications using multiple concurrent TLS/HTTPS connections (like Spotify Connect implementations). By allowing the WebSocket task stack and client structure to be moved to PSRAM, we can significantly free up critical internal RAM.

Beyond the memory allocation, this PR also introduces a more robust lifecycle management (using DESTRUCTION_IN_PROGRESS_BIT and deferred task deletion). This hardening has proven to be very stable and effectively prevents race conditions during rapid reconnect/destroy cycles that were previously an issue.

The branch is currently conflict-free and has been extensively tested on ESP32-S3. Looking forward to your thoughts!

@gabsuren
Copy link
Copy Markdown
Collaborator

hi @shardt68
Sorry for the delayed response. I’ll take a closer look at the PR soon.

In the meantime, I noticed some compilation issues in the target tests due to the PR changes, for example errors around WEBSOCKET_EVENT_HEADER_RECEIVED being undeclared, a mismatched #endif in esp_websocket_client.c, and a warning about websocket_header_hook being defined but not used.
These are causing the build to fail.

https://github.com/espressif/esp-protocols/actions/runs/20578631568/job/61449098113?pr=980

Best regards,
Suren

Comment thread components/esp_websocket_client/esp_websocket_client.c Outdated
Comment thread components/esp_websocket_client/esp_websocket_client.c Outdated
Comment thread components/esp_websocket_client/esp_websocket_client.c Outdated
Comment thread components/esp_websocket_client/esp_websocket_client.c Outdated
@shardt68
Copy link
Copy Markdown
Author

Hi @gabsuren,

Thank you for your feedback! I've fixed the build issues and the client->run state inconsistency identified by the bot in the latest commit.

Regarding the scope of this PR: While the primary goal was to add PSRAM allocation support, we encountered several race conditions and lifecycle issues during testing (some of which were also highlighted by the Cursor review bot). To ensure the stability of the component—especially when using static task buffers in PSRAM—we felt it was necessary to harden the task lifecycle.

Key additions beyond the memory allocation include:

A deferred destruction mechanism (esp_websocket_client_destroy_task) and a DESTRUCTION_IN_PROGRESS_BIT to prevent use-after-free and deadlocks when destroying the client from within its own event loop or callback.
More consistent locking in start(), stop(), and destroy() to prevent state corruption during rapid reconnect cycles.
Switching to vTaskSuspend(NULL) for task exit to avoid the "task self-deletion" race condition.
We believe these changes are vital for a production-ready component, but we realize they touch core logic. Could you let us know if you are comfortable with this hardening/refactoring being part of this PR, or if you would prefer a more stripped-back version focused strictly on the PSRAM options?

Looking forward to your guidance!

@gabsuren
Copy link
Copy Markdown
Collaborator

Hi @shardt68,
Thank you for the detailed explanation and the additional hardening work!

I've reviewed the recent commits, and I agree with your assessment. These changes significantly touch the core lifecycle and concurrency model of the client.

To answer your question: Yes, please split these changes into a separate PR.
We prefer to keep the PSRAM feature purely focused on the allocation changes. The hardening and refactoring (deferred destruction, race condition fixes, etc.) are valuable but should be reviewed and tested independently to ensure no regressions are introduced in the standard behavior before we merge them.

Also, regarding the version update in idf_component.yml: Please do not manually update the version in the YAML file. We handle versioning and releases through our own release process/tags, so that change should be reverted in this PR.

Could you please:
Revert this PR to focus solely on the PSRAM configuration options.
Open a new PR with the concurrency and lifecycle hardening fixes.

Thanks again for your contribution!


xSemaphoreTakeRecursive(client->lock, portMAX_DELAY);
if (client->task_handle) {
while (client->task_handle && eTaskGetState(client->task_handle) != eSuspended) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait for task to suspend might block indefinitely if the task is blocked elsewhere. If the client is stuck in a network call or blocked on a mutex, stop() will hang the caller forever

}
}
vTaskDelete(NULL);
vTaskSuspend(NULL);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the task crashes, blocks, or enters an infinite loop before it reaches line 1502, it will never become suspended right?

xSemaphoreGiveRecursive(client->lock);

if (!already_destroying) {
if (xTaskCreate(esp_websocket_client_destroy_task, "ws_destroy", 4096, client, client->config->task_prio, NULL) != pdPASS) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the system is Out Of Memory (OOM), the helper task is not created. The client task suspends itself, but no one is left to free its allocated memory.

Copy link
Copy Markdown
Collaborator

@gabsuren gabsuren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some notes. Please take a look.
The task cleanup logic is duplicated across 4 different locations. This increases maintenance burden and the risk that future changes will introduce bugs if not applied consistently to all 4 spots.

@gabsuren
Copy link
Copy Markdown
Collaborator

@shardt68 Regarding the concurrency issues you encountered:
May I ask what specific deadlock issues you faced during development? Can you provide example code and config so we can trace it?
This will help us design a safer hardening fix in a separate PR.

Thank you again!
Suren

Comment thread components/esp_websocket_client/esp_websocket_client.c
Comment thread components/esp_websocket_client/esp_websocket_client.c
@shardt68
Copy link
Copy Markdown
Author

Hi @gabsuren,

Thank you for the guidance! I have updated the PR to focus strictly on the PSRAM allocation configuration. Specifically, I have:

Reverted the version bump in idf_component.yml and CHANGELOG.md.
Removed the concurrency hardening (deferred destruction, state bits, etc.).
Removed the dynamic buffer feature as it was out of scope for the PSRAM requirement.
Preserved the MALLOC_CAP_SPIRAM allocation for the client structure and the static task stack support.
Regarding your question about the deadlocks, here are the primary scenarios we encountered during development:

  1. Self-Destruction Deadlock (Task Context)
    The most critical deadlock occurs when esp_websocket_client_destroy() is called from within the client's own event handler (e.g., in response to an error).

Example Scenario:

static void websocket_event_handler(void *handler_args, esp_event_base_t base, int32_t event_id, void *event_data) {
    if (event_id == WEBSOCKET_EVENT_ERROR) {
        /* 
         * ❌ DEADLOCK: esp_websocket_client_destroy calls stop_wait_task() 
         * which waits for the STOPPED_BIT. 
         * The task is currently blocked in this callback, so it never reaches 
         * the end of its loop to set the STOPPED_BIT.
         */
        esp_websocket_client_destroy(client); 
    }
}
  1. Task Lifecycle Race (Rapid Start-Stop)
    When calling stop() followed immediately by start(), we found that client->task_handle could become inconsistent. If the previous task had not yet reached its exit point (vTaskDelete), the new xTaskCreate call would overwrite the handle while the old task was still cleaning up the transport or internal resources.

  2. API Handle "Check-then-use" Race
    In multi-threaded environments, one task might call esp_websocket_client_send_text() while another calls stop(). Previously, if a context switch occurred after the connection check but before the transport write, the system could attempt to use a transport handle that was just closed or invalidated.

I have already prepared these hardening fixes in a separate branch and will submit them as a new PR for independent review once this PSRAM feature is addressed.

Please let me know if this version looks good for merging!

Best regards,
shardt68

@shardt68 shardt68 force-pushed the feature/websocket_psram_stack_config branch from d2242b7 to 61a8100 Compare January 30, 2026 21:04
Comment thread components/esp_websocket_client/esp_websocket_client.c
Comment thread components/esp_websocket_client/esp_websocket_client.c Outdated
Comment thread components/esp_websocket_client/esp_websocket_client.c Outdated
@gabsuren
Copy link
Copy Markdown
Collaborator

gabsuren commented Feb 3, 2026

@shardt68 thank you for simplifying the PR to consider only the PSRAM part.
Left some comments for your consideration.

Best regards,
Suren

@gabsuren
Copy link
Copy Markdown
Collaborator

Hi @gabsuren,

Thank you for the guidance! I have updated the PR to focus strictly on the PSRAM allocation configuration. Specifically, I have:

Reverted the version bump in idf_component.yml and CHANGELOG.md. Removed the concurrency hardening (deferred destruction, state bits, etc.). Removed the dynamic buffer feature as it was out of scope for the PSRAM requirement. Preserved the MALLOC_CAP_SPIRAM allocation for the client structure and the static task stack support. Regarding your question about the deadlocks, here are the primary scenarios we encountered during development:

  1. Self-Destruction Deadlock (Task Context)
    The most critical deadlock occurs when esp_websocket_client_destroy() is called from within the client's own event handler (e.g., in response to an error).

Example Scenario:

static void websocket_event_handler(void *handler_args, esp_event_base_t base, int32_t event_id, void *event_data) {
    if (event_id == WEBSOCKET_EVENT_ERROR) {
        /* 
         * ❌ DEADLOCK: esp_websocket_client_destroy calls stop_wait_task() 
         * which waits for the STOPPED_BIT. 
         * The task is currently blocked in this callback, so it never reaches 
         * the end of its loop to set the STOPPED_BIT.
         */
        esp_websocket_client_destroy(client); 
    }
}
  1. Task Lifecycle Race (Rapid Start-Stop)
    When calling stop() followed immediately by start(), we found that client->task_handle could become inconsistent. If the previous task had not yet reached its exit point (vTaskDelete), the new xTaskCreate call would overwrite the handle while the old task was still cleaning up the transport or internal resources.
  2. API Handle "Check-then-use" Race
    In multi-threaded environments, one task might call esp_websocket_client_send_text() while another calls stop(). Previously, if a context switch occurred after the connection check but before the transport write, the system could attempt to use a transport handle that was just closed or invalidated.

I have already prepared these hardening fixes in a separate branch and will submit them as a new PR for independent review once this PSRAM feature is addressed.

Please let me know if this version looks good for merging!

Best regards, shardt68

@shardt68 sorry I missed your last message, thanks for providing an example.
Actually, this is a known constraint of the task lifecycle here :)

The deadlock occurs because esp_websocket_client_destroy() attempts to stop the WebSocket task and waits for it to finish. However, when you call this function from within the event handler (e.g., WEBSOCKET_EVENT_ERROR), you are running inside that very task. The task essentially waits for itself to finish, causing a deadlock.

Instead of destroying the client immediately, signal it to destroy itself upon exit.

❌ Incorrect (Causes Deadlock):

static void websocket_event_handler(...) {
    if (event_id == WEBSOCKET_EVENT_ERROR) {
        // DEADLOCK: Waits for this task to stop while running inside it
        esp_websocket_client_destroy(client); 
    }
}

✅ Correct:
Use esp_websocket_client_destroy_on_exit(). This sets a flag so the task cleans up its own resources gracefully when it exits the loop.

static void websocket_event_handler(...) {
    if (event_id == WEBSOCKET_EVENT_ERROR) {
        // Safe: Signals the task to clean up after it breaks the loop
        esp_websocket_client_destroy_on_exit(client);
    }
}

Best regards,
Suren

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

This is the final PR Bugbot will review for you during this billing cycle

Your free Bugbot reviews will reset on March 20

Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

client->task_handle = xTaskCreateStaticPinnedToCore(
esp_websocket_client_task,
client->config->task_name ? client->config->task_name : "websocket_task",
client->config->task_stack / sizeof(StackType_t),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stack size incorrectly divided, reducing it to 1/4

High Severity

In ESP-IDF's FreeRTOS, xTaskCreateStaticPinnedToCore expects ulStackDepth in bytes (unlike vanilla FreeRTOS which uses words). The code passes client->config->task_stack / sizeof(StackType_t), which divides the intended stack size by 4 on 32-bit platforms. With the default 4096-byte stack, the task only receives 1024 bytes, almost certainly causing a stack overflow. The value passed here needs to match what was allocated — just client->config->task_stack.

Fix in Cursor Fix in Web

@shardt68 shardt68 requested a review from gabsuren March 23, 2026 15:56
@cursor
Copy link
Copy Markdown

cursor bot commented Mar 31, 2026

You have used all of your free Bugbot PR reviews.

To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants