-
Notifications
You must be signed in to change notification settings - Fork 11
docs: specification detailing startup failure behaviour in the SDKs #112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,146 @@ | ||||||||||
| # Initialization & Configuration Validation Specification | ||||||||||
|
|
||||||||||
| This document specifies how Unleash SDKs MUST behave during initialization when configuration is missing or invalid. | ||||||||||
|
|
||||||||||
| ## Overview | ||||||||||
|
|
||||||||||
| Unleash SDKs are frequently configured via environment variables. At runtime, required configuration such as url and token may be null, undefined, or otherwise invalid. | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. did you use "such as" because you expect more required configs in future?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. isn't undefined too JS specific to be part of the spec? maybe "absent"?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. does invalid mean non conformant to the token and url spec?
This comment was marked as outdated.
Sorry, something went wrong.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Because it's late in the day and I don't have the entire list in my mind 😅 Need to do a teensy bit of digging tomorrow and figure out what exactly is required. I think it's just this but I'm not 100% sure
Cheers! Good feedback. Absent seems like the right thing here
I think I'm being a bit too vague here. Invalid in my mind means "can't talk to unleash"/"unleash doesn't like your token". I specifically don't want to go down the path of validating URLs because it's weird and hard to do. I'd like to reuse existing failure paths that we have in the SDK - we're already pretty robust around not crashing when upstream is unreachable
Agree, I'll figure out what this is and update
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the full list is a nice-to-have, it can be evolved or changed over time. Happy having it if it's cheap. |
||||||||||
|
|
||||||||||
| SDKs MUST be resilient in production environments. Misconfiguration MUST NOT cause the host application process to crash by default. Instead, SDKs MUST start in a degraded state, emit appropriate warnings or errors through existing mechanisms, and provide predictable default behavior. | ||||||||||
|
|
||||||||||
| This specification defines: | ||||||||||
|
|
||||||||||
| - Default (soft) initialization behavior | ||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't find "soft" and "hard" clarifying here. Disclaimer: I'm reading from top to bottom so maybe it becomes clearer later. If I forget to clarify please ping me. Edit:
Suggested change
|
||||||||||
| - Hard initialization behavior | ||||||||||
| - Debug/development mode behavior | ||||||||||
| - Public API behavior in degraded state | ||||||||||
| - Backup state loading semantics | ||||||||||
| - Event and logging requirements | ||||||||||
|
|
||||||||||
| ## Definitions | ||||||||||
|
|
||||||||||
| Required configuration parameters: | ||||||||||
|
|
||||||||||
| - **URL** - A URL specifying the base path of the Unleash client API | ||||||||||
| - **Token** - An Unleash token that allows access to the relevant client API | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I know we have very unfortunate name for our backend APIs that we call client API but maybe we can say "backend client API" to be less confusing to newcomers?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Gosh we are being burned - this needs to cover frontend SDKs here too so I think I need to be more clear here. This means "the URL that you put in the config at startup". I think some frontend SDKs do weird things like expect the full path too. Needs some clarity though, agree
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we landed on backend vs frontend. The fact that the backend lives under /client and that we used to call those APIs client is confusing. Maybe we could introduce a /backend url, and deprecate /client, but I don't want to go this route now. I think what Mateusz suggests makes sense for now, just be clearer with words :) |
||||||||||
|
|
||||||||||
| These parameters are required for communication with Unleash. | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's already said on line 22 |
||||||||||
|
|
||||||||||
| **Soft initialization**: | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Opinion. I find "soft" too vague. I'd prefer "lenient" because it means it tolerates misconfiguration. The opposite would be "strict".
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agree, that's a nice suggestion There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure abut "lenient". After looking for some alternatives I quite like |
||||||||||
|
|
||||||||||
| The default SDK initialization method. | ||||||||||
|
|
||||||||||
| **Hard initialization**: | ||||||||||
|
|
||||||||||
| An explicit, non-default initialization method that enforces successful initial synchronization before returning control. | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. non-default -> opt-in
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not sure what returning control means here. Some APIs have both sync and async init.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agree, nice suggestion |
||||||||||
|
|
||||||||||
| **Debug/Development mode**: | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd call it development mode (although Rust would disagree). IMO development mode is the local-dev mode, while debug mode is when you set |
||||||||||
|
|
||||||||||
| A runtime mode provided by the host ecosystem (e.g., Rust debug_assertions, Node NODE_ENV !== "production"), indicating the application is not running in production. Only relevant where the ecosystem supports this idiomatically. | ||||||||||
|
|
||||||||||
| **Degraded State**: | ||||||||||
|
|
||||||||||
| The SDK is initialized and callable, but has not successfully synchronized with Unleash due to missing or invalid configuration. | ||||||||||
|
|
||||||||||
|
|
||||||||||
| ## Default (Soft) Initialization | ||||||||||
|
|
||||||||||
| ### Required Behavior | ||||||||||
|
|
||||||||||
| When initialized using the default initialization method: | ||||||||||
|
|
||||||||||
| 1) The SDK MUST NOT throw, panic, or terminate the host process due to missing or invalid required configuration. | ||||||||||
| 2) The SDK MUST start and enter a running state, even if url and/or token are missing or invalid. | ||||||||||
| 3) The SDK MUST emit its normal ready (or equivalent) event once initialization completes. | ||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||||||||||
|
|
||||||||||
| `ready` in this context means: | ||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it possible to add information about the degraded state to the event in some way? Even if only for users to create their own custom logs I think this could be helpful. |
||||||||||
| The SDK has initialized and public API methods are callable. | ||||||||||
|
Comment on lines
+56
to
+57
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we can steal some terminology from K8s here... liveness vs readiness. I think when you finish constructing the SDK, it's alive (liveness => true), while after you load from upstream, or backup, or bootstrap, it is "ready" (we could argue there are different levels of readiness, particularly if you have bootstrap>backup>upstream, maybe it's a level of freshness, IDK). But agree, after we attempted to fetch from Unleash, we should move into the ready state, because we did our best effort to retrieve toggles.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree that spec should distinguish between "fetched" successfully vs "gave up and loaded backup" so that users know if they're running on real data or defaults |
||||||||||
| It does NOT guarantee that flags have been fetched from Unleash, only that the SDK has made a best effort attempt to retrieve flags. | ||||||||||
|
|
||||||||||
| ### Validation Semantics | ||||||||||
|
|
||||||||||
| - Soft validation MUST check only for presence of required configuration parameters. | ||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: "soft" here may be confused/conflated with "Soft initialization" What about words like
Suggested change
|
||||||||||
| - SDKs MUST NOT reject URLs purely based on format validation. Communication success is determined by whether Unleash responds successfully. | ||||||||||
| - Missing or invalid required configuration MUST be surfaced through existing logging or error-event mechanisms. | ||||||||||
| - Configuration that is not required but missing should default to safe, known values. | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do we want to touch non required params in this spec? If so, what should happen with optional but incorrect parameters?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Is this too vague? I did want to cover these, apparently I'm doing a bad job of it 😅
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think Mateusz point is that this is increasing the surface area of this definition, we could address optional params in a different contract. Maybe it's fine if we don't even mention them cause it doesn't alter this document's scope There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure what
Suggested change
|
||||||||||
|
|
||||||||||
| ### Network Behavior | ||||||||||
|
|
||||||||||
| - The SDK MAY proceed with its normal fetch or streaming scheduling. | ||||||||||
| - Resulting network failures MUST be surfaced via existing error/logging mechanisms. | ||||||||||
| - Repeated failures MUST NOT cause process termination. | ||||||||||
|
|
||||||||||
| Rate limiting and retry behavior are governed by existing SDK mechanisms and MUST NOT be bypassed. | ||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the
I think it's a de facto offline mode and it would be a lot easier to explain why communication/retries should still be working if it was called as such. 🤔 But, maybe I'm wrong, explicit "offline" mode could make all of this even harder to understand. 🤔 So maybe "disconnected" mode? 🤷 |
||||||||||
|
|
||||||||||
| ### Backup State Loading | ||||||||||
|
|
||||||||||
| 1) The SDK MUST attempt to load persisted toggle state during initialization, regardless of configuration validity. | ||||||||||
| 2) Backup loading MUST occur before emitting the ready event. | ||||||||||
|
|
||||||||||
| Backup loading failure MUST NOT prevent SDK startup. | ||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
|
|
||||||||||
| ## Degraded Public API Behavior | ||||||||||
|
|
||||||||||
| If the SDK has not successfully fetched toggle state from Unleash and bootstrapping and backup loading have failed: | ||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't see a section on Also I would find it clearer if you used Network in this sentence as the section describing it is called "Network Behavior". Or maybe add a small (network) clarifier?
Suggested change
|
||||||||||
|
|
||||||||||
| `isEnabled` (or equivalent SDK method) | ||||||||||
| - MUST return false | ||||||||||
| - If a caller-provided fallback value is supplied, that fallback MUST be returned. | ||||||||||
|
|
||||||||||
| `getVariant` (or equivalent SDK method) | ||||||||||
| - MUST return the SDK’s canonical default variant (disabled). | ||||||||||
| - If a caller-provided fallback variant is supplied, that fallback MUST be returned. | ||||||||||
|
|
||||||||||
| `listToggles` (or equivalent SDK method) | ||||||||||
| - MUST return an empty list. | ||||||||||
|
|
||||||||||
| ## Periodic Warnings | ||||||||||
|
|
||||||||||
| If required configuration is missing or invalid: | ||||||||||
| - The SDK MUST emit at least one clear warning during startup identifying which required parameters are missing. | ||||||||||
| - Continued connectivity failures MUST be surfaced using existing logging/event mechanisms. | ||||||||||
| - Emission MUST use existing rate-limiting or retry controls provided by the SDK. | ||||||||||
|
|
||||||||||
| ## Hard Initialization | ||||||||||
|
|
||||||||||
| SDKs MAY provide an explicit, non-default initialization method that enforces strict validation. | ||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I would follow this explanation and rename "Hard Initialization" to "Strict Initialization". |
||||||||||
|
|
||||||||||
| ### Required Behavior | ||||||||||
|
|
||||||||||
| When using a hard initialization method: | ||||||||||
|
|
||||||||||
| 1) url and token MUST be treated as required. | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this will be a breaking change for some SDKs since it's a default behavior nowadays. Are we gonna bump major versions for them?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we need to then yeah we will. But I don't intend to add hard/strict initialization to SDKs that don't have it until someone cries for it. This is more to clarify the behavior of SDKs that do have this right now. If we gotta break them, then so be it |
||||||||||
| 2) The SDK MUST attempt a full fetch from Unleash. If retries are supported in the SDK then the retry limit must be exceeded. | ||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is "fetch" cross platform? How about "network request" or something similar?
If there is a Maybe retries and retry-limit could be a new point
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a missing 'not' here?
Suggested change
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If URL is missing should it still attempt a fetch? Isn't is wasteful with empty URL? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought the first point was about preventing that?
|
||||||||||
| 3) The method MUST NOT resolve/return until at least one successful toggle fetch has occurred. | ||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry if I'm missing something, but doesn't this contradict the "The method MUST reject, throw, or return an error."?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it the same behavior in the presence of a backup? |
||||||||||
|
|
||||||||||
| If the initial fetch fails: | ||||||||||
| - The method MUST reject, throw, or return an error. | ||||||||||
| - The SDK MUST NOT silently downgrade into degraded state. | ||||||||||
|
|
||||||||||
| Timeout behavior and retry semantics are governed by existing SDK implementation. | ||||||||||
|
|
||||||||||
| ## Debug / Development Mode Behavior | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Two modes (soft + hard):
Three modes (soft + hard + debug):
Basically: three modes adds a small (maybe big?) developer convenience at the cost of a meaningful increase in complexity and a less predictable soft init contract.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Opinion: I'm in explicit over implicit or configuration over convention camp. Two modes is configuration, three modes relies on environment convention. Works like magic until it doesn't match your actual intent There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Opinion: developer mode is great and adds a lot of developer convenience. But, I wonder if we can have that by just by changing our examples in README of the SDKs that support that. For example node sdk could have: import { initialize } from 'unleash-client';
const unleash = initialize({
url: 'https://YOUR-API-URL',
appName: 'my-node-name',
customHeaders: { Authorization: '<YOUR_API_TOKEN>' },
strictInitialization: process.env.NODE_ENV === 'development' # throws on errors
});It also has a clear advantage that each customer can tweak how NODE_ENV |
||||||||||
|
|
||||||||||
| SDK ecosystems that provide a clear debug or development mode (e.g., Rust debug_assertions, Node non-production environments) MAY enforce stricter validation semantics during soft initialization. | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does it mean we have 3 modes?
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I don't like "soft" as it seems like it isn't proper initialization for some reason. But, if we use
|
||||||||||
|
|
||||||||||
| When operating in debug/development mode: | ||||||||||
| - The SDK MAY treat missing required configuration (url, token) as fatal during soft initialization. | ||||||||||
| - This MAY result in throwing, panicking, or otherwise failing initialization. | ||||||||||
| - This behavior MUST NOT apply in production mode. | ||||||||||
|
|
||||||||||
| This provision exists to enable fail-fast behavior during local development while preserving resilience in production environments. | ||||||||||
|
|
||||||||||
| SDKs implementing this behavior MUST document: | ||||||||||
| - How debug/development mode is detected. | ||||||||||
| - That strict validation applies only in that mode. | ||||||||||
|
|
||||||||||
| ## Event and Logging Semantics | ||||||||||
|
|
||||||||||
| All validation and connectivity failures MUST be surfaced via existing SDK mechanisms: | ||||||||||
| - SDKs that emit error events MUST use those events. | ||||||||||
| - SDKs that rely on logging MUST log appropriately. | ||||||||||
| - SDKs MUST NOT introduce new event types unless separately specified. | ||||||||||
|
|
||||||||||
| At minimum: | ||||||||||
| - A clear warning identifying missing required configuration MUST be emitted during startup. | ||||||||||
| - Network failures resulting from invalid configuration MUST be surfaced normally. | ||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it be called "required configuration parameters" and internally link to the definitions section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure that seems like a good suggestion