Skip to content

[FFL-1945] Add flag evaluation metrics via OpenFeature hook#5599

Open
sameerank wants to merge 12 commits intomasterfrom
sameerank/FFL-1945/add-flag-eval-metrics
Open

[FFL-1945] Add flag evaluation metrics via OpenFeature hook#5599
sameerank wants to merge 12 commits intomasterfrom
sameerank/FFL-1945/add-flag-eval-metrics

Conversation

@sameerank
Copy link
Copy Markdown
Contributor

@sameerank sameerank commented Apr 16, 2026

What does this PR do?
Adds feature_flag.evaluations counter metric via OpenTelemetry to track flag evaluations.

Motivation:
Enable observability of feature flag usage through standardized OTel metrics.

Change log entry
Yes. Add flag evaluation metrics (feature_flag.evaluations) via OpenTelemetry for OpenFeature provider.

Additional Notes:

  • Uses OpenFeature hooks pattern (consistent with Python/Go SDKs)
  • Metric attributes: feature_flag.key, feature_flag.result.variant, feature_flag.result.reason
  • Conditional attributes: feature_flag.result.allocation_key, error.type
  • Gracefully degrades when OTel SDK is unavailable
  • Requires DD_METRICS_OTEL_ENABLED=true to enable metrics (consistent with the other dd-trace-* SDKs)
  • Requires openfeature-sdk >= 0.5.1 for provider hooks support

How to test the change?

% EXTRA_DOCKER_ARGS="--no-cache" ./build.sh ruby --weblog-variant rails72
# Debug logs match commit hash: DEBUG -- datadog: [datadog] (/usr/local/bundle/bundler/gems/dd-trace-rb-60aa06ee27f8/

% TEST_LIBRARY=ruby WEBLOG_VARIANT=rails72 ./run.sh FEATURE_FLAGGING_AND_EXPERIMENTATION
========================================================== test context ===========================================================
Scenario: FEATURE_FLAGGING_AND_EXPERIMENTATION
Logs folder: ./logs_feature_flagging_and_experimentation
Starting containers...
Agent: 7.77.2
Backend: datad0g.com
Library: ruby@2.32.0-dev
Weblog variant: rails72
Weblog system: Linux weblog 6.12.76-linuxkit #1 SMP Fri Mar  6 10:10:19 UTC 2026 aarch64 GNU/Linux

======================================================= test session starts =======================================================
collected 2454 items / 2422 deselected / 32 selected                                                                              
----------------------------------------------------------- tests setup -----------------------------------------------------------

tests/ffe/test_dynamic_evaluation.py ..
tests/ffe/test_exposures.py ...........
tests/ffe/test_flag_eval_metrics.py .................

------------------------------------------------- Wait for library interface (0s) -------------------------------------------------
------------------------------------------------- Wait for agent interface (30s) --------------------------------------------------
------------------------------------------------- Wait for backend interface (0s) -------------------------------------------------

tests/ffe/test_dynamic_evaluation.py ..                                                                                     [  6%]
tests/ffe/test_exposures.py ...........                                                                                     [ 40%]
tests/ffe/test_flag_eval_metrics.py .................                                                                       [ 93%]
tests/schemas/test_schemas.py ..                                                                                            [100%]

========================================= 32 passed, 2422 deselected in 221.51s (0:03:41) =========================================

Environment variables:

DD_EXPERIMENTAL_FLAGGING_PROVIDER_ENABLED=true  # Enable FFE
DD_METRICS_OTEL_ENABLED=true                     # Enable OTel metrics (required for flag eval metrics)

@sameerank sameerank added the AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos label Apr 16, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 16, 2026

Typing analysis

Note: Ignored files are excluded from the next sections.

Untyped methods

This PR introduces 2 partially typed methods, and clears 1 partially typed method. It increases the percentage of typed methods from 61.33% to 61.6% (+0.27%).

Partially typed methods (+2-1)Introduced:
sig/datadog/open_feature/hooks/flag_eval_hook.rbs:11
└── def finally: (
          hook_context: ::OpenFeature::SDK::Hooks::HookContext,
          evaluation_details: ::OpenFeature::SDK::EvaluationDetails,
          **untyped _opts
        ) -> void
sig/datadog/open_feature/provider.rbs:46
└── def fetch_object_value: (
        flag_key: ::String,
        default_value: ::Array[untyped] | ::Hash[untyped, untyped],
        ?evaluation_context: ::OpenFeature::SDK::EvaluationContext?
      ) -> ::OpenFeature::SDK::Provider::ResolutionDetails
Cleared:
sig/datadog/open_feature/provider.rbs:44
└── def fetch_object_value: (
        flag_key: ::String,
        default_value: ::Array[untyped] | ::Hash[untyped, untyped],
        ?evaluation_context: ::OpenFeature::SDK::EvaluationContext?
      ) -> ::OpenFeature::SDK::Provider::ResolutionDetails

If you believe a method or an attribute is rightfully untyped or partially typed, you can add # untyped:accept on the line before the definition to remove it from the stats.

@datadog-prod-us1-6
Copy link
Copy Markdown

datadog-prod-us1-6 Bot commented Apr 16, 2026

Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage (details)
Patch Coverage: 94.59%
Overall Coverage: 97.20% (-0.02%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: caca4d7 | Docs | Datadog PR Page | Give us feedback!

@sameerank sameerank force-pushed the sameerank/FFL-1945/add-flag-eval-metrics branch 2 times, most recently from 8a1f290 to 46c5c56 Compare April 16, 2026 17:47
@pr-commenter
Copy link
Copy Markdown

pr-commenter Bot commented Apr 16, 2026

Benchmarks

Benchmark execution time: 2026-04-25 00:46:12

Comparing candidate commit caca4d7 in PR branch sameerank/FFL-1945/add-flag-eval-metrics with baseline commit 9a64076 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 45 metrics, 1 unstable metrics.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

  • 🟩 = significantly better candidate vs. baseline
  • 🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

@sameerank sameerank force-pushed the sameerank/FFL-1945/add-flag-eval-metrics branch 9 times, most recently from a0d96ae to dca0c3d Compare April 21, 2026 07:17
@github-actions github-actions Bot added the core Involves Datadog core libraries label Apr 21, 2026
@sameerank sameerank force-pushed the sameerank/FFL-1945/add-flag-eval-metrics branch from dca0c3d to 576c7e4 Compare April 21, 2026 08:35
@sameerank sameerank force-pushed the sameerank/FFL-1945/add-flag-eval-metrics branch 8 times, most recently from 58fbef9 to ae9c63f Compare April 22, 2026 17:22
metadata = evaluation_details.flag_metadata
return nil unless metadata.is_a?(Hash)

metadata['allocation_key']
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In dd-trace-py this is a constant (METADATA_ALLOCATION_KEY) that is imported and used in the provider. Importing into Provider in this case isn't as simple, so I opted to use a string literal

@sameerank sameerank force-pushed the sameerank/FFL-1945/add-flag-eval-metrics branch from ae9c63f to 60aa06e Compare April 22, 2026 17:31
@sameerank sameerank added the otel OpenTelemetry-related changes label Apr 22, 2026
@sameerank sameerank added the openfeature A new component that provider an ability to configure feature flags label Apr 22, 2026
@sameerank sameerank marked this pull request as ready for review April 22, 2026 19:35
@sameerank sameerank requested review from a team as code owners April 22, 2026 19:35
Comment on lines +8 to +41
METER_NAME = 'ddtrace.openfeature'
METRIC_NAME = 'feature_flag.evaluations'
METRIC_UNIT = '{evaluation}'
METRIC_DESCRIPTION = 'Number of feature flag evaluations'

ATTR_FLAG_KEY = 'feature_flag.key'
ATTR_VARIANT = 'feature_flag.result.variant'
ATTR_REASON = 'feature_flag.result.reason'
ATTR_ALLOCATION_KEY = 'feature_flag.result.allocation_key'
ATTR_ERROR_TYPE = 'error.type'

ERROR_TYPE_MAP = {
'FLAG_NOT_FOUND' => 'flag_not_found',
'TYPE_MISMATCH' => 'type_mismatch',
'PARSE_ERROR' => 'parse_error',
'PROVIDER_NOT_READY' => 'provider_not_ready',
'TARGETING_KEY_MISSING' => 'targeting_key_missing',
'INVALID_CONTEXT' => 'invalid_context',
'GENERAL' => 'general',
'PROVIDER_FATAL' => 'general',
'UNKNOWN_TYPE' => 'general',
}.freeze

REASON_MAP = {
'TARGETING_MATCH' => 'targeting_match',
'ERROR' => 'error',
'DEFAULT' => 'default',
'DISABLED' => 'disabled',
'SPLIT' => 'split',
'STATIC' => 'static',
'UNKNOWN' => 'unknown',
}.freeze

EXCLUDE_ALLOCATION_KEY_REASONS = %w[ERROR DEFAULT DISABLED].freeze
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these could go in an ext.rb file

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered this but decided against because

  1. The existing ext.rb only contains provider state constants (ERROR, GENERAL, PROVIDER_FATAL, etc.), and these are a separate concern.
  2. These constants are internal to FlagEvalMetrics and not shared elsewhere, so keeping them colocated with their usage seemed reasonable.

I am reading this feedback as a suggestion, but I am amenable to moving them if you feel strongly about it.

@metrics = metrics
end

def finally(hook_context:, evaluation_details:, hints:)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hints seems to be unused here. I suppose that the goal is to make it compatible with ::OpenFeature::SDK::Hooks::Hook, in that case I'd suggest to replace it with:

Suggested change
def finally(hook_context:, evaluation_details:, hints:)
def finally(hook_context:, evaluation_details:, **opts)

or something similarly named (rest, kwargs...) and also apply these changes in the .rbs and _spec.rb files.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was aiming for compatibility but your suggestion makes sense! Updated to use **_opts 35403da


@telemetry: Core::Telemetry::Component
@logger: Core::Logger
@counter: untyped
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please type everything if possible. If it accepts any type, we have the type any defined in datadog.rbs. In that case, it seems to accept ::OpenTelemetry::Metrics::Instrument::Counter. You can define it in vendor/rbs/opentelemetry. We want to avoid untyped as much as possible

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added vendor RBS types for OpenTelemetry metrics here: b4091cf. @counter is now typed as ::OpenTelemetry::Metrics::Instrument::Counter?


def ensure_meter_provider_initialized!: () -> void

def meter_provider_available?: (untyped meter_provider) -> bool
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, it should probably be ::OpenTelemetry::Metrics::MeterProvider instead of untyped

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also addressed in b4091cf

Comment thread sig/datadog/open_feature/provider.rbs Outdated

def build_flag_metadata: (
untyped result
) -> ::Hash[::String, untyped]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, result should be (ResolutionDetails | Core::FeatureFlags::ResolutionDetails) and the return type ::Hash[::String, ::String] as it is result.flag_metadata type

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the type signature. Used Core::FeatureFlags::ResolutionDetails since that's what EvaluationEngine#fetch_value returns internally 8a8a378

Comment on lines +5 to +13
# Stub the OpenFeature SDK Hook module before loading the hook
module OpenFeature
module SDK
module Hooks
module Hook
end
end
end
end
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it would be better to test it against the real OpenFeature::SDK::Hooks::Hook module. There is an openfeature appraisal available

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved these specs to run under the openfeature appraisal with the real SDK. Removed the stub module: bf7acd8

Comment on lines +12 to +17
# Create a fake OpenTelemetry module for stubbing
let(:otel_module) do
mod = Module.new
mod.define_singleton_method(:meter_provider) { nil }
mod
end
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I think it would be better to test it against the real OpenTelemetry module. If you need both opentelemetry and openfeature, you should be able to add opentelemetry to the openfeature appraisal.

Copy link
Copy Markdown
Contributor Author

@sameerank sameerank Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the fake module for stubbing, and added opentelemetry-metrics-sdk to the openfeature appraisal bf7acd8

The `hints` parameter in the `finally` hook method is part of the
OpenFeature SDK Hook interface but is not used by our implementation.
Replace with `**_opts` to indicate it's intentionally unused while
maintaining interface compatibility.
Add vendor RBS types for OpenTelemetry metrics SDK:
- MeterProvider#meter method
- Meter class with create_counter
- Instrument::Counter with add method

Update flag_eval_metrics.rbs to use proper types instead of untyped:
- @counter: ::OpenTelemetry::SDK::Metrics::Instrument::Counter?
- get_or_create_counter return type
- meter_provider_available? parameter type
Type the result parameter as the union type returned by
EvaluationEngine#fetch_value:
  (ResolutionDetails | Core::FeatureFlags::ResolutionDetails)

Update return type to ::Hash[::String, ::String] to match
the flag_metadata type from the result.
@dd-octo-sts dd-octo-sts Bot requested a review from a team as a code owner April 24, 2026 06:04
- Add opentelemetry-sdk and opentelemetry-metrics-sdk to openfeature
  appraisal for all Ruby versions
- Update hook specs to require real OpenFeature SDK
- Update metrics specs to require real OTel SDK
- Make Hook module inclusion conditional to support SDK versions
  before 0.5.0 when Hooks::Hook was introduced
- Fix type checker issues with local variable assignments
@sameerank sameerank force-pushed the sameerank/FFL-1945/add-flag-eval-metrics branch from ca7429d to bf7acd8 Compare April 24, 2026 06:13
Comment on lines +21 to +22
metrics = @metrics
return unless metrics
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this local metrics variable is not needed here?

return false if meter_provider.nil?
return false unless defined?(::OpenTelemetry::SDK::Metrics::MeterProvider)

meter_provider.is_a?(::OpenTelemetry::SDK::Metrics::MeterProvider)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this - we are getting the meter provider from OTel configuration, but for some reason checking that it is an instance of some particular class? Why do we do this?

require 'opentelemetry-metrics-sdk'
require 'datadog/opentelemetry/metrics'
Datadog::OpenTelemetry::Metrics.initialize!(Datadog.send(:components))
rescue LoadError, NameError => e
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what could raise a NameError here?


attrs = {
ATTR_FLAG_KEY => flag_key,
ATTR_VARIANT => variant || '',
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not variant.to_s?

attrs
end

def normalize_reason_with_upcase(reason)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method name is confusing - from the name it is not clear that it returns a tuple

Comment on lines +31 to +39
REASON_MAP = {
'TARGETING_MATCH' => 'targeting_match',
'ERROR' => 'error',
'DEFAULT' => 'default',
'DISABLED' => 'disabled',
'SPLIT' => 'split',
'STATIC' => 'static',
'UNKNOWN' => 'unknown',
}.freeze
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like a map of upcase strings to their downcased equivalents. Why do we even need that, and why can't we just downcase?

Comment on lines +141 to +150
def normalize_reason_with_upcase(reason)
return ['unknown', nil] if reason.nil?

reason_str = reason.to_s
return ['unknown', nil] if reason_str.empty?

reason_upcase = reason_str.upcase
normalized = REASON_MAP[reason_upcase] || reason_str.downcase
[normalized, reason_upcase]
end
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this whole method could be removed if we try to understand what we really want to get from it. Do we need a tuple of strings, one downcased and one upcased, with fallback to 'unknownorUNKNOWN`?

Comment on lines +153 to +158
return 'general' if error_code.nil?

error_str = error_code.to_s
return 'general' if error_str.empty?

ERROR_TYPE_MAP[error_str] || 'general'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this whole method could be replaced with one #fetch:

ERROR_TYPE_MAP.fetch(error_code, 'general')

Comment thread lib/datadog/open_feature/component.rb Outdated
require_relative 'hooks/flag_eval_hook'
metrics = Metrics::FlagEvalMetrics.new(telemetry: @telemetry, logger: @logger)
Hooks::FlagEvalHook.new(metrics)
rescue LoadError, NameError
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is there a NameError here?

Comment thread lib/datadog/open_feature/provider.rb Outdated

def hooks
hook = Datadog.send(:components).open_feature&.flag_eval_hook
hook ? [hook] : []
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[hook].compact

def get_or_create_counter
@mutex.synchronize do
return @counter if @counter
return nil unless otel_metrics_enabled?
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to move that condition out of the synchronize. When could it be that we have counter, but OTel metrics are not enabled? What will happen if someone disable OTel metrics, but counter is already there?


ensure_meter_provider_initialized!
meter_provider = ::OpenTelemetry.meter_provider
return nil unless meter_provider && meter_provider_available?(meter_provider)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar condition is checked in ensure call above, what is the case here? Do we expect it as a possibility that it gets unavailable between calls?

private

def create_flag_eval_hook
return nil unless defined?(::OpenFeature::SDK::Hooks::Hook)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method leaks knowledge about hooks, when it also duped in hooks/flag_eval_hook file. This is a wrong responsibilities

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos core Involves Datadog core libraries openfeature A new component that provider an ability to configure feature flags otel OpenTelemetry-related changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants