[FFL-1945] Add flag evaluation metrics via OpenFeature hook by sameerank · Pull Request #5599 · DataDog/dd-trace-rb

sameerank · 2026-04-16T15:24:48Z

What does this PR do?
Adds feature_flag.evaluations counter metric via OpenTelemetry to track flag evaluations.

Motivation:
Enable observability of feature flag usage through standardized OTel metrics.

Change log entry
Yes. Add flag evaluation metrics (feature_flag.evaluations) via OpenTelemetry for OpenFeature provider.

Additional Notes:

Uses OpenFeature hooks pattern (consistent with Python/Go SDKs)
Metric attributes: feature_flag.key, feature_flag.result.variant, feature_flag.result.reason
Conditional attributes: feature_flag.result.allocation_key, error.type
Gracefully degrades when OTel SDK is unavailable
Requires DD_METRICS_OTEL_ENABLED=true to enable metrics (consistent with the other dd-trace-* SDKs)
Requires openfeature-sdk >= 0.5.1 for provider hooks support

How to test the change?

Unit tests: bundle exec rspec spec/datadog/open_feature/metrics/ spec/datadog/open_feature/hooks/
Type check: bundle exec steep check lib/datadog/open_feature/metrics lib/datadog/open_feature/hooks
System tests:
- In CI [ruby] Enable FFE flag evaluation metrics tests [ruby@sameerank/FFL-1945/add-flag-eval-metrics] system-tests#6781
  - Dev mode (using this branch) - All passing ✓: System Tests (ruby, dev) / End-to-end #10 / rails72 10
  - Prod mode (released gem) - Expected to fail: System Tests (ruby, prod) / End-to-end #10 / rails72 10
- Locally, 32 tests passed with the latest commit 60aa06e

% EXTRA_DOCKER_ARGS="--no-cache" ./build.sh ruby --weblog-variant rails72
# Debug logs match commit hash: DEBUG -- datadog: [datadog] (/usr/local/bundle/bundler/gems/dd-trace-rb-60aa06ee27f8/

% TEST_LIBRARY=ruby WEBLOG_VARIANT=rails72 ./run.sh FEATURE_FLAGGING_AND_EXPERIMENTATION
========================================================== test context ===========================================================
Scenario: FEATURE_FLAGGING_AND_EXPERIMENTATION
Logs folder: ./logs_feature_flagging_and_experimentation
Starting containers...
Agent: 7.77.2
Backend: datad0g.com
Library: ruby@2.32.0-dev
Weblog variant: rails72
Weblog system: Linux weblog 6.12.76-linuxkit #1 SMP Fri Mar  6 10:10:19 UTC 2026 aarch64 GNU/Linux

======================================================= test session starts =======================================================
collected 2454 items / 2422 deselected / 32 selected                                                                              
----------------------------------------------------------- tests setup -----------------------------------------------------------

tests/ffe/test_dynamic_evaluation.py ..
tests/ffe/test_exposures.py ...........
tests/ffe/test_flag_eval_metrics.py .................

------------------------------------------------- Wait for library interface (0s) -------------------------------------------------
------------------------------------------------- Wait for agent interface (30s) --------------------------------------------------
------------------------------------------------- Wait for backend interface (0s) -------------------------------------------------

tests/ffe/test_dynamic_evaluation.py ..                                                                                     [  6%]
tests/ffe/test_exposures.py ...........                                                                                     [ 40%]
tests/ffe/test_flag_eval_metrics.py .................                                                                       [ 93%]
tests/schemas/test_schemas.py ..                                                                                            [100%]

========================================= 32 passed, 2422 deselected in 221.51s (0:03:41) =========================================

FFE dogfooding dashboard: https://github.com/DataDog/ffe-dogfooding/pull/60

Environment variables:

DD_EXPERIMENTAL_FLAGGING_PROVIDER_ENABLED=true  # Enable FFE
DD_METRICS_OTEL_ENABLED=true                     # Enable OTel metrics (required for flag eval metrics)

github-actions · 2026-04-16T15:25:16Z

Typing analysis

Note: Ignored files are excluded from the next sections.

Untyped methods

This PR introduces 2 partially typed methods, and clears 1 partially typed method. It increases the percentage of typed methods from 61.33% to 61.6% (+0.27%).

Partially typed methods (+2-1)

❌ Introduced:

sig/datadog/open_feature/hooks/flag_eval_hook.rbs:11
└── def finally: (
          hook_context: ::OpenFeature::SDK::Hooks::HookContext,
          evaluation_details: ::OpenFeature::SDK::EvaluationDetails,
          **untyped _opts
        ) -> void
sig/datadog/open_feature/provider.rbs:46
└── def fetch_object_value: (
        flag_key: ::String,
        default_value: ::Array[untyped] | ::Hash[untyped, untyped],
        ?evaluation_context: ::OpenFeature::SDK::EvaluationContext?
      ) -> ::OpenFeature::SDK::Provider::ResolutionDetails

✅ Cleared:

sig/datadog/open_feature/provider.rbs:44
└── def fetch_object_value: (
        flag_key: ::String,
        default_value: ::Array[untyped] | ::Hash[untyped, untyped],
        ?evaluation_context: ::OpenFeature::SDK::EvaluationContext?
      ) -> ::OpenFeature::SDK::Provider::ResolutionDetails

If you believe a method or an attribute is rightfully untyped or partially typed, you can add # untyped:accept on the line before the definition to remove it from the stats.

datadog-prod-us1-6 · 2026-04-16T15:46:05Z

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage (details)
• Patch Coverage: 94.59%
• Overall Coverage: 97.20% (-0.02%)

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: caca4d7 | Docs | Datadog PR Page | Give us feedback!}

pr-commenter · 2026-04-16T18:15:28Z

Benchmarks

Benchmark execution time: 2026-04-25 00:46:12

Comparing candidate commit caca4d7 in PR branch sameerank/FFL-1945/add-flag-eval-metrics with baseline commit 9a64076 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 45 metrics, 1 unstable metrics.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

🟩 = significantly better candidate vs. baseline
🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

sameerank · 2026-04-22T17:27:14Z

+          metadata = evaluation_details.flag_metadata
+          return nil unless metadata.is_a?(Hash)
+
+          metadata['allocation_key']


In dd-trace-py this is a constant (METADATA_ALLOCATION_KEY) that is imported and used in the provider. Importing into Provider in this case isn't as simple, so I opted to use a string literal

vpellan · 2026-04-23T13:31:10Z

+        METER_NAME = 'ddtrace.openfeature'
+        METRIC_NAME = 'feature_flag.evaluations'
+        METRIC_UNIT = '{evaluation}'
+        METRIC_DESCRIPTION = 'Number of feature flag evaluations'
+
+        ATTR_FLAG_KEY = 'feature_flag.key'
+        ATTR_VARIANT = 'feature_flag.result.variant'
+        ATTR_REASON = 'feature_flag.result.reason'
+        ATTR_ALLOCATION_KEY = 'feature_flag.result.allocation_key'
+        ATTR_ERROR_TYPE = 'error.type'
+
+        ERROR_TYPE_MAP = {
+          'FLAG_NOT_FOUND' => 'flag_not_found',
+          'TYPE_MISMATCH' => 'type_mismatch',
+          'PARSE_ERROR' => 'parse_error',
+          'PROVIDER_NOT_READY' => 'provider_not_ready',
+          'TARGETING_KEY_MISSING' => 'targeting_key_missing',
+          'INVALID_CONTEXT' => 'invalid_context',
+          'GENERAL' => 'general',
+          'PROVIDER_FATAL' => 'general',
+          'UNKNOWN_TYPE' => 'general',
+        }.freeze
+
+        REASON_MAP = {
+          'TARGETING_MATCH' => 'targeting_match',
+          'ERROR' => 'error',
+          'DEFAULT' => 'default',
+          'DISABLED' => 'disabled',
+          'SPLIT' => 'split',
+          'STATIC' => 'static',
+          'UNKNOWN' => 'unknown',
+        }.freeze
+
+        EXCLUDE_ALLOCATION_KEY_REASONS = %w[ERROR DEFAULT DISABLED].freeze


I think these could go in an ext.rb file

I considered this but decided against because

The existing ext.rb only contains provider state constants (ERROR, GENERAL, PROVIDER_FATAL, etc.), and these are a separate concern.

These constants are internal to FlagEvalMetrics and not shared elsewhere, so keeping them colocated with their usage seemed reasonable.

I am reading this feedback as a suggestion, but I am amenable to moving them if you feel strongly about it.

vpellan · 2026-04-23T14:53:36Z

+          @metrics = metrics
+        end
+
+        def finally(hook_context:, evaluation_details:, hints:)


hints seems to be unused here. I suppose that the goal is to make it compatible with ::OpenFeature::SDK::Hooks::Hook, in that case I'd suggest to replace it with:

Suggested change

def finally(hook_context:, evaluation_details:, hints:)

def finally(hook_context:, evaluation_details:, **opts)

or something similarly named (rest, kwargs...) and also apply these changes in the .rbs and _spec.rb files.

Yes, I was aiming for compatibility but your suggestion makes sense! Updated to use **_opts 35403da

vpellan · 2026-04-23T15:06:18Z

+
+        @telemetry: Core::Telemetry::Component
+        @logger: Core::Logger
+        @counter: untyped


Please type everything if possible. If it accepts any type, we have the type any defined in datadog.rbs. In that case, it seems to accept ::OpenTelemetry::Metrics::Instrument::Counter. You can define it in vendor/rbs/opentelemetry. We want to avoid untyped as much as possible

Added vendor RBS types for OpenTelemetry metrics here: b4091cf. @counter is now typed as ::OpenTelemetry::Metrics::Instrument::Counter?

vpellan · 2026-04-23T15:07:11Z

+
+        def ensure_meter_provider_initialized!: () -> void
+
+        def meter_provider_available?: (untyped meter_provider) -> bool


Same here, it should probably be ::OpenTelemetry::Metrics::MeterProvider instead of untyped

Also addressed in b4091cf

vpellan · 2026-04-23T15:15:29Z

+
+      def build_flag_metadata: (
+        untyped result
+      ) -> ::Hash[::String, untyped]


Same here, result should be (ResolutionDetails | Core::FeatureFlags::ResolutionDetails) and the return type ::Hash[::String, ::String] as it is result.flag_metadata type

Updated the type signature. Used Core::FeatureFlags::ResolutionDetails since that's what EvaluationEngine#fetch_value returns internally 8a8a378

vpellan · 2026-04-23T15:19:33Z

+# Stub the OpenFeature SDK Hook module before loading the hook
+module OpenFeature
+  module SDK
+    module Hooks
+      module Hook
+      end
+    end
+  end
+end


I believe it would be better to test it against the real OpenFeature::SDK::Hooks::Hook module. There is an openfeature appraisal available

Moved these specs to run under the openfeature appraisal with the real SDK. Removed the stub module: bf7acd8

vpellan · 2026-04-23T15:26:07Z

+  # Create a fake OpenTelemetry module for stubbing
+  let(:otel_module) do
+    mod = Module.new
+    mod.define_singleton_method(:meter_provider) { nil }
+    mod
+  end


Again, I think it would be better to test it against the real OpenTelemetry module. If you need both opentelemetry and openfeature, you should be able to add opentelemetry to the openfeature appraisal.

Removed the fake module for stubbing, and added opentelemetry-metrics-sdk to the openfeature appraisal bf7acd8

The `hints` parameter in the `finally` hook method is part of the OpenFeature SDK Hook interface but is not used by our implementation. Replace with `**_opts` to indicate it's intentionally unused while maintaining interface compatibility.

@counter

Add vendor RBS types for OpenTelemetry metrics SDK: - MeterProvider#meter method - Meter class with create_counter - Instrument::Counter with add method Update flag_eval_metrics.rbs to use proper types instead of untyped: - @counter: ::OpenTelemetry::SDK::Metrics::Instrument::Counter? - get_or_create_counter return type - meter_provider_available? parameter type

Type the result parameter as the union type returned by EvaluationEngine#fetch_value: (ResolutionDetails | Core::FeatureFlags::ResolutionDetails) Update return type to ::Hash[::String, ::String] to match the flag_metadata type from the result.

- Add opentelemetry-sdk and opentelemetry-metrics-sdk to openfeature appraisal for all Ruby versions - Update hook specs to require real OpenFeature SDK - Update metrics specs to require real OTel SDK - Make Hook module inclusion conditional to support SDK versions before 0.5.0 when Hooks::Hook was introduced - Fix type checker issues with local variable assignments

…uns/24875212568

y9v · 2026-04-24T08:02:59Z

+          metrics = @metrics
+          return unless metrics


I think this local metrics variable is not needed here?

y9v · 2026-04-24T08:11:41Z

+          return false if meter_provider.nil?
+          return false unless defined?(::OpenTelemetry::SDK::Metrics::MeterProvider)
+
+          meter_provider.is_a?(::OpenTelemetry::SDK::Metrics::MeterProvider)


I don't understand this - we are getting the meter provider from OTel configuration, but for some reason checking that it is an instance of some particular class? Why do we do this?

y9v · 2026-04-24T08:12:04Z

+          require 'opentelemetry-metrics-sdk'
+          require 'datadog/opentelemetry/metrics'
+          Datadog::OpenTelemetry::Metrics.initialize!(Datadog.send(:components))
+        rescue LoadError, NameError => e


what could raise a NameError here?

y9v · 2026-04-24T08:12:20Z

+
+          attrs = {
+            ATTR_FLAG_KEY => flag_key,
+            ATTR_VARIANT => variant || '',


why not variant.to_s?

y9v · 2026-04-24T08:14:01Z

+          attrs
+        end
+
+        def normalize_reason_with_upcase(reason)


this method name is confusing - from the name it is not clear that it returns a tuple

y9v · 2026-04-24T08:16:29Z

+        REASON_MAP = {
+          'TARGETING_MATCH' => 'targeting_match',
+          'ERROR' => 'error',
+          'DEFAULT' => 'default',
+          'DISABLED' => 'disabled',
+          'SPLIT' => 'split',
+          'STATIC' => 'static',
+          'UNKNOWN' => 'unknown',
+        }.freeze


this looks like a map of upcase strings to their downcased equivalents. Why do we even need that, and why can't we just downcase?

y9v · 2026-04-24T08:17:54Z

+        def normalize_reason_with_upcase(reason)
+          return ['unknown', nil] if reason.nil?
+
+          reason_str = reason.to_s
+          return ['unknown', nil] if reason_str.empty?
+
+          reason_upcase = reason_str.upcase
+          normalized = REASON_MAP[reason_upcase] || reason_str.downcase
+          [normalized, reason_upcase]
+        end


I think this whole method could be removed if we try to understand what we really want to get from it. Do we need a tuple of strings, one downcased and one upcased, with fallback to 'unknownorUNKNOWN`?

y9v · 2026-04-24T08:18:56Z

+          return 'general' if error_code.nil?
+
+          error_str = error_code.to_s
+          return 'general' if error_str.empty?
+
+          ERROR_TYPE_MAP[error_str] || 'general'


this whole method could be replaced with one #fetch:

ERROR_TYPE_MAP.fetch(error_code, 'general')

y9v · 2026-04-24T08:21:48Z

+        require_relative 'hooks/flag_eval_hook'
+        metrics = Metrics::FlagEvalMetrics.new(telemetry: @telemetry, logger: @logger)
+        Hooks::FlagEvalHook.new(metrics)
+      rescue LoadError, NameError


why is there a NameError here?

y9v · 2026-04-24T08:22:23Z


+      def hooks
+        hook = Datadog.send(:components).open_feature&.flag_eval_hook
+        hook ? [hook] : []


[hook].compact

Strech · 2026-04-24T08:46:37Z

+        def get_or_create_counter
+          @mutex.synchronize do
+            return @counter if @counter
+            return nil unless otel_metrics_enabled?


Does it make sense to move that condition out of the synchronize. When could it be that we have counter, but OTel metrics are not enabled? What will happen if someone disable OTel metrics, but counter is already there?

Strech · 2026-04-24T08:48:10Z

+
+            ensure_meter_provider_initialized!
+            meter_provider = ::OpenTelemetry.meter_provider
+            return nil unless meter_provider && meter_provider_available?(meter_provider)


Similar condition is checked in ensure call above, what is the case here? Do we expect it as a possibility that it gets unavailable between calls?

Strech · 2026-04-24T08:58:09Z

+      private
+
+      def create_flag_eval_hook
+        return nil unless defined?(::OpenFeature::SDK::Hooks::Hook)


This method leaks knowledge about hooks, when it also duped in hooks/flag_eval_hook file. This is a wrong responsibilities

sameerank added the AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos label Apr 16, 2026

sameerank force-pushed the sameerank/FFL-1945/add-flag-eval-metrics branch 2 times, most recently from 8a1f290 to 46c5c56 Compare April 16, 2026 17:47

sameerank force-pushed the sameerank/FFL-1945/add-flag-eval-metrics branch 9 times, most recently from a0d96ae to dca0c3d Compare April 21, 2026 07:17

github-actions Bot added the core Involves Datadog core libraries label Apr 21, 2026

sameerank force-pushed the sameerank/FFL-1945/add-flag-eval-metrics branch from dca0c3d to 576c7e4 Compare April 21, 2026 08:35

sameerank mentioned this pull request Apr 21, 2026

[ruby] Enable FFE flag evaluation metrics tests [ruby@sameerank/FFL-1945/add-flag-eval-metrics] DataDog/system-tests#6781

Open

4 tasks

sameerank force-pushed the sameerank/FFL-1945/add-flag-eval-metrics branch 8 times, most recently from 58fbef9 to ae9c63f Compare April 22, 2026 17:22

sameerank commented Apr 22, 2026

View reviewed changes

Add flag evaluation metrics via OpenTelemetry hook

60aa06e

sameerank force-pushed the sameerank/FFL-1945/add-flag-eval-metrics branch from ae9c63f to 60aa06e Compare April 22, 2026 17:31

sameerank added the otel OpenTelemetry-related changes label Apr 22, 2026

sameerank added the openfeature A new component that provider an ability to configure feature flags label Apr 22, 2026

sameerank marked this pull request as ready for review April 22, 2026 19:35

sameerank requested review from a team as code owners April 22, 2026 19:35

sameerank requested review from dd-oleksii and leoromanovsky April 22, 2026 19:35

vpellan reviewed Apr 23, 2026

View reviewed changes

sameerank added 3 commits April 23, 2026 16:17

dd-octo-sts Bot requested a review from a team as a code owner April 24, 2026 06:04

sameerank force-pushed the sameerank/FFL-1945/add-flag-eval-metrics branch from ca7429d to bf7acd8 Compare April 24, 2026 06:13

sameerank and others added 2 commits April 24, 2026 06:13

Merge bf7acd8 into 9a64076

98fe52f

[🤖] Lock Dependency: https://github.com/DataDog/dd-trace-rb/actions/r…

646e1f2

…uns/24875212568

y9v reviewed Apr 24, 2026

View reviewed changes

Strech reviewed Apr 24, 2026

View reviewed changes

sameerank added 5 commits April 24, 2026 17:19

refactor: use defined? guard instead of NameError rescue

82f05c6

refactor: use variant.to_s instead of variant || ''

71ee0d3

refactor: simplify normalize_error_type using fetch

f9620d7

refactor: remove unnecessary NameError from rescue clause

97575c9

refactor: use [hook].compact instead of ternary

caca4d7

	def finally(hook_context:, evaluation_details:, hints:)
	def finally(hook_context:, evaluation_details:, **opts)


		def ensure_meter_provider_initialized!: () -> void

		def meter_provider_available?: (untyped meter_provider) -> bool

Conversation

sameerank commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 16, 2026 • edited by dd-octo-sts Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Typing analysis

Untyped methods

Uh oh!

datadog-prod-us1-6 Bot commented Apr 16, 2026 • edited by datadog-datadog-prod-us1 Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pr-commenter Bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

Explanation

More details about the CI and significant changes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sameerank Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

sameerank commented Apr 16, 2026 •

edited

Loading

github-actions Bot commented Apr 16, 2026 •

edited by dd-octo-sts Bot

Loading

datadog-prod-us1-6 Bot commented Apr 16, 2026 •

edited by datadog-datadog-prod-us1 Bot

Loading

pr-commenter Bot commented Apr 16, 2026 •

edited

Loading

sameerank Apr 24, 2026 •

edited

Loading