Add v24 + v25 VCR test coverage for Meta extractors (+ async/pagination resilience fixes)#57
Open
matyas-jirat-keboola wants to merge 9 commits into
Open
Add v24 + v25 VCR test coverage for Meta extractors (+ async/pagination resilience fixes)#57matyas-jirat-keboola wants to merge 9 commits into
matyas-jirat-keboola wants to merge 9 commits into
Conversation
Full [Dont touch] config recorded via debug job on cf-dev, replayed locally. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…cassettes Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
26 queries using filtering()/time_range()/time_ranges() DSL (a '{' inside a
modifier) or .fields() removed: the deployed parser splits the field DSL on
commas and leaks the filter/date JSON into the 'fields' param (FB code 2500).
196 of 222 queries retained. Version-independent parser limitation; not fixed
here per decision to avoid the reverted CFTL-630 DSL change.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
poll_async_job raised UserException the instant Facebook returned async_status 'Job Failed'/'Job Skipped' (or on poll timeout), failing the whole extraction. Facebook returns these transiently under load. Treat them as retryable: raise a dedicated AsyncInsightsJobTransientError and re-submit the report up to 3 times with exponential backoff (mirrors _get_with_transient_retry for synchronous GETs). Surfaces as UserException only after retries are exhausted. Adds unit tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The full 196-query ads run dies non-deterministically with a bare container 'Internal Server Error' (no Python traceback) — a cumulative resource limit, not a v25 API issue (every query records fine individually; v25 only changed async error_* fields). Split into two halves so each run stays under the limit; this is queries 1-100 (98 after dropping the DSL-breaking ones). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Nested-insights pagination (load_page_from_url) was consumed via 'yield from output_parser.iter_parsed_data(...)' OUTSIDE the per-object try/except, so a Facebook code=2 on a deep insights paging.next that outlived the retry budget escaped as an opaque container 'Internal Server Error' and killed the whole run with no context. Wrap it: log the failing query + object id + exception, then continue to the next object. Also widen the transient-retry budget 3->5 (backoff 5/10/20/40/80s) to ride out v25's longer insights throttle windows. Surfaces real errors instead of a bare Internal Server Error, and makes large ads extractions resilient to single-object throttling. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… config) Supersedes the part-1 split: with the pagination-resilience fix deployed (image api-version-23-24-25-427), the full 196-query run completes at v25. 160 expected output tables. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…settes
With the pagination-resilience fix deployed, the full original ads config now
records and replays cleanly at v24 and v25 — the 26 DSL-breaking queries
(filtering()/time_range() '{' leaking into fields, FB code 2500) are logged and
skipped per-object instead of crashing the run. Replaces the reduced 196-query
cassettes. v24: 168 tables, v25: 169 tables (vs 160 reduced).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds VCR functional test coverage for Facebook Graph / Marketing API v24 and v25 across all three Meta extractors (Ads, Pages, Instagram), recorded from real debug-job runs on the CF test project against the full production-style configs. Existing v23 coverage on
mainis retained. Also fixes two real resilience bugs surfaced while recording the large Ads config at v25.New VCR test cases
keboola.ex-instagram-v2instagram_keboola_account_v24instagram_keboola_account_v25keboola.ex-facebook-pagesfacebook_pages_keboola_accounts_v24facebook_pages_keboola_accounts_v25keboola.ex-facebook-ads-v2facebook_ads_keboola_account_v24(full 222-query)facebook_ads_keboola_account_v25(full 222-query)All recorded from the full configs and replay green locally and in CI.
Bug fixes (component resilience)
Recording the large Ads config at v25 exposed two real bugs, both fixed here:
Async-insights transient failures killed the job.
poll_async_jobraisedUserExceptionthe instant Facebook returnedasync_statusJob Failed/Job Skipped(or on poll timeout). Facebook returns these transiently under load. Now raised as a dedicatedAsyncInsightsJobTransientErrorand the report is re-submitted with exponential backoff before giving up.A single object's pagination failure crashed the whole run with an opaque "Internal Server Error". Nested-insights pagination (
load_page_from_url) was consumed viayield from ...outside the per-objecttry/except, so a Facebook error on a deepinsightspaging.nextescaped every handler and killed the extraction with no context. Now wrapped: the failing query + object id + exception are logged, and processing continues with the next object.Also widened the transient-retry budget (3 → 5; backoff 5/10/20/40/80s) to ride out v25's longer insights throttle windows. Unit tests in
tests/test_page_loader_async_retry.py.On the full Ads config (222 queries)
The Ads cassettes use the full 222-query config. 26 of those queries hit a pre-existing DSL-parser limitation (a
{inside afiltering()/time_range()modifier leaks into thefieldsparam → FBcode 2500) — that's the reverted CFTL-630 fix and is intentionally not addressed here. Thanks to resilience fix #2, those queries now log a clear error and are skipped per-object instead of crashing the run, so the full config records and replays cleanly (v24: 168 tables, v25: 169). v25 itself did not break any query — the only live v25 insights change is new asyncerror_*fields.🤖 Generated with Claude Code