Skip to content

feat: Make list methods of CollectionClients iterable#760

Closed
Pijukatel wants to merge 21 commits intomasterfrom
iterable-list-methods-2
Closed

feat: Make list methods of CollectionClients iterable#760
Pijukatel wants to merge 21 commits intomasterfrom
iterable-list-methods-2

Conversation

@Pijukatel
Copy link
Copy Markdown
Contributor

@Pijukatel Pijukatel commented Apr 23, 2026

Description

  • All collection clients list method returns an iterator as well.
  • All async collection clients list method returns an async iterator as well.
  • List of modified clients (same for async clients):
    • ActorCollectionClient
    • BuildCollectionClient
    • RunCollectionClient
    • ScheduleCollectionClient
    • TaskCollectionClient
    • WebhookCollectionClient
    • WebhookDispatchCollectionClient
    • DatasetCollectionClient
    • KeyValueStoreCollectionClient
    • RequestQueueCollectionClient
    • StoreCollectionClient
    • ActorEnvVarCollectionClient
    • ActorVersionCollectionClient
  • Additionally, the following storage-related list methods were modified to support iteration as well:
    • DatasetClient.list_items (and marking Dataset.iterate_items as deprecated)
    • KeyValueStoreClient.list_keys (and marking Dataset.iterate_items as deprecated)
    • RequestQueueClient.list_requests

Example usage

...
# Sync
datasets_client = ApifyClient(token='...').datasets()

# Same as before
list_page = datasets_client.list(...)

# New functionality
individual_items = [item for item in datasets_client.list(...)]

...
# Async
datasets_client = ApifyClientAsync(token='...').datasets()

# Same as before
list_page = await datasets_client.list(...)

# New functionality
individual_items = [item async for item in datasets_client.list(...)]

Issues

Testing

  • Unit tests
  • Manual API tests

Checklist

  • CI passed

Working tests and implementation.
TODO:
-Check KVS and RQ special cases
-Figure out model mocking in some elegant way
@github-actions github-actions Bot added this to the 139th sprint - Tooling team milestone Apr 24, 2026
@github-actions github-actions Bot added t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics. labels Apr 24, 2026
@Pijukatel Pijukatel requested a review from Copilot April 24, 2026 06:02
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds first-class pagination iteration to the Python Apify client by making list()-style methods return objects that preserve first-page metadata while also supporting (async) iteration across subsequent API pages.

Changes:

  • Introduces IterableListPage / IterableListPageAsync and helper builders for offset- and cursor-based pagination.
  • Updates multiple sync/async resource-client list() methods (and storage list methods like dataset items / KVS keys / RQ requests) to return iterable pages.
  • Adds unit tests covering pagination behavior across many clients and option combinations.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 22 comments.

Show a summary per file
File Description
tests/unit/test_client_pagination.py Adds end-to-end pagination tests (offset + cursor) for sync/async clients using an HTTP test server.
src/apify_client/_iterable_list_page.py New pagination wrappers + builders enabling iteration/awaiting behavior.
src/apify_client/_resource_clients/actor_collection.py Makes Actors collection list() iterable (sync + async).
src/apify_client/_resource_clients/build_collection.py Makes Builds collection list() iterable (sync + async).
src/apify_client/_resource_clients/run_collection.py Makes Runs collection list() iterable (sync + async).
src/apify_client/_resource_clients/schedule_collection.py Makes Schedules collection list() iterable (sync + async).
src/apify_client/_resource_clients/task_collection.py Makes Tasks collection list() iterable (sync + async).
src/apify_client/_resource_clients/webhook_collection.py Makes Webhooks collection list() iterable (sync + async).
src/apify_client/_resource_clients/webhook_dispatch_collection.py Makes Webhook dispatches list() iterable (sync + async), including empty-list handling.
src/apify_client/_resource_clients/store_collection.py Makes Store actors list() iterable (sync + async).
src/apify_client/_resource_clients/dataset_collection.py Makes Datasets collection list() iterable (sync + async).
src/apify_client/_resource_clients/key_value_store_collection.py Makes KVS collection list() iterable (sync + async).
src/apify_client/_resource_clients/request_queue_collection.py Makes Request queues collection list() iterable (sync + async).
src/apify_client/_resource_clients/dataset.py Makes list_items() iterable; deprecates iterate_items() by delegating to list_items().
src/apify_client/_resource_clients/key_value_store.py Makes list_keys() iterable; deprecates iterate_keys() by delegating to list_keys().
src/apify_client/_resource_clients/request_queue.py Makes list_requests() iterable (cursor-based); adds chunk_size and keeps mutual-exclusion validation.
src/apify_client/_resource_clients/actor_env_var_collection.py Makes env var collection list() iterable (sync + async).
src/apify_client/_resource_clients/actor_version_collection.py Makes version collection list() iterable (sync + async).
Comments suppressed due to low confidence (1)

src/apify_client/_resource_clients/request_queue.py:533

  • The Args: section lists cursor / exclusive_start_id twice, which is confusing and makes the docstring contradictory. Please remove the duplicated lines and keep a single description (including the deprecation note).
        Args:
            limit: How many requests to retrieve.
            filter: List of request states to use as a filter. Multiple values mean union of the given filters.
            cursor: A token returned in a previous API response, to continue listing the next page of requests.
            exclusive_start_id: (deprecated) All requests up to this one (including) are skipped from the result.
                Only applied to the first page fetched; subsequent pages during iteration use `cursor`.
            chunk_size: Maximum number of requests requested per API call when iterating. Only
                relevant when iterating across pages.
            timeout: Timeout for the API HTTP request.
            cursor: A token returned in previous API response, to continue listing next page of requests
            exclusive_start_id: (deprecated) All requests up to this one (including) are skipped from the result.
        """

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/apify_client/_resource_clients/request_queue.py Outdated
Comment thread src/apify_client/_resource_clients/task_collection.py Outdated
Comment thread src/apify_client/_iterable_list_page.py Outdated
Comment thread src/apify_client/_iterable_list_page.py Outdated
Comment thread src/apify_client/_resource_clients/key_value_store.py Outdated
Comment thread src/apify_client/_iterable_list_page.py Outdated
Comment thread src/apify_client/_resource_clients/webhook_dispatch_collection.py Outdated
Comment thread src/apify_client/_resource_clients/webhook_dispatch_collection.py Outdated
Comment thread src/apify_client/_resource_clients/store_collection.py Outdated
Comment thread src/apify_client/_resource_clients/key_value_store_collection.py Outdated
Pijukatel and others added 4 commits April 24, 2026 13:06
…eturn

Align every integration test that calls a `list()` / `list_keys()` / `list_requests()` method
with the pattern introduced on iterable-list-methods-2 (commit 82ee01f):

    xs_page = await maybe_await(client.xs().list(limit=10))
    assert isinstance(xs_page, ListPage)
    assert isinstance(xs_page.items, list)
    assert isinstance(xs_page.items[0], XShort)

Covers: actors, actor-env-vars, actor-versions, builds (user + per-actor), datasets,
key-value-stores, key-value-store keys (incl. signature variant), request-queues,
request-queue requests (list + batch-add + batch-delete polls), runs (multi-status +
user runs + task runs), schedules, store, tasks, webhooks, webhook dispatches, log
(build listing).

Where a listing may legitimately be empty (user's own actors, user's own datasets/
KVSs/RQs/runs/builds, new task's webhooks, webhook dispatches), the element-type
assertion is guarded with `if xs_page.items:` rather than asserting `items[0]`.

All `ListOf*` imports from `_models_generated` in integration tests replaced with the
item-type import (e.g. `ActorShort`, `BuildShort`, `KeyValueStoreKey`, `Request`) plus
`ListPage` from `_iterable_list_page`.

No source changes. 258 integration tests collect cleanly; 521 unit tests pass.

https://claude.ai/code/session_011VSSFo89Z9LfyFqZGsJKfz
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 39 out of 39 changed files in this pull request and generated 13 comments.

Comments suppressed due to low confidence (1)

src/apify_client/_resource_clients/request_queue.py:533

  • The Args section documents cursor and exclusive_start_id twice, which is confusing for users and will be duplicated in generated docs. Remove the duplicate lines so each parameter is described once (including how exclusive_start_id only applies to the first page during iteration).
        Args:
            limit: How many requests to retrieve.
            filter: List of request states to use as a filter. Multiple values mean union of the given filters.
            cursor: A token returned in a previous API response, to continue listing the next page of requests.
            exclusive_start_id: (deprecated) All requests up to this one (including) are skipped from the result.
                Only applied to the first page fetched; subsequent pages during iteration use `cursor`.
            chunk_size: Maximum number of requests requested per API call when iterating. Only
                relevant when iterating across pages.
            timeout: Timeout for the API HTTP request.
            cursor: A token returned in previous API response, to continue listing next page of requests
            exclusive_start_id: (deprecated) All requests up to this one (including) are skipped from the result.
        """

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/apify_client/_resource_clients/key_value_store.py
Comment thread src/apify_client/_iterable_list_page.py Outdated
Comment thread tests/integration/test_request_queue.py
Comment thread tests/integration/test_request_queue.py
Comment thread tests/integration/test_request_queue.py
Comment thread docs/02_concepts/08_pagination.mdx
Comment thread docs/02_concepts/code/08_pagination_sync.py
Comment thread docs/02_concepts/code/08_pagination_async.py
Comment thread tests/integration/test_webhook.py
Comment thread tests/integration/test_request_queue.py
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 40 out of 40 changed files in this pull request and generated 9 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/apify_client/_pagination_classes.py Outdated
Comment thread src/apify_client/_pagination.py
Comment thread tests/integration/test_key_value_store.py
Comment thread tests/integration/test_webhook.py
Comment thread tests/integration/test_request_queue.py
Comment thread tests/integration/test_key_value_store.py
Comment thread src/apify_client/_resource_clients/request_queue.py Outdated
Comment thread tests/integration/test_request_queue.py
Comment thread docs/02_concepts/08_pagination.mdx Outdated
@Pijukatel Pijukatel force-pushed the iterable-list-methods-2 branch from 0ec418d to dcaa4b7 Compare April 28, 2026 15:43
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 28, 2026

Codecov Report

❌ Patch coverage is 97.02602% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.00%. Comparing base (ff9817c) to head (451db5d).

Files with missing lines Patch % Lines
src/apify_client/_pagination.py 81.81% 16 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #760      +/-   ##
==========================================
+ Coverage   95.92%   96.00%   +0.07%     
==========================================
  Files          48       50       +2     
  Lines        5226     5600     +374     
==========================================
+ Hits         5013     5376     +363     
- Misses        213      224      +11     
Flag Coverage Δ
integration 96.00% <97.02%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Pijukatel and others added 2 commits April 29, 2026 09:31
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 40 out of 40 changed files in this pull request and generated 7 comments.

Comments suppressed due to low confidence (1)

src/apify_client/_resource_clients/dataset.py:44

  • DatasetItemsPage docstring states it is returned by list_items, but list_items() now returns IterablePageOfDatasetItems / IterablePageOfDatasetItemsAsync. This makes the class documentation misleading; either update the docstring to reflect current usage (e.g., legacy/compat only) or remove/replace the type if it’s no longer part of the public surface.
@docs_group('Other')
class DatasetItemsPage(BaseModel):
    """A page of dataset items returned by the `list_items` method.

    Dataset items are arbitrary JSON objects stored in the dataset, so they cannot be
    represented by a specific Pydantic model. This class provides pagination metadata
    along with the raw items.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/apify_client/_resource_clients/dataset.py Outdated
Comment thread src/apify_client/_resource_clients/dataset.py Outdated
Comment thread src/apify_client/_resource_clients/request_queue.py
Comment thread src/apify_client/_pagination_classes.py Outdated
Comment thread docs/02_concepts/code/08_pagination_sync.py
Comment thread src/apify_client/_resource_clients/key_value_store.py Outdated
Comment thread src/apify_client/_models_generated.py Outdated
@Pijukatel
Copy link
Copy Markdown
Contributor Author

Will be done in a new PR for human review to avoid the AI review clutter and numerous commits

@Pijukatel Pijukatel closed this Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants