feat: Make list methods of CollectionClients iterable#760
feat: Make list methods of CollectionClients iterable#760
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds first-class pagination iteration to the Python Apify client by making list()-style methods return objects that preserve first-page metadata while also supporting (async) iteration across subsequent API pages.
Changes:
- Introduces
IterableListPage/IterableListPageAsyncand helper builders for offset- and cursor-based pagination. - Updates multiple sync/async resource-client
list()methods (and storage list methods like dataset items / KVS keys / RQ requests) to return iterable pages. - Adds unit tests covering pagination behavior across many clients and option combinations.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 22 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/test_client_pagination.py | Adds end-to-end pagination tests (offset + cursor) for sync/async clients using an HTTP test server. |
| src/apify_client/_iterable_list_page.py | New pagination wrappers + builders enabling iteration/awaiting behavior. |
| src/apify_client/_resource_clients/actor_collection.py | Makes Actors collection list() iterable (sync + async). |
| src/apify_client/_resource_clients/build_collection.py | Makes Builds collection list() iterable (sync + async). |
| src/apify_client/_resource_clients/run_collection.py | Makes Runs collection list() iterable (sync + async). |
| src/apify_client/_resource_clients/schedule_collection.py | Makes Schedules collection list() iterable (sync + async). |
| src/apify_client/_resource_clients/task_collection.py | Makes Tasks collection list() iterable (sync + async). |
| src/apify_client/_resource_clients/webhook_collection.py | Makes Webhooks collection list() iterable (sync + async). |
| src/apify_client/_resource_clients/webhook_dispatch_collection.py | Makes Webhook dispatches list() iterable (sync + async), including empty-list handling. |
| src/apify_client/_resource_clients/store_collection.py | Makes Store actors list() iterable (sync + async). |
| src/apify_client/_resource_clients/dataset_collection.py | Makes Datasets collection list() iterable (sync + async). |
| src/apify_client/_resource_clients/key_value_store_collection.py | Makes KVS collection list() iterable (sync + async). |
| src/apify_client/_resource_clients/request_queue_collection.py | Makes Request queues collection list() iterable (sync + async). |
| src/apify_client/_resource_clients/dataset.py | Makes list_items() iterable; deprecates iterate_items() by delegating to list_items(). |
| src/apify_client/_resource_clients/key_value_store.py | Makes list_keys() iterable; deprecates iterate_keys() by delegating to list_keys(). |
| src/apify_client/_resource_clients/request_queue.py | Makes list_requests() iterable (cursor-based); adds chunk_size and keeps mutual-exclusion validation. |
| src/apify_client/_resource_clients/actor_env_var_collection.py | Makes env var collection list() iterable (sync + async). |
| src/apify_client/_resource_clients/actor_version_collection.py | Makes version collection list() iterable (sync + async). |
Comments suppressed due to low confidence (1)
src/apify_client/_resource_clients/request_queue.py:533
- The
Args:section listscursor/exclusive_start_idtwice, which is confusing and makes the docstring contradictory. Please remove the duplicated lines and keep a single description (including the deprecation note).
Args:
limit: How many requests to retrieve.
filter: List of request states to use as a filter. Multiple values mean union of the given filters.
cursor: A token returned in a previous API response, to continue listing the next page of requests.
exclusive_start_id: (deprecated) All requests up to this one (including) are skipped from the result.
Only applied to the first page fetched; subsequent pages during iteration use `cursor`.
chunk_size: Maximum number of requests requested per API call when iterating. Only
relevant when iterating across pages.
timeout: Timeout for the API HTTP request.
cursor: A token returned in previous API response, to continue listing next page of requests
exclusive_start_id: (deprecated) All requests up to this one (including) are skipped from the result.
"""
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…eturn Align every integration test that calls a `list()` / `list_keys()` / `list_requests()` method with the pattern introduced on iterable-list-methods-2 (commit 82ee01f): xs_page = await maybe_await(client.xs().list(limit=10)) assert isinstance(xs_page, ListPage) assert isinstance(xs_page.items, list) assert isinstance(xs_page.items[0], XShort) Covers: actors, actor-env-vars, actor-versions, builds (user + per-actor), datasets, key-value-stores, key-value-store keys (incl. signature variant), request-queues, request-queue requests (list + batch-add + batch-delete polls), runs (multi-status + user runs + task runs), schedules, store, tasks, webhooks, webhook dispatches, log (build listing). Where a listing may legitimately be empty (user's own actors, user's own datasets/ KVSs/RQs/runs/builds, new task's webhooks, webhook dispatches), the element-type assertion is guarded with `if xs_page.items:` rather than asserting `items[0]`. All `ListOf*` imports from `_models_generated` in integration tests replaced with the item-type import (e.g. `ActorShort`, `BuildShort`, `KeyValueStoreKey`, `Request`) plus `ListPage` from `_iterable_list_page`. No source changes. 258 integration tests collect cleanly; 521 unit tests pass. https://claude.ai/code/session_011VSSFo89Z9LfyFqZGsJKfz
a7580b3 to
8d5fafa
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 39 out of 39 changed files in this pull request and generated 13 comments.
Comments suppressed due to low confidence (1)
src/apify_client/_resource_clients/request_queue.py:533
- The Args section documents
cursorandexclusive_start_idtwice, which is confusing for users and will be duplicated in generated docs. Remove the duplicate lines so each parameter is described once (including howexclusive_start_idonly applies to the first page during iteration).
Args:
limit: How many requests to retrieve.
filter: List of request states to use as a filter. Multiple values mean union of the given filters.
cursor: A token returned in a previous API response, to continue listing the next page of requests.
exclusive_start_id: (deprecated) All requests up to this one (including) are skipped from the result.
Only applied to the first page fetched; subsequent pages during iteration use `cursor`.
chunk_size: Maximum number of requests requested per API call when iterating. Only
relevant when iterating across pages.
timeout: Timeout for the API HTTP request.
cursor: A token returned in previous API response, to continue listing next page of requests
exclusive_start_id: (deprecated) All requests up to this one (including) are skipped from the result.
"""
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 40 out of 40 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
0ec418d to
dcaa4b7
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #760 +/- ##
==========================================
+ Coverage 95.92% 96.00% +0.07%
==========================================
Files 48 50 +2
Lines 5226 5600 +374
==========================================
+ Hits 5013 5376 +363
- Misses 213 224 +11
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 40 out of 40 changed files in this pull request and generated 7 comments.
Comments suppressed due to low confidence (1)
src/apify_client/_resource_clients/dataset.py:44
DatasetItemsPagedocstring states it is returned bylist_items, butlist_items()now returnsIterablePageOfDatasetItems/IterablePageOfDatasetItemsAsync. This makes the class documentation misleading; either update the docstring to reflect current usage (e.g., legacy/compat only) or remove/replace the type if it’s no longer part of the public surface.
@docs_group('Other')
class DatasetItemsPage(BaseModel):
"""A page of dataset items returned by the `list_items` method.
Dataset items are arbitrary JSON objects stored in the dataset, so they cannot be
represented by a specific Pydantic model. This class provides pagination metadata
along with the raw items.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Will be done in a new PR for human review to avoid the AI review clutter and numerous commits |
Description
Example usage
Issues
Testing
Checklist