Skip to content

feat: modernize langchain integration core tools#28

Draft
daveomri wants to merge 31 commits intofeat/modernize-langchain-integrationfrom
feat/modernize-langchain-integration-core-tools
Draft

feat: modernize langchain integration core tools#28
daveomri wants to merge 31 commits intofeat/modernize-langchain-integrationfrom
feat/modernize-langchain-integration-core-tools

Conversation

@daveomri
Copy link
Copy Markdown
Collaborator

@daveomri daveomri commented Apr 21, 2026

Summary

First PR into feat/modernize-langchain-integration; adds the foundational tools layer and modernizes auth conventions across the package. Upcoming PRs will add search & crawling tools, social media tools, LangChain-native components, and docs to feat/modernize-langchain-integration before merging it all to main.

New code: ~880 lines - Tests: ~1000 lines

Note on scope: While building the new tools layer, I spotted several pre-existing issues in the legacy code (plain-string token handling, outdated get_from_dict_or_env + mode='before' validator pattern, tokens leaking into model_dump() / repr()). Because the new tools reuse the same SecretStr-based auth, keeping two parallel conventions in the package would have been confusing and short-lived, so I folded the fixes into this PR. Remaining, more independent improvements I noticed along the way (e.g., docs & examples refresh, LangChain-native component additions, actor-specific tool classes) will be split out as follow-up tasks on the integration branch rather than bundled here.


  • ApifyToolsClient (_client.py)
    • Internal helper wrapping ApifyClient, one method per tool operation. Accepts both SecretStr and raw str tokens and falls back to the APIFY_API_TOKEN env var. Shared _list_items_or_raise helper wraps dataset-fetch errors into RuntimeError.
  • 6 new BaseTool subclasses
    • ApifyRunActorTool, ApifyGetDatasetItemsTool, ApifyRunActorAndGetItemsTool, ApifyScrapeUrlTool, ApifyRunTaskTool, ApifyRunTaskAndGetItemsTool. Exported via the APIFY_CORE_TOOLS: list[type[BaseTool]] convenience list for selective agent binding.
  • _ApifyGenericTool base class
    • Common client handling, handle_tool_error=True, developer-controlled safety clamping (_clamp_timeout, _clamp_memory, _clamp_items) with configurable ceilings (max_timeout_secs, max_memory_mbytes, max_items) and hardcoded floor of 1 to enforce API protocol minimums.
  • Auth pattern modernized (document_loaders.py, wrappers.py, tools.py)
    • Replaced legacy get_from_dict_or_env + @model_validator(mode='before') with SecretStr field type and secret_from_env('APIFY_API_TOKEN', default=None) default factory, matching langchain-openai / langchain-anthropic conventions. Tokens are automatically redacted in logs/traces and additionally excluded from model_dump() / repr() via exclude=True, repr=False. Client construction moved to @model_validator(mode='after') / model_post_init. Added populate_by_name=True to ConfigDict on loader and wrapper. The new tools reuse this same auth pattern; fixing it here avoids shipping two parallel conventions across the package.
  • Backward compatible
    • ApifyActorsTool, ApifyDatasetLoader, ApifyWrapper retain their public API; auth changes are internal.
  • Tests
    • Unit tests for all tools & client (~1000 lines across test_tools.py, test_client.py, test_document_loaders.py), integration smoke tests under tests/integration_tests/, and error-scenario coverage (missing token, run failure, network error, clamp floor/ceiling, token excluded from model_dump, APIFY_TOKEN env-var fallback on the loader).

Review strategy

The diff is larger than a typical PR (~1.9k lines, half of which is tests). Suggested reading order to make it tractable:

  1. _client.py: the new ApifyToolsClient abstraction
  2. _ApifyGenericTool base class in tools.py, then the 6 tool classes (homogeneous, once one clicks, the rest read fast)
  3. Auth diff in document_loaders.py, wrappers.py, and the ApifyActorsTool.__init__ change in tools.py
  4. Tests last: mostly linear, grouped by the module they cover

Merge strategy

This PR targets feat/modernize-langchain-integration, not main. The plan is to accumulate all reviewed modernization work (core tools -> actor-specific tools -> LangChain-native components -> docs) on that integration branch, then open a single PR from feat/modernize-langchain-integration -> main once everything is complete and reviewed. Any smaller, pre-existing issues I find along the way will be split out as separate follow-up tasks on the integration branch rather than bundled into the larger PRs.

@daveomri daveomri self-assigned this Apr 21, 2026
@daveomri daveomri changed the title Feat: modernize langchain integration core tools feat: modernize langchain integration core tools Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants