feat: modernize langchain integration native components by daveomri · Pull Request #29 · apify/langchain-apify

daveomri · 2026-04-24T09:50:27Z

Summary

Second PR on top of feat/modernize-langchain-integratio; builds on the core tools PR and adds the LangChain-native components layer: two actor-specific tools, a search retriever, and a crawl loader. Upcoming PRs will extend this with social media tools and documentation to feat/modernize-langchain-integration before merging it all to main.

New code: ~690 lines - Tests: ~545 lines

Note on scope: This PR is intentionally scoped to the four native LangChain components (BaseTool for search/crawl, BaseRetriever, BaseLoader). Social-media and scraping Actor tools, docs, and example notebooks will be as follow-up PRs.

ApifyGoogleSearchTool (_actor_tools.py)
- Wraps apify/google-search-scraper behind a simplified, LLM-friendly interface. Returns a JSON array of {title, url, description} objects. Inherits _ApifyGenericTool's safety clamping and handle_tool_error=True.
ApifyWebCrawlerTool (_actor_tools.py)
- Wraps apify/website-content-crawler. Returns a JSON array of {url, title, content (markdown)} objects with configurable max_crawl_pages, max_crawl_depth, and crawler_type. Reuses _clamp_timeout / _clamp_items for safety ceilings.
ApifySearchRetriever (retrievers.py)
- New BaseRetriever backed by apify/rag-web-browser. Provides both _get_relevant_documents (sync) and _aget_relevant_documents (async) via ApifyClient / ApifyClientAsync. Yields Document objects with source and title metadata, ready to drop into any LangChain RAG pipeline. Actor-run logs suppressed via logger=None.
ApifyCrawlLoader (document_loaders.py)
- New BaseLoader that wraps ApifyToolsClient.crawl_website and maps each crawled page to a Document with source, title, and crawl_depth metadata. Supports both load() and lazy_load().
APIFY_ACTOR_TOOLS convenience list
- New list[type] exported alongside APIFY_CORE_TOOLS for selective agent binding: [ApifyGoogleSearchTool, ApifyWebCrawlerTool].
ApifyToolsClient additions (_client.py)
- Three new methods powering the native components: google_search, crawl_website, and rag_web_search. All reuse the existing run_actor_and_get_items + _list_items_or_raise plumbing, so timeout/memory/dataset error handling is consistent with core tools.
Backward compatible
- No changes to public API of any pre-existing class.
Tests
- New unit tests for every new component: test_actor_tools.py (~184 lines), test_retrievers.py (~224 lines, sync + async), expanded test_document_loaders.py (+139 lines covering ApifyCrawlLoader), and test_client.py (+151 lines for the three new client methods). Error scenarios covered: missing token, Actor run failure, network error, empty / missing-metadata results, markdown vs. text fallback, async path.

Review strategy

Merge strategy

This PR targets feat/modernize-langchain-integration, not main. It depends on the core tools PR being merged first; _actor_tools.py subclasses _ApifyGenericTool and the loader relies on ApifyToolsClient. Once core tools is merged into the integration branch, this PR will be rebased and opened for review. Social-media tools and docs will follow as separate PRs on the same integration branch before the final merge to main.

…input schemas

…mline client handling and error management

…media tools for apify integration

… Apify tools

…un_task methods

…ms and message for empty dataset

… api interaction

…y tools to enforce safety constraints

…and maintability; update test cases for better formatting and error handling

…tandards

…oolsclient

…to apify_api_token

…ders

…eat/modernize-langchain-integration-native-components

daveomri added 30 commits April 20, 2026 16:12

feat: implement apifyclient wrapper

8cad430

feat: removed redundant const file

2404b9c

feat: add few more input schemas, helpers and tool classes

b1a89a4

feat: export new tools from __init__

0aa9175

feat: add unit tests

4e46d36

feat: implement tests and introduce tools list

fc6ef12

fix: lint fix

cc5be9e

feat: enhance error handling and documentation for apify tools

c2b9cb6

fix: iso format fix

3edf126

feat: add apify run task and apify run task and get items tools with …

8c36edc

…input schemas

feat: introduce _ApifyGenericTool base class for Apify tools to strea…

026175a

…mline client handling and error management

feat: add _actor_tools.py file to define upcomming search and social …

110c971

…media tools for apify integration

fix: add try/except to match others

a08f63e

fix: update timeout constants and improve input schema descripiton in…

d028531

… Apify tools

fix: enhance error handling for missing dataset id in run_actor and r…

429a3ed

…un_task methods

fix: update apifygetdatasetitemstool to return a json object with ite…

b914e47

…ms and message for empty dataset

feat: add integration smoke tests for generic Apify tools to validate…

0f71181

… api interaction

feat: implement clamping for timeout, memory, and item limits in apif…

50c52f2

…y tools to enforce safety constraints

feat: clean up _actor_tools.py and tools.py for improved readibility …

ba179a6

…and maintability; update test cases for better formatting and error handling

feat: add three new tools to _client.py

da900ce

feat: implement apifygooglesearchtool and apifywebcrawlertool

ff6ffeb

feat: implement a apify search retrievel

6e8888c

feat: add apify crawl loader to document_loaders.py

b124ce1

feat: update __init__

029b9e1

feat: add unit tests

c7ee287

feat: add actor tools unit tests

ec60765

feat: add retrievers unit tests

c077186

feat: simplify apify crawl loader init and enhance unit tests

0b4ecbb

ref: align private scope conventions with langchain partner package s…

005294b

…tandards

ref: migrate auth to SecretStr + secret_from_env pattern

2f74c29

daveomri added 12 commits April 23, 2026 13:38

fix: backward-compat fix

6258b2b

fix: update stale doc string

2905b67

chore: removed redundant file

3238c02

fix: extracted repeated code, fixed secretstr compatibility to apifyt…

92df406

…oolsclient

fix: set min value to timeout, memory and items, add exlude and repr …

3a0f666

…to apify_api_token

feat: added repr and exclude to apify api token

8614cfd

feat: add type checking to apify core tools list

2bf130a

feat: add tests for clamped values and apify api token

98293d4

fix: lint fix

863ed8d

ref: update apify_api_token type to support SecretStr in document loa…

70527e0

…ders

Merge branch 'feat/modernize-langchain-integration-core-tools' into f…

797b7f9

…eat/modernize-langchain-integration-native-components

fix: turn off logger for ApifySearchRetrieval

f005bc5

daveomri self-assigned this Apr 24, 2026

daveomri changed the title ~~Feat: modernize langchain integration native components~~ feat: modernize langchain integration native components Apr 24, 2026

daveomri added 2 commits April 24, 2026 12:03

fix: fix lint errors

dd08098

fix: tests fix

2804a5c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: modernize langchain integration native components#29

feat: modernize langchain integration native components#29
daveomri wants to merge 44 commits intofeat/modernize-langchain-integrationfrom
feat/modernize-langchain-integration-native-components

daveomri commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

daveomri commented Apr 24, 2026

Summary

Review strategy

Merge strategy

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants