Feat/modernize langchain integration crawl tools by daveomri · Pull Request #31 · apify/langchain-apify

daveomri · 2026-04-29T08:12:33Z

Summary

Third PR on top of feat/modernize-langchain-integration; builds on the [native components PR](https://github.com/apify/langchain-apify/tree/feat/modernize-langchain-integration-native-components) and adds the Search & Crawling Actor tools layer: four new BaseTool subclasses wrapping search, maps, video, and e-commerce Actors. Upcoming PR will fold this together with social-media tools and documentation onto [feat/modernize-langchain-integration](https://github.com/apify/langchain-apify/tree/feat/modernize-langchain-integration) before merging to main.

New code: ~426 lines — Tests: ~337 lines

Note on scope: This PR is intentionally scoped to the four Search & Crawling Actor tools called out in US-4 (RAG Web Browser, Google Maps, YouTube, E-commerce). Social-media Actor tools and the integration documentation will land as follow-up PRs.

ApifyRAGWebBrowserTool ([_actor_tools.py](langchain_apify/_actor_tools.py))
- Wraps apify/rag-web-browser. Returns JSON with run metadata (run_id / status / dataset_id / timestamps) and items (crawled-page dicts). Distinct from ApifySearchRetriever (which returns LangChain Document objects); this tool returns JSON for agent tool-calling.
ApifyGoogleMapsTool ([_actor_tools.py](langchain_apify/_actor_tools.py))
- Wraps compass/crawler-google-places. Required query, optional max_results (default 10) and language (ISO code). Returns JSON with run metadata and items (place dicts).
ApifyYouTubeScraperTool ([_actor_tools.py](langchain_apify/_actor_tools.py))
- Wraps streamers/youtube-scraper. Required search_query, optional search_type: Literal['search', 'video', 'channel'] (default search), max_results (default 10). Tight Literal at the LLM boundary, loose str + runtime ValueError at the _client.py boundary so direct callers get the same protection.
ApifyEcommerceScraperTool ([_actor_tools.py](langchain_apify/_actor_tools.py))
- Wraps apify/e-commerce-scraping-tool. Required url, optional max_results (default 20). Bare-URL design intentionally keeps the LLM-facing surface minimal; selector hints can be added later if real users hit empty-result issues.
APIFY_SEARCH_TOOLS convenience list
- New list[type[BaseTool]] exported alongside APIFY_CORE_TOOLS and APIFY_ACTOR_TOOLS for selective agent binding: [ApifyRAGWebBrowserTool, ApifyGoogleMapsTool, ApifyYouTubeScraperTool, ApifyEcommerceScraperTool].
ApifyToolsClient additions ([_client.py](langchain_apify/_client.py))
- Three new methods (google_maps_search, youtube_scrape, ecommerce_scrape) and one rename + signature change: rag_web_search → rag_web_browser_search, now returning (run, items) like the other helpers so the tool layer can build _run_meta(run). All four reuse the existing run_actor_and_get_items plumbing — transport-error wrapping and _check_run_status come for free.
ApifySearchRetriever ([retrievers.py](langchain_apify/retrievers.py))
- Single call site updated to consume the new tuple return (_, items = self._client.rag_web_browser_search(...)); behaviour and Document shape are unchanged.
Backward compatible
- No changes to public API of any pre-existing class. ApifyActorsTool / ApifyDatasetLoader / ApifyWrapper untouched. The rag_web_search rename is internal — only the retriever consumed it, and that's updated in-tree.
Tests
- 35 new unit tests covering: input-mapping per helper (asserts Actor ID + run_input keys), youtube_scrape enum validation, happy-path JSON shape per tool, parametrized _TOOL_INVOCATIONS battery covering RuntimeError → ToolException, empty-dataset, handle_tool_error=True swallow, missing-token, plus inheritance / metadata / APIFY_SEARCH_TOOLS membership. Existing test_retrievers.py tests rewired for the new tuple-return helper.

Review strategy

Merge strategy

This PR targets feat/modernize-langchain-integration, not main. It depends on the [native components PR](https://github.com/apify/langchain-apify/tree/feat/modernize-langchain-integration-native-components) being merged first — _actor_tools.py extends the file introduced there and consumes _run_meta / _ApifyGenericTool from tools.py. Once native components is merged into the integration branch, this PR will be rebased and opened for review. Social-media tools and docs will follow as separate PRs on the same integration branch before the final merge to main.

…input schemas

…mline client handling and error management

…media tools for apify integration

… Apify tools

…un_task methods

…ms and message for empty dataset

… api interaction

…y tools to enforce safety constraints

…and maintability; update test cases for better formatting and error handling

…tandards

…eat/modernize-langchain-integration-native-components

…ser_search, google_maps_search, youtube_scrape, ecommerce_scrape)

…t helpers

…raperTool, ApifyEcommerceScraperTool

…andle_tool_error)

…craping-tool

daveomri added 30 commits April 20, 2026 16:12

feat: implement apifyclient wrapper

8cad430

feat: removed redundant const file

2404b9c

feat: add few more input schemas, helpers and tool classes

b1a89a4

feat: export new tools from __init__

0aa9175

feat: add unit tests

4e46d36

feat: implement tests and introduce tools list

fc6ef12

fix: lint fix

cc5be9e

feat: enhance error handling and documentation for apify tools

c2b9cb6

fix: iso format fix

3edf126

feat: add apify run task and apify run task and get items tools with …

8c36edc

…input schemas

feat: introduce _ApifyGenericTool base class for Apify tools to strea…

026175a

…mline client handling and error management

feat: add _actor_tools.py file to define upcomming search and social …

110c971

…media tools for apify integration

fix: add try/except to match others

a08f63e

fix: update timeout constants and improve input schema descripiton in…

d028531

… Apify tools

fix: enhance error handling for missing dataset id in run_actor and r…

429a3ed

…un_task methods

fix: update apifygetdatasetitemstool to return a json object with ite…

b914e47

…ms and message for empty dataset

feat: add integration smoke tests for generic Apify tools to validate…

0f71181

… api interaction

feat: implement clamping for timeout, memory, and item limits in apif…

50c52f2

…y tools to enforce safety constraints

feat: clean up _actor_tools.py and tools.py for improved readibility …

ba179a6

…and maintability; update test cases for better formatting and error handling

feat: add three new tools to _client.py

da900ce

feat: implement apifygooglesearchtool and apifywebcrawlertool

ff6ffeb

feat: implement a apify search retrievel

6e8888c

feat: add apify crawl loader to document_loaders.py

b124ce1

feat: update __init__

029b9e1

feat: add unit tests

c7ee287

feat: add actor tools unit tests

ec60765

feat: add retrievers unit tests

c077186

feat: simplify apify crawl loader init and enhance unit tests

0b4ecbb

ref: align private scope conventions with langchain partner package s…

005294b

…tandards

ref: migrate auth to SecretStr + secret_from_env pattern

2f74c29

daveomri added 30 commits April 28, 2026 10:40

ref: simplify ApifyToolsClient.__init__ to require explicit token

1360e92

docs: add module-level docstring to tools.py

09b6c6e

ref: rename model_post_init parameter to

a5bd7cc

Merge branch 'feat/modernize-langchain-integration-core-tools' into f…

e0f15e8

…eat/modernize-langchain-integration-native-components

revert: restore env-fallback

23242c1

Merge branch 'feat/modernize-langchain-integration-core-tools' into f…

8f9afe6

…eat/modernize-langchain-integration-native-components

chore: drop placeholder section in _actor_tools.py

7ea3e8c

chore: align APIFY_ACTOR_TOOLS type hint with APIFY_CORE_TOOLS

700e5ab

feat: constrain crawler_type to a Literal of valid Apify values

c0dd11e

feat: clamp max_crawl_depth in ApifyWebCrawlerTool

0189943

feat: expose timeout_secs in ApifyGoogleSearchInput

6d2422d

ref: accept SecretStr token in ApifyCrawlLoader

2dfecd7

docs: clarify ApifyCrawlLoader.lazy_load is not truly lazy

9c81785

ref: rewrite ApifySearchRetriever to use ApifyToolsClient

49dd4f0

fix: normalise locale codes to lowercase to match Apify Actor schema

a060c14

fix: extract source URL from metadata.url for apify/rag-web-browser

a908467

feat: add Search & Crawling helpers to ApifyToolsClient (rag_web_brow…

b2290a7

…ser_search, google_maps_search, youtube_scrape, ecommerce_scrape)

feat: cover input mapping and enum validation for new ApifyToolsClien…

c4d133b

…t helpers

feat: add ApifyRAGWebBrowserTool, ApifyGoogleMapsTool, ApifyYouTubeSc…

f5dd607

…raperTool, ApifyEcommerceScraperTool

feat: expose search tools and APIFY_SEARCH_TOOLS from langchain_apify

5368645

test: cover search tools (happy path + parametrized error / empty / h…

1392e0b

…andle_tool_error)

fix: lint fix

45a62d1

fix: send correct detailsUrls/maxProductResults to apify/e-commerce-s…

3db07fb

…craping-tool

fix: return flat [{url,title,content}] array per spec

c973123

feat: support category URLs via url_type parameter

6b825af

fix: use listingUrls (not categoryUrls) for category-mode

ddb4373

fix: use canonical searchQueries (array) field, not searchKeywords

c5607d8

fix: rename actor search group

250e1ac

fix: test fix

f4cf20e

fix: merge tools

1c7aa14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/modernize langchain integration crawl tools#31

Feat/modernize langchain integration crawl tools#31
daveomri wants to merge 78 commits intofeat/modernize-langchain-integrationfrom
feat/modernize-langchain-integration-crawl-tools

daveomri commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

daveomri commented Apr 29, 2026

Summary

Review strategy

Merge strategy

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants