feat: modernize langchain integration native components#29
Draft
daveomri wants to merge 44 commits intofeat/modernize-langchain-integrationfrom
Draft
feat: modernize langchain integration native components#29daveomri wants to merge 44 commits intofeat/modernize-langchain-integrationfrom
daveomri wants to merge 44 commits intofeat/modernize-langchain-integrationfrom
Conversation
…mline client handling and error management
…media tools for apify integration
…ms and message for empty dataset
…y tools to enforce safety constraints
…and maintability; update test cases for better formatting and error handling
…to apify_api_token
…eat/modernize-langchain-integration-native-components
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Second PR on top of
feat/modernize-langchain-integratio;builds on the core tools PR and adds the LangChain-native components layer: two actor-specific tools, a search retriever, and a crawl loader. Upcoming PRs will extend this with social media tools and documentation to feat/modernize-langchain-integration before merging it all tomain.New code: ~690 lines - Tests: ~545 lines
ApifyGoogleSearchTool(_actor_tools.py)apify/google-search-scraperbehind a simplified, LLM-friendly interface. Returns a JSON array of{title, url, description}objects. Inherits_ApifyGenericTool's safety clamping andhandle_tool_error=True.ApifyWebCrawlerTool(_actor_tools.py)apify/website-content-crawler. Returns a JSON array of{url, title, content (markdown)}objects with configurablemax_crawl_pages,max_crawl_depth, andcrawler_type. Reuses_clamp_timeout/_clamp_itemsfor safety ceilings.ApifySearchRetriever(retrievers.py)BaseRetrieverbacked byapify/rag-web-browser. Provides both_get_relevant_documents(sync) and_aget_relevant_documents(async) viaApifyClient/ApifyClientAsync. YieldsDocumentobjects withsourceandtitlemetadata, ready to drop into any LangChain RAG pipeline. Actor-run logs suppressed vialogger=None.ApifyCrawlLoader(document_loaders.py)BaseLoaderthat wrapsApifyToolsClient.crawl_websiteand maps each crawled page to aDocumentwithsource,title, andcrawl_depthmetadata. Supports bothload()andlazy_load().APIFY_ACTOR_TOOLSconvenience listlist[type]exported alongsideAPIFY_CORE_TOOLSfor selective agent binding:[ApifyGoogleSearchTool, ApifyWebCrawlerTool].ApifyToolsClientadditions (_client.py)google_search,crawl_website, andrag_web_search. All reuse the existingrun_actor_and_get_items+_list_items_or_raiseplumbing, so timeout/memory/dataset error handling is consistent with core tools.test_actor_tools.py(~184 lines),test_retrievers.py(~224 lines, sync + async), expandedtest_document_loaders.py(+139 lines coveringApifyCrawlLoader), andtest_client.py(+151 lines for the three new client methods). Error scenarios covered: missing token, Actor run failure, network error, empty / missing-metadata results, markdown vs. text fallback, async path.Review strategy
Suggested reading order:
_client.py: the three newApifyToolsClientmethods (google_search,crawl_website,rag_web_search); each follows the same pattern as the core-tools methods_actor_tools.py: the two_ApifyGenericToolsubclasses (homogeneous, once one clicks the other reads fast)retrievers.py:ApifySearchRetrieverwith its sync/async pair and the shared_items_to_documentshelperdocument_loaders.py: the newApifyCrawlLoaderalongside the existingApifyDatasetLoaderMerge strategy
This PR targets
feat/modernize-langchain-integration, notmain. It depends on the core tools PR being merged first;_actor_tools.pysubclasses_ApifyGenericTooland the loader relies onApifyToolsClient. Once core tools is merged into the integration branch, this PR will be rebased and opened for review. Social-media tools and docs will follow as separate PRs on the same integration branch before the final merge tomain.