diff --git a/python/playwright/download-financial-statements/README.md b/python/playwright/download-financial-statements/README.md new file mode 100644 index 00000000..3349c064 --- /dev/null +++ b/python/playwright/download-financial-statements/README.md @@ -0,0 +1,155 @@ +# Playwright + Browserbase: Download Financial Statements (Python) + +## AT A GLANCE + +- Goal: Automatically download Apple's quarterly financial statements (PDFs) from their investor relations page. +- Uses pure Playwright with Browserbase SDK (no AI/Stagehand required). +- Demonstrates file downloads, page navigation, and the Browserbase downloads API. +- Docs → https://docs.browserbase.com/introduction/playwright + +## GLOSSARY + +- Browserbase SDK: Cloud browser infrastructure that provides managed browser sessions with built-in download handling + Docs → https://docs.browserbase.com/sdk +- CDP (Chrome DevTools Protocol): Low-level protocol for communicating with Chrome/Chromium browsers + Docs → https://chromedevtools.github.io/devtools-protocol/ +- Downloads API: Browserbase feature that captures and retrieves files downloaded during a session + Docs → https://docs.browserbase.com/features/file-downloads + +## QUICKSTART + +1. Create and activate a virtual environment: + ```bash + python -m venv venv + source venv/bin/activate # On Windows: venv\Scripts\activate + ``` + +2. Install dependencies: + ```bash + pip install -e . + playwright install chromium + ``` + +3. Set up environment variables: + ```bash + cp .env.example .env + # Add BROWSERBASE_API_KEY and BROWSERBASE_PROJECT_ID to .env + ``` + +4. Run the script: + ```bash + python main.py + ``` + +## EXPECTED OUTPUT + +- Console logs showing navigation through Apple's investor relations pages +- Live view URL to watch the automation in real-time +- Downloaded `downloaded_files.zip` containing quarterly financial statement PDFs +- Session replay URL for debugging + +## HOW IT WORKS + +**Navigation Flow:** + +1. Navigate to apple.com +2. Scroll to footer and click "Investors" link +3. Navigate to investor relations page +4. Scroll to "Quarterly Earnings Reports" section +5. Click year tab (2025) +6. Click "Financial Statements" links for Q1-Q4 + +**Download Handling:** + +1. Configure CDP download behavior to allow downloads +2. Click PDF links to trigger downloads +3. Poll Browserbase downloads API until files are ready +4. Save downloaded files as a zip archive + +## STAGEHAND VS PLAYWRIGHT + +This template uses **pure Playwright** for browser automation. The Stagehand v3 Python SDK uses a session-based API with **observe** (find actions) and **act** (execute an action), so you describe intent in natural language instead of writing selectors. Here's how they compare: + +| Task | Stagehand v3 — natural language (you describe intent) | Playwright — specific selectors (you target exact elements) | +| ------------ | ------------------------------------------------------- | ------------------------------------------------------------- | +| Click link | *"Click the Investors link"* | `page.get_by_role("link", name="Investors").click()` | +| Scroll | *"Scroll to the Financial Data section"* | `page.evaluate("window.scrollTo(...)")` | +| Find element | *"Find the Financial Statements link under Q4"* | `page.locator("text=Q4").locator("..").get_by_role("link", ...)` | + +**Example - Clicking a link:** + +```python +# Stagehand v3: natural language; observe finds the action, act runs it +session = await client.sessions.create(model_name="openai/gpt-5-nano") +await session.navigate(url="https://apple.com/investor") +observe_resp = await session.observe(instruction="Click the Investors link at the bottom of the page") +action = observe_resp.data.result[0].to_dict(exclude_none=True) +await session.act(input=action) + +# Playwright: you specify the exact element +await page.get_by_role("link", name="Investors").click() +``` + +**Example - Downloading quarterly statements:** + +```python +# Stagehand v3: describe what you want in plain language +observe_resp = await session.observe( + instruction="Click the Financial Statements link under Q4" +) +await session.act(input=observe_resp.data.result[0].to_dict(exclude_none=True)) + +# Playwright: build selector logic to find the right link +link = (page.locator("text=Q4").locator("..").locator("..") + .get_by_role("link", name="Financial Statements").first) +await link.click() +``` + +## COMMON PITFALLS + +- Missing credentials: verify .env contains BROWSERBASE_PROJECT_ID and BROWSERBASE_API_KEY +- Playwright not installed: run `playwright install chromium` after pip install +- Download timeout: increase retry_for_seconds if downloads are large or network is slow +- Page structure changes: Apple may update their investor relations page layout +- Find more information on your Browserbase dashboard → https://www.browserbase.com/sign-in + +## USE CASES + +- Financial data collection: Automate downloading quarterly/annual reports from investor relations pages. +- Document archival: Build automated pipelines to archive public financial documents. +- Compliance monitoring: Track and download regulatory filings as they're published. +- Research automation: Collect financial statements across multiple companies for analysis. + +## CUSTOMIZATION + +**Change target company:** +Modify the navigation flow in `main()` to target a different company's investor relations page. + +**Adjust download timeout:** + +```python +await save_downloads_with_retry(bb, session.id, 60) # 60 seconds timeout +``` + +**Download specific quarters:** + +```python +# Only download Q4 and Q3 +await click_financial_statements_link(page, "Q4") +await click_financial_statements_link(page, "Q3") +``` + +## NEXT STEPS + +- Add error recovery: Implement retry logic for failed navigation steps. +- Extract metadata: Parse downloaded PDFs to extract key financial metrics. +- Schedule automation: Run on a schedule to capture new filings as they're published. + +## HELPFUL RESOURCES + +📚 Stagehand Docs: https://docs.stagehand.dev/v2/first-steps/introduction +🎮 Browserbase: https://www.browserbase.com +💡 Try it out: https://www.browserbase.com/playground +🔧 Templates: https://www.browserbase.com/templates +📧 Need help? support@browserbase.com +💬 Discord: http://stagehand.dev/discord diff --git a/python/playwright/download-financial-statements/main.py b/python/playwright/download-financial-statements/main.py new file mode 100644 index 00000000..0af7b100 --- /dev/null +++ b/python/playwright/download-financial-statements/main.py @@ -0,0 +1,284 @@ +# Playwright + Browserbase: Download Apple's Quarterly Financial Statements +# See README.md for full documentation + +import asyncio +import os + +from browserbase import Browserbase +from dotenv import load_dotenv +from playwright.async_api import BrowserContext, Page, async_playwright + +# Load environment variables from .env file +# Required: BROWSERBASE_API_KEY, BROWSERBASE_PROJECT_ID +load_dotenv() + + +async def save_downloads_with_retry( + bb: Browserbase, session_id: str, retry_for_seconds: int = 30 +) -> int: + """ + Polls Browserbase API for downloads with timeout handling. + + Browserbase stores downloaded files during a session and makes them available + via API. Files may take a few seconds to process, so this function implements + retry logic to wait for downloads to be ready before retrieving them. + + Args: + bb: Browserbase client instance for API calls + session_id: The Browserbase session ID to retrieve downloads from + retry_for_seconds: Maximum time to wait for downloads (default: 30 seconds) + + Returns: + int: The size of the downloaded ZIP file in bytes + + Raises: + TimeoutError: If downloads aren't ready within the specified timeout + """ + print(f"Waiting up to {retry_for_seconds} seconds for downloads to complete...") + + # Track elapsed time to implement timeout without using threading timers + start_time = asyncio.get_event_loop().time() + timeout = retry_for_seconds + + while True: + elapsed = asyncio.get_event_loop().time() - start_time + + # Check if we've exceeded the timeout period + if elapsed >= timeout: + raise TimeoutError("Download timeout exceeded") + + try: + print("Checking for downloads...") + # Use asyncio.to_thread for synchronous Browserbase SDK calls + # This prevents blocking the event loop while waiting for API responses + response = await asyncio.to_thread(bb.sessions.downloads.list, session_id) + download_buffer = await asyncio.to_thread(response.read) + + # Check if downloads are ready (non-empty buffer indicates files are available) + if len(download_buffer) > 0: + print(f"Downloads ready! File size: {len(download_buffer)} bytes") + + # Save the ZIP file containing all downloaded PDFs to disk + with open("downloaded_files.zip", "wb") as f: + f.write(download_buffer) + print("Files saved as: downloaded_files.zip") + return len(download_buffer) + else: + print("Downloads not ready yet, retrying...") + except Exception as e: + print(f"Error fetching downloads: {e}") + raise + + # Poll every 2 seconds to check if downloads are ready + # This interval balances responsiveness with API rate limits + await asyncio.sleep(2) + + +async def scroll_to_text(page: Page, text: str) -> None: + """ + Scrolls to an element on the page by text content. + + Uses JavaScript evaluation to find elements containing the specified text + and smoothly scrolls them into view. + + Args: + page: Playwright page instance + text: The text content to search for and scroll to + """ + await page.evaluate( + """(searchText) => { + const elements = document.querySelectorAll('*'); + for (const el of elements) { + if (el.textContent?.includes(searchText)) { + el.scrollIntoView({ behavior: 'smooth', block: 'center' }); + break; + } + } + }""", + text, + ) + await asyncio.sleep(0.5) + + +async def click_financial_statements_link(page: Page, quarter: str) -> None: + """ + Clicks a Financial Statements link for a specific quarter. + + Uses context-aware selection to find the right link in the quarterly table. + Falls back to positional selection if the context-based approach fails. + + Args: + page: Playwright page instance + quarter: The quarter identifier (e.g., "Q1", "Q2", "Q3", "Q4") + + Raises: + Exception: If the Financial Statements link cannot be found + """ + print(f"Clicking Financial Statements link for {quarter}...") + + # Try to find the link by traversing from the quarter label to sibling links + link = ( + page.locator(f"text={quarter}") + .locator("..") + .locator("..") + .get_by_role("link", name="Financial Statements") + .first + ) + + link_exists = await link.count() > 0 + + if link_exists: + await link.click() + else: + # Fallback: find all Financial Statements links and click by position + # Q4 is first (index 0), Q3 is second (index 1), etc. + all_links = page.get_by_role("link", name="Financial Statements") + count = await all_links.count() + + quarter_positions = { + "Q4": 0, + "Q3": 1, + "Q2": 2, + "Q1": 3, + } + + position = quarter_positions.get(quarter) + if position is not None and position < count: + await all_links.nth(position).click() + else: + raise Exception(f"Could not find Financial Statements link for {quarter}") + + # Wait for download to initiate before clicking next link + await asyncio.sleep(2) + + +async def main(): + """ + Main application entry point. + + Orchestrates the entire PDF download automation process: + 1. Initializes Browserbase client and creates a session + 2. Connects Playwright to Browserbase via CDP + 3. Navigates to Apple's investor relations site + 4. Locates and clicks quarterly financial statement links + 5. Waits for downloads to process and saves them as a ZIP file + """ + print("Starting Apple Financial Statements Download Automation (Playwright)...") + + # Initialize Browserbase SDK client for cloud browser management + print("Initializing Browserbase client...") + bb = Browserbase(api_key=os.environ.get("BROWSERBASE_API_KEY")) + + # Create a new browser session in Browserbase cloud + print("Creating Browserbase session...") + session = bb.sessions.create(project_id=os.environ.get("BROWSERBASE_PROJECT_ID")) + print(f"Session created: https://browserbase.com/sessions/{session.id}") + + # Display live view URL for debugging and monitoring + live_view_links = bb.sessions.debug(session.id) + print(f"Live View: {live_view_links.debugger_fullscreen_url}") + + async with async_playwright() as playwright: + # Connect Playwright to Browserbase via Chrome DevTools Protocol (CDP) + # This gives direct control over the cloud-hosted browser + browser = await playwright.chromium.connect_over_cdp(session.connect_url) + + context: BrowserContext = browser.contexts[0] + if not context: + raise Exception("No browser context found") + + page: Page = context.pages[0] + if not page: + raise Exception("No page found in browser context") + + # Configure CDP to allow file downloads during the session + # eventsEnabled: true allows tracking download progress + client = await context.new_cdp_session(page) + await client.send( + "Browser.setDownloadBehavior", + { + "behavior": "allow", + "downloadPath": "downloads", + "eventsEnabled": True, + }, + ) + print("Download behavior configured") + + try: + # Navigate to Apple homepage with extended timeout for slow-loading sites + print("Navigating to Apple.com...") + await page.goto( + "https://www.apple.com/", + wait_until="domcontentloaded", + timeout=60000, + ) + + # Scroll to footer where investor links are located + print("Scrolling to footer...") + await page.evaluate("window.scrollTo(0, document.body.scrollHeight)") + await asyncio.sleep(1) + + # Navigate to investor relations section + print("Clicking Investors link...") + await page.get_by_role("link", name="Investors").click() + await page.wait_for_load_state("domcontentloaded") + print(f"Navigated to: {page.url}") + + # Scroll to the Financial Data section of the investor relations page + print("Scrolling to Financial Data section...") + await scroll_to_text(page, "Financial Data") + await asyncio.sleep(1) + + # Locate the Quarterly Earnings Reports table + print("Locating Quarterly Earnings Reports...") + await scroll_to_text(page, "Quarterly Earnings Reports") + await asyncio.sleep(1) + + # Click on the 2025 year tab to show current year's reports + year_tab = page.locator("text=2025").first + if await year_tab.is_visible(): + print("Clicking 2025 year tab...") + await year_tab.click() + await asyncio.sleep(1) + + # Download all quarterly financial statements + # When a PDF link is clicked, Browserbase automatically captures and stores the file + # See https://docs.browserbase.com/features/screenshots#pdfs for more info + print("\nDownloading quarterly financial statements...") + + await click_financial_statements_link(page, "Q4") + await click_financial_statements_link(page, "Q3") + await click_financial_statements_link(page, "Q2") + await click_financial_statements_link(page, "Q1") + + print("\nAll PDF links clicked. Waiting for downloads to sync...") + + # Retrieve all downloads triggered during this session from Browserbase API + print("Retrieving downloads from Browserbase...") + await save_downloads_with_retry(bb, session.id, 45) + print("\nAll downloads completed successfully!") + + except Exception as error: + print(f"Error during automation: {error}") + raise + finally: + # Always close browser to release resources and end session + await browser.close() + print("Browser closed, session ended") + print(f"\nView session replay: https://browserbase.com/sessions/{session.id}") + + +if __name__ == "__main__": + # Entry point for script execution + # asyncio.run() creates event loop and runs main() coroutine until completion + try: + asyncio.run(main()) + except Exception as err: + # Handle any uncaught exceptions and provide helpful debugging information + print(f"Application error: {err}") + print("Common issues:") + print(" - Check .env file has BROWSERBASE_PROJECT_ID and BROWSERBASE_API_KEY") + print(" - Verify internet connection and Apple website accessibility") + print(" - Ensure sufficient timeout for slow-loading pages") + print("Docs: https://docs.browserbase.com/introduction/playwright") + exit(1) diff --git a/python/playwright/download-financial-statements/pyproject.toml b/python/playwright/download-financial-statements/pyproject.toml new file mode 100644 index 00000000..842e2dc7 --- /dev/null +++ b/python/playwright/download-financial-statements/pyproject.toml @@ -0,0 +1,33 @@ +[project] +name = "download-financial-statements" +version = "0.1.0" +description = "Download Apple's Quarterly Financial Statements using Playwright and Browserbase" +readme = "README.md" +requires-python = ">=3.9" +dependencies = [ + "browserbase", + "playwright", + "python-dotenv", +] + +[project.optional-dependencies] +dev = [ + "pytest>=7.0.0", + "black>=23.0.0", + "ruff>=0.1.0", +] + +[build-system] +requires = ["setuptools>=61.0", "wheel"] +build-backend = "setuptools.build_meta" + +[tool.black] +line-length = 100 +target-version = ['py39', 'py310', 'py311'] + +[tool.ruff] +line-length = 100 +target-version = "py39" + +[tool.ruff.lint] +select = ["E", "F", "I", "N", "W"] diff --git a/typescript/playwright/download-financial-statements/README.md b/typescript/playwright/download-financial-statements/README.md new file mode 100644 index 00000000..544ec795 --- /dev/null +++ b/typescript/playwright/download-financial-statements/README.md @@ -0,0 +1,129 @@ +# Playwright + Browserbase: Download Financial Statements + +## AT A GLANCE + +- Goal: Automatically download Apple's quarterly financial statements (PDFs) from their investor relations page. +- Uses pure Playwright with Browserbase SDK (no AI/Stagehand required). +- Demonstrates file downloads, page navigation, and the Browserbase downloads API. +- Docs → https://docs.browserbase.com/introduction/playwright + +## GLOSSARY + +- Browserbase SDK: Cloud browser infrastructure that provides managed browser sessions with built-in download handling + Docs → https://docs.browserbase.com/sdk +- CDP (Chrome DevTools Protocol): Low-level protocol for communicating with Chrome/Chromium browsers + Docs → https://chromedevtools.github.io/devtools-protocol/ +- Downloads API: Browserbase feature that captures and retrieves files downloaded during a session + Docs → https://docs.browserbase.com/features/file-downloads + +## QUICKSTART + +1. pnpm install +2. cp .env.example .env +3. Add required API keys/IDs to .env +4. pnpm start + +## EXPECTED OUTPUT + +- Console logs showing navigation through Apple's investor relations pages +- Live view URL to watch the automation in real-time +- Downloaded `downloaded_files.zip` containing quarterly financial statement PDFs +- Session replay URL for debugging + +## HOW IT WORKS + +**Navigation Flow:** + +1. Navigate to apple.com +2. Scroll to footer and click "Investors" link +3. Navigate to investor relations page +4. Scroll to "Quarterly Earnings Reports" section +5. Click year tab (2025) +6. Click "Financial Statements" links for Q1-Q4 + +**Download Handling:** + +1. Configure CDP download behavior to allow downloads +2. Click PDF links to trigger downloads +3. Poll Browserbase downloads API until files are ready +4. Save downloaded files as a zip archive + +## STAGEHAND VS PLAYWRIGHT + +This template uses **pure Playwright** for browser automation. The Stagehand version of this template uses AI-powered natural language commands instead. Here's how they compare: + +| Task | Stagehand (AI) | Playwright (Selectors) | +|------|----------------|------------------------| +| Click link | `await stagehand.act("Click 'Investors'")` | `await page.getByRole("link", { name: "Investors" }).click()` | +| Scroll | `await stagehand.act("Scroll to Financial Data")` | `await page.evaluate(() => window.scrollTo(...))` | +| Find element | AI interprets intent | Explicit selectors required | + +**Example - Clicking a link:** + +```typescript +// Stagehand: Natural language, AI finds the element +await stagehand.act("Click the 'Investors' button at the bottom of the page"); + +// Playwright: Explicit selector, you specify how to find it +await page.getByRole("link", { name: "Investors" }).click(); +``` + +**Example - Downloading quarterly statements:** + +```typescript +// Stagehand: AI understands context +await stagehand.act("Click the 'Financial Statements' link under Q4"); + +// Playwright: Must build selector logic to find correct link +const link = page.locator("text=Q4").locator("..").locator("..") + .getByRole("link", { name: /Financial Statements/i }).first(); +await link.click(); +``` + +## COMMON PITFALLS + +- Missing credentials: verify .env contains BROWSERBASE_PROJECT_ID and BROWSERBASE_API_KEY +- Download timeout: increase retryForSeconds if downloads are large or network is slow +- Page structure changes: Apple may update their investor relations page layout +- Find more information on your Browserbase dashboard → https://www.browserbase.com/sign-in + +## USE CASES + +- Financial data collection: Automate downloading quarterly/annual reports from investor relations pages. +- Document archival: Build automated pipelines to archive public financial documents. +- Compliance monitoring: Track and download regulatory filings as they're published. +- Research automation: Collect financial statements across multiple companies for analysis. + +## CUSTOMIZATION + +**Change target company:** +Modify the navigation flow in `main()` to target a different company's investor relations page. + +**Adjust download timeout:** + +```typescript +await saveDownloadsWithRetry(bb, session.id, 60); // 60 seconds timeout +``` + +**Download specific quarters:** + +```typescript +// Only download Q4 and Q3 +await clickFinancialStatementsLink(page, "Q4"); +await clickFinancialStatementsLink(page, "Q3"); +``` + +## NEXT STEPS + +- Add error recovery: Implement retry logic for failed navigation steps. +- Extract metadata: Parse downloaded PDFs to extract key financial metrics. +- Schedule automation: Run on a schedule to capture new filings as they're published. + +## HELPFUL RESOURCES + +📚 Stagehand Docs: https://docs.stagehand.dev/v2/first-steps/introduction +🎮 Browserbase: https://www.browserbase.com +💡 Try it out: https://www.browserbase.com/playground +🔧 Templates: https://www.browserbase.com/templates +📧 Need help? support@browserbase.com +💬 Discord: http://stagehand.dev/discord diff --git a/typescript/playwright/download-financial-statements/index.ts b/typescript/playwright/download-financial-statements/index.ts new file mode 100644 index 00000000..10e9ba4e --- /dev/null +++ b/typescript/playwright/download-financial-statements/index.ts @@ -0,0 +1,238 @@ +// Playwright + Browserbase: Download Apple's Quarterly Financial Statements - See README.md for full documentation + +import { chromium, Page, BrowserContext } from "playwright-core"; +import { Browserbase } from "@browserbasehq/sdk"; +import fs from "fs"; +import "dotenv/config"; + +/** + * Polls Browserbase API for downloads with timeout handling. + * Retries every 2 seconds until downloads are ready or timeout is reached. + */ +async function saveDownloadsWithRetry( + bb: Browserbase, + sessionId: string, + retryForSeconds: number = 30, +): Promise { + return new Promise((resolve, reject) => { + console.log(`Waiting up to ${retryForSeconds} seconds for downloads to complete...`); + + const intervals = { + poller: undefined as NodeJS.Timeout | undefined, + timeout: undefined as NodeJS.Timeout | undefined, + }; + + async function fetchDownloads(): Promise { + try { + console.log("Checking for downloads..."); + const response = await bb.sessions.downloads.list(sessionId); + const downloadBuffer: ArrayBuffer = await response.arrayBuffer(); + + if (downloadBuffer.byteLength > 0) { + console.log(`Downloads ready! File size: ${downloadBuffer.byteLength} bytes`); + fs.writeFileSync("downloaded_files.zip", Buffer.from(downloadBuffer)); + console.log("Files saved as: downloaded_files.zip"); + + if (intervals.poller) clearInterval(intervals.poller); + if (intervals.timeout) clearTimeout(intervals.timeout); + resolve(downloadBuffer.byteLength); + } else { + console.log("Downloads not ready yet, retrying..."); + } + } catch (e: unknown) { + console.error("Error fetching downloads:", e); + if (intervals.poller) clearInterval(intervals.poller); + if (intervals.timeout) clearTimeout(intervals.timeout); + reject(e); + } + } + + // Set timeout to prevent infinite polling if downloads never complete + intervals.timeout = setTimeout(() => { + if (intervals.poller) { + clearInterval(intervals.poller); + } + reject(new Error("Download timeout exceeded")); + }, retryForSeconds * 1000); + + // Poll every 2 seconds to check if downloads are ready + intervals.poller = setInterval(fetchDownloads, 2000); + }); +} + +/** + * Scrolls to an element on the page by text content. + * Uses evaluate to find and scroll to matching elements. + */ +async function scrollToText(page: Page, text: string): Promise { + await page.evaluate((searchText) => { + const elements = document.querySelectorAll("*"); + for (const el of elements) { + if (el.textContent?.includes(searchText)) { + el.scrollIntoView({ behavior: "smooth", block: "center" }); + break; + } + } + }, text); + await new Promise((resolve) => setTimeout(resolve, 500)); +} + +/** + * Clicks a Financial Statements link for a specific quarter. + * Uses context-aware selection to find the right link in the quarterly table. + */ +async function clickFinancialStatementsLink(page: Page, quarter: string): Promise { + console.log(`Clicking Financial Statements link for ${quarter}...`); + + // Try to find the link by traversing from the quarter label to sibling links + const link = page + .locator(`text=${quarter}`) + .locator("..") + .locator("..") + .getByRole("link", { name: /Financial Statements/i }) + .first(); + + const linkExists = (await link.count()) > 0; + + if (linkExists) { + await link.click(); + } else { + // Fallback: find all Financial Statements links and click by position + // Q4 is first (index 0), Q3 is second (index 1), etc. + const allLinks = page.getByRole("link", { name: /Financial Statements/i }); + const count = await allLinks.count(); + + const quarterPositions: { [key: string]: number } = { + Q4: 0, + Q3: 1, + Q2: 2, + Q1: 3, + }; + + const position = quarterPositions[quarter]; + if (position !== undefined && position < count) { + await allLinks.nth(position).click(); + } else { + throw new Error(`Could not find Financial Statements link for ${quarter}`); + } + } + + // Wait for download to initiate before clicking next link + await new Promise((resolve) => setTimeout(resolve, 2000)); +} + +async function main(): Promise { + console.log("Starting Apple Financial Statements Download Automation (Playwright)..."); + + // Initialize Browserbase SDK client for cloud browser management + console.log("Initializing Browserbase client..."); + const bb = new Browserbase({ + apiKey: process.env.BROWSERBASE_API_KEY!, + }); + + // Create a new browser session in Browserbase cloud + console.log("Creating Browserbase session..."); + const session = await bb.sessions.create({ + projectId: process.env.BROWSERBASE_PROJECT_ID!, + }); + console.log(`Session created: https://browserbase.com/sessions/${session.id}`); + + // Display live view URL for debugging and monitoring + const liveViewLinks = await bb.sessions.debug(session.id); + console.log(`Live View: ${liveViewLinks.debuggerFullscreenUrl}`); + + // Connect Playwright to Browserbase via Chrome DevTools Protocol (CDP) + // This gives direct control over the cloud-hosted browser + const browser = await chromium.connectOverCDP(session.connectUrl); + const context: BrowserContext = browser.contexts()[0]; + if (!context) { + throw new Error("No browser context found"); + } + const page: Page = context.pages()[0]; + if (!page) { + throw new Error("No page found in browser context"); + } + + // Configure CDP to allow file downloads during the session + // eventsEnabled: true allows tracking download progress + const client = await context.newCDPSession(page); + await client.send("Browser.setDownloadBehavior", { + behavior: "allow", + downloadPath: "downloads", + eventsEnabled: true, + }); + console.log("Download behavior configured"); + + try { + // Navigate to Apple homepage with extended timeout for slow-loading sites + console.log("Navigating to Apple.com..."); + await page.goto("https://www.apple.com/", { + waitUntil: "domcontentloaded", + timeout: 60000, + }); + + // Scroll to footer where investor links are located + console.log("Scrolling to footer..."); + await page.evaluate(() => window.scrollTo(0, document.body.scrollHeight)); + await new Promise((resolve) => setTimeout(resolve, 1000)); + + // Navigate to investor relations section + console.log("Clicking Investors link..."); + await page.getByRole("link", { name: "Investors" }).click(); + await page.waitForLoadState("domcontentloaded"); + console.log(`Navigated to: ${page.url()}`); + + // Scroll to the Financial Data section of the investor relations page + console.log("Scrolling to Financial Data section..."); + await scrollToText(page, "Financial Data"); + await new Promise((resolve) => setTimeout(resolve, 1000)); + + // Locate the Quarterly Earnings Reports table + console.log("Locating Quarterly Earnings Reports..."); + await scrollToText(page, "Quarterly Earnings Reports"); + await new Promise((resolve) => setTimeout(resolve, 1000)); + + // Click on the 2025 year tab to show current year's reports + const yearTab = page.locator("text=2025").first(); + if (await yearTab.isVisible()) { + console.log("Clicking 2025 year tab..."); + await yearTab.click(); + await new Promise((resolve) => setTimeout(resolve, 1000)); + } + + // Download all quarterly financial statements + // When a PDF link is clicked, Browserbase automatically captures and stores the file + // See https://docs.browserbase.com/features/screenshots#pdfs for more info + console.log("\nDownloading quarterly financial statements..."); + + await clickFinancialStatementsLink(page, "Q4"); + await clickFinancialStatementsLink(page, "Q3"); + await clickFinancialStatementsLink(page, "Q2"); + await clickFinancialStatementsLink(page, "Q1"); + + console.log("\nAll PDF links clicked. Waiting for downloads to sync..."); + + // Retrieve all downloads triggered during this session from Browserbase API + console.log("Retrieving downloads from Browserbase..."); + await saveDownloadsWithRetry(bb, session.id, 45); + console.log("\nAll downloads completed successfully!"); + } catch (error) { + console.error("Error during automation:", error); + throw error; + } finally { + // Always close browser to release resources and end session + await browser.close(); + console.log("Browser closed, session ended"); + console.log(`\nView session replay: https://browserbase.com/sessions/${session.id}`); + } +} + +main().catch((err) => { + console.error("Application error:", err); + console.error("Common issues:"); + console.error(" - Check .env file has BROWSERBASE_PROJECT_ID and BROWSERBASE_API_KEY"); + console.error(" - Verify internet connection and Apple website accessibility"); + console.error(" - Ensure sufficient timeout for slow-loading pages"); + console.error("Docs: https://docs.browserbase.com/introduction/playwright"); + process.exit(1); +}); diff --git a/typescript/playwright/download-financial-statements/package.json b/typescript/playwright/download-financial-statements/package.json new file mode 100644 index 00000000..7276d53a --- /dev/null +++ b/typescript/playwright/download-financial-statements/package.json @@ -0,0 +1,21 @@ +{ + "name": "download-financial-statements", + "version": "1.0.0", + "description": "Playwright + Browserbase: Download Apple's Quarterly Financial Statements", + "type": "module", + "main": "index.ts", + "scripts": { + "start": "tsx index.ts" + }, + "dependencies": { + "@browserbasehq/sdk": "latest", + "dotenv": "latest", + "playwright-core": "latest" + }, + "devDependencies": { + "@types/node": "latest", + "tsx": "latest", + "typescript": "latest" + }, + "packageManager": "pnpm@9.0.0" +}