Skip to content

feat: new course kick off: Scraping with Apify and AI#2275

Open
honzajavorek wants to merge 47 commits intomasterfrom
honzajavorek/ai-course
Open

feat: new course kick off: Scraping with Apify and AI#2275
honzajavorek wants to merge 47 commits intomasterfrom
honzajavorek/ai-course

Conversation

@honzajavorek
Copy link
Copy Markdown
Collaborator

@honzajavorek honzajavorek commented Feb 24, 2026

Puts #2174 into action.

Course structure

  • The aim is to provide basic course intro and structure which we can further develop. Most of the stuff is TBD, this is really just a scaffolding.
  • All the pages are unlisted, so we can do continuous deployment, but it's all hidden from the visitors.
  • I re-numbered the sections in academy/platform so that the numbers make some sense after @TC-MO has previously nuked or moved most of the other content.
  • I welcome feedback on where the course is in the academy, the name of the course, the names of the lessons…

First lesson

  • I added the first lesson about chatting with AI and setting up an Actor. I welcome feedback on the tone of the lesson, handling of the level of experience of the audience, and the content of the lesson in general.

Roasting time! 🍖 @tomnosek @patrikbraborec I guess I won't ask you for a review under each and every PR of the new course, but I think we should do this for the kick off to sync expectations. We had some initial discussions, then I've made a few educated guesses and decisions on how to approach this, and in the end I thought it's best if I just create the first lesson right away and let's see what you think about it.

The lesson is also a result of a few dead ends I had the "pleasure" to already explore 😅 I tried to design the lesson for people who are not familiar with coding, but can do basic work on the computer including running commands in the terminal. We could go lower, but then I'd be explaining the terminal itself, or how to copy and paste things, and I don't know if that's useful for this kind of content.


Note

Low Risk
Low risk: documentation-only changes (new unlisted course pages plus sidebar reordering) with no product code or runtime behavior impact.

Overview
Introduces a new unlisted academy course, Scraping with Apify and AI, including a course landing page plus six lesson pages; lesson 1 is fully drafted (walkthrough using ChatGPT + Apify CLI to create a simple Shopify price scraper), while lessons 2–6 are placeholders marked under construction.

Reorders academy/platform navigation by adjusting multiple sidebar_position values, and updates the docs vocabulary allowlist to include crawlee.dev for linting/spellcheck.

Written by Cursor Bugbot for commit 0780c18. Configure here.

@honzajavorek honzajavorek added the t-academy Issues related to Web Scraping and Apify academies. label Feb 24, 2026
@apify-service-account
Copy link
Copy Markdown

Preview for this PR was built for commit 7401333 and is ready at https://pr-2275.preview.docs.apify.com!

@honzajavorek honzajavorek changed the title New course kick off: Building Actors with AI feat: new course kick off: Building Actors with AI Feb 24, 2026
@apify-service-account
Copy link
Copy Markdown

Preview for this PR was built for commit 65204f55 and is ready at https://pr-2275.preview.docs.apify.com!

@apify-service-account
Copy link
Copy Markdown

Preview for this PR was built for commit 7b24b4dc and is ready at https://pr-2275.preview.docs.apify.com!

@honzajavorek honzajavorek force-pushed the honzajavorek/ai-course branch from 7b24b4d to 983c801 Compare February 25, 2026 11:36
@apify-service-account
Copy link
Copy Markdown

Preview for this PR was built for commit 983c801 and is ready at https://pr-2275.preview.docs.apify.com!

@apify-service-account
Copy link
Copy Markdown

Preview for this PR was built for commit 0cd235f4 and is ready at https://pr-2275.preview.docs.apify.com!

@apify-service-account
Copy link
Copy Markdown

Preview for this PR was built for commit 7ecf5aa5 and is ready at https://pr-2275.preview.docs.apify.com!

@apify-service-account
Copy link
Copy Markdown

Preview for this PR was built for commit 267920c3 and is ready at https://pr-2275.preview.docs.apify.com!

@apify-service-account
Copy link
Copy Markdown

Preview for this PR was built for commit 99f4efdf and is ready at https://pr-2275.preview.docs.apify.com!

@honzajavorek honzajavorek changed the title feat: new course kick off: Building Actors with AI feat: new course kick off: Scraping with Apify and AI Feb 25, 2026
@apify-service-account
Copy link
Copy Markdown

Preview for this PR was built for commit 944f9c7b and is ready at https://pr-2275.preview.docs.apify.com!

@apify-service-account
Copy link
Copy Markdown

Preview for this PR was built for commit 34cd5ab0 and is ready at https://pr-2275.preview.docs.apify.com!

@apify-service-account
Copy link
Copy Markdown

Preview for this PR was built for commit 0780c189 and is ready at https://pr-2275.preview.docs.apify.com!

@honzajavorek honzajavorek marked this pull request as ready for review February 26, 2026 13:42
@honzajavorek honzajavorek requested a review from TC-MO as a code owner February 26, 2026 13:42
@honzajavorek honzajavorek force-pushed the honzajavorek/ai-course branch from 0780c18 to f6d6567 Compare February 26, 2026 14:26
@apify-service-account
Copy link
Copy Markdown

Preview for this PR was built for commit f6d6567 and is ready at https://pr-2275.preview.docs.apify.com!

@tomnosek
Copy link
Copy Markdown
Contributor

tomnosek commented Mar 5, 2026

I think that the first article should be about just asking AI to create an Actor and then copy/paste it into the built-in Web IDE. That's the quickest way to get somewhere. If I open the course, the first heading is "Install Node.js", the second heading is "Install Apify CLI" - I'm already like "too complicated, I'm outta here". The current first article could be the next step.

@danpoletaev danpoletaev force-pushed the honzajavorek/ai-course branch from f6d6567 to b71a606 Compare March 6, 2026 21:57
@apify-service-account
Copy link
Copy Markdown

Preview for this PR was built for commit b71a6066 and is ready at https://pr-2275.preview.docs.apify.com!

@metalwarrior665 metalwarrior665 force-pushed the honzajavorek/ai-course branch from b71a606 to f6d6567 Compare March 7, 2026 00:02
@apify-service-account
Copy link
Copy Markdown

Preview for this PR was built for commit f6d6567d and is ready at https://pr-2275.preview.docs.apify.com!

@honzajavorek
Copy link
Copy Markdown
Collaborator Author

@tomnosek I agree starting with "install Node.js" isn't ideal and I'm aware of it. I didn't start with what you suggest, because:

  • I didn't realize "copy/paste it into the built-in Web IDE" could be a way. I know you mentioned it during our calls, but it slipped through my notes and it didn't pop up in my head on its own – I don't use the Web IDE myself and I basically forgot it exists.
  • I already explored several options on how to start with "just asking AI to create an Actor" and it's actually harder than it seems, because without additional context the basic AI chat, moreover the free one, struggles to create Actors properly. It started working only with the template code as an input. The prompts in docs don't help much, as they are tailored towards agentic IDEs with the project set up locally.

I think it's worth finding the easiest way to start, so I'll spend some time exploring if we could really start with the Web IDE and plain ChatGPT, and what would the workflow could be.

The capabilities of ChatGPT are limited and it's a bit of struggle, but I think it's a really important limitation as most people know only ChatGPT. It really should be the starting point. We pick them up where they already are and bring them to more advanced patterns. Hopefully I can find a way.

@tomnosek
Copy link
Copy Markdown
Contributor

tomnosek commented Mar 30, 2026

@honzajavorek I agree with starting with ChatGPT - at least for now, it's the intro to AI/LLM for a broader audience. Maybe it'll be Gemini in the future, but I'm happy with your choice.

What worked well for me in the past was to literally say in the message that I'm going to be using Apify Console to run it with Web IDE for the code and that this is the link to the template to use and this is the link to the docs to use.

@honzajavorek honzajavorek force-pushed the honzajavorek/ai-course branch from f6d6567 to b503490 Compare March 30, 2026 12:53
@honzajavorek honzajavorek marked this pull request as draft March 30, 2026 12:55
@apify-service-account
Copy link
Copy Markdown

A PR to update the Python client models has been created: apify/apify-client-python#713

This was automatically triggered by OpenAPI specification changes in this PR.

@apify-service-account
Copy link
Copy Markdown

Preview for this PR was built for commit f3af1367 and is ready at https://pr-2275.preview.docs.apify.com!

@honzajavorek
Copy link
Copy Markdown
Collaborator Author

What OpenAPI specification changes?

image

@apify-service-account
Copy link
Copy Markdown

Preview for this PR was built for commit f800e47 and is ready at https://pr-2275.preview.docs.apify.com!

@apify-service-account
Copy link
Copy Markdown

Preview for this PR was built for commit bf6353e and is ready at https://pr-2275.preview.docs.apify.com!

@honzajavorek honzajavorek marked this pull request as ready for review April 10, 2026 08:38
@honzajavorek honzajavorek requested a review from tomnosek April 10, 2026 08:39
@honzajavorek
Copy link
Copy Markdown
Collaborator Author

@tomnosek @TC-MO This now contains the first two lessons of the course. Ready for a review!

Copy link
Copy Markdown
Contributor

@szaganek szaganek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some suggestions plus asked two questions about the order of presenting information, but I think it's shaping up to be a fantastic course. I'll also follow the whole thing as a student and come back with feedback if my results differ :)

There's one more thing on my mind, have you considered using imperative from time to time (Navigate to X vs. We'll navigate to X or Click X vs. We'll click X) to make the content flow a little better, make it sound more instructional, and avoid future tense when it's not really necessary?


import DocCardList from '@theme/DocCardList';

**Learn how to use AI to extract information from websites in this practical course, starting from the absolute basics.**
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Learn how to use AI to extract information from websites in this practical course, starting from the absolute basics.**
**Learn how to use AI to extract information from websites, starting from the absolute basics.**


---

In this course we'll use AI assistants to create an application for watching prices. It'll be able to scrape product pages of an e-commerce website and record prices. Data from several runs of such program would be useful for seeing trends in price changes, detecting discounts, etc.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In this course we'll use AI assistants to create an application for watching prices. It'll be able to scrape product pages of an e-commerce website and record prices. Data from several runs of such program would be useful for seeing trends in price changes, detecting discounts, etc.
In this practical course, we'll use AI assistants to create an application for watching prices. It'll be able to scrape product pages of an e-commerce website and record prices. Data gathered from several runs of such program would be useful for seeing trends in price changes or detecting discounts.

## What we'll do

- Use ChatGPT (AI chat) to create a program which extracts data from a web page.
- Save extracted data in various formats, e.g. CSV which MS Excel or Google Sheets can open.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Save extracted data in various formats, e.g. CSV which MS Excel or Google Sheets can open.
- Save extracted data in various formats, for example CSV which MS Excel or Google Sheets can open.

- Use ChatGPT (AI chat) to create a program which extracts data from a web page.
- Save extracted data in various formats, e.g. CSV which MS Excel or Google Sheets can open.
- Use Cursor (AI agent) to improve the program so that it is robust and maintainable.
- Save time and effort with Apify's scraping platform.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Save time and effort with Apify's scraping platform.
- Save time and effort with the Apify platform.

- Use Cursor (AI agent) to improve the program so that it is robust and maintainable.
- Save time and effort with Apify's scraping platform.

## Who this course is for
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Who this course is for
## Who is this course for


Try it! The generated code will most likely work out of the box, but the resulting program will still have a few caveats. Some are usability issues:

- _User-operated:_ We have to run the scraper ourselves. If we're tracking price trends, we'd need to remember to run it daily. If we want, for example, alerts for big discounts, manually running the program isn't much better than just checking the site in a browser every day.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- _User-operated:_ We have to run the scraper ourselves. If we're tracking price trends, we'd need to remember to run it daily. If we want, for example, alerts for big discounts, manually running the program isn't much better than just checking the site in a browser every day.
- _User-operated:_ We have to run the scraper ourselves. If we're tracking price trends, we need to remember to run it daily. If we want, for example, alerts for big discounts, manually running the program isn't much better than just checking the site in a browser every day.

Try it! The generated code will most likely work out of the box, but the resulting program will still have a few caveats. Some are usability issues:

- _User-operated:_ We have to run the scraper ourselves. If we're tracking price trends, we'd need to remember to run it daily. If we want, for example, alerts for big discounts, manually running the program isn't much better than just checking the site in a browser every day.
- _Manual data management:_ Tracking prices over time means figuring out how to organize the exported data ourselves. Processing the data could also be tricky since different analysis tools often require different formats.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- _Manual data management:_ Tracking prices over time means figuring out how to organize the exported data ourselves. Processing the data could also be tricky since different analysis tools often require different formats.
- _Manual data management:_ Tracking prices over time means figuring out how to organize the exported data ourselves. Processing the data could also be tricky, since different analysis tools often require different formats.


The Actor's detail page has plenty of tabs and settings, but for now we'll stay at **Source** → **Code**. That's where the **Web IDE** is.

IDE stands for _integrated development environment_. Fear not, it's just jargon for ‘an app for editing code, somewhat comfortably’. In the Web IDE, we can browse the files the Actor is made of, and change their contents.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
IDE stands for _integrated development environment_. Fear not, it's just jargon for an app for editing code, somewhat comfortably. In the Web IDE, we can browse the files the Actor is made of, and change their contents.
IDE stands for _integrated development environment_. Fear not, it's just jargon for an app for editing code, somewhat comfortably. In the Web IDE, we can browse the files the Actor is made of, and change their contents.


## Creating a new Actor

Your phone runs apps, Apify runs Actors. If we want Apify to run something for us, it must be wrapped in the Actor structure. Conveniently, the platform provides ready-made templates we can use.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This first paragraph feels a little out of place, wouldn't it make more sense before line 57?


:::

First, let's navigate through the tabs to **Source** → **Input**, where we can change what the Actor takes as input. The sample scraper walks through whatever website we give it in the **Start URLs** field. We'll change it to this URL:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this part move to the Scraping products section? I'm missing how it's relevant to ChatGPT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

t-academy Issues related to Web Scraping and Apify academies.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants