Skip to content

Conversation

@nurtoltor
Copy link

@nurtoltor nurtoltor commented Jan 26, 2026

Solution

I built a small carousel scraper using Nokogiri and regex. It finds the best Knowledge Graph carousel container by looking for data-attrid sections and links that include stick=, then extracts fields for each item. I prioritised semantic HTML (role, aria-label, alt, title) over class names. It also skips "show more" items and only includes images already present in the HTML (data:image, encrypted-tbn, or knowledgecard icons).

The output is a hash where the key matches the search results selected tab (e.g., artworks, cast, albums). If no tab is selected, it defaults to results.

Structure

  • lib/carousel_scraper.rb: Orchestrates the extraction and chooses the correct carousel scope.
  • lib/carousel_item_extractor.rb: Extracts name, extensions, link, and image from a single item link.

I tested against 3 other result pages to find common patterns:

  • "David Bowie albums" search: files/david-bowie-albums.html
  • "George Orwell books" search: files/george-orwell-books.html
  • "Lord of the Rings cast" search: files/lord-of-the-rings-cast.html

How to run

Install dependencies:

bundle install

Run with the default Van Gogh paintings HTML (outputs to files/van-gogh-paintings-expected-array.json):

ruby main.rb

Run with a specific HTML file (outputs JSON to the same directory):

ruby main.rb files/david-bowie-albums.html
ruby main.rb files/george-orwell-books.html
ruby main.rb files/lord-of-the-rings-cast.html

Run the tests:

bundle exec rspec

@nurtoltor nurtoltor changed the title Initial setup: add basic gems and html carrousel examples Code challenge solution Jan 26, 2026
@nurtoltor nurtoltor marked this pull request as ready for review January 26, 2026 23:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant