20 years of AUS course data, one SQLite file.
Warning
Do not run the crawler unless you know what you are doing. The crawler makes tens of thousands of requests to AUS Banner and can easily overwhelm the server if misconfigured, which can result in service disruption and get you in trouble with the university. A pre-built database (aus_courses.db) is already included in this repository with a complete snapshot of all course data since 2005 — use that instead.
AUSCrawl is a fast, async web crawler that scrapes AUS Banner for course data across every semester since 2005 and stores it in an SQLite database. But more importantly, this repo ships a ready-to-use database so you never have to run the crawler yourself.
Written in Python. Single file. ~15 minutes for a full crawl of 74,000+ course sections, catalog descriptions, prerequisites, and more.
This repository includes aus_courses.db, a complete SQLite database containing every course, instructor, prerequisite, and catalog description from AUS Banner since Fall 2005. Just download it and start building.
| Table | Records | Description |
|---|---|---|
courses | 73,418 | Every course section ever offered |
course_dependencies | 152,968 | Prerequisite/corequisite links with minimum grades |
section_details | 71,754 | Prerequisites, corequisites, restrictions, waitlist, fees |
catalog | 3,007 | Course descriptions, credit/lecture/lab hours |
instructors | 1,649 | All instructors with emails and first appearance |
semesters | 98 | Every term from Fall 2005 to the present |
subjects | 98 | All subject codes (COE, ENG, MTH, etc.) |
attributes | 225 | Course attributes |
levels | 9 | Academic levels (Undergraduate, Graduate, etc.) |
This dataset is a goldmine for AUS students. Use it to help your fellow students or sharpen your own skills:
- Prerequisite visualizer — build an interactive graph of course dependencies for your major
- Schedule planner — help students find open sections that fit their timetable
- Instructor tracker — see which professors teach what, and how their assignments changed over the years
- Course trend analysis — which courses are offered less frequently? Which departments are growing?
- Grade requirement explorer — find every course that requires a minimum grade of C- or higher
- Data science projects — 20 years of course data across 98 subjects is a great dataset for learning SQL, pandas, or building dashboards
If you build something with this data, open an issue and let us know — we'd love to see it.
# Clone the repo — the database is included
git clone https://github.com/DeadPackets/AUSCrawl
cd AUSCrawl
# Open it directly with sqlite3
sqlite3 aus_courses.db
# Or use Python
python3 -c "
import sqlite3
conn = sqlite3.connect('aus_courses.db')
for row in conn.execute('SELECT term_name, COUNT(*) FROM courses JOIN semesters ON courses.term_id = semesters.term_id GROUP BY courses.term_id ORDER BY courses.term_id DESC LIMIT 5'):
print(row)
"-- All courses taught by a specific instructor
SELECT term_id, subject, course_number, title, days, start_time, end_time
FROM courses WHERE instructor_name LIKE '%Smith%'
ORDER BY term_id DESC;
-- Courses with prerequisites and minimum grades
SELECT d.subject, d.course_number, d.dep_type, d.minimum_grade,
sd.prerequisites
FROM course_dependencies d
JOIN section_details sd ON sd.crn = d.crn AND sd.term_id = d.term_id
WHERE d.dep_type = 'prerequisite'
GROUP BY d.subject, d.course_number;
-- How many sections per semester
SELECT s.term_name, COUNT(*) as sections
FROM courses c JOIN semesters s ON c.term_id = s.term_id
GROUP BY c.term_id ORDER BY c.term_id;
-- Course catalog with hours breakdown
SELECT subject, course_number, description, credit_hours, lecture_hours, lab_hours
FROM catalog WHERE subject = 'COE';
-- Find all prerequisites for a specific course
SELECT d.subject, d.course_number, d.minimum_grade
FROM course_dependencies d
JOIN courses c ON c.crn = d.crn AND c.term_id = d.term_id
WHERE c.subject = 'COE' AND c.course_number = '390'
GROUP BY d.subject, d.course_number;The SQLite database contains 10 normalized tables with proper indexes:
Core tables:
semesters— term ID and name (e.g.202620,Fall 2025)subjects— subject codes and full names (e.g.COE,Computer Engineering)courses— every course section with schedule, instructor, classroom, etc.instructors— deduplicated instructor names and emails withfirst_seenlevels— academic levels (Undergraduate, Graduate, etc.)attributes— course attributes withfirst_seen
Extended tables:
catalog— course descriptions, credit/lecture/lab hours, departmentsection_details— prerequisites, corequisites, restrictions, waitlist, fees per sectioncourse_dependencies— structured prerequisite/corequisite links with minimum grade requirements
AUS uses Ellucian Banner, a student information system widely deployed across universities. The public-facing schedule search is served at banner.aus.edu behind Cloudflare, exposing several OWA (Oracle Web Agent) endpoints:
| Endpoint | Method | Purpose |
|---|---|---|
/axp3b21h/owa/bwckschd.p_disp_dyn_sched |
GET | Semester dropdown — returns all available term IDs |
/axp3b21h/owa/bwckgens.p_proc_term_date |
POST | Subject listing — returns available subjects for a given term |
/axp3b21h/owa/bwckschd.p_get_crse_unsec |
POST | Course search — returns HTML tables of all matching sections |
/axp3b21h/owa/bwckctlg.p_display_courses |
GET | Course catalog — returns descriptions, credit hours, department |
/axp3b21h/owa/bwckschd.p_disp_detail_sched |
GET | Section detail — returns prerequisites, corequisites, restrictions, waitlist, fees |
The course search endpoint accepts all subject codes in a single POST body (up to ~4,500 bytes before the WAF rejects it), returning a large HTML page with <table class="datadisplaytable"> rows. Instructor emails are obfuscated using Cloudflare's email protection (XOR encoding with the first byte as key). The server enforces HTTP/2 stream limits (~10,000 streams per connection) and rate limits on the GET endpoints (~100 req/s before 429 responses begin).
Caution
Only run the crawler if you need fresher data than what's in the included database. Be aware that aggressive crawling can take down AUS Banner and result in your IP being banned. The default settings are tuned to be safe, but modifying worker counts or running multiple instances simultaneously can cause problems.
Click to expand crawler docs
Python 3.13+ and uv.
uv run python crawl.py [options]
| Flag | Description |
|---|---|
-o, --output |
SQLite output path (default: aus_data.db) |
-t, --terms |
Only crawl specific term IDs (e.g. 202620 202510) |
-w, --workers |
Max concurrent requests (default: 50) |
--delay |
Seconds between requests (default: 0) |
--latest |
Only crawl the most recent semester |
--resume |
Skip semesters already in the database |
--force |
Drop and recreate all tables |
--no-catalog |
Skip catalog description scraping |
--no-details |
Skip section detail scraping |
-v, --verbose |
Debug-level logging |
The crawler runs in 5 phases:
- Semester discovery — fetches the list of all available terms from Banner's dropdown
- Subject catalog — fetches subject codes from every semester and deduplicates (the dropdown varies per term)
- Course scraping — POSTs to the schedule search endpoint for every semester with all subjects in a single batch, then parses the HTML response with lxml (50 concurrent workers)
- Catalog scraping — GETs course catalog pages for a sample of 6 evenly-spaced terms to collect descriptions, hours, and departments (10 concurrent workers)
- Detail scraping — GETs the section detail page for every unique CRN/term pair to extract prerequisites, corequisites, restrictions, waitlist info, and fees (10 concurrent workers)
- Async HTTP/2 via
httpxwith connection pooling and automatic retry with exponential backoff - lxml for HTML parsing (12x faster than BeautifulSoup)
- ThreadPoolExecutor offloads CPU-bound parsing from the async event loop
- Catalog sampling reduces catalog requests by ~80% while maintaining full course coverage
- Cloudflare email protection decoding (XOR-obfuscated instructor emails)
- Crash resilience — each phase saves to DB immediately; detail phase does periodic batch saves every 5,000 entries;
--resumeskips completed work - Rate-limit aware — respects server 429 responses with exponential backoff; GET endpoints capped at 10 workers to avoid triggering bans
Built for AUS students, by an AUS student.
MIT License