Rails API only backend for scraping and creating a Postgres database of OSCN data. Leverages the oscn_scraper gem to scrape data and return as json.
TODO - Create sample .env file, Instructions for booting up
You can configure the following ENV variables:
Comma separated string of the county names.
Example
COUNTIES=Tulsa,Oklahoma
Comma separated string of Case type abbreviations:
Example
CASE_TYPES_ABBREVIATION=CF,CM,TR,TRI,AM,CPC,DTR # default
Number of requests to send to OSCN per minute
OSCN_THROTTLE=120
Number of threads to run concurrently
OSCN_CONCURRENCY=120 # default
MAX_REQUESTS, MEDIUM_PRIORITY, LOW_PRIORITY, DAYS_AGO, DAYS_AHEAD
High Priority Cases - Any case that has appear on the docket in the past 7 days will be scraped nightly
Medium Priority Cases - Any open case (closed_on = nil). Scrapes the oldest first.
Low Priority Cases - Closed cases that likely will not be updated as often.
- Find the date to use for the run by downloading the file to your local (see the quarterly_data.rb importer for location) and looking at the bottom of the sentence extract file for the maxiumum sentencing date. It will be in the format 20250402 and may run into the case number (e.g, 20250402CF-2021-4596). Use that year and month for the folder name.
- run
rake "doc:scrape['2025-04']"(replace 2025-04 with the year and month from the last step for the folder name) - if there are any failures in validation update the code to address them.
- If there are no failures run the import command.
For best results run this in detached mode on a scaled heroku dyno, e.g.,
heroku run:detached -a oscn --size=performance-l rake "doc:import['2025-04']"(replacing 2025-04 again). Use the code provided to tail the logs for monitoring. - Run
rake "doc:link"to link the imported data to other counties.
Entity resolution is accomplished via the Roster tables. These are all generated via Postgres materialized views then connected to rails models. Do not use any tmp ids from these as they can change. The views are stacked for legibility Other ELT is done using similar methods. ELT is done outside scenic (what we use for similar views) to reduce the complexity around stacking views
To add a new "stack" of elt views (and/or functions)
- For analysis and optimization you can deploy to the test schema. If this starts to interfere with production database operations we may need nightly or weekly db forks
- When ready to deploy first add files for each view to the project in the /elt directory
- Create a new service in the services/elt directory (see elt/Roster.rb) This handles creation and refreshing and gives us something to test against
- Add it to
handle_viewsinrails_helper.rbto ensure creation for tests - Add to the correct spot in
refresh_viewsinupdate.raketo ensure nightly refresh - Make tests for the views. Be sure to refresh the views after changing underlying data
See the section "Running rails console" then run:
emails = ["developer@9bcorp.com"] # update this
emails.each do |email|
pass = SecureRandom.urlsafe_base64
user = User.new({email: email, password: pass, password_confirmation: pass})
user.otp_required_for_login = true
user.otp_secret = User.generate_otp_secret # provide this to them
puts "email: #{email}"
puts "pass: #{pass}"
puts "one time code: #{user.otp_secret}"
user.save!
endThe one time code is their code to link up a multi-factor auth app.
This application has partial support for Elastic Beanstalk.
To connect use eb ssh
To run rails console login as sudo first sudo su -
Cd to rails directory: cd /var/app/current
then the normal: bundle exec rails c