README

Rails API only backend for scraping and creating a Postgres database of OSCN data. Leverages the oscn_scraper gem to scrape data and return as json.

Getting started

TODO - Create sample .env file, Instructions for booting up

Configurations

You can configure the following ENV variables:

COUNTIES

Comma separated string of the county names.

Example

COUNTIES=Tulsa,Oklahoma

CASE_TYPES_ABBREVIATION

Comma separated string of Case type abbreviations:

Example

CASE_TYPES_ABBREVIATION=CF,CM,TR,TRI,AM,CPC,DTR # default

OSCN_THROTTLE

Number of requests to send to OSCN per minute

OSCN_THROTTLE=120

OSCN_CONCURRENCY=10 # default

Number of threads to run concurrently

OSCN_CONCURRENCY=120 # default

TODO ENVs

MAX_REQUESTS, MEDIUM_PRIORITY, LOW_PRIORITY, DAYS_AGO, DAYS_AHEAD

Scraping Methodology

High Priority Cases - Any case that has appear on the docket in the past 7 days will be scraped nightly

Medium Priority Cases - Any open case (closed_on = nil). Scrapes the oldest first.

Low Priority Cases - Closed cases that likely will not be updated as often.

Manual Scraping/Imports

DOC

Find the date to use for the run by downloading the file to your local (see the quarterly_data.rb importer for location) and looking at the bottom of the sentence extract file for the maxiumum sentencing date. It will be in the format 20250402 and may run into the case number (e.g, 20250402CF-2021-4596). Use that year and month for the folder name.
run rake "doc:scrape['2025-04']" (replace 2025-04 with the year and month from the last step for the folder name)
if there are any failures in validation update the code to address them.
If there are no failures run the import command. For best results run this in detached mode on a scaled heroku dyno, e.g., heroku run:detached -a oscn --size=performance-l rake "doc:import['2025-04']" (replacing 2025-04 again). Use the code provided to tail the logs for monitoring.
Run rake "doc:link" to link the imported data to other counties.

Roster Tables and ELT

Entity resolution is accomplished via the Roster tables. These are all generated via Postgres materialized views then connected to rails models. Do not use any tmp ids from these as they can change. The views are stacked for legibility Other ELT is done using similar methods. ELT is done outside scenic (what we use for similar views) to reduce the complexity around stacking views

ELT View Generation Workflow

To add a new "stack" of elt views (and/or functions)

For analysis and optimization you can deploy to the test schema. If this starts to interfere with production database operations we may need nightly or weekly db forks
When ready to deploy first add files for each view to the project in the /elt directory
Create a new service in the services/elt directory (see elt/Roster.rb) This handles creation and refreshing and gives us something to test against
Add it to handle_views in rails_helper.rb to ensure creation for tests
Add to the correct spot in refresh_views in update.rake to ensure nightly refresh
Make tests for the views. Be sure to refresh the views after changing underlying data

Users

User Creation

See the section "Running rails console" then run:

emails = ["developer@9bcorp.com"] # update this
emails.each do |email|
  pass = SecureRandom.urlsafe_base64
  user = User.new({email: email, password: pass, password_confirmation: pass})
  user.otp_required_for_login = true
  user.otp_secret = User.generate_otp_secret # provide this to them
  puts "email: #{email}"
  puts "pass: #{pass}"
  puts "one time code: #{user.otp_secret}"
  user.save!
end

The one time code is their code to link up a multi-factor auth app.

Elastic beanstalk

This application has partial support for Elastic Beanstalk.

Running rails console on EB

To connect use eb ssh To run rails console login as sudo first sudo su - Cd to rails directory: cd /var/app/current then the normal: bundle exec rails c

Name		Name	Last commit message	Last commit date
Latest commit History 662 Commits
.ebextensions		.ebextensions
.elasticbeanstalk		.elasticbeanstalk
.github/workflows		.github/workflows
app		app
bin		bin
config		config
db		db
lib		lib
public		public
spec		spec
storage		storage
test		test
tmp/pids		tmp/pids
.gitattributes		.gitattributes
.gitignore		.gitignore
.rspec		.rspec
.rubocop.yml		.rubocop.yml
.ruby-version		.ruby-version
.tool-versions		.tool-versions
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
Procfile		Procfile
README.md		README.md
Rakefile		Rakefile
config.ru		config.ru

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

README

Getting started

Configurations

COUNTIES

CASE_TYPES_ABBREVIATION

OSCN_THROTTLE

OSCN_CONCURRENCY=10 # default

TODO ENVs

Scraping Methodology

Manual Scraping/Imports

DOC

Roster Tables and ELT

ELT View Generation Workflow

Users

User Creation

Elastic beanstalk

Running rails console on EB

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

README

Getting started

Configurations

COUNTIES

CASE_TYPES_ABBREVIATION

OSCN_THROTTLE

OSCN_CONCURRENCY=10 # default

TODO ENVs

Scraping Methodology

Manual Scraping/Imports

DOC

Roster Tables and ELT

ELT View Generation Workflow

Users

User Creation

Elastic beanstalk

Running rails console on EB

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages