Using https://www.sec.gov/edgar/searchedgar/companysearch.html Can search for all companies by an SIC code., which has the following: https://www.sec.gov/cgi-bin/browse-edgar?company=&match=&filenum=&State=&Country=&SIC=5200&myowner=exclude&action=getcompany Could write the R web scraping to take the SIC code, and get back a table of CIK, Company names, and states in a tabular form. Another approach: Find all annual filings in the last XXX days with https://www.sec.gov/edgar/searchedgar/currentevents.htm Show on the website, but figure out the URL pattern is: https://www.sec.gov/cgi-bin/current?q1=3&q2=0&q3= Gives - Company - Cik - url to form The URL to the "Form" can be converted to the full text by a simple rule https://www.sec.gov/Archives/edgar/data/1144215/0001144215-18-000110-index.html becomes https://www.sec.gov/Archives/edgar/data/1144215/000114421518000110/0001144215-18-000110.txt Or something like that....
Using https://www.sec.gov/edgar/searchedgar/companysearch.html
Can search for all companies by an SIC code., which has the following:
https://www.sec.gov/cgi-bin/browse-edgar?company=&match=&filenum=&State=&Country=&SIC=5200&myowner=exclude&action=getcompany
Could write the R web scraping to take the SIC code, and get back a table of CIK, Company names, and states in a tabular form.
Another approach:
Find all annual filings in the last XXX days with https://www.sec.gov/edgar/searchedgar/currentevents.htm
Show on the website, but figure out the URL pattern is: https://www.sec.gov/cgi-bin/current?q1=3&q2=0&q3=
Gives
The URL to the "Form" can be converted to the full text by a simple rule
https://www.sec.gov/Archives/edgar/data/1144215/0001144215-18-000110-index.html
becomes
https://www.sec.gov/Archives/edgar/data/1144215/000114421518000110/0001144215-18-000110.txt
Or something like that....