This repository contains code for an solution that parses and extracts relevant information from Defense Contracts, as part of the Best AI Powered Solution for Defense Contracts challenge.
The data source used for this project is the daily digest of Defense Contracts accessible at https://www.defense.gov/News/Contracts/.
-
Web-scraped the contracts present on the https://www.defense.gov/News/Contracts website using Python's BeautifulSoup library for HTML parsing.
-
Extracted these contracts into a JSON format using the Gemini API.
-
Tabulated the JSON string into a CSV file.
-
Classified the type of work in the contract.
-
Visualized and highlighted insights from the data.
You need to have Gemini API Key to run this project
- Clone the repository:
https://github.com/jayeshpamnani99/BitCamp-Hackathon.git
- Change directory to the cloned repository.
- Install the required Python packages:
pip install -r requirements.txt
- Configurations: Make the required changes in the .env file, such as adding your database name.
Sample '.env' would look like:
API_KEY=<your-API-KEY>
5. Run:
```bash
python main/web_data_extraction.py
- Run:
python main/contract_data_extraction.py- (Optional) If you want the classifiction of each contract, then Run:
python main/purpose_classification.py- Extracted Sample JSON string (from #2 of Setup and Usage):
[{ "agency_name": [ "contract1", "contract2"] }] - Sample response generated by the Gemini API
{
"contractor": "Lockheed Martin Rotary and Mission Systems",
"location": "Owego, New York",
"cost": "$88,380,255",
"purpose": "overhaul of B-2 digital receiver and legacy defense message system",
"completion_date": "April 16, 2034",
"work_location": "Owego, New York",
"contract_number": "FA8119-24-D-0008",
"contracting_activity": "Air Force Sustainment Center, Tinker Air Force Base, Oklahoma"
}- Sample CSV file generated by tabulating string (from #3 of Setup and Usage):
agency_name,contractor,location,cost,purpose,completion_date,work_location,contract_number,contracting_activity- Test cases were written to validate our model's accuracy to identify contractor, completion date and cost. The results of the test cases were promising.
- The Final tabulated result file generated after running the code is present in main/resources folder (Final_CSV.xlsx).
- The Tableau visualizations are present in Tableau-Dashboard folder.
- Jayesh Pamnani
- Anjaneya Ketkar
- Janesh Hasija
