Data Finder is a full-stack AI-powered platform that fetches, verifies, and manages company information at scale — search by city & industry, get real-time verified results, and export everything to Excel in one click.
🚀 Getting Started · 📖 API Docs · ✨ Features · 🤝 Contributing
- Overview
- Features
- Architecture
- Tech Stack
- Prerequisites
- Installation
- Configuration
- Usage
- API Documentation
- Project Structure
- Contributing
- Security
- Future Enhancements
- License
Data Finder solves a real-world problem: gathering verified, up-to-date company data at scale is slow, expensive, and error-prone. This platform combines OpenAI's GPT models with intelligent filtering and a clean React UI to deliver high-quality company intelligence — including operational status, location, website, and industry — in seconds.
Whether you're a researcher, sales team, or business analyst, Data Finder turns hours of manual research into an automated, reliable pipeline.
| Feature | Description |
|---|---|
| 🤖 AI-Powered Extraction | Leverages OpenAI GPT to intelligently extract and verify company data with high accuracy |
| 🌍 City + Industry Search | Filter companies by city and industry through a clean, intuitive form |
| ✅ Smart Verification | Every result is tagged — Active, Acquired, or Closed |
| 🚫 Duplicate Prevention | Built-in exclusion list management keeps your dataset clean |
| 📊 One-Click Excel Export | Export formatted .xlsx files using Apache POI — CRM-ready |
| 📁 Drag-and-Drop Upload | Batch-process companies via drag-and-drop file input |
| ⚡ Real-Time Table View | Live-updating interactive data table with filtering & sorting |
| 🔒 Secure by Design | All secrets stored in environment variables — never hardcoded |
┌──────────────────────────────────────────────────────────┐
│ CLIENT LAYER │
│ │
│ ┌─────────────────────────────────────┐ │
│ │ React 18 Frontend (Port 3000) │ │
│ │ Search Form · Data Table · Export │ │
│ └──────────────────┬──────────────────┘ │
└─────────────────────────── │ ────────────────────────────┘
│ HTTP REST (JSON)
▼
┌──────────────────────────────────────────────────────────┐
│ SERVER LAYER │
│ │
│ ┌───────────────────────────────────────────────┐ │
│ │ Spring Boot 3.5.3 Backend (Port 8081) │ │
│ │ Controllers · Services · JPA Repositories │ │
│ └─────────────────┬─────────────┬──────────────┘ │
└───────────────────────│─────────────│────────────────────┘
│ │
┌──────────▼───┐ ┌─────▼────────────┐
│ MySQL 8.0 │ │ OpenAI GPT API │
│ (Persist) │ │ (Extraction) │
└──────────────┘ └──────────────────┘
| Layer | Technology | Version |
|---|---|---|
| Frontend | React | 18.x |
| Frontend Lang | JavaScript ES6+ | — |
| Backend | Spring Boot | 3.5.3 |
| Backend Lang | Java | 17 |
| Build Tool | Maven | 3.8+ |
| Database | MySQL | 8.0 |
| ORM | Spring Data JPA / Hibernate | — |
| AI Integration | OpenAI API (GPT) | Latest |
| File Processing | Apache POI | — |
| Security | Spring Security + Env Vars | — |
| Requirement | Version | Link |
|---|---|---|
| JDK | 17+ | Download |
| Node.js | 16+ | Download |
| MySQL | 8.0+ | Download |
| Maven | 3.8+ | Download |
| OpenAI API Key | — | Get Key |
git clone https://github.com/Tushargg1/Data-Finder.git
cd Data-FinderCREATE DATABASE company_extractor;
CREATE USER 'df_user'@'localhost' IDENTIFIED BY 'your_secure_password';
GRANT ALL PRIVILEGES ON company_extractor.* TO 'df_user'@'localhost';
FLUSH PRIVILEGES;cd company-backendSet environment variables (Linux/macOS):
export DB_USERNAME=df_user
export DB_PASSWORD=your_secure_password
export OPENAI_API_KEY=sk-...your_key...Set environment variables (Windows PowerShell):
$env:DB_USERNAME="df_user"
$env:DB_PASSWORD="your_secure_password"
$env:OPENAI_API_KEY="sk-...your_key..."Build & Run:
mvn clean install
mvn spring-boot:run✅ Backend running at: http://localhost:8081
cd ../frontend
npm install
npm start✅ Frontend running at: http://localhost:3000
| Variable | Description | Required |
|---|---|---|
DB_USERNAME |
MySQL username | ✅ |
DB_PASSWORD |
MySQL password | ✅ |
OPENAI_API_KEY |
OpenAI secret key | ✅ |
DB_URL |
JDBC connection URL | Optional |
SERVER_PORT |
Backend port (default: 8081) | Optional |
⚠️ Never commit credentials to version control. Add.envto.gitignore.
- Open
http://localhost:3000 - Enter a City (e.g.,
Mumbai) and Industry (e.g.,FinTech) - Click Extract Data — live results populate immediately
- Click Export to Excel — a formatted
.xlsxfile downloads automatically
- Drag and drop a company list file into the upload area
- Backend enriches each entry and displays results in the live table
Base URL: http://localhost:8081/api
| Method | Endpoint | Description |
|---|---|---|
POST |
/companies/extract |
Extract companies by city & industry |
GET |
/companies |
Retrieve all stored companies |
GET |
/companies/{id} |
Get a specific company by ID |
DELETE |
/companies/{id} |
Delete a company record |
GET |
/companies/export |
Download all data as Excel |
POST |
/companies/upload |
Batch upload via file |
curl -X POST http://localhost:8081/api/companies/extract \
-H "Content-Type: application/json" \
-d '{
"city": "Bangalore",
"industry": "FinTech",
"limit": 20
}'{
"status": "success",
"count": 20,
"data": [
{
"id": 1,
"name": "Razorpay",
"city": "Bangalore",
"industry": "FinTech",
"website": "https://razorpay.com",
"status": "Active",
"extractedAt": "2025-07-16T10:30:00Z"
}
]
}Data-Finder/
├── 📂 company-backend/ # Spring Boot backend
│ └── src/main/java/com/company/
│ ├── controller/ # REST Controllers
│ ├── service/ # Business logic + AI calls
│ ├── repository/ # JPA Repositories
│ ├── model/ # Entity classes
│ └── config/ # App & security config
│
├── 📂 frontend/ # React frontend
│ └── src/
│ ├── components/ # Reusable UI components
│ ├── pages/ # Page views
│ ├── services/ # Axios API service layer
│ └── App.js
│
└── 📄 README.md
Contributions are welcome and appreciated! 🎉
- Fork the repository
- Create your feature branch:
git checkout -b feature/my-feature - Commit your changes:
git commit -m "feat: add my feature" - Push to the branch:
git push origin feature/my-feature - Open a Pull Request
| Prefix | Purpose |
|---|---|
feat: |
New feature |
fix: |
Bug fix |
docs: |
Documentation |
refactor: |
Code cleanup |
test: |
Tests |
chore: |
Maintenance |
- API keys and DB credentials are never hardcoded — environment variables only
- Database access uses a scoped user with minimal privileges
- CORS is restricted to the frontend origin
- Found a vulnerability? Please open a private issue or contact the maintainer directly
- 🔐 JWT-based user authentication & role management
- 📈 Analytics dashboard for extraction trends
- 🐳 Docker + Docker Compose for one-command deployment
- 📧 Scheduled email digest reports
- 🔗 CRM integrations (Salesforce, HubSpot)
- 🧪 Full test coverage (JUnit + React Testing Library)
- 📱 Progressive Web App (PWA) support
- 🌐 Multi-language UI support
This project is licensed under the MIT License — see the LICENSE file for details.