Apache Spark / PySpark Data Pipeline

Production-style PySpark batch processing pipeline with a full CI/CD setup (Jenkinsfile), pytest test suite, and structured configuration management.

Key Features

Batch data processing pipeline built with PySpark
Pytest test suite for unit testing Spark transformations
Jenkinsfile for automated CI/CD integration
Pipenv-managed dependencies
log4j logging configuration

Stack

PySpark / Apache Spark
Python (Pipenv)
pytest
Jenkins (CI/CD)

Project Structure

apache-spark/
├── conf/               # Spark and app configuration
├── lib/                # Core pipeline modules
├── test_data/          # Sample datasets for testing
├── sbdl_main.py        # Main pipeline entry point
├── sbdl_submit.sh      # Spark submit script
├── test_pytest_sbdl.py # Pytest test suite
├── Jenkinsfile         # CI/CD pipeline definition
└── Pipfile             # Dependencies

Running Tests

pipenv install
pipenv run pytest test_pytest_sbdl.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Apache Spark / PySpark Data Pipeline

Key Features

Stack

Project Structure

Running Tests

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
__pycache__		__pycache__
conf		conf
lib		lib
test_data		test_data
.gitignore		.gitignore
Jenkinsfile		Jenkinsfile
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
log4j.properties		log4j.properties
sbdl_main.py		sbdl_main.py
sbdl_submit.sh		sbdl_submit.sh
test_pytest_sbdl.py		test_pytest_sbdl.py

Folders and files

Latest commit

History

Repository files navigation

Apache Spark / PySpark Data Pipeline

Key Features

Stack

Project Structure

Running Tests

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages