Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
# Exclude files from Docker build context. This prevents unnecessary files from
# being sent to Docker daemon, reducing build time and image size.

# Python artifacts
__pycache__/
*.pyc
*.pyo
*.pyd
*.egg-info/

# Virtual environments
venv/
.venv/
env/
.env
.envrc
client_venv.helpers/
ENV/

# Jupyter
.ipynb_checkpoints/
.jupyter/

# Build artifacts
build/
dist/
*.eggs/
.eggs/

# Cache and temporary files
*.log
*.tmp
*.cache
.pytest_cache/
.mypy_cache/
.coverage
htmlcov/

# Git and version control
.git/
.gitignore
.gitattributes
.github/

# Docker build scripts (not needed at runtime)
docker_build.sh
docker_push.sh
docker_clean.sh
docker_exec.sh
docker_cmd.sh
docker_bash.sh
docker_jupyter.sh
docker_name.sh
run_jupyter.sh
Dockerfile.*
.dockerignore

# Documentation
README.md
README.admin.md
docs/
*.md
CHANGELOG.md
LICENSE

# Configuration and secrets
.env.*
.env.local
.env.development
.env.production
.DS_Store
Thumbs.db

# Shell configuration
.bashrc
.bash_history
.zshrc

# Large data files (mount via volume instead)
data/
*.csv
*.pkl
*.h5
*.parquet
*.feather
*.arrow
*.npy
*.npz

# Generated images
*.png
*.jpg
*.jpeg
*.gif
*.svg
*.pdf

# Test files and examples
tests/
test_*
*_test.py
tutorials/
examples/

# IDE and editor files
.vscode/
.idea/
*.swp
*.swo
*~
.project
.pydevproject
.settings/
*.iml
.sublime-project
.sublime-workspace

# Node and frontend (if applicable)
node_modules/
npm-debug.log
yarn-error.log
.npm

# Requirements management
requirements.in
Pipfile
Pipfile.lock
poetry.lock
setup.py
setup.cfg

# CI/CD configuration
.gitlab-ci.yml
.travis.yml
Jenkinsfile
.circleci/

# Miscellaneous
*.bak
.venv.bak/
*.whl
*.tar.gz
*.zip
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# YData-profiling Project

## Project Title
YData Profiling for Exploratory Data Analysis and Regression Modeling

## Project Description
This project explores the Python library YData-profiling for automated exploratory data analysis (EDA). The goal is to generate comprehensive data profile reports, identify data quality issues, understand variable distributions, and prepare the dataset for predictive modeling.

## Objectives
- Load and inspect a dataset using Pandas
- Generate an automated profiling report with YData-profiling
- Identify missing values, outliers, and data quality issues
- Perform data cleaning and feature engineering
- Build a regression model for prediction
- Evaluate model performance using appropriate metrics

## Tool
- Python
- Pandas
- YData-profiling
- Scikit-learn
- Jupyter Notebook

## Dataset
A public dataset will be selected for profiling and regression analysis.

## Expected Output
- Automated profiling report
- Cleaned dataset
- Regression model
- Evaluation results
- Documentation showing how YData-profiling supports the workflow
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
set -o vi

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
#!/bin/bash
# """
# This script launches a Docker container with an interactive bash shell for
# development.
# """

# Exit immediately if any command exits with a non-zero status.
set -e

# Print each command to stdout before executing it.
set -x

# Import the utility functions from the project template.
GIT_ROOT=$(git rev-parse --show-toplevel)
source $GIT_ROOT/class_project/project_template/utils.sh

# Load Docker configuration variables for this script.
get_docker_vars_script ${BASH_SOURCE[0]}
source $DOCKER_NAME
print_docker_vars

# List the available Docker images matching the expected image name.
run "docker image ls $FULL_IMAGE_NAME"

# Configure and run the Docker container with interactive bash shell.
# - Container is removed automatically on exit (--rm)
# - Interactive mode with TTY allocation (-ti)
# - Port forwarding for Jupyter or other services
# - Current directory mounted to /data inside container
CONTAINER_NAME=${IMAGE_NAME}_bash
PORT=8889
cmd="docker run --rm -ti \
--name $CONTAINER_NAME \
-p $PORT:$PORT \
-v $(pwd):/data \
-v $GIT_ROOT:/git_root \
-e PYTHONPATH=/git_root:/git_root/helpers_root \
$FULL_IMAGE_NAME"
run $cmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
#!/bin/bash
# """
# Build a Docker container image for the project.
#
# This script sets up the build environment with error handling and command
# tracing, loads Docker configuration from docker_name.sh, and builds the
# Docker image using the build_container_image utility function. It supports
# both single-architecture and multi-architecture builds via the
# DOCKER_BUILD_MULTI_ARCH environment variable.
# """

# Exit immediately if any command exits with a non-zero status.
set -e

# Print each command to stdout before executing it.
set -x

# Import the utility functions.
GIT_ROOT=$(git rev-parse --show-toplevel)
source $GIT_ROOT/class_project/project_template/utils.sh

# Load Docker configuration variables (REPO_NAME, IMAGE_NAME, FULL_IMAGE_NAME).
get_docker_vars_script ${BASH_SOURCE[0]}
source $DOCKER_NAME
print_docker_vars

# Configure Docker build settings.
# Enable BuildKit for improved build performance and features.
export DOCKER_BUILDKIT=1
#export DOCKER_BUILDKIT=0

# Configure single-architecture build (set to 1 for multi-arch build).
#export DOCKER_BUILD_MULTI_ARCH=1
export DOCKER_BUILD_MULTI_ARCH=0

# Build the container image.
# Uncomment the line below to build without using Docker cache.
#build_container_image --no-cache
build_container_image
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#!/bin/bash
# """
# Remove Docker container image for the project.
#
# This script cleans up Docker images by removing the container image
# matching the project configuration. Useful for freeing disk space or
# ensuring a fresh build.
# """

# Exit immediately if any command exits with a non-zero status.
set -e

# Print each command to stdout before executing it.
set -x

# Import the utility functions.
GIT_ROOT=$(git rev-parse --show-toplevel)
source $GIT_ROOT/class_project/project_template/utils.sh

# Load Docker configuration variables for this script.
get_docker_vars_script ${BASH_SOURCE[0]}
source $DOCKER_NAME
print_docker_vars

# Remove the container image.
remove_container_image
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
#!/bin/bash -e
# """
# Execute a command in a Docker container.
#
# This script runs a specified command inside a new Docker container instance.
# The container is removed automatically after the command completes. The
# current directory is mounted to /data inside the container.
# """

# Exit immediately if any command exits with a non-zero status.
set -e
#set -x

# Capture the command to execute from command-line arguments.
CMD="$@"
echo "Executing: '$CMD'"

# Import the utility functions.
GIT_ROOT=$(git rev-parse --show-toplevel)
source $GIT_ROOT/class_project/project_template/utils.sh

# Load Docker configuration variables for this script.
get_docker_vars_script ${BASH_SOURCE[0]}
source $DOCKER_NAME
print_docker_vars

# List available Docker images matching the expected image name.
run "docker image ls $FULL_IMAGE_NAME"
#(docker manifest inspect $FULL_IMAGE_NAME | grep arch) || true

# Configure and run the Docker container with the specified command.
DOCKER_RUN_OPTS=""
CONTAINER_NAME=$IMAGE_NAME
run "docker run \
--rm -ti \
--name $CONTAINER_NAME \
$DOCKER_RUN_OPTS \
-v $(pwd):/data \
-v $GIT_ROOT:/git_root \
-e PYTHONPATH=/git_root:/git_root/helpers_root \
$FULL_IMAGE_NAME \
bash -c '$CMD'"
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#!/bin/bash
# """
# Execute a bash shell in a running Docker container.
#
# This script connects to an already running Docker container and opens an
# interactive bash session for debugging or inspection purposes.
# """

# Exit immediately if any command exits with a non-zero status.
set -e

# Print each command to stdout before executing it.
set -x

# Import the utility functions.
GIT_ROOT=$(git rev-parse --show-toplevel)
source $GIT_ROOT/class_project/project_template/utils.sh

# Load Docker configuration variables for this script.
get_docker_vars_script ${BASH_SOURCE[0]}
source $DOCKER_NAME
print_docker_vars

# Execute bash shell in the running container.
exec_container
Loading