Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
2e27385
Updated files
ShockaHolmes Apr 22, 2026
7966d27
check
niciahrymer-hillian Apr 22, 2026
b869abe
added files to git ignore
niciahrymer-hillian Apr 22, 2026
d942bb6
updated sample test
ShockaHolmes Apr 22, 2026
32a6034
The cleaning and validation runs im currently getting another java pa…
niciahrymer-hillian Apr 22, 2026
e79b284
"full sample package valid, invalid and edge caases, all data clean n…
niciahrymer-hillian Apr 23, 2026
fa913c9
Merge pull request #1 from niciahrymer-hillian/Nicky
niciahrymer-hillian Apr 23, 2026
17258f4
Merge branch 'Landing-Branch' into Shocka-feature
ShockaHolmes Apr 23, 2026
cd825c3
Merge pull request #2 from niciahrymer-hillian/Shocka-feature
ShockaHolmes Apr 23, 2026
01e2dfd
Ran program to pull all into into 4 classes
niciahrymer-hillian Apr 23, 2026
4afa544
Merge pull request #3 from niciahrymer-hillian/Nicky
niciahrymer-hillian Apr 23, 2026
a5a06fe
added tests, and generated information to fill empty fields after API…
niciahrymer-hillian Apr 24, 2026
c62020b
Merge branch 'Landing-Branch' into Nicky
niciahrymer-hillian Apr 24, 2026
6cdcb1f
Merge pull request #4 from niciahrymer-hillian/Nicky
niciahrymer-hillian Apr 24, 2026
a42eedf
Edited data
niciahrymer-hillian Apr 24, 2026
d5b6bf1
Merge remote-tracking branch 'origin/Landing-Branch' into Nicky
niciahrymer-hillian Apr 24, 2026
62dbddb
Merge pull request #5 from niciahrymer-hillian/Nicky
niciahrymer-hillian Apr 24, 2026
394334c
altered the data: removing the id from title column and removing birt…
niciahrymer-hillian Apr 27, 2026
d3e980a
Update generated java handoff package zip
niciahrymer-hillian Apr 27, 2026
23c9c24
Update generated Java handoff package zip and title sanitization logic
niciahrymer-hillian Apr 27, 2026
8d6e06d
Merge pull request #6 from niciahrymer-hillian/Nicky
niciahrymer-hillian Apr 27, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 13 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
#Data files
*.csv
*.json
*.ndjson
*.db


# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
Expand Down Expand Up @@ -191,4 +198,9 @@ cython_debug/
# exclude from AI features like autocomplete and code analysis. Recommended for sensitive data
# refer to https://docs.cursor.com/context/ignore-files
.cursorignore
.cursorindexingignore
.cursorindexingignore

# Dataset files
*.csv
*.json
*.ndjson
Binary file added Generated java handoff package.zip
Binary file not shown.
36 changes: 36 additions & 0 deletions Library.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "d3799054-696c-48ee-b1db-d95976e59a30",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"df = pd.read_csv(\"credits.csv, \"keywords.csv\", \"links_small.csv\", \"links.csv\", )"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.14.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
46 changes: 0 additions & 46 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,48 +1,2 @@
# Shhhh! CentralLibraryData

the place CentralLibrary gets its content from.

As a team, fork this repository to an Organization and submit the URL of your fork via the Student Portal. Each teammate will submit the **SAME URL.**

There are things you _need to know_ and ask Instructors about. What are they? When can we schedule some dialog sessions so that you can get some ideas about what to study?

Group Work requires that y'all think about plans, and priorities, and schedule some time to get to experts and senior devs to ask intelligent questions, and learn important topics.

_What are they?_

## Project Overview

Build a series of pipelines that feed the content needed by a comprehensive Library Management System built by the Java group.

This is a Data project matched to the Java project [Central Library](https://github.com/ZCW-Summer25/CentralLibrary.git).

The Java group builds that project, Data, you get to wrangle the data in the datasets below into a form that you agree with Java on. They load it, we get cool software.

## Learning Objectives

- pipelines - for real
- data formats (csv, json)
- wrangling data into useful formats

## Need to Know

- pandas
- python data structures

### Content Datasets

and this is NOT public: https://zcw-students-projects.s3.us-east-1.amazonaws.com/LMSDataForWeek3Project/LMS-DataStuff.zip

It contains the collected data for this project.
You need to get it from an instructor, we're not gonna store it in this github repo, it's too Big.

## The Pipeline Exercises


These are the conceptual sub-projects.
They will be useful in understanding how and what you need to do to provide Java with the data y'all need to make the project complete.

https://github.com/ZCW-Summer25/PipelineOne
https://github.com/ZCW-Summer25/PipelineTwo
https://github.com/ZCW-Summer25/PipelineThree
https://github.com/ZCW-Summer25/PipelineFour
78 changes: 78 additions & 0 deletions Untitled.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 2,
"id": "980e2938-bd3c-4095-b123-59fcf1c8a572",
"metadata": {},
"outputs": [
{
"ename": "FileNotFoundError",
"evalue": "[Errno 2] No such file or directory: 'data/raw/keywords.csv'",
"output_type": "error",
"traceback": [
"\u001b[31m---------------------------------------------------------------------------\u001b[39m",
"\u001b[31mFileNotFoundError\u001b[39m Traceback (most recent call last)",
"\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[2]\u001b[39m\u001b[32m, line 3\u001b[39m\n\u001b[32m 1\u001b[39m \u001b[38;5;28;01mimport\u001b[39;00m pandas \u001b[38;5;28;01mas\u001b[39;00m pd\n\u001b[32m 2\u001b[39m \u001b[38;5;28;01mimport\u001b[39;00m numpy \u001b[38;5;28;01mas\u001b[39;00m np\n\u001b[32m----> \u001b[39m\u001b[32m3\u001b[39m df = pd.read_csv(\u001b[33m'data/raw/keywords.csv'\u001b[39m)\n\u001b[32m 4\u001b[39m \u001b[38;5;66;03m#---SHAPE---\u001b[39;00m\n\u001b[32m 5\u001b[39m print(f'Rows: {df.shape[\u001b[32m0\u001b[39m]:,} Columns: {df.shape[\u001b[32m1\u001b[39m]}')\n\u001b[32m 6\u001b[39m print()\n",
"\u001b[36mFile \u001b[39m\u001b[32m/opt/homebrew/Cellar/jupyterlab/4.5.6_2/libexec/lib/python3.14/site-packages/pandas/io/parsers/readers.py:873\u001b[39m, in \u001b[36mread_csv\u001b[39m\u001b[34m(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, skip_blank_lines, parse_dates, date_format, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, on_bad_lines, low_memory, memory_map, float_precision, storage_options, dtype_backend)\u001b[39m\n\u001b[32m 861\u001b[39m kwds_defaults = _refine_defaults_read(\n\u001b[32m 862\u001b[39m dialect,\n\u001b[32m 863\u001b[39m delimiter,\n\u001b[32m (...)\u001b[39m\u001b[32m 869\u001b[39m dtype_backend=dtype_backend,\n\u001b[32m 870\u001b[39m )\n\u001b[32m 871\u001b[39m kwds.update(kwds_defaults)\n\u001b[32m--> \u001b[39m\u001b[32m873\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43m_read\u001b[49m\u001b[43m(\u001b[49m\u001b[43mfilepath_or_buffer\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mkwds\u001b[49m\u001b[43m)\u001b[49m\n",
"\u001b[36mFile \u001b[39m\u001b[32m/opt/homebrew/Cellar/jupyterlab/4.5.6_2/libexec/lib/python3.14/site-packages/pandas/io/parsers/readers.py:300\u001b[39m, in \u001b[36m_read\u001b[39m\u001b[34m(filepath_or_buffer, kwds)\u001b[39m\n\u001b[32m 297\u001b[39m _validate_names(kwds.get(\u001b[33m\"\u001b[39m\u001b[33mnames\u001b[39m\u001b[33m\"\u001b[39m, \u001b[38;5;28;01mNone\u001b[39;00m))\n\u001b[32m 299\u001b[39m \u001b[38;5;66;03m# Create the parser.\u001b[39;00m\n\u001b[32m--> \u001b[39m\u001b[32m300\u001b[39m parser = \u001b[43mTextFileReader\u001b[49m\u001b[43m(\u001b[49m\u001b[43mfilepath_or_buffer\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43m*\u001b[49m\u001b[43m*\u001b[49m\u001b[43mkwds\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 302\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m chunksize \u001b[38;5;129;01mor\u001b[39;00m iterator:\n\u001b[32m 303\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m parser\n",
"\u001b[36mFile \u001b[39m\u001b[32m/opt/homebrew/Cellar/jupyterlab/4.5.6_2/libexec/lib/python3.14/site-packages/pandas/io/parsers/readers.py:1645\u001b[39m, in \u001b[36mTextFileReader.__init__\u001b[39m\u001b[34m(self, f, engine, **kwds)\u001b[39m\n\u001b[32m 1642\u001b[39m \u001b[38;5;28mself\u001b[39m.options[\u001b[33m\"\u001b[39m\u001b[33mhas_index_names\u001b[39m\u001b[33m\"\u001b[39m] = kwds[\u001b[33m\"\u001b[39m\u001b[33mhas_index_names\u001b[39m\u001b[33m\"\u001b[39m]\n\u001b[32m 1644\u001b[39m \u001b[38;5;28mself\u001b[39m.handles: IOHandles | \u001b[38;5;28;01mNone\u001b[39;00m = \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[32m-> \u001b[39m\u001b[32m1645\u001b[39m \u001b[38;5;28mself\u001b[39m._engine = \u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43m_make_engine\u001b[49m\u001b[43m(\u001b[49m\u001b[43mf\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43mengine\u001b[49m\u001b[43m)\u001b[49m\n",
"\u001b[36mFile \u001b[39m\u001b[32m/opt/homebrew/Cellar/jupyterlab/4.5.6_2/libexec/lib/python3.14/site-packages/pandas/io/parsers/readers.py:1904\u001b[39m, in \u001b[36mTextFileReader._make_engine\u001b[39m\u001b[34m(self, f, engine)\u001b[39m\n\u001b[32m 1902\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m \u001b[33m\"\u001b[39m\u001b[33mb\u001b[39m\u001b[33m\"\u001b[39m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;129;01min\u001b[39;00m mode:\n\u001b[32m 1903\u001b[39m mode += \u001b[33m\"\u001b[39m\u001b[33mb\u001b[39m\u001b[33m\"\u001b[39m\n\u001b[32m-> \u001b[39m\u001b[32m1904\u001b[39m \u001b[38;5;28mself\u001b[39m.handles = \u001b[43mget_handle\u001b[49m\u001b[43m(\u001b[49m\n\u001b[32m 1905\u001b[39m \u001b[43m \u001b[49m\u001b[43mf\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 1906\u001b[39m \u001b[43m \u001b[49m\u001b[43mmode\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 1907\u001b[39m \u001b[43m \u001b[49m\u001b[43mencoding\u001b[49m\u001b[43m=\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43moptions\u001b[49m\u001b[43m.\u001b[49m\u001b[43mget\u001b[49m\u001b[43m(\u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43mencoding\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mNone\u001b[39;49;00m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 1908\u001b[39m \u001b[43m \u001b[49m\u001b[43mcompression\u001b[49m\u001b[43m=\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43moptions\u001b[49m\u001b[43m.\u001b[49m\u001b[43mget\u001b[49m\u001b[43m(\u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43mcompression\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mNone\u001b[39;49;00m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 1909\u001b[39m \u001b[43m \u001b[49m\u001b[43mmemory_map\u001b[49m\u001b[43m=\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43moptions\u001b[49m\u001b[43m.\u001b[49m\u001b[43mget\u001b[49m\u001b[43m(\u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43mmemory_map\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mFalse\u001b[39;49;00m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 1910\u001b[39m \u001b[43m \u001b[49m\u001b[43mis_text\u001b[49m\u001b[43m=\u001b[49m\u001b[43mis_text\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 1911\u001b[39m \u001b[43m \u001b[49m\u001b[43merrors\u001b[49m\u001b[43m=\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43moptions\u001b[49m\u001b[43m.\u001b[49m\u001b[43mget\u001b[49m\u001b[43m(\u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43mencoding_errors\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43mstrict\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 1912\u001b[39m \u001b[43m \u001b[49m\u001b[43mstorage_options\u001b[49m\u001b[43m=\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43moptions\u001b[49m\u001b[43m.\u001b[49m\u001b[43mget\u001b[49m\u001b[43m(\u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43mstorage_options\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mNone\u001b[39;49;00m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 1913\u001b[39m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 1914\u001b[39m \u001b[38;5;28;01massert\u001b[39;00m \u001b[38;5;28mself\u001b[39m.handles \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[32m 1915\u001b[39m f = \u001b[38;5;28mself\u001b[39m.handles.handle\n",
"\u001b[36mFile \u001b[39m\u001b[32m/opt/homebrew/Cellar/jupyterlab/4.5.6_2/libexec/lib/python3.14/site-packages/pandas/io/common.py:926\u001b[39m, in \u001b[36mget_handle\u001b[39m\u001b[34m(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)\u001b[39m\n\u001b[32m 921\u001b[39m \u001b[38;5;28;01melif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(handle, \u001b[38;5;28mstr\u001b[39m):\n\u001b[32m 922\u001b[39m \u001b[38;5;66;03m# Check whether the filename is to be opened in binary mode.\u001b[39;00m\n\u001b[32m 923\u001b[39m \u001b[38;5;66;03m# Binary mode does not support 'encoding' and 'newline'.\u001b[39;00m\n\u001b[32m 924\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m ioargs.encoding \u001b[38;5;129;01mand\u001b[39;00m \u001b[33m\"\u001b[39m\u001b[33mb\u001b[39m\u001b[33m\"\u001b[39m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;129;01min\u001b[39;00m ioargs.mode:\n\u001b[32m 925\u001b[39m \u001b[38;5;66;03m# Encoding\u001b[39;00m\n\u001b[32m--> \u001b[39m\u001b[32m926\u001b[39m handle = \u001b[38;5;28;43mopen\u001b[39;49m\u001b[43m(\u001b[49m\n\u001b[32m 927\u001b[39m \u001b[43m \u001b[49m\u001b[43mhandle\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 928\u001b[39m \u001b[43m \u001b[49m\u001b[43mioargs\u001b[49m\u001b[43m.\u001b[49m\u001b[43mmode\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 929\u001b[39m \u001b[43m \u001b[49m\u001b[43mencoding\u001b[49m\u001b[43m=\u001b[49m\u001b[43mioargs\u001b[49m\u001b[43m.\u001b[49m\u001b[43mencoding\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 930\u001b[39m \u001b[43m \u001b[49m\u001b[43merrors\u001b[49m\u001b[43m=\u001b[49m\u001b[43merrors\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 931\u001b[39m \u001b[43m \u001b[49m\u001b[43mnewline\u001b[49m\u001b[43m=\u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[32m 932\u001b[39m \u001b[43m \u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 933\u001b[39m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[32m 934\u001b[39m \u001b[38;5;66;03m# Binary mode\u001b[39;00m\n\u001b[32m 935\u001b[39m handle = \u001b[38;5;28mopen\u001b[39m(handle, ioargs.mode)\n",
"\u001b[31mFileNotFoundError\u001b[39m: [Errno 2] No such file or directory: 'data/raw/keywords.csv'"
]
}
],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"df = pd.read_csv('data/raw/keywords.csv')\n",
"#---SHAPE---\n",
"print(f'Rows: {df.shape[0]:,} Columns: {df.shape[1]}')\n",
"print()\n",
"#---Column Names and Types---\n",
"print(df.dtypes)\n",
"print()\n",
"#---First 5 Rows (visual check)---\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "36eacfc6-1a42-44f4-8669-01fecc069fde",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"id": "387da07a-6cd8-407e-a53e-ccfdadf070c5",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.14.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading