GitHub - gjstirling/csv_consumer_tech_test: A Node API which consumes CSV files, checks for outliers and saves data if none are present.

Project Description

A node API that can be used to consume CSV data, perform outlier detection using the 1.5 times interquartile range method (1.5*IRQ). If numerical data contains no outlier data this is then stored in a new table named "csv_import_${uuid}". The new tables are tracked using the data_uploads table which saves each table name and assigns it a new id.

Full Technical test outline can be found in TASK.md inside the root directory.

Two csv files are provided inside /test_csv directory:

success.csv - which contains no outliers and
rejected.csv which contains outliers

Endpoints and base url:

POST /upload - attached .csv file
URL: http://localhost:5175/upload

Running the application

Locally

Install dependencies: npm i
Start the development database docker compose -f docker-compose.db.yaml up -d
Generate DB schema: npm run db:generate
Perform migration:npm run db:migrate
Locally run app: npm run dev

Production - run via docker compose

create a .env.production file and add the following variables:

PORT=5175
HOST=localhost
DB_URL=postgres://postgres:password@postgres_db:5432/my_database

POSTGRES_PASSWORD=password
POSTGRES_USER=postgres
POSTGRES_DB=my_database

docker compose --env-file .env.production up

View Uploaded Data:

To verify the data is being stored you can run a bash shell inside the docker container to query the database:
docker exec -it "{docker-db-reference}" bash

Connect to db:psql -U postgres -d my_database
List tables: \dt
View uploads: SELECT * FROM data_uploads;

Personal Learning Outcomes/Challenges:

Creating the project: I researched how to create a Node/Express API and connect it to a PostgreSQL database. I followed some basic example projects for guidance, starting by exposing a basic "/" endpoint. Once completed, I created a simple data_uploads table to store text alongside a random number, verifying that I could store data in PostgreSQL database.

Configuring the fs module to convert the CSV into a JS Record: Initially, I incorrectly implemented .pipe() and csvParser, which resulted in storing the column names as the first line of data. I fixed this by reading the documentation and finding examples on converting streams into an array of record objects.

Converting strings to numbers: I needed to extract key values from each row to set the appropriate type (number or text). Initially, I used a hardcoded column name to convert the type to numeric. To remove the hardcoding, I iterated through the row keys, checking each key's type during processing and converting number strings into numbers.

Outlier detection: I researched methods for identifying outliers in numerical datasets. Initially, I calculated the mean of the values and set a tolerance for how far a value could be from the mean. However, I realized this method might be too restrictive with larger datasets. Instead, I focused on using the lower and upper quartiles to create an interquartile range (IQR) and used this to determine if a value was too small or too large.

Return break error: I used a return statement inside a for loop, assuming it would break out of the loop and return from the nested function. However, unit tests showed it always defaulted to false regardless of the data. To fix this, I used the some method to iterate through the records, returning true when an outlier matched the boundary condition. This eliminated the need for two return statements.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
db/drizzle		db/drizzle
images		images
src		src
test		test
test_csv		test_csv
.DS_Store		.DS_Store
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
TASK.md		TASK.md
babel.config.js		babel.config.js
docker-compose.db.yaml		docker-compose.db.yaml
docker-compose.yaml		docker-compose.yaml
drizzle.config.ts		drizzle.config.ts
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Description

Endpoints and base url:

Running the application

Locally

Production - run via docker compose

View Uploaded Data:

Personal Learning Outcomes/Challenges:

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Project Description

Endpoints and base url:

Running the application

Locally

Production - run via docker compose

View Uploaded Data:

Personal Learning Outcomes/Challenges:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages