Completely serverless flask, docker and AWS based web application with deployment scripts, architecture diagrams and complete installation and run steps.
- This application is based on
ServerlessFramework and deploys the whole application throughcloudformationtemplates. - Once the deployment completes and necessary infrastructure is ready on
AWS, you will get a deployedAPI URLto access the application. - It takes input of valid
tsvfile from user through aflaskbased web UI. - Once user provides the file and clicks on
uploadbutton, application uploads the file to anS3Bucket. It also paritions the data throughyear, month, day and hourand converts it to parquet format. It also creates aGluecatalog for the raw input data which can be accessed inAWS Athena. - Then it redirects to the file ready to process page and ask for user input through a button.
- Once user clicks the button, Upload Files API is called which in back end triggers a Spark job in GLUE which runs the python script in
Scriptsdirectory. - This API request will keep refreshing itself until Glue job finishes processing.
- This
Spark ETL Gluejob runs the business logic and after processing, copies the processed file to S3 output directory specified and does a call back to the Web UI to signal completion. - API will take the output file name and sends message to the web users with the location of processed file in S3 and also creates a
gluecatalog for the processed data which can be accessed inAthena. - Users now also have the option to download the processed file on their local system as well.
- Quicksight dashboard is created to analyse the input raw data and processed output data. Link for that is provided below.
- AWS must be configured on your machine. Run
aws configureto check. Gitshould be installed and configured to clone the repo.- Since
serverlessframework dockerizes everything,docker daemonmust be running. nodejsmust be installed to package node modules and installserverlessframework.
- Clone the branch :
git clone https://github.com/ayushsood2/my-adobe-application.git cd my-adobe-application- Install serverless :
npm install -g serverless - Install wsgi plugin for
requirements.txt:npm install --save-dev serverless-wsgi serverless-python-requirements - Donwload awswrangler lamda layer from here: https://github.com/awslabs/aws-data-wrangler/releases/download/2.11.0/awswrangler-layer-2.11.0-py3.6.zip. Rename it to
awswrangler.zipand copy it in Scripts folder. sls deployto deploy it.
https://us-west-2.quicksight.aws.amazon.com/sn/dashboards/e4da48b9-b333-4022-aa1c-1e867abd7a75
