Skip to content

Latest commit

 

History

History
184 lines (140 loc) · 6.67 KB

File metadata and controls

184 lines (140 loc) · 6.67 KB

Best Practices Cloud Platforms

This section is a collection of best practices on how you can arrange the tools together to a platform.
It's here especially to help you start your own project in the cloud on AWS, Azure and GCP.

Like the advanced skills section this section also follows my My Data Science Platform Blueprint. In the blueprint I divided the platform into sections: Connect, Buffer, Processing, Store and Visualize.

This order will help you learn how to connect the right tools together. Take your time and research the tools and learn how they work.

Right now the Azure section has a lot of links to platform examples. They are also useful for AWS and GCP, just try to change out the tools.

As always, I am going to add more stuff to this over time.

Have fun!

Contents

AWS

Connect

  • Elastic Beanstalk
  • SES Simple Email Service
  • API Gateway

Buffer

  • Kinesis
  • Kinesis Data Firehose
  • Managed Streaming for Kafka (MSK)
  • MQ
  • Simple Queue Service (SQS)
  • Simple Notification Service (SNS)

Processing

  • Athena
  • EMR
  • Elasticsearch
  • Kinesis Data Analytics
  • Glue
  • Step Functions
  • Fargate
  • Lambda
  • SageMaker

Store

  • Simple Storage Service (S3)
  • Redshift
  • Aurora
  • RDS
  • DynamoDB
  • ElastiCache
  • Neptune Graph DB
  • Timestream
  • DocumentDB (MongoDB compatible)

Visualize

  • Quicksight

Containerization

  • Elastic Container Service (ECS)
  • Elastic Container Registry (ECR)
  • Elastic Kubernetes Service (EKS)

Best Practices

Deploying a Spring Boot Application on AWS Using AWS Elastic Beanstalk:

https://aws.amazon.com/de/blogs/devops/deploying-a-spring-boot-application-on-aws-using-aws-elastic-beanstalk/

How to deploy a Docker Container on AWS:

https://towardsdatascience.com/how-to-deploy-a-docker-container-python-on-amazon-ecs-using-amazon-ecr-9c52922b738f

More Details

AWS Whitepapers:

https://d1.awsstatic.com/whitepapers/aws-overview.pdf

Azure

Connect

  • Event Hub
  • IoT Hub

Buffer

  • Data Factory
  • Event Hub
  • RedisCache (also Store)

Processing

  • Stream Analytics Service
  • Azure Databricks
  • Machine Learning
  • Azure Functions

Store

  • Blob
  • CosmosDB
  • MariaDB
  • MySQL
  • PostgreSQL
  • SQL
  • Azure Data lake
  • Azure Storage (SQL Table?)

Visualize

  • PowerBI

Containerization

Best Practices

Advanced Analytics Architecture:

https://docs.microsoft.com/en-us/azure/architecture/solution-ideas/articles/advanced-analytics-on-big-data

Anomaly Detection in Real-time Data Streams:

https://docs.microsoft.com/en-us/azure/architecture/solution-ideas/articles/anomaly-detection-in-real-time-data-streams

Modern Data Warehouse Architecture:

https://docs.microsoft.com/en-us/azure/architecture/solution-ideas/articles/modern-data-warehouse

CI/CD for Containers:

https://docs.microsoft.com/en-us/azure/architecture/solution-ideas/articles/cicd-for-containers

Real Time Analytics on Big Data Architecture:

https://docs.microsoft.com/en-us/azure/architecture/solution-ideas/articles/real-time-analytics

Anomaly Detection in Real-time Data Streams:

https://docs.microsoft.com/en-us/azure/architecture/solution-ideas/articles/anomaly-detection-in-real-time-data-streams

IoT Architecture – Azure IoT Subsystems:

https://docs.microsoft.com/en-us/azure/architecture/solution-ideas/articles/azure-iot-subsystems

Tier Applications & Data for Analytics:

https://docs.microsoft.com/en-us/azure/architecture/solution-ideas/articles/tiered-data-for-analytics

Extract, transform, and load (ETL) using HDInsight:

https://docs.microsoft.com/en-us/azure/architecture/solution-ideas/articles/extract-transform-and-load-using-hdinsight

IoT using Cosmos DB:

https://docs.microsoft.com/en-us/azure/architecture/solution-ideas/articles/iot-using-cosmos-db

Streaming using HDInsight:

https://docs.microsoft.com/en-us/azure/architecture/solution-ideas/articles/streaming-using-hdinsight

GCP

Connect

Buffer

Processing

Store

Visualize

Containerization

Best Practices