Skip to content

21Vijeth/DataEngineering-Workshop1

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Engineering Workshop

One Day workshop on understanding Docker, Web Scrapping, Regular Expressions, PostgreSQL and Git.

Prerequisites

Use Ubuntu 20.04 LTS with following packages installed

  • Python 3.9 or above
  • docker
  • docker-compose
  • pip3
  • git (any recent version)

GitHub account

  • Create an account on GitHub (Only if you do not have an account)
  • Fork DataEngineering-Workshop1 repository. Refer this guide to understand how to fork a repository
  • Clone forked repo to your machine using SSH Key.
    • Make sure you have set up SSH key as per the documentation to create a new SSH Key if you don't have a Key.
    • Open your forked repo link in your browser.
    • Click on Code (Green color button).
    • Select SSH option and copy the link.
    • Clone the repo (replace YOUR-GIT-ID with your GitHub id)
         git clone git@github.com:<YOUR-GIT-ID>/DataEngineering-Workshop1.git
      

Docker

  • To install docker go to your cloned repository and run the following command
  • sudo prerequisites/install_docker.sh

Workshop environment setup

  • Check if Git, Docker, and Docker Compose are installed in on the system.
  • Open the terminal and run the following command to check the version of the prerequisites
    • Check Git version
       git --version
      
      git version 2.25.1
    • Check Docker version
       docker --version
      
      Docker version 20.10.17, build 100c701
    • Check Docker Compose version
       docker-compose --version
      
      docker-compose version 1.25.0, build 0a186604

What will you learn by the end of this workshop?

  • By the end of this workshop you will learn how to build docker image and it's usage.
  • You will learn how to scrape a website using urllib/requests and Beautifulsoup.
  • You will learn Regular Expressions and how to work with it.
  • You will learn key features of PostgreSQL.
  • You will learn how to dockerize your project.

Schedule

Time Topics
09:00 - 11:00 Introduction to Docker
11:00 - 01:00 Introduction to Webscrapping.
01:00 - 02:00 Break
02:00 - 03:00 Dockerizing a project
03:00 - 04:00 Introduction to PostgreSQL
04:00 - 04:30 Introduction to Github
04:30 - 04:45 Q & A
04:45 - 05:00 Wrapping Up

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Shell 85.7%
  • Python 12.7%
  • Dockerfile 1.6%