Skip to content

PriyankaBhatta/60DaysofLearning2025

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 

Repository files navigation

🚀 60 Days of Learning (2025 Edition)

Welcome to my #60DaysOfLearning2025 challenge!
This repository is a personal learning log where I document my daily progress as I explore and upskill in various areas of technology.

📅 Start Date:

June 1, 2025

🎯 Goal:

To stay consistent with learning and build hands-on skills in areas like:

  • Big Data (Hadoop, HDFS, Hive, PySpark etc.)
  • Data Engineering
  • Linux & Shell Scripting
  • SQL
  • Cloud Fundamentals (AWS/GCP)
  • Tools & Frameworks I encounter during my journey

📚 Daily Logs

Each day I will:

  • Practice or learn something new
  • Document the learnings in a markdown file (e.g., Day01.md)
  • Push relevant screenshots, notes, or code snippets
  • Share highlights on Twitter using hashtags like #60DaysOfLearning2025

🔖 Progress

I will update this section weekly with my learning summary.

Day Topic Summary
01 Setup Hadoop on VM Installed Hadoop & Java 11, configured environment variables
02 HDFS Basics Ran basic HDFS commands, understood roles of NameNode and DataNode
03 Hadoop Architecture Studied YARN, HDFS structure and node responsibilities
04 Practicing HDFS Created/uploaded/viewed files in HDFS, practiced put, cat, ls
05 Practicing HDFS Created & viewed sample.txt with HDFS cmds: put, ls, mkdir, get, cat, tail
06 Read an article A Comparative Evaluation of Apache Hadoop and Apache Spark
07 Read multiple article On big Data Concepts and Analytics
08 Watched HDFS tutorial in YT To stregthen my knowledge in HDFS before umping into Mapreduce
09 Learnt about MapReduce Read about MapReduce from youtube tutorials and also implemented it's various usecases in my vm terinal
10 Read about hive Read about Hive, it's archiecture, how it works
11 Installed Hive Installed Apache Hive on my VM today as I dive deeper into the Big Data ecosystem
12 Namenode not working Tried to fix namenode error issue, but couldn't solve it today
13 Read article Read article on Case Study of Hive using Hadoop
14 Read SQL Basic Terms and Table and Column Naming Rules
15 Read about SQL keys Read about foreign, unique and primary keys in detail and made notes in notepad
16 Read about SQL check and default costraint Learned about DEFAULT AND CHECK CONSTRAINT in sql and made notes in notepad
17 Practiced SQL queries Revised and used some sql functions i hadn't used in a while
18 Practiced sql queries Practiced easy to medium SQL queries
19 Read a research paper Read a paper on expanding SQL learning beyond RDBMS to Big Data systems like MapReduce, NoSQL & NewSQL. Essential shift for modern data insights.
20 Read sql quizes Practiced sql quizes trough learn sql app
21 Read sql functions Revised aggregate, string, numeric, date functions today
22 Fixed HDFS issue Spent the day fixing an HDFS issue caused by inconsistent NameNode/DataNode directories. Solved it by reformatting NameNode and clearing stale PID files
23 Practice queries in hive Created a Hive database lspp23, a sales table, inserted rows, and wrote SQL queries to view sales, calculate revenue, analyze orders by region, and filter data by conditions like product type and quantity.
24 Installed spark and lerned queries Installed Apache Spark 3.5.6 and ran my first RDD queries using sc.parallelize() and groupBy. Learned how Spark handles distributed data and transformations across nodes.
25 Practiced sql queries Practiced medium hard SQL queries and revised my notes from previous days
26 Learned about pysaprk Learning in detail about PySpark from datacamp course 'Introduction to PySpark DataFrames
27 Learned SQL Window fuctions Learned about window functions , bit hard compared to other sql topics but an informative read
28 Hard level SQL Queies Did some hard level SQL questions today, focusing on advanced CASE WHEN, subqueries, and grouping logic. Feeling more confident with daily practice.
29 Data Wareshouse Studied Data Warehousing today: its need, components, types, characteristics, advantages, and disadvantages. Learned how it helps store huge data centrally for analysis and better decision-making.
30 Hard level SQL queries Practiced hard SQL today with CASE WHEN, MOD for even-odd, JOINs, GROUP BY, HAVING, LAG window functions, ROUND, and ORDER BY custom sorting. These are levelling up my SQL skills.
31 Data Warehousing Concepts Studied ETL, dimensional modelling, star & snowflake schemas, OLAP cubes, data marts, data integration, and governance to build strong fundamentals for data engineering.
32 Big Data Analytics Downloaded Nepal daily climate dataset from Kaggle. Connected Spark to HDFS and YARN, explored dataset schema and columns like Temp_2m, MaxTemp_2m, MinTemp_2m, and WindSpeed_10m. Planned to upload to HDFS and perform analytics using Spark DataFrames tomorrow.
33 Big Data Project Setup Set up PySpark environment for daily climate data analytics. Connected HDFS and YARN, created project directories in HDFS, uploaded the dataset, and practiced reading data with Spark for upcoming analysis. Faced pandas installation issues, will continue with pandas-based visualisation tomorrow.
34 Pandas Basics & Dataset Exploration Faced pandas installation issues in PySpark, so switched to Jupyter Notebook for visualization. Loaded Nepal climate dataset, renamed columns, and explored it using pandas functions for structure, info, and initial analysis.
35 Visualised climate dataset using pandas + matplotlib + seaborn Created histograms for temp and pressure, boxplots by district, and correlation heatmaps to analyse relationships between features. Strengthening my data analysis and EDA skills.
36 Climate Data Visualization Performed advanced visualizations on Nepal climate data today. Converted date to datetime, extracted year, plotted average temperature trends over years, analysed district-wise temperatures post-1990, and explored wind speed distribution. Strengthening my data analysis and visualization skills with pandas, matplotlib, and seaborn.
37 Nepal Climate Data Analysis Project Performed max temperature distribution analysis and calculated average humidity per district, identifying top and bottom humid regions for insights.
38 Nepal Climate Data Analysis Project Implemented linear regression to predict precipitation, evaluated model (MSE: 25.6, R²: 0.29), visualized actual vs predicted values, and analyzed feature coefficients for insights.
39 Superstore Sales Data Project Analysis Completed EDA, box plots, and styled tables on Superstore Sales dataset (2nd project) using pandas and Jupyter Notebook for insightful data visualization and analysis.
40 Superstore Sales Data Project Visualizations Created 5 visualizations (Monthly Trend, Category Distribution, Region-Segment Sales, Day vs. Sales, Heatmap) (2nd project) using pandas and seaborn in Jupyter Notebook, analyzing sales patterns and trends.
41 Superstore Sales Data Project Visualizations Added a bar chart of average sales by ship mode and a correlation heatmap (2nd project) using pandas and seaborn in Jupyter Notebook, exploring sales patterns and relationships.
42 Superstore Sales Data Project Visualizations Added a horizontal bar chart of sales by sub-category, sales distribution by state, and a line plot of sales over time by category (2nd project) using pandas and matplotlib in Jupyter Notebook.
43 Superstore Sales Data Project Prediction using Linear Regression Enhanced Superstore Sales prediction model using Order Month, Order Year, and encoded Category, Sub-Category, Region, Ship Mode, Segment. Analyzed scatter plot visualization to assess model fit.
44 Superstore Sales Data Project Prediction using Linear Regression and XGBoost Enhanced sales prediction model with Linear Regression (MSE: 1873.5) and XGBoost (MSE: 14918.1) using new features like discount, competitor_price, and price_elasticity. Fixed randomness for stable results, visualized with a y=mx+c line, and identified two outliers.
45 Studied 'XGBoost vs Linear Regression blog Studied https://medium.com/@heyamit10/xgboost-vs-linear-regression-a-practical-guide-aa09a68af12b blog to deepen my understanding. . Learning how complexity vs. simplicity impacts predictions!
46 Medicines Information Dataset (MID) Exploration in Jupyter (Third Project) Started third project with Kaggle's MID dataset. Spent significant time selecting dataset, cleaned Therapeutic_class_counts.xlsx by dropping empty columns (Unnamed: 3, Unnamed: 4). Plan to explore MID dataset and perform deeper analysis from tomorrow.
47 Medicines Information Dataset (MID) Exploration in Jupyter (Third Project) Continued third project with Kaggle's MID dataset. Cleaned MID by filling missing values (ProductIntroduction, HowToUse, HowWorks, Chemical_Class, Action_Class) with 'Not Specified'. Performed EDA: counted unique values in Therapeutic_Class, Chemical_Class, Habit_Forming, Action_Class. Visualized HowToUse with WordCloud and text length histogram, and top 10 therapeutic classes by Size with bar plot. Plan to explore more visualizations tomorrow.
48 Medicines Information Dataset (MID) Exploration in Jupyter (Third Project) Continued third project with Kaggle's MID dataset. Created SideEffect WordCloud to identify common side effects, SideEffect text length histogram to analyze description complexity, and top 10 Chemical_Class bar plot to explore chemical compositions. Skipped ProductUses WordCloud due to messy text (e.g., null, HTML artifacts) and Habit_Forming pie chart (binary Yes/No).
49 Medicines Information Dataset (MID) Exploration in Jupyter (Third Project) Continued third project with Kaggle's MID dataset. Completed a scatter plot for Size vs Ratio from the counts dataset to analyze relationships and a bar plot for top 10 Action_Class values (excluding Not Specified) to explore drug mechanisms.
50 Medicines Information Dataset (MID) Exploration in Jupyter (Third Project) Continued third project with Kaggle's MID dataset. Completed a line plot for cumulative Therapeutic_Class sizes to track medicine accumulation and a violin plot for SideEffect text lengths to analyze distribution density.
51 Medicines Information Dataset (MID) Exploration in Jupyter (Third Project) Continued third project with Kaggle's MID dataset. Completed a boxen plot for HowToUse text lengths to analyze distribution, finalizing all visualizations and analysis. Transitioning to model building starting tomorrow.
52 Medicines Information Dataset (MID) Prediction in Jupyter (Third Project) Continued third project with Kaggle's MID dataset. Conducted model performance analysis for Logistic Regression, achieving ~98.2% accuracy. Noted strong performance on major classes (e.g., ANTI INFECTIVES, CARDIAC) and weaker results on minor classes (e.g., ANTI NEOPLASTIC, OTHERS) due to limited data. Misclassifications occurred between similar classes (e.g., GASTRO INTESTINA vs GASTRO INTESTINAL)
53 Medicines Information Dataset (MID) Created GitHub repository for the MID Therapeutic Class Prediction project. Uploaded images to Loading and analysis of data/, Visualization Images/, and Multi-Class Text Classification using Logistic Regression/ folders, documenting data exploration, charts (e.g., confusion matrix), and model training steps for the Logistic Regression model (~98.2% accuracy).
54 Created a GitHub repo for Sales Prediction Analysis with Linear Regression and XGBoost. Created a GitHub repo for Sales Prediction Analysis with Linear Regression and XGBoost. Uploaded notebook and images of data analysis, visualizations, and model training in the repository.
55 Updated Github repository for Nepal Climate Analysis Project This GitHub repo is for Nepal Climate Data Analytics Project which was my first project for this challenge. I worked with hdfs, PySpark and jupyter notebook for further analysis , visualization and modeling.
56 SQL Practice Solved few basic SQL questions in hacker rank, learned some theory and did some quizzes in SQLZoo.
57 Learning SQL theory Learned about keys, normalization and denormalization and practiced some quizzes
58 SQL Practice with Northwind Database Completed all easy, medium, and hard-level SQL questions using the Northwind dataset from sql-practice.com
59 Created a new GitHub repo — Data-Engineer-Interview-Preparation To document everything I’m learning and practicing in SQL, ETL, and data modeling.
60 SQL Practice Did medium, and hard-level SQL questions from sql-practice.com

🛠️ Tools Used

  • Ubuntu 24.04 (VMware)
  • Hadoop 3.3.6
  • Git & GitHub
  • Jupyter Notebook
  • YouTube (for video references)

🌟 Connect With Me

Feel free to reach out or follow along:


Let’s grow and stay consistent 🚀
#60DaysOfLearning2025 ✨

About

Welcome to my #60DaysOfLearning2025 challenge! This repository is a personal learning log where I document my daily progress as I explore and upskill in various areas of technology.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors