🚀 60 Days of Learning (2025 Edition)

Welcome to my #60DaysOfLearning2025 challenge!
This repository is a personal learning log where I document my daily progress as I explore and upskill in various areas of technology.

📅 Start Date:

June 1, 2025

🎯 Goal:

To stay consistent with learning and build hands-on skills in areas like:

Big Data (Hadoop, HDFS, Hive, PySpark etc.)
Data Engineering
Linux & Shell Scripting
SQL
Cloud Fundamentals (AWS/GCP)
Tools & Frameworks I encounter during my journey

📚 Daily Logs

Each day I will:

Practice or learn something new
Document the learnings in a markdown file (e.g., Day01.md)
Push relevant screenshots, notes, or code snippets
Share highlights on Twitter using hashtags like #60DaysOfLearning2025

🔖 Progress

I will update this section weekly with my learning summary.

Day	Topic	Summary
01	Setup Hadoop on VM	Installed Hadoop & Java 11, configured environment variables
02	HDFS Basics	Ran basic HDFS commands, understood roles of NameNode and DataNode
03	Hadoop Architecture	Studied YARN, HDFS structure and node responsibilities
04	Practicing HDFS	Created/uploaded/viewed files in HDFS, practiced `put`, `cat`, `ls`
05	Practicing HDFS	Created & viewed sample.txt with HDFS cmds: put, ls, mkdir, get, cat, tail
06	Read an article	A Comparative Evaluation of Apache Hadoop and Apache Spark
07	Read multiple article	On big Data Concepts and Analytics
08	Watched HDFS tutorial in YT	To stregthen my knowledge in HDFS before umping into Mapreduce
09	Learnt about MapReduce	Read about MapReduce from youtube tutorials and also implemented it's various usecases in my vm terinal
10	Read about hive	Read about Hive, it's archiecture, how it works
11	Installed Hive	Installed Apache Hive on my VM today as I dive deeper into the Big Data ecosystem
12	Namenode not working	Tried to fix namenode error issue, but couldn't solve it today
13	Read article	Read article on Case Study of Hive using Hadoop
14	Read SQL	Basic Terms and Table and Column Naming Rules
15	Read about SQL keys	Read about foreign, unique and primary keys in detail and made notes in notepad
16	Read about SQL check and default costraint	Learned about DEFAULT AND CHECK CONSTRAINT in sql and made notes in notepad
17	Practiced SQL queries	Revised and used some sql functions i hadn't used in a while
18	Practiced sql queries	Practiced easy to medium SQL queries
19	Read a research paper	Read a paper on expanding SQL learning beyond RDBMS to Big Data systems like MapReduce, NoSQL & NewSQL. Essential shift for modern data insights.
20	Read sql quizes	Practiced sql quizes trough learn sql app
21	Read sql functions	Revised aggregate, string, numeric, date functions today
22	Fixed HDFS issue	Spent the day fixing an HDFS issue caused by inconsistent NameNode/DataNode directories. Solved it by reformatting NameNode and clearing stale PID files
23	Practice queries in hive	Created a Hive database lspp23, a sales table, inserted rows, and wrote SQL queries to view sales, calculate revenue, analyze orders by region, and filter data by conditions like product type and quantity.
24	Installed spark and lerned queries	Installed Apache Spark 3.5.6 and ran my first RDD queries using sc.parallelize() and groupBy. Learned how Spark handles distributed data and transformations across nodes.
25	Practiced sql queries	Practiced medium hard SQL queries and revised my notes from previous days
26	Learned about pysaprk	Learning in detail about PySpark from datacamp course 'Introduction to PySpark DataFrames
27	Learned SQL Window fuctions	Learned about window functions , bit hard compared to other sql topics but an informative read
28	Hard level SQL Queies	Did some hard level SQL questions today, focusing on advanced CASE WHEN, subqueries, and grouping logic. Feeling more confident with daily practice.
29	Data Wareshouse	Studied Data Warehousing today: its need, components, types, characteristics, advantages, and disadvantages. Learned how it helps store huge data centrally for analysis and better decision-making.
30	Hard level SQL queries	Practiced hard SQL today with CASE WHEN, MOD for even-odd, JOINs, GROUP BY, HAVING, LAG window functions, ROUND, and ORDER BY custom sorting. These are levelling up my SQL skills.
31	Data Warehousing Concepts	Studied ETL, dimensional modelling, star & snowflake schemas, OLAP cubes, data marts, data integration, and governance to build strong fundamentals for data engineering.
32	Big Data Analytics	Downloaded Nepal daily climate dataset from Kaggle. Connected Spark to HDFS and YARN, explored dataset schema and columns like Temp_2m, MaxTemp_2m, MinTemp_2m, and WindSpeed_10m. Planned to upload to HDFS and perform analytics using Spark DataFrames tomorrow.
33	Big Data Project Setup	Set up PySpark environment for daily climate data analytics. Connected HDFS and YARN, created project directories in HDFS, uploaded the dataset, and practiced reading data with Spark for upcoming analysis. Faced pandas installation issues, will continue with pandas-based visualisation tomorrow.
34	Pandas Basics & Dataset Exploration	Faced pandas installation issues in PySpark, so switched to Jupyter Notebook for visualization. Loaded Nepal climate dataset, renamed columns, and explored it using pandas functions for structure, info, and initial analysis.
35	Visualised climate dataset using pandas + matplotlib + seaborn	Created histograms for temp and pressure, boxplots by district, and correlation heatmaps to analyse relationships between features. Strengthening my data analysis and EDA skills.
36	Climate Data Visualization	Performed advanced visualizations on Nepal climate data today. Converted date to datetime, extracted year, plotted average temperature trends over years, analysed district-wise temperatures post-1990, and explored wind speed distribution. Strengthening my data analysis and visualization skills with pandas, matplotlib, and seaborn.
37	Nepal Climate Data Analysis Project	Performed max temperature distribution analysis and calculated average humidity per district, identifying top and bottom humid regions for insights.
38	Nepal Climate Data Analysis Project	Implemented linear regression to predict precipitation, evaluated model (MSE: 25.6, R²: 0.29), visualized actual vs predicted values, and analyzed feature coefficients for insights.
39	Superstore Sales Data Project Analysis	Completed EDA, box plots, and styled tables on Superstore Sales dataset (2nd project) using pandas and Jupyter Notebook for insightful data visualization and analysis.
40	Superstore Sales Data Project Visualizations	Created 5 visualizations (Monthly Trend, Category Distribution, Region-Segment Sales, Day vs. Sales, Heatmap) (2nd project) using pandas and seaborn in Jupyter Notebook, analyzing sales patterns and trends.
41	Superstore Sales Data Project Visualizations	Added a bar chart of average sales by ship mode and a correlation heatmap (2nd project) using pandas and seaborn in Jupyter Notebook, exploring sales patterns and relationships.
42	Superstore Sales Data Project Visualizations	Added a horizontal bar chart of sales by sub-category, sales distribution by state, and a line plot of sales over time by category (2nd project) using pandas and matplotlib in Jupyter Notebook.
43	Superstore Sales Data Project Prediction using Linear Regression	Enhanced Superstore Sales prediction model using Order Month, Order Year, and encoded Category, Sub-Category, Region, Ship Mode, Segment. Analyzed scatter plot visualization to assess model fit.
44	Superstore Sales Data Project Prediction using Linear Regression and XGBoost	Enhanced sales prediction model with Linear Regression (MSE: 1873.5) and XGBoost (MSE: 14918.1) using new features like discount, competitor_price, and price_elasticity. Fixed randomness for stable results, visualized with a y=mx+c line, and identified two outliers.
45	Studied 'XGBoost vs Linear Regression blog	Studied https://medium.com/@heyamit10/xgboost-vs-linear-regression-a-practical-guide-aa09a68af12b blog to deepen my understanding. . Learning how complexity vs. simplicity impacts predictions!
46	Medicines Information Dataset (MID) Exploration in Jupyter (Third Project)	Started third project with Kaggle's MID dataset. Spent significant time selecting dataset, cleaned Therapeutic_class_counts.xlsx by dropping empty columns (Unnamed: 3, Unnamed: 4). Plan to explore MID dataset and perform deeper analysis from tomorrow.
47	Medicines Information Dataset (MID) Exploration in Jupyter (Third Project)	Continued third project with Kaggle's MID dataset. Cleaned MID by filling missing values (ProductIntroduction, HowToUse, HowWorks, Chemical_Class, Action_Class) with 'Not Specified'. Performed EDA: counted unique values in Therapeutic_Class, Chemical_Class, Habit_Forming, Action_Class. Visualized HowToUse with WordCloud and text length histogram, and top 10 therapeutic classes by Size with bar plot. Plan to explore more visualizations tomorrow.
48	Medicines Information Dataset (MID) Exploration in Jupyter (Third Project)	Continued third project with Kaggle's MID dataset. Created SideEffect WordCloud to identify common side effects, SideEffect text length histogram to analyze description complexity, and top 10 Chemical_Class bar plot to explore chemical compositions. Skipped ProductUses WordCloud due to messy text (e.g., null, HTML artifacts) and Habit_Forming pie chart (binary Yes/No).
49	Medicines Information Dataset (MID) Exploration in Jupyter (Third Project)	Continued third project with Kaggle's MID dataset. Completed a scatter plot for Size vs Ratio from the counts dataset to analyze relationships and a bar plot for top 10 Action_Class values (excluding Not Specified) to explore drug mechanisms.
50	Medicines Information Dataset (MID) Exploration in Jupyter (Third Project)	Continued third project with Kaggle's MID dataset. Completed a line plot for cumulative Therapeutic_Class sizes to track medicine accumulation and a violin plot for SideEffect text lengths to analyze distribution density.
51	Medicines Information Dataset (MID) Exploration in Jupyter (Third Project)	Continued third project with Kaggle's MID dataset. Completed a boxen plot for HowToUse text lengths to analyze distribution, finalizing all visualizations and analysis. Transitioning to model building starting tomorrow.
52	Medicines Information Dataset (MID) Prediction in Jupyter (Third Project)	Continued third project with Kaggle's MID dataset. Conducted model performance analysis for Logistic Regression, achieving ~98.2% accuracy. Noted strong performance on major classes (e.g., ANTI INFECTIVES, CARDIAC) and weaker results on minor classes (e.g., ANTI NEOPLASTIC, OTHERS) due to limited data. Misclassifications occurred between similar classes (e.g., GASTRO INTESTINA vs GASTRO INTESTINAL)
53	Medicines Information Dataset (MID)	Created GitHub repository for the MID Therapeutic Class Prediction project. Uploaded images to Loading and analysis of data/, Visualization Images/, and Multi-Class Text Classification using Logistic Regression/ folders, documenting data exploration, charts (e.g., confusion matrix), and model training steps for the Logistic Regression model (~98.2% accuracy).
54	Created a GitHub repo for Sales Prediction Analysis with Linear Regression and XGBoost.	Created a GitHub repo for Sales Prediction Analysis with Linear Regression and XGBoost. Uploaded notebook and images of data analysis, visualizations, and model training in the repository.
55	Updated Github repository for Nepal Climate Analysis Project	This GitHub repo is for Nepal Climate Data Analytics Project which was my first project for this challenge. I worked with hdfs, PySpark and jupyter notebook for further analysis , visualization and modeling.
56	SQL Practice	Solved few basic SQL questions in hacker rank, learned some theory and did some quizzes in SQLZoo.
57	Learning SQL theory	Learned about keys, normalization and denormalization and practiced some quizzes
58	SQL Practice with Northwind Database	Completed all easy, medium, and hard-level SQL questions using the Northwind dataset from sql-practice.com
59	Created a new GitHub repo — Data-Engineer-Interview-Preparation	To document everything I’m learning and practicing in SQL, ETL, and data modeling.
60	SQL Practice	Did medium, and hard-level SQL questions from sql-practice.com

🛠️ Tools Used

Ubuntu 24.04 (VMware)
Hadoop 3.3.6
Git & GitHub
Jupyter Notebook
YouTube (for video references)

🌟 Connect With Me

Feel free to reach out or follow along:

🐦 Twitter: https://x.com/itspriibhatta
💼 LinkedIn: https://www.linkedin.com/in/priyanka-bhatta/

Let’s grow and stay consistent 🚀
#60DaysOfLearning2025 ✨

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 60 Days of Learning (2025 Edition)

📅 Start Date:

🎯 Goal:

📚 Daily Logs

🔖 Progress

🛠️ Tools Used

🌟 Connect With Me

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

🚀 60 Days of Learning (2025 Edition)

📅 Start Date:

🎯 Goal:

📚 Daily Logs

🔖 Progress

🛠️ Tools Used

🌟 Connect With Me

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages