This assignment focuses on using the Titanic dataset (available in the Seaborn library) to extract meaningful insights, emphasizing histograms and KDE plots. Instead of just plotting graphs, the focus is on interpreting the results, asking thoughtful questions, and connecting findings to real-world scenarios.
You are required to perform a detailed analysis of the Titanic dataset using histograms and KDE plots to uncover insights. Use visualizations to support your claims, but primarily focus on interpreting the data.
The Titanic dataset can be accessed from the Seaborn library. Use the following code to load the dataset:
import seaborn as sns
import pandas as pd
# Load Titanic dataset
titanic = sns.load_dataset('titanic')
# View the first few rows of the dataset
titanic.head()- Task 1: Dataset Overview
- Briefly describe the Titanic dataset. What kind of information is contained in each column?
Deliverable: Write a paragraph summarizing the dataset based on your exploration.
- Briefly describe the Titanic dataset. What kind of information is contained in each column?
-
Task 2: Fare Distribution
- Plot a histogram and KDE for the distribution of fares paid by passengers.
Question: What trends or outliers do you observe in the fare distribution? How might fare impact survival rates?
- Plot a histogram and KDE for the distribution of fares paid by passengers.
-
Task 3: Fare and Survival Rates
- Create a KDE plot to compare the fare distributions for survivors and non-survivors.
Question: Does fare seem to have influenced survival? What do the distributions suggest about socioeconomic status and survival?
- Create a KDE plot to compare the fare distributions for survivors and non-survivors.
- Task 4: Passenger Class, Fare, and Survival
- Create histograms and KDE plots to compare fare distributions across different passenger classes (
Pclass) and analyze survival rates for each class.
Question: What patterns do you observe in fare distribution and survival rates across different classes? How does this reflect the socioeconomic divisions on the Titanic?
- Create histograms and KDE plots to compare fare distributions across different passenger classes (
-
Task 5: Age of Survivors and Non-Survivors
- Create KDE plots to compare the age distributions of survivors and non-survivors.
Question: What trends do you notice in the age distribution of survivors compared to those who did not survive? How might age have impacted survival rates?
- Create KDE plots to compare the age distributions of survivors and non-survivors.
-
Task 6: Pyramid Histogram for Age Survival by Sex
- Create a pyramid-style histogram (also known as a population pyramid) for age, showing survival data split by sex. If possible, attempt to find or create one.
Question: What can you infer from the age and sex distribution in relation to survival rates?
- Create a pyramid-style histogram (also known as a population pyramid) for age, showing survival data split by sex. If possible, attempt to find or create one.
- Task 7 (Optional): Bar Graph, Line Plot, and Scatter Plot
- Create a bar graph, line plot, and scatter plot using the Titanic dataset.
Suggestions: You can explore variables like survival rates, passenger class, age, or fare in these plots. Question: What additional insights do these visualizations provide that weren't evident from the histograms and KDE plots?
- Emphasis on histograms and KDE: Ensure all analyses use histograms and KDE where appropriate.
- Critical thinking: Think critically about the dataset’s limitations, potential biases, and how these factors might influence conclusions.
- Historical context: Tie findings to the historical context of the Titanic, particularly socioeconomic factors and family structures.
- Work in Google Colab for this assignment.
- After completing your work, save your file and link it to GitHub.
- Post the GitHub link of your work into MS Teams as your final submission.
- Ensure visualizations include brief descriptions and answers to each question.
- Your code must be well-commented and clean.