-
Notifications
You must be signed in to change notification settings - Fork 0
Database Build Plan
The earthquake data is currently consolidated in a single CSV file with 19 columns. Based on the proposed ERD, we will decompose this singular table into seven distinct tables, namely: City, Location, Country, Continent, EarthquakeDetails, MagType, and Network.
Using the CREATE TABLE command in SQL, I'll initiate the creation of each table.
To respect the foreign key constraints, the table creation will follow this sequence: Continent, Country, City, Location, MagnitudeType, Network, and finally EarthquakeDetails.
Example: Creation of the Location table
CREATE TABLE Location (
location_id INT PRIMARY KEY,
latitude FLOAT NOT NULL,
longitude FLOAT NOT NULL,
city_id INT,
FOREIGN KEY (city_id) REFERENCES City(city_id)
);The other tables will be created in a similar manner following the ERD illustrated in the previous page. After creating the schemas, I will import and clean up the data as described below in the Data Input Methods section. I then plan to populate the tables using INSERT INTO commands. Below are some examples of how I plan to populate the tables. I have not shown all the code, but a general structure I will follow.
Example: Populating the Continents Table
INSERT INTO Continent (continent_name)
SELECT DISTINCT continent_name
FROM cleaned_table;Example: Populating the Countries Table
INSERT INTO Countries (country_name, continent_id)
SELECT t.country_name, c.continent_id
FROM cleaned_table as t
JOIN Continent as c ON t.continent_name = c.continent_name;In the above query, I am assuming that after cleaning the database, my original table will have a column with the country_name and the continent_name. Thus, I can compute a JOIN and select the continent_id which will be populated in my Countries table. A similar approach will be used in populating the City table.
Example: Populating the Location Table
INSERT INTO Location (latitutde, longitude, city_id)
SELECT t.latitutde, t.longitude, c.city_id
FROM cleaned_table as t
JOIN City as c ON t.city_name = c.city_name;Populating the EarthquakeDetails table will mostly follow from inserting the columns already present in the original table.
Before the import process, I'll ensure all characters in the dataset are ASCII compatible. Special characters like á or é will be normalized to ensure a smooth data import. I noticed these special characters appear in the dataset in reference to location information.
The original dataset consists of one CSV file. The CSV dataset will be imported into MySQL Workbench utilizing the Table Data Import Wizard tool found in Workbench. I will then proceed to clean this table and then split the data across the 7 distinct tables outlined in my plan.
I'll introduce an event-id column as a primary key. The title attribute from the original dataset contains distance and direction information. This will be parsed to retrieve specific data. The date-time attribute will be split to extract specific day, month, and year of each earthquake event. In SQL, there seems to be specific functions e.g. DAY() that help with this task.
Example: Creating Day, Month, Year columns
ALTER TABLE cleaned_table
ADD COLUMN day INT,
ADD COLUMN month INT,
ADD COLUMN year INT;UPDATE cleaned_table
SET day = DAY(date_time),
month = MONTH(date_time),
year = YEAR(date_time);To fill in missing country and continent information, I'll parse the location column to determine event countries. Entries that have NULL values for location will be removed from the dataset. This is because important information like city impacted will be missing. From an initial review of the dataset, most rows have non-NULL values for this attribute.
To associate each country with its respective continent, I'll utilize a publicly available dataset that maps countries to continents. One such dataset can be found here.
By extracting all of this data, I will then be able to populate my newly created tables with SELECT, JOIN and INSERT INTO SQL commands as described in the section 1. Creation of Schema/Tables.