This project was done in collaboration with Ocean Science Analytics. For more information, please check this post on Waveform Analytics' website.
This repository contains the code used to prepare the data and to build the dashboard. The data preparation was done using Python, and the dashboard was build using R Shiny.
This project utilizes a comprehensive dataset that includes fish annotations, acoustic indices, and environmental data. The data is stored in a DuckDB database and CSV files, and is linked through an R Shiny dashboard for analysis and visualization.
-
DuckDB Database:
- The main data source is a DuckDB database (
mbon11.duckdb) containing several tables related to fish annotations, acoustic indices, and seascaper data.
- The main data source is a DuckDB database (
-
CSV Files:
- Additional data is sourced from CSV files, including:
- Index Categories: Updated index categories for analysis (
Updated_Index_Categories_v2.csv). - Site Information: Information about different sites where data was collected (
BioSound_Datasets.csv).
- Index Categories: Updated index categories for analysis (
- Additional data is sourced from CSV files, including:
-
Fish Annotations:
- Data related to fish presence and annotations from different locations (e.g., Key West, May River).
- Tables:
t_fish_keywest,t_fish_mayriver.
-
Acoustic Indices:
- Acoustic indices data that includes various metrics related to sound recordings.
- Tables:
t_aco2,t_aco_norm2.
-
Seascaper Data:
- Data from the Seascaper tool that relates to environmental data and water classes.
- Table:
t_seascaper.
-
R Data Frames:
- Data is manipulated and analyzed using R data frames created from the DuckDB tables and CSV files.
-
Pandas DataFrames:
- In Python, data is handled using Pandas DataFrames, especially in the
data_wrangler.pyandtidy_biosound_data.ipynbfiles.
- In Python, data is handled using Pandas DataFrames, especially in the
-
Jupyter Notebook:
- The
tidy_biosound_data.ipynbfile is a Jupyter notebook that contains code for data preparation and analysis, including merging and cleaning data.
- The
-
R Shiny Dashboard:
- The R Shiny dashboard includes multiple tabs for visualizing data, including:
- Time series plots with annotations.
- Boxplots comparing index values by species.
- Heatmaps showing relationships between acoustic indices and water classes.
- Download options for generated plots and data.
- The R Shiny dashboard includes multiple tabs for visualizing data, including:
-
Plotting Libraries:
- Libraries such as
ggplot2,dygraphs, andplotlyare used for creating visualizations in the R Shiny application.
- Libraries such as
-
Data Wrangling:
- Functions in
data_wrangler.pyare used to prepare and normalize data, handle annotations, and combine datasets.
- Functions in
-
Normalization:
- The
normalize_dffunction normalizes acoustic indices to a range between -1 and 1.
- The
-
Annotation Preparation:
- Functions like
annotation_prep_kw_styleandannotation_prep_mr_styleare used to prepare annotations for Key West and May River datasets, respectively.
- Functions like
