Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions R/download_geospatial.R
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,9 @@ download_geospatial <- function(spatial_level, simplified = TRUE,
if (simplified) {
dataset_path <- sub("\\.gpkg$", "_SIM.gpkg", dataset_path)
}

message("This might take a while")

geospatial_data <- sf::st_read(dataset_path, quiet = TRUE)
geospatial_vars <- c("area", "latitud", "longitud")
shape_vars <- c("shape_length", "shape_area")
Expand Down
4 changes: 4 additions & 0 deletions _pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,10 @@ articles:
navbar: Merge data
contents:
- merge_geo_demographic
- title: Design Principles
navbar: Design Principles
contents:
- design_principles

reference:
- title: Demographic Data
Expand Down
2 changes: 2 additions & 0 deletions tests/testthat/_snaps/download_geospatial.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
download_geospatial(spatial_level = "department", simplified = TRUE,
include_geom = TRUE, include_cnpv = FALSE)
Message
This might take a while
ColOpenData provides open data derived from Departamento Administrativo
Nacional de Estadística (DANE), and Instituto de Hidrología,
Meteorología y Estudios Ambientales (IDEAM) but with modifications for
Expand Down Expand Up @@ -51,6 +52,7 @@
download_geospatial(spatial_level = "dpto", simplified = TRUE, include_geom = FALSE,
include_cnpv = TRUE)
Message
This might take a while
ColOpenData provides open data derived from Departamento Administrativo
Nacional de Estadística (DANE), and Instituto de Hidrología,
Meteorología y Estudios Ambientales (IDEAM) but with modifications for
Expand Down
73 changes: 73 additions & 0 deletions vignettes/design_principles.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
---
title: "Design Principles for {ColOpenData}"
output:
rmarkdown::html_vignette:
df_print: tibble
vignette: >
%\VignetteIndexEntry{Population Projections with ColOpenData}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
eval = identical(tolower(Sys.getenv("NOT_CRAN")), "true")
)
```

::: {style="text-align: justify;"}
This vignette outlines the design decisions that have been taken during the development of the ColOpenData R package, and provides some of the reasoning, and possible pros and cons of each decision.

This document is primarily intended to be read by those interested in understanding the code within the package and for potential package contributors.
:::

## Scope
::: {style="text-align: justify;"}
ColOpenData provides an easy access to Colombian Open Government Data from two main data sources: Departamento Administrativo Nacional de Estadística (DANE), and Instituto de Hidrología, Meteorología y Estudios Ambientales (IDEAM). These include Demographic, Geospatial, Climate, and Population Projections' datasets. The package centralizes information an provides harmonized and cleaned datasets for easier data analysis.

In addition, the package includes a series of documentation and dictionaries to help the user navigate the open datasets available. Further information regarding included documentation, naming conventions, and uses can be found here: [Documentation and Datasets](https://epiverse-trace.github.io/ColOpenData/articles/documentation_and_dictionaries.html)
:::

## Output

::: {style="text-align: justify;"}
The output for each function is a `data.frame` with the requested data. When downloading geospatial data, which includes maps, an `sf` object is created.
:::

## Design Decisions

::: {style="text-align: justify;"}
A manual cleaning process was necessary to address issues like inconsistent headers in the original datasets, while harmonization ensured alignment with tidy data standards and national naming conventions. To avoid repeating these steps at each download, and ensure stable and independent access to open data, the already processed datasets were stored on a private server for later redistribution. The datasets list and dictionaries are stored inside the package data, to avoid unnecessary downloads of frequently consulted datasets.

Additional particularities include:

- The datasets are only available in the original language (Spanish). However, dictionaries and dataset lists are available in English and Spanish.
- Geospatial datasets include very detailed and segregated structures (down to block level). To avoid extremely long download times, a simplified version was included for all the maps, allowing the users to select either of them according to their needs.
- Climate datasets are stored as they were acquired from the source, so filtering and aggregation occurs during each user's request.
- Dictionaries are only provided for geospatial data, since they were the only ones originally provided from the source.
:::

## Dependencies

::: {style="text-align: justify;"}
The package dependencies are:

- `{checkmate}`
- `{config}`
- `{dplyr}`
- `{magrittr}`
- `{rlang}`
- `{sf}`
- `{stringdist}`
- `{tidyr}`
- `{utils}`

:::

## Additional Considerations

::: {style="text-align: justify;"}
The datasets included in the package contain changes which may alter the structure, format, or content, meaning the data does not reflect the official dataset. The package is developed independently, with no endorsement or involvement from these institutions or any Colombian government body. The authors of ColOpenData are not liable for how users utilize the data, and users are responsible for any outcomes from their use or analysis of the data.
:::
Loading