-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathplotnine_contest.qmd
More file actions
171 lines (132 loc) · 4.48 KB
/
plotnine_contest.qmd
File metadata and controls
171 lines (132 loc) · 4.48 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
---
title: "Plotnine Contest"
format:
html:
toc: true
embed-resources: true
other-links:
- text: The Himalayan Database
href: https://www.himalayandatabase.com/
- text: Tidy Tuesday Project
href: https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-09-22/readme.md
---
# Eight-thousanders in Nepal
For my submission, I use data from [The Himalayan Database](https://www.himalayandatabase.com/). I found the csv in the [Tidy Tuesday Project](https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-09-22/readme.md).
# Data wrangling
```{python}
from plotnine import *
import geopandas
import geocoder
import pandas as pd
from siuba import *
```
## Filter the relevant mountains
There are [14 eight-thousanders](https://en.wikipedia.org/wiki/Eight-thousander) in the world. Eight of them are (partially) in Nepal and their ascents are well-documented in The Himalayan Database. Those mountains will be the focus of my analysis and data visualization.
```{python}
# Import the csv from the Tidy Tuesday project
members = pd.read_csv(
"https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-09-22/members.csv"
)
peaks = pd.read_csv(
"https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-09-22/peaks.csv"
)
```
```{python}
# Filter for the eight-thousanders
df01 = peaks >> filter(_.height_metres > 8000)
# I want the ascents on the individual level (not on expedition-level)
df02 = inner_join(df01, members, by=["peak_id", "peak_name"])
```
```{python}
# I only want the main peaks of the eight-thousanders (not the subsidiary peaks)
main_peaks = (
df02
>> count(_.peak_name)
>> filter(_.n > 1000)
)
```
```{python}
# I want to focus on the more recent years when there were coming more and more big commercial expeditions
df03 = (
df02
>> filter(_.peak_name.isin(main_peaks["peak_name"]))
>> filter(_.year >= 2009)
)
```
```{python}
# I am interested in the success differentiated by whether oxygen was used or not
oxygen_use_sucess_01 = (
df03
>> count(_.peak_name, _.oxygen_used, _.success)
>> mutate(peak_name = _.peak_name.str.replace("I", ""))
)
```
## Geocoding
```{python}
# I need the latitude and longitude of the mountains
lat = []
lng = []
for peak in oxygen_use_sucess_01["peak_name"]:
latlng = geocoder.arcgis(peak).latlng
lat.append(latlng[0])
lng.append(latlng[1])
oxygen_use_sucess_01["lat"] = lat
oxygen_use_sucess_01["lng"] = lng
```
```{python}
# Move some mountains a little bit on the map so that they do not overlap that much
oxygen_use_sucess_02 = (
oxygen_use_sucess_01
>> mutate(lng=case_when({
_.peak_name == "Makalu": _.lng + 0.15,
_.peak_name == "Lhotse": _.lng + 0.15,
_.peak_name == "Cho Oyu": _.lng - 0.15,
True: _.lng,
}))
>> rename(Oxygen = _.oxygen_used, Success = _.success)
)
```
## Geopandas
```{python}
# Load the world shapefile data
world = geopandas.read_file(
"https://github.com/geopandas/geopandas/raw/v0.9.0/geopandas/datasets/naturalearth_lowres/naturalearth_lowres.shp"
)
# Filter for Nepal
nepal = world >> filter(_.name == "Nepal")
```
# Creating the plot
```{python}
p = (
ggplot(oxygen_use_sucess_02)
+ geom_map(nepal, fill="#d0d0d0", size=0)
+ geom_point(aes(x="lng", y="lat", size="n", color="peak_name"))
+ facet_wrap(["Oxygen", "Success"], labeller="label_both")
+ coord_fixed()
+ theme_void()
+ labs(
size="Number of climbers",
color="Mountain",
title="Success on > 8,000 m peaks in Nepal without bottled oxygen is rare",
subtitle="Everest has only been summited 57 times without bottled oxygen (and 5,903 times using bottled oxygen)",
caption="Data: The Himalayan Database\nYears: 2009 - 2019",
)
+ theme(
plot_background=element_rect(fill="#EBF2FF"),
legend_position="bottom",
plot_title=element_text(ha="center", size=19),
plot_subtitle=element_text(ha="center"),
)
+ scale_size_continuous(
range=[1, 20],
breaks=[50, 100, 1000, 3000, 5000],
labels=["50", "100", "1,000", "3,000", "5,000"],
)
)
```
```{python}
#p.save("mathis_plotnine_contest.png", height = 8, width = 10, dpi=600)
```
# My submission
The plot can be found in @fig-submission. **For the best viewing experience, open it in an external tab.**
{#fig-submission}