Reproducible-Research-Using-R/index.qmd at main · martinezc1/Reproducible-Research-Using-R · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
# About {.unnumbered}

This book was inspired by my students—particularly graduate students at Brooklyn College who wanted to ask meaningful research questions but felt held back by tools that were opaque, brittle, or difficult to reproduce. Over time, it became clear that learning *R* was not just about writing code; it was about learning how to think clearly, document decisions, and produce work that others (and your future self) can understand and reuse.

**Reproducible Research in R** is an open educational resource (OER) designed to help students, researchers, and practitioners build confidence using R as a complete research tool—from data import and visualization to statistical analysis and polished reporting. The goal of this book is not just to teach *what* buttons to press or *which* functions to call, but to teach a **workflow** that is transparent, explainable, and most importantly, reproducible.

This book was created in conjunction with the Open Educational Resources initiative at Brooklyn College and is freely available for learning, teaching, and adaptation.

## What You’ll Learn {#sec-what-learn}

By working through this book, you will learn how to:

-   Use R and RStudio as an integrated research environment\
-   Import, clean, and explore data using modern R tools\
-   Visualize data clearly and intentionally\
-   Conduct common statistical analyses used in applied research\
-   Interpret results in context—not just report numbers\
-   Create fully reproducible reports using R Markdown and Bookdown

Throughout the book, reproducibility is treated not as an “extra” or an advanced topic, but as a **default practice**.

## What You Should Know First {#sec-what-to-know}

You do *not* need prior experience with:

-   R
-   Programming
-   Command-line tools

Basic familiarity with research methods and statistics is helpful, but the focus of this book is on **implementation and workflow**, not statistical theory.

## What This Book Does Not Cover {#sec-what-it-covers}

This book is not intended to be:

-   A comprehensive statistics theory textbook\
-   A software engineering or computer science text\
-   An advanced machine learning or big-data resource

Instead, it focuses on the tools and practices most commonly needed to conduct and present reproducible research in applied settings.

# How to Use This Book {#sec-how-use}

This book is designed to be flexible. You can read it cover-to-cover, jump directly to specific chapters, or use it as a reference alongside your own projects.

## Chapter Anatomy {#sec-chapter-anatomy}

The breakdown of the book is as follows:

-   Part I: Foundations
    -   Getting Started with R
    -   Working with Data Using the tidyverse
    -   Data Visualization with ggplot2
-   Part II: Making Comparisons
    -   Comparing Two Groups: Data Wrangling, Visualization, and t-Tests
    -   Comparing Multiple Means
    -   Analyzing Categorical Data
-   Part III: Relationships and Modeling
    -   Correlation
    -   Linear Regression
    -   Logistic Regression
-   Part IV: Reproducible Communication
    -   Reproducible Reporting with R Markdown

Most chapters follow a consistent structure:

-   Conceptual explanation of *why* a tool or method is useful\
-   Step-by-step code examples\
-   Visualizations and outputs\
-   Interpretation and best practices\
-   A checklist to reinforce reproducible habits

This repetition is intentional. Consistency helps build intuition.

## Code, Data, and Reproducibility {#sec-code-etc}

All code in this book is meant to be **run**, **modified**, and occasionally **broken**. Learning happens when you experiment.

::: callout-tip
## As my father always says:

That's why they put erasers on pencils
:::

### The reproresearchR Package {#sec-reproresearchR-package}

The datasets used throughout this book are provided in the companion R package `reproresearchR`, allowing readers to load data directly into R without manually downloading files. This ensures that all examples run the same way for everyone. All figures, tables, and analyses in this book are generated directly from code—never copied and pasted from external software—so that every result is fully reproducible. Chapters @sec-intro-cat-analysis, @sec-intro-cor, and @sec-linear-reg all use data from the pakage.

The `reproresearchR` package also includes two versions of each chapter’s R script:

-   Full Script: the complete code used to generate all analyses and figures in the chapter.
-   Helper Script: a partially completed script with key sections removed, allowing you to work along with the textbook by filling in the missing code.

Once R and the `reproresearchR` package are installed (with RStudio recommended), readers have everything they need to follow along and successfully complete the analyses in this textbook.

### R and RStudio

If it isn't already evident, this textbook is about the programming language R. If you want to work with R on your computer, it is suggested that you have both:

1.  [R](https://www.r-project.org/)
2.  [RStudio](https://posit.co/downloads/)

If you are not able to download both on your computer but still want to learn/use R, you can use the online version [here](https://posit.cloud/content/yours?sort=name_asc).

## The NYC Open Data Student Gallery {#sec-nyc-gallery}

This textbook is part of a broader reproducible research initiative built around real civic data. Students at Brooklyn College used the workflow outlined in this book to conduct original research projects using datasets from [New York City’s Open Data portal](https://opendata.cityofnewyork.us/).

Each student developed a fully reproducible analysis in R, compiled their work into a structured report, and produced a final research project grounded in real public data. These projects were then assembled into a collective publication: the NYC Open Data Student Gallery.

The Gallery showcases student research on topics ranging from public health and environmental conditions to social infrastructure and urban policy—demonstrating that reproducibility is not just a technical skill, but a tool for meaningful civic inquiry.

You can explore the full collection of student projects here: [**NYC Open Data Student Gallery**](https://martinezc1-nyc-open-data-student-gallery.share.connect.posit.cloud/)

The Gallery reflects what is possible when reproducibility, open data, and structured workflows are integrated into the classroom from day one.

## Acknowledgments {#sec-acknowledgments}

This book would not exist without the curiosity, questions, and persistence of students at Brooklyn College. Their willingness to wrestle with messy data and imperfect code shaped both the content and the tone of this text.

Additional thanks go to the Open Educational Resources team at Brooklyn College and to the broader R community, whose commitment to open tools and shared knowledge makes projects like this possible.

## License

This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).

You are free to:

-   Share — copy and redistribute the material
-   Adapt — remix, transform, and build upon the material

Under the following terms:

-   Attribution — You must give appropriate credit.

[Full license](https://creativecommons.org/licenses/by/4.0/)

## How to Cite

If you use this textbook in your teaching, research, or projects, please cite it as:

Martinez, C. (2026). *Reproducible Research Using R* (Version 1.0.0). Zenodo. https://doi.org/10.5281/zenodo.19136755

### Version History

#### v1.0.0 (2026)

-   Initial public release
-   Full textbook curriculum covering data analysis, visualization, and modeling in R
-   Used in Fall 2025 coursework at Brooklyn College (CUNY)