So you’ve finally assembled or gotten your hands on a dataset or spreadsheet. Nice work! Not sure what to do next? These materials form a workshop on data organization, manipulation, and analysis for beginners, using the dplyr package in R.
This workshop was originally offered during the Fall 2018 semester and again in Fall 2019 at Rutgers University-New Brunswick through the New Brunswick Libraries Graduate Specialists program and the Rutgers DH Initiative.
Data101.Rmd Master .Rmd file for user participation in the workshop, to run or edit code as desired; used to generate .html file.
Data101.html This .html file is best for viewing the workshop or following along outside of an R or RStudio environment; it contains all code as well as sample outputs and figures. Click here to view in-browser without downloading.
transaction.csv and book.csv Default sample data from the from the What Middletown Read Project used in the workshop - though users are welcome to use their own data saved in .csv format. Downloadable from the What Middletown Read Project website with a blank advanced search.
If you’re new to R, you’ll need to set up access in one of two ways first. The standard method is to download R itself for whichever operating system you're using and then download RStudio, an Integrated Development Environment (IDE) that makes working in R clearer by adding a text editor for writing or loading R code and a workspace for viewing data in memory.
If you want a fast, easy way to give R a try with a stable internet connection, you can use RStudio Cloud. This browser version of RStudio looks and functions just like the desktop version, and it saves data between sessions too. All it takes is a Google account to log on. You can even clone this workshop repository directly into your RStudio Cloud workspace by clicking the arrow next to the “New Project” button, selecting “New Project from Git Repo,” and then pasting the url to this page.