rstudio:conf 2020 recap

rstudio:conf 2020

rstudio:conf took place January 27th through January 30th, 2020 in San Francisco, CA. I was fortunate to be able to attend a workshop at the conference titled “What They Forgot to Teach You About R,” taught by Jenny Bryan, Kara Woo, and Jim Hester. Read on for a re-cap of what I learned at the workshop and interesting talks from the conference.

Resources

All slides and practice sets from the workshop (abbreviated WTF) are available online here

Helpful paper about achievable best practices in scientific computing: Good Enough Practices in Scientific Computing

More about getting started with git: Happy Git with R

Purrr tutorial

Workshop recap

The two-day WTF workshop was incredibly informative. Topics covered included organizational tips, debugging, Git/Github, package versions, reproducible environments, and Purrr. Below I will highlight some of the best practices covered in the workshop.

Organizational tips (file paths, dates, file organization)

Tips are taken from the slides for the first session of the workshop: project-oriented workflows

  • File paths in your scripts should be relative to a stable base and should use file system functions (not paste() or strsplit()).
  • Packages that can help with defining paths are here and fs. Below is an example of how to use here() and why it is useful:

When you load here, the package notes what directory your project is in, and reports that directory to your console:

library(here)
## here() starts at /Users/saraharcos/Desktop/r_fridays

To build a path to a file using here, separate each directory and subdirectory in the path into individual strings, starting with the project directory. Two of the calls to read_tsv() below do the exact same thing, while the call using a relative path breaks

library(readr)

#Using an absolute path
read_tsv("/Users/saraharcos/Desktop/r_fridays/data/andrew_081619_WT_noise.tsv")
## # A tibble: 22 x 2
##       NP   ani
##    <dbl> <dbl>
##  1 3995.   256
##  2 3000.   254
##  3 2251.   250
##  4 1690.   249
##  5 1265.   243
##  6  952.   224
##  7  710.   225
##  8  535.   212
##  9  396.   172
## 10  304.   157
## # … with 12 more rows
#Using a relative path
read_tsv("data/andrew_081619_WT_noise.tsv")
## Error: 'data/andrew_081619_WT_noise.tsv' does not exist in current working directory ('/Users/saraharcos/Desktop/r_fridays/content/post').
#Using here
read_tsv(here("data", "andrew_081619_WT_noise.tsv"))
## # A tibble: 22 x 2
##       NP   ani
##    <dbl> <dbl>
##  1 3995.   256
##  2 3000.   254
##  3 2251.   250
##  4 1690.   249
##  5 1265.   243
##  6  952.   224
##  7  710.   225
##  8  535.   212
##  9  396.   172
## 10  304.   157
## # … with 12 more rows

Why did the relative path break? It turns out that this Rmarkdown document is actually located in a subfolder of this R project. So the relative path to the “data” folder containing andrew’s data is actually ../../data/andrew_081619_WT_noise.tsv. But the here() function works because it is based in the main project directory, so as long as you are within the same project, then the path to any file in any subdirectory is always the same using here(), even if your script is located within a subdirectory.

A second reason to use here() is because the absolute path can be different on different operating systems. For example, if you construct your absolute paths using the shortcut shown below, then your script may work on your computer, but it won’t necessarily work on a Windows machine.

read_tsv("~/Desktop/r_fridays/data/andrew_081619_WT_noise.tsv")
## Parsed with column specification:
## cols(
##   NP = col_double(),
##   ani = col_double()
## )
## # A tibble: 22 x 2
##       NP   ani
##    <dbl> <dbl>
##  1 3995.   256
##  2 3000.   254
##  3 2251.   250
##  4 1690.   249
##  5 1265.   243
##  6  952.   224
##  7  710.   225
##  8  535.   212
##  9  396.   172
## 10  304.   157
## # … with 12 more rows
  • Dates: There is an international standard for reporting dates called ISO 8601. The format for ISO 8601 is YYYY-MM-DD. It is good to use this system when recording dates in file names so that they can be understood by people in all nations.