R fridays: Introduction

R Basics

What is R?

  • R is a programming language that is commonly used for analysis of biological data. R can help you analyze, visualize, and perform statistics with your data.

Why not excel/prism/etc?

Much of the basic data analysis that we do in R can also be done in excel or prism. Yet many scientists prefer R. Here are a few reasons I like R:

  • R facilitates reproducible data analysis. When the steps you take to manipulate and analyze your data are recorded in an R script, these steps can be more easily reproduced on a different day, by a different person, and with a different computer.

  • R saves time. If you set up your Rscript well, you can analyze future data in just a few keystrokes. With excel/prism, data analysis is harder to automate- which also makes it more prone to errors in copy/pasting or picking the wrong settings.

  • More complex data analysis and visualization (high-throughput sequencing, proteomics, etc.) cannot be done efficiently with excel or prism. Becoming familiar with using R to analyze simpler data will prepare you for handling more complex datasets.

The Basics: Hello World

Let’s start by opening an R session and printing “Hello World”:

  1. Open a command line (terminal) window on your computer
  2. To start an R session, type R and hit enter.

Did you get an error? You may need to install R.

Last login: Fri Jul  5 10:45:29 on ttys000
Hello, Sarah!
>> R

R version 3.5.3 (2019-03-11) -- "Great Truth"
Copyright (C) 2019 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin15.6.0 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
  1. Then type: print("Hello World")
> print("Hello World")
[1] "Hello World"
>
  1. To quit the R session, type quit() and then n
> quit()
Save workspace image? [y/n/c]: n
>>

Running an Rscript

You can run R commands by starting an R session like before, but to save your work so you can reproduce it later, you can put the R commands into an Rscript.

  1. Before we start making scripts, let’s create a directory (folder) to store them.
    • On your command line, type mkdir r-fridays. This will make a directory called r-fridays in your home directory.
    • Then type cd r-fridays to enter that directory. Now when we make an Rscript, it will be saved to your r-fridays directory.

New to the command line? Not to worry! There are plenty of great online tutorials about command line basics.

>> mkdir r-fridays
>> cd r-fridays
  1. From the command line, type vim helloworld.r. This will open the vim text editor in your command line window and create a file (script) named helloworld.r.

vim is a powerful text editor- that we won’t be covering in detail in R Fridays. Today we are just using it to make a simple Rscript

>> vim helloworld.r
  1. From here, hit the i key on your keyboard to insert text. Then type print("Hello World").
  2. Hit the esc key on your keyboard to escape the insert mode. Then type :wq to quit vim and save changes to the helloworld.r file.
  3. After typing :wq, your window should return to the normal command line.
  4. Let’s check that vim saved your changes. To see the contents of helloworld.r, type cat helloworld.r
>> cat helloworld.r
print("Hello World")
>>
  1. To run the helloworld.r Rscript that we just made, type Rscript helloworld.r
>> Rscript helloworld.R
[1] "Hello World"
>>

Vim is nice.. but RStudio is awesome

RStudio is an integrated development environment (IDE) that simplifies writing and running code in R, like the workflow we went through above: writing an RScript and running the commands inside the script.

Let’s open RStudio and run some basic arithmetic commands.

If you don’t have RStudio, you can download the desktop version here.

  1. Open RStudio
  2. In the console window (bottom left) type: 2+2
2+2
## [1] 4
  1. Play around with other basic arithmetic in the console. Try +, -, /, and *.
  2. In R, you can “assign” a value to a variable using the <- command. To print the value of a variable, just type the variable name into the console and press enter
x <- 2
x
## [1] 2
y <- 4
y
## [1] 4
x+y
## [1] 6
z <- x*y
z
## [1] 8

The console in RStudio is basically the same as running an R session from the command line.

RStudio makes editing an Rscript and viewing the console output easy by showing them in the same window. This helps you to make iterative changes to your code.

An RStudio “Project” is a good way to organize your Rscripts. When a Project is created, the directory of the Project becomes the working directory for the scripts inside the Project. This solves potential issues with finding the right paths to data files you want to analyze.

Next we will make a Project based on the r-fridays directory you created earlier.

  1. Click on the Project button in the very top right of the RStudio window and select “New Project”.
  2. Select “Existing Directory”, and browse to the r-fridays directory we made, located in your home directory.
  3. Click “Create Project”
  4. In the “Files” pane on the lower right of your screen, click on helloworld.r to open the file.
  5. Highlight the print statement in your code, and then hit “cmd + enter” on your keyboard. View the results in the console!
  6. Add some other print or arithmetic statements to the helloworld.r script, and practice running individual statements or groups of statements.

Wrap up

Now you know a bit about running R commands and using R scripts and RStudio to organize your work! Next, we will tackle real data generously donated by Sam Lisy of the Ascano Lab. If you weren’t able to attend this R Friday in person, you can see a summary of our work in the next post: “Day 1: A First Look at qPCR Data”.

If some of the information covered in this intro was confusing to you, don’t fret! Learning to program in R is a lot to take in at once. There are many helpful tools online that go into more detail than we covered here (see the Welcome to R Fridays post). If you have specific questions, don’t hesitate to email me.