Type values and mathematical formulas into R’s command prompt
1 + 1## [1] 2Assign values to symbols (variables)
x = 1
x + x## [1] 2Invoke functions such as c(), which takes any number of values and returns a single vector
x = c(1, 2, 3)
x## [1] 1 2 3R functions, such as sqrt(), often operate efficienty on vectors
y = sqrt(x)
y## [1] 1.000000 1.414214 1.732051There are often several ways to accomplish a task in R
x = c(1, 2, 3)
x## [1] 1 2 3x <- c(4, 5, 6)
x## [1] 4 5 6x <- 7:9
x## [1] 7 8 910:12 -> x
x## [1] 10 11 12Sometimes R does ‘surprising’ things that can be fun to figure out
x <- c(1, 2, 3) -> y
x## [1] 1 2 3y## [1] 1 2 3‘Atomic’ vectors
Types include integer, numeric (float-point; real), complex, logical, character, raw (bytes)
people <- c("Brian", "Jim", "Herve", "Dan", "Val", "Martin")
people## [1] "Brian"  "Jim"    "Herve"  "Dan"    "Val"    "Martin"Atomic vectors can be named
population <- c(Buffalo=259000, Rochester=210000, `New York`=8400000)
population##   Buffalo Rochester  New York 
##    259000    210000   8400000log10(population)##   Buffalo Rochester  New York 
##  5.413300  5.322219  6.924279Statistical concepts like NA (not available)
truthiness <- c(TRUE, FALSE, NA)
truthiness## [1]  TRUE FALSE    NALogical concepts like ‘and’ (&), ‘or’ (|), and ‘not’ (!)
!truthiness## [1] FALSE  TRUE    NAtruthiness | !truthiness## [1] TRUE TRUE   NAtruthiness & !truthiness## [1] FALSE FALSE    NANumerical concepts like infinity (Inf) or not-a-number (NaN, e.g., 0 / 0)
undefined_numeric_values <- c(NA, 0/0, NaN, Inf, -Inf)
undefined_numeric_values## [1]   NA  NaN  NaN  Inf -Infsqrt(undefined_numeric_values)## Warning in sqrt(undefined_numeric_values): NaNs produced## [1]  NA NaN NaN Inf NaNCommon string manipulations
toupper(people)## [1] "BRIAN"  "JIM"    "HERVE"  "DAN"    "VAL"    "MARTIN"substr(people, 1, 3)## [1] "Bri" "Jim" "Her" "Dan" "Val" "Mar"R is a green consumer – recylcing short vectors to align with long vectors
x <- 1:3
x * 2            # '2' (vector of length 1) recycled to c(2, 2, 2)## [1] 2 4 6truthiness | NA## [1] TRUE   NA   NAtruthiness & NA## [1]    NA FALSE    NAIt’s very common to nest operations, which can be simultaneously compact, confusing, and expressive ([: subset; <: less than)
substr(tolower(people), 1, 3)## [1] "bri" "jim" "her" "dan" "val" "mar"population[population < 1000000]##   Buffalo Rochester 
##    259000    210000Lists
The list type can contain other vectors, including other lists
frenemies = list(
    friends=c("Larry", "Richard", "Vivian"),
    enemies=c("Dick", "Mik")
)
frenemies## $friends
## [1] "Larry"   "Richard" "Vivian" 
## 
## $enemies
## [1] "Dick" "Mik"[ subsets one list to create another list, [[ extracts a list element
frenemies[1]## $friends
## [1] "Larry"   "Richard" "Vivian"frenemies[c("enemies", "friends")]## $enemies
## [1] "Dick" "Mik" 
## 
## $friends
## [1] "Larry"   "Richard" "Vivian"frenemies[["enemies"]]## [1] "Dick" "Mik"Factors
Character-like vectors, but with values restricted to specific levels
sex = factor(c("Male", "Male", "Female"),
             levels=c("Female", "Male", "Hermaphrodite"))
sex## [1] Male   Male   Female
## Levels: Female Male Hermaphroditesex == "Female"## [1] FALSE FALSE  TRUEtable(sex)## sex
##        Female          Male Hermaphrodite 
##             1             2             0sex[sex == "Female"]## [1] Female
## Levels: Female Male HermaphroditeVariables are often related to one another in a highly structured way, e.g., two ‘columns’ of data in a spreadsheet
x = rnorm(1000)       # 1000 random normal deviates
y = x + rnorm(1000)   # another 1000 deviates, as a function of x
plot(y ~ x)           # relationship bewteen x and yConvenient to manipulate them together
data.frame(): like columns in a spreadsheet
df = data.frame(X=x, Y=y)
head(df)           # first 6 rows##            X           Y
## 1 -1.7569371 -0.70884344
## 2 -1.6527157 -1.97487316
## 3 -0.5161684 -1.36055768
## 4  0.2218860  0.09724608
## 5 -0.6661832 -1.82587026
## 6 -0.5512824  0.71819197plot(Y ~ X, df)    # same as aboveSee all data with View(df). Summarize data with summary(df)
summary(df)##        X                  Y           
##  Min.   :-3.27963   Min.   :-5.20065  
##  1st Qu.:-0.71917   1st Qu.:-1.02837  
##  Median :-0.06830   Median :-0.08605  
##  Mean   :-0.06072   Mean   :-0.09962  
##  3rd Qu.: 0.64606   3rd Qu.: 0.90735  
##  Max.   : 2.77080   Max.   : 4.37988Easy to manipulate data in a coordinated way, e.g., access column X with $ and subset for just those values greater than 0
positiveX = df[df$X > 0,]
head(positiveX)##            X           Y
## 4  0.2218860  0.09724608
## 9  0.6701959  0.82361589
## 10 1.1216619  1.49955242
## 14 0.6156470  0.11297448
## 15 0.2805778 -1.84736727
## 16 0.7633320 -1.63962235plot(Y ~ X, positiveX)R is introspective – ask it about itself
class(df)## [1] "data.frame"dim(df)## [1] 1000    2colnames(df)## [1] "X" "Y"matrix() a related class, where all elements have the same type (a data.frame() requires elements within a column to be the same type, but elements between columns can be different types).
A scatterplot makes one want to fit a linear model (do a regression analysis)
Variables found in the second argument
fit <- lm(Y ~ X, df)Visualize the points, and add the regression line
plot(Y ~ X, df)
abline(fit, col="red", lwd=3)Summarize the fit as an ANOVA table
anova(fit)## Analysis of Variance Table
## 
## Response: Y
##            Df Sum Sq Mean Sq F value    Pr(>F)    
## X           1 1040.0 1039.96  1022.2 < 2.2e-16 ***
## Residuals 998 1015.4    1.02                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1Introspection – what class is fit? What methods can I apply to an object of that class?
class(fit)## [1] "lm"methods(class=class(fit))##  [1] add1           alias          anova          case.names     coerce         confint       
##  [7] cooks.distance deviance       dfbeta         dfbetas        drop1          dummy.coef    
## [13] effects        extractAIC     family         formula        hatvalues      influence     
## [19] initialize     kappa          labels         logLik         model.frame    model.matrix  
## [25] nobs           plot           predict        print          proj           qr            
## [31] residuals      rstandard      rstudent       show           simulate       slotsFromS3   
## [37] summary        variable.names vcov          
## see '?methods' for accessing help and source codeHelp available in Rstudio or interactively
Check out the help page for rnorm()
?rnorm‘Usage’ section describes how the function can be used
rnorm(n, mean = 0, sd = 1)Arguments, some with default values. Arguments matched first by name, then position
‘Arguments’ section describes what the arguments are supposed to be
‘Value’ section describes return value
‘Examples’ section illustrates use
Often include citations to relevant technical documentation, reference to related functions, obscure details
Can be intimidating, but in the end actually very useful