====== R Cheat Sheet ====== [[http://www.r-project.org/ | R]] is a free software environment for statistical computing and graphics. These notes summarize the [[http://tryr.codeschool.com/ | free R CodeSchool tutorial]]. ===== Basics ===== * ''R'' is the command-line interpreter * ''install.pa­ckages("gg­plot2") '' to install additional packages * Expressions are evaluated and displayed e.g. 1, 1+1, "Hello World" * Booleans are e.g. ''1=1'' , ''3>4'' , ''TRUE'', T, ''FALSE'', F * For variable assignment ''x=1'' or ''x<-1'' * For help on a function use ''help(sum)'' , ''help(package='ggplot2')'' or ''example(sqrt)'' * Operations are ''+ - * / = <-'' * ''NA'' is used to express a missing or unknown data value. Expressions on NA return NA. ===== Vectors ===== * To create a vector, use the combine command ''c(4,7,9)'' * Vectors must be of the same type, and are cast if not (e.g. to strings). * ''a:b'' creates a vector of integers from a to b. * ''seq(a,b,s)'' creates a vector of numbers from a to b in increments of s * ''myseq[3]'' to access third element i.e. vectors indexed starting at 1. * Use a vector as an index to access multiple elements e.g. ''myseq[c(1,3)]'' * The ''names'' function can be used to assign names to vector elements. Once names are asigned, they can be used as indices e.g. names(myseq)=c('one','two','three') myseq['two'] * ''myseq + 1'' adds one to all elements of the myseq vector. * Scalar operations or functions on vectors typically produce other vectors e.g. + - == sin(myseq) * ''head(myvec)'' , ''tail(myvec)'' to show start or end of vector ===== Plotting ===== * ''barplot[myseq]'' creates a bar plot of the ''myseq'' vector. ''abline(h=y)'' plots a horizontal line at height y. * ''plot[x,y]'' plots x vs y e.g. x=seq[0,20,.1] y=sin(x) plot(x,y) * ''contour(mymat)'' plots a contour map of a matrix. * ''persp(mymat)'' plots a contour map in perspective. * ''image(volcano)'' generates a heat map of the matrix. * ''qplot(weights, prices, color=types)'' - more attractive plotting using ggplot2 package. ===== Matrices ===== * ''matrix(0,3,4)'' creates a 3x4 matrix with all elements 0. * ''matrix(1:12,3,4)'' creates a 3x4 matrix with numbers 1-12. * dim(myseq) can be used to change dimensions of a matrix * ''mymatrix[3,4]'' returns an element of the matrix (row,column). * ''mymatrix[,2]'' returns entire second column. ===== Data Sets ===== * ''factor'' is a collection type for categorized values - ''myfac=factor(myvec)'' * ''factor''s group unique string values as ''level''s e.g. levels(myfac) shows unique levels. * ''as.integer(myfac)'' shows levels as integers, can be used to set plot type * ''legend("to­pright", level­s(types), pch=1­:length(le­vels(types­)))'' * A data frame collects sets of related values (i.e. sets of columns with values in the same order) e.g. ''mydf=data.frame(weights,prices,types)'' * To extract a column, use double-square brackets with the column index or name e.g. ''mydf%%[['weights']]%%'' or just a dollar sign e.g. ''treasure$prices'' * ''merge'' merges data sets by joining on shared column names ===== Statistics ===== * ''mean(myvec) median(myvec) sd(myvec)'' * ''cor.test'' tests for correlation (Pearson's product-moment) * ''line = lm(cola ~ colb)'' calculates a linear model between cola and colb that can be plotted with ''abline(line)'' * ===== File Handling ===== * ''list.files()'' to list files in furrent directory * ''source("file.R")'' to load file of code * ''read.csv('mydat.csv')'' to load a csv file * ''read.table'' to read text data with other separators * ''con<-url("http://google.com","r")'' to read a webpage * ''x<-readLines(con)'' to convert to a vector of lines