UC Davis Plants



R Notes 2012 LAB 1Topics covered: Intro to R Shapiro test, t testOne-way ANOVAProbability and critical valuesIntro to R# Download the program: # Set working directorysetwd("G:/PLS205/R/Lab1")# The function getwd can be used to verify where your current working directory is. getwd()[1] "G:/PLS205/R/Lab1"# The most useful command in R are ? or alternatively help and ?? (or alternatively help.search). Use ?# (or help(yourcommand)) if you know the exact command name and use ??yourcommand (or # help.search("yourcommand")) if you don’t know the command name, R will return a list of commands# matching (somehow) your search. ?setwd# the same as help(setwd)??shapiro# the same as help.search("shapiro")# R as a calculatorsqrt(4)[1] 22^3[1] 8# Assign values to a symbolic variablex<- 2# Assign a vector of values to a vector nameweight <- c(3,5,7,10)weight [1] 3 5 7 10# List variables ls()“x” “weight”# Operating with vectors of equal lengthheight <- c(2,2,2,4)bmi <- weight/height^2bmi[1] 0.750 1.250 1.750 0.625# Correlationscor(weight, height)[1] 0.8372184cor.test(weight, height) Pearson's product-moment correlationdata: weight and height t = 2.1651, df = 2, p-value = 0.1628alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: -0.6340529 0.9964900 sample estimates: cor 0.8372184# Operations using the numbers in a vector#The length function returns the number of elements in a vectorbirds<-c(210, 221, 218, 228, 220, 227, 223, 224,192)birds_n<-length(birds)birds_n[1] 9 # Adding the elements in a vector, calculating the mean and the standard deviation (functions: sum, mean and sd)birds_sum <-sum(birds)birds_sum [1] 1963 birds_mean <- mean(birds) # or median(birds)birds_mean [1] 218.1111 birds_sd <- sd(birds) # or var(birds)birds_sd [1] 11.1517 birds_se<-sd(birds)/sqrt(birds_n)birds_se [1] 3.7172 # We can create a function se using the function commandse <- function(x) {sd(x)/sqrt(length(x))}se(birds)# Coefficient of variationbirds_CV<-sd(birds)/mean(birds)*100 birds_CV [1] 5.1128 # alternatively, you can create a function that calculates the cv: cv <- function(x) { 100 * sd(x)/mean(x)) }cv(birds)Testing normality# Shapiro test of normalityshapiro.test(birds) Shapiro-Wilk normality testdata: birds W = 0.7981, p-value = 0.01945# Q-Q plots: # Q-Q plots are expected to produce a straight line if the data come from a ~N distribution with any mean and sdqqnorm(birds, pch=19, col="red", main="Q-Q plot for Birds data")qqline(birds)t-test# One sample T test # (test if the mean is different from a specified population mean mu=?).extract<-c(77.7, 76.0, 76.9, 74.6, 74.7, 76.5, 74.2, 75.4, 76.0, 76.0, 73.9, 77.4, 76.6, 77.3)?t.test # check the parameters that can be set # one sample ttest t.test(extract, mu=78) One Sample t-testdata: extract t = -6.2727, df = 13, p-value = 2.866e-05alternative hypothesis: true mean is not equal to 78 95 percent confidence interval: 75.23436 76.65135 sample estimates: mean of x 75.94286# Two independent samples without assumption of equal variances (default of t.test function).height <- c(2,2,2,4)height2 <- c(2,2,3,5)t.test (height, height2) Welch Two Sample t-testdata: height and height2 t = -0.5774, df = 5.4, p-value = 0.587alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -2.677493 1.677493 sample estimates:mean of x mean of y 2.5 3.0# Two independent samples assuming equal variancest.test (height, height2, var.equal = T) Two Sample t-testdata: height and height2 t = -0.5774, df = 6, p-value = 0.5847alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -2.619088 1.619088 sample estimates:mean of x mean of y 2.5 3.0# Two paired samples ttestt.test (height, height2, paired=T) Paired t-testdata: height and height2 t = -1.7321, df = 3, p-value = 0.1817alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -1.4186931 0.4186931 sample estimates:mean of the differences -0.5 One-way ANOVA# There is a file lab1.txt in your work directory that contains a table (that’s one reason while is important to setup# your work directory with setwd). The table was created in excel and saved as tab delimited text file. To open it, we# use the function read.table and we specify that the table has a header row with the names# of variables (header=T). T1<-read.table("Lab1.txt", header=T)T1 Genotype Score1 A 12 A 23 A 24 B 55 B 76 B 97 C 18 C 39 C 2#Alternatively, you can create tables directly in R, you could check the usage of the commands rep and data.frame :g<-rep(c('A','B','C'), each=3)s<-c(1,2,2,5,7,9,1,3,2)lab1<- data.frame(Genotype=g, Score=s)lab1# To access specific columns of the table, we use the $ symbol followed by the header name: lab1$Genotype[1] A A A B B B C C CLevels: A B Clab1$Score[1] 1 2 2 5 7 9 1 3 2# Always check the structure of your data set (str function)str(lab1)'data.frame': 9 obs. of 2 variables: $ Genotype: Factor w/ 3 levels "A","B","C": 1 1 1 2 2 2 3 3 3 $ Score : int 1 2 2 5 7 9 1 3 2# By default R recognizes the numbers in a data frame as continuous numerical values. Sometimes numbers in your # experiments are discrete categorical factors (e.g. quantities of fertilization used) and we will see later how to# change the type of data using the as.factor command. You can check the data type by “asking” using the# is.factor or is.numeric functions: is.factor(lab1$Genotype)[1] TRUEis.factor(lab1$Score)[1] FALSEis.numeric(lab1$Genotype)[1] TRUEis.numeric(lab1$Score)[1] FALSE# To define the one way ANOVA model (result_variable~classification_variable) you can use lm or aov functionsmodel<-lm(Score~Genotype, data=lab1)# To see the ANOVA table (for aov you need to use summary instead of anova) anova(model)Analysis of Variance TableResponse: Score Df Sum Sq Mean Sq F value Pr(>F) Genotype 2 53.556 26.778 15.062 0.004582 **Residuals 6 10.667 1.778 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Probability: calculate critical values# Quantile functions: qnorm, qt, qchisq, qf# NORMAL DISTRIBUTION# Usage:# qnorm(p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)# default distribution is “Standard normal distribution” with mean 0 and sd 1. Nvalue<-qnorm(0.975)Nvalue [1] 1.959964# T DISTRIBUTION# Usage: # qt(p, df, ncp, lower.tail = TRUE, log.p = FALSE)Tvalue<-qt(0.975, 20)Tvalue[1] 2.085963# CHI DISTRIBUTION# Usage: # qchisq(p, df, ncp=0, lower.tail = TRUE, log.p = FALSE)Chivalue<-qchisq(0.975, 20)Chivalue[1] 34.16961# F DISTRIBUTION# Usage:# qf(p, df1, df2, ncp, lower.tail = TRUE, log.p = FALSE)Fvalue<-qf(0.975, 20, 4)Fvalue[1] 8.559943# Cumulative distribution functions: pnorm, pt, pchisq, pf# NORMAL DISTRIBUTION# Usage:# pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)Nprob<-pnorm(1.96)Nprob[1] 0.9750021# T DISTRIBUTION# Usage: # pt(q, df, ncp, lower.tail = TRUE, log.p = FALSE)Tprob<-pt(2.086, 20)Tprob[1] 0.9750018# CHI DISTRIBUTION# Usage: # qchisq(q, df, ncp=0, lower.tail = TRUE, log.p = FALSE)Chiprob<-pchisq(34.2, 20)Chiprob[1] 0.9751968# F DISTRIBUTION# Usage:# qf(q, df1, df2, ncp, lower.tail = TRUE, log.p = FALSE)Fprob<-pf(0.975, 20, 4)Fprob[1] 0.9750003 ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download