Principal Components Analysis (PCA) Part I



t-test and One-Way ANOVA

1 Purpose:

In this lab, we will compute an Independent t-test and a One-Way Analysis of Variance (ANOVA).

2 Background:

Most commonly employed biostatistical procedures is the comparison of two samples to infer whether differences exist between the two populations sampled are t-tests.

1 Independent sample design

The samples are independent from each other. The observations from one sample are not related to the observations from the other. Need to check for skewness and equal variances using the F-test. If variances are unequal use the Mann-Whitney test.

The null hypothesis for the Independent t-test is [pic]

[pic].

tcal = observed differences between sample means/pooled standard error, or

[pic], where [pic].

2 Paired sample design

The samples are not independent from each other. Sample 1 is in some way correlated with sample 2, so that the data may be said to occur in pairs. Measurements are taken on a single subject; one common example is “before and after” experiment. A second type of pairing occurs when an investigator matches the subjects in one group with those in a second group so that the members of a pair are as much alike as possible with respect to important characteristics, such as age and gender. This minimizes environmental effects but also reduces inferences limits.

The null hypothesis for the Paired t-test is [pic]

[pic].

The test statistics for the null hypothesis is

[pic], where [pic] = mean difference between pairs. [pic]= (# of pairs – 1), standard error

Test does not have the normality and equality of variances assumptions of the independent t-test, but assumes instead that the differences, dj, come from a normally distributed population of differences. Thus test by: [pic] (standard deviation of differences) / [pic] (mean difference); if the calculated value < 2 do paired, > 2 Wilcoxon paired-sample test.

3 One-way ANOVA

The 1-way ANOVA is used to determine if there are differences among means. The analysis is used in situations where there has been a variable (Independent variable) that has been manipulated (e.g. treatment and control) and one measured variable (Dependent variable). If there are only two levels, sometimes individuals use a 2-sample Independent t-test. However, the test statistics for the two procedures are mathematically equivalent.

The validity of ANOVA depends on whether or not certain conditions (assumptions) are true; if the assumptions are not true, the results are not valid. The assumptions are: 1) data in each level are normally distributed, 2) variances of measurements within each level are equal for all levels, 3) observations/individuals are randomly selected for the experiment and randomly assigned to the levels, and 4) the observations/individuals are independent among levels (i.e., different individuals/observations in each level).

3 General procedure for conducting a One-Way ANOVA

1) Determine the appropriate alpha for the experiment.

2) Develop planned (a priori) comparisons. The ANOVA will tell you if there are differences among the level means but not which means are different and which are not. The planned comparisons are used for determining where those differences are.

3) Obtain random observations for each level of the independent variable. Make sure that there are different observations/individuals for each level.

4) Before you do an ANOVA, you MUST check the assumption of equality of variance. Use a statistical test (i.e., Burr Foster Q-test) to determine if the variances of individual measurements are equal between levels.

If the assumption is TRUE, proceed to computing the ANOVA.

If the assumption is NOT TRUE, check for skewness, apply the appropriate transformations and retest the assumption.

If the variances ARE EQUAL AFTER TRANSFORMATIONS, use the transformed variable instead of the measured variable as the dependent variable in the ANOVA.

If the variance are NOT EQUAL AFTER TRANSFORMATIONS, you must use a different type of test, a non-parametric test. For a 1-Way ANOVA, the appropriate non-parametric test is the Kruskal-Wallis test.

5) Compute the ANOVA to test for differences among means. The null hypothesis is:

[pic]

If you ACCEPT HO, you conclude that there are NO DIFFERENCES among level means.

If you REJECT HO, You conclude that there ARE SIGNIFICANT DIFFERENCES among level means.

Compute the planned comparisons to find out where those differences lie.

If you reject an Ho for a planned comparison, produce a bar chart showing the values of the two means that were compared.

6) Conduct an unplanned multiple comparisons test (a posteriori) to see if there are differences among means that were not examined during the planned comparisons. This procedure is meant only to help you identify potentially interesting results for further experimentation. DO NOT use this instead of the planned comparisons.

If you reject an Ho for an unplanned multiple comparisons test, you should design another experiment to actually test that result.

4 Example of a 1-Way ANOVA

PROBLEM: You wish to determine if beetle density on leaves is related to plant height. Your original idea was that beetles probably would select shorter plants because they would be easier for them to get to shorter plants. For this experiment, you decided to use a specific species of plant on which the beetles are normally found. You developed racks to place them at ground level, shrub level, and tree level.

Question: You decide to guard against Type I error and set alpha to 0.025 (WHY?).

DESIGN: You will conduct the experiment at a local county park. The independent variable (factor) is Height with three levels: Ground (< 0.2m), Shrubs (0.2 TO 1.0m) and Trees (>2.0m). The dependent variable is Beetle Density (number of beetles per 0.25 kg of plant biomass measured after five days. You will use plants of the same species that are approximately the same size. The plants will be placed on racks at the three different heights. After five days, you will count the number of beetles on the plant and weight the plant.

Question: Figure 1 illustrates two different rack designs: Which would be appropriate for a completely random design and why?

Develop planned comparisons - These are comparisons between levels that are planned in advance. Each comparison will be between TWO GROUPS, but a group may contain more than one level. The comparisons must be orthogonal (independent). Comparisons made in this manner are referred to as CONTRASTS.

A) Determine Maximum Number of Comparisons (Contrasts)

Max Contrasts = a-1, where a = number of groups

For this example, the maximum number of contrasts would be 2.

B) Specify Contrasts.

After you choose your first contrast, your choices for further contrasts will be limited so make sure that this contrast addresses your main question.

The original idea is that beetles would not prefer tree height because of the difficulty in getting to the leaves.

The two groups for this comparison would be:

Mean tree height vs. the mean of lower heights combined.

The levels would be:

Tree height (T) vs. Shrub height (S) & Ground level (G)

The second contrast must be INDEPENDENT of the first. This means that once a treatment is used by itself on one side of a comparison, it can’t be used again. For example, if T vs. G & S and is the first comparison, the tree height treatment (T) cannot be used in any other comparisons. Therefore, in this example, there is only one possibility left, G vs. S.

Question: Figure 2 illustrates the park in which you plan to conduct your experiment. How will you randomly locate the sites for the racks? Do so on the figure.

[pic]

In the first part of this lab, we will compute an Independent 2 sample t-test. For the second part of the lab, we will conduct a one-way Analysis of Variance (ANOVA).

COPY the file, SUGAR DATA, to your diskette from the BIO 156 data folder in the BIOl 156 folder on the file server or enter the data from below.

PART 1: Independent t-test

You are interested in comparing cholesterol levels between males and females aged 20 to 29, height 5’ 4” to 5’ 9” weighing between 120 and 150 lbs. to determine if one sex may have higher risk of arteriosclerosis heart disease. You selected 10 males and 10 females at random from patients with the appropriate characteristics at Kaiser Permanente. Your results were:

Male Female

164.76 185.61

139.67 107.92

158.38 165.43

189.20 187.34

168.29 222.57

222.12 170.56

207.81 205.69

150.56 174.81

194.09 150.39

153.93 186.55

Use SYSTAT to compute the mean and variance for each group.

Enter the data in two columns, SEX$ and CHOLEST

Set BY GROUPS in the DATA menu to SEX$

Select STATISTICS from the STATS submenu from the STATS pull down menu and compute the summary statistics on CHOLEST

Check assumption of equality of variance.

What is Ho?

What is the critical F-value in the table?

Compute the F-test.

What were your results?

What did you conclude about the validity of using a t-test?

Use SYSTAT to compute the t-test.

Turn the BY GROUPS OFF!!!!!!!

Select “Two Groups” from the t-test fly-out menu from the STATISTICS pull-down menu.

Select CHOLEST as the measured variable and SEX$ as the grouping variable.

Click on OK.

Did you accept or reject Ho?

Interpret your results.

PART 2: One-Way Analysis of Variance

You are interested in determining the effect of auxin on pea section length when sugars are present. You have set up 30 cultures of pea sections with auxin in each culture. Ten of the cultures received a 0% sugar solution, 10 received a 2% fructose solution, and 10 received a 2% glucose solution. You then measured the length of the pea sections after three weeks.

Biological Questions:

1. Do sugars cause changes in pea growth in the presence of auxin?

Ho: [pic]Control vs Fructose+Glucose [pic][pic]error

2. Does Fructose cause different changes in pea growth in the presence of auxin than Glucose?

Ho: [pic]Fructose vs Glucose [pic][pic]error

Measurement variable:

length of pea sections grown in tissue culture.

preliminary experiments showed that sample size (n) should be10.

Test for equality of variance

A Burr Foster Q-test shows that the variances are considered to be equal.

Inference limits: There are three different seed sources in California. We want to infer over all peas in California for the current crop year.

The Data: Download the SUGAR DATA file onto your floppy diskette.

CONDUCT THE ANOVA

Use SYSTAT to compute the analysis of variance.

From the STATISTIC pull-down menu, select ANOVA and “Estimate Model” from the fly-out menu.

In the next screen, you need to specify the dependent variable (GROWTH) and the Factor (SUGAR$).

Click on OK

The program will then compute the ANOVA for the main Ho.

Results of Hypotheses tests:

Main Ho: [pic]SUGAR [pic][pic]ERROR Decision - REJECT Ho

Conclusion: Average growth is not the same for all treatment levels.

DO THE A priori comparisons:

USE SYSTAT to compute the a priori comparisons. Do the following for EACH comparison:

From the STATISTICS pull-down menu, select ANOVA and then “Hypothesis Test” from the fly-out menu.

Enter SUGAR$ in the Effects box and select Contrast

The next thing we have to do is specify the contrast formula. Select Custom.

The contrast tells the computer which treatments you are going to compare. For the first comparison, we want to compare the control to the two sugars combined. The contrast formula consists of coefficients, with one coefficient per treatment level. In this case we have 3 levels, so we will need 3 coefficients.

For the first comparison, we are going to compare the Control to two other treatment levels. So the coefficient will be 2 for treatment level 1 (Control). Give each of the sugars a coefficient of -1. To do this, enter 2 -1 -1 in the box.

When you click on OK, the computer calculates the a prior comparison. For the first a priori:

Ho ________________

Accept or reject Ho?

Compute the average length for the combined sugars (the mean of the two means)

_________ __

How did it compare to the mean for the control?

Conclusion?

For the second a priori:

Ho _________________

Accept or reject Ho?

How did the mean for Fructose and Glucose compare?

Conclusion?

Overall conclusion?

-----------------------

Figure 1: Rack Designs

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download