Degrees of Freedom - Roger Wimmer



Degrees of Freedom

Excerpt from Mass Media Research: An Introduction/9th Edition

Roger D. Wimmer & Joseph R. Dominick

Before learning any statistical method, it is important to understand a statistical term known as the degrees of freedom (df). The concept involves subtracting a number, usually 1 or 2 from a sample (N or n), a group (K), a row (R), a column (C), or other subset designation, such as: N-1, N-2, K-1, R-1, and so on. For purposes of discussion, we will use N-1 to represent all letter variations and number variations of degrees of freedom.

Sample size, or subset size (hereinafter referred to only as sample size), is a fundamental component of virtually all statistical formulas—it is added to something, multiplied times something, or divided by something to arrive at a result. The understanding of the crucial importance of sample size led to the development of degrees of freedom.

Since degrees of freedom is relevant to virtually all statistical methods, it is important to understand what the concept actually means. However, the problem is that most definitions of degrees of freedom are confusing. For example, in most definitions, degrees of freedom is defined something like, "The number of elements free to vary," and is usually followed by an example, such as: "Assume we have five ratings on a 1-10 scale from five different respondents, and the sum of the five ratings is 32. If we know that one person's rating is 2, the four remaining ratings are free to vary (5-1 = 4 degrees of freedom) to account for the remaining 30 points." This means that since we know one of the five ratings is 2, the four remaining scores can be any score from 1 to 10, such as 6, 9, 8, 7 or 3, 9, 9, 9 or 6, 5, 9, 10—four ratings, or four degrees of freedom, are free to vary from 1 to 10.

While the use of the words "free to vary" in defining degrees of freedom may sound intriguing, and has been used in the definition for nearly 100 years, the definition elicits this question: "So what? What affect does 'free to vary and N-1' have on the data or the interpretation of the data?" This is an excellent question and we will try to clarify the mystery. But first, we need a little history of the development of degrees of freedom

Most statistics historians would probably agree that the concept of degrees of freedom was first developed by British mathematician Karl Pearson (1857-1936) around 1900, but was refined, or more appropriately, corrected, by R. A. Fisher, in an article published in 1922. Both Pearson and Fisher shared a hatred of being wrong (and they also reportedly shared a hatred for each other), and in their quest for correctness, they realized that something was needed in statistical calculations to compensate for possible errors made in data collection, analysis, or interpretation. The fear of making an error is actually the foundation for the development of degrees of freedom.

In their work, Pearson and Fisher concentrated on creating statistics to use in analyzing data from a sample selected from a population so that the results could be generalized to the population. They knew early on the importance of sample size in statistics, and they also knew, as we have discussed, that no matter how many controls are established in a research study, there will always be some error involved in projecting the results from the sample to the population.

With the reality of ever-present error, Pearson developed, and Fisher refined, a method to account for the problem that was directly associated with sample size. However, as indicated, the main problem with the term "degrees of freedom" is similar to the problem with other statistics terms—the term itself is confusing. For example, students often have difficulty understanding the concept "standard deviation," but when we tell them the term basically means "average difference from the mean," the usual comment is something like, "Now I understand. Why didn't they say that in the first place?" "Degrees of freedom" falls into the same category—it's an ambiguous term. It's possible that if a different term were used, there wouldn't be as much confusion with the concept.

Recall that the philosophy behind degrees of freedom is very simple—fear of being wrong. Since most research studies analyze data from a sample where the results are projected to the population, there is a need to make a slight adjustment to the sample to compensate for errors made in data collection and/or interpretation. This is because population parameters (the "real" data) are rarely, if ever, known to researchers. When calculating statistics for a sample, there is a need to have results that are somewhat conservative (corrected, adjusted) in nature to compensate for any errors that may be present. A brief example should help.

Assume we have a sample of 10 respondents selected from a population. (The small sample is only for demonstrating the concept of degrees of freedom and would never be used in a legitimate research study.) The sample is asked to rate a new TV show on 1-10 scale, where the higher the number, the more the respondent likes the show. The following table summarizes the computation of the standard deviation for the hypothetical sample. As a reminder, the formula for standard deviation is:

S = [pic]

Table 1: Degrees of Freedom Example

Table 1 shows that the mean rating for the 10 respondents is 5.8. The second column shows the deviation scores (X –[pic]), and the third column shows the squared deviation scores, which are used to produce the Sum of Squares used in the numerator of the standard deviation formula. Line A at the bottom of Table 1, shows the standard deviation for the 10 respondents using "N" in the denominator (referred to as biased) of the standard deviation formula—2.56; Line B shows the standard deviation with "N-1" in the denominator (referred to as unbiased) of the formula—2.70.

The difference of .14 (2.70 - 2.56) in the two standard deviations, while small, demonstrates that using N-1 (degrees of freedom) in the denominator produces a slightly larger standard deviation than N alone. The larger standard deviation (with N-1), which is a slightly more conservative estimate, compensates for the fact that the biased standard deviation formula (N in the denominator) may be too restrictive when the results are generalized to the population. This situation is true for every nonparametric and parametric statistic (discussed next) using N-1 (or other variation) in its formula—the resulting computation will always be slightly larger (more conservative) because the sample size is reduced by at least one, although the difference between the two calculations (N or N-x) gets smaller as sample size increases.

As mentioned, the confusion surrounding degrees of freedom might be reduced if the concept had another name or definition. From our previous discussion, we can say that, in essence, the "key" to degrees of freedom is not that data are free to vary, but rather the concept relates to an adjustment made to data to provide a slightly more conservative estimate of the data to compensate for the possibility of errors in data collection, analysis, or interpretation. Therefore, our formally stated definition for degrees of freedom is:

An intentional and predetermined reduction in sample size to provide a conservative data adjustment to compensate for research error.

©2011 Roger D. Wimmer & Joseph R. Dominick. This article, in whole or in part, may not be used in any form for any reason without prior written permission from the authors and Wadsworth/Cengage Learning. Contact Roger Wimmer at Roger@.

-----------------------

[pic]

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download