Module III Inference



Module III Inference

Lecture 1

The Logic of Statistical Decision Making

Assume that a manufacturer of computer devices has a process which coats a computer part with a material that is supposed to be 100 microns (one micron = 1/1000 of a millimeter) thick. If the coating is too thin, then proper insulation of the computer device will not occur and it will not function reliably. Similarly if the coating is too thick, the device will not fit properly with other computer components.

The manufacturer has calibrated the machine that applies the coating so that it has an average coating depth of 100 microns with a standard deviation of 10 microns. When calibrated this way the process is said to be "in control".

Any physical process, however, will have a tendency to drift. Mechanical parts wear out, sprayers clog, etc. Accordingly the process must be monitored to make sure that it is "in control". How can statistics be applied to this problem?

To understand the way statistics might be applied to this problem, let us shift to a different non-statistical problem.

Consider a person on trial for a "criminal" offense in the United States. Under the US system a jury (or sometimes just the judge) must decide if the person is innocent or guilty while in fact the person may be innocent or guilty. This results in a table as shown below:

[pic]

[pic]

Are both of these errors equally important? That is, is it as bad to decide that a guilty person is innocent and let them go free as it is to decide an innocent person is guilty and punish them for the crime?

Is a jury supposed to be totally objective, not assuming that the person is either innocent or guilty and make their decision based on the weight of the evidence one way or another?

Actually in a criminal trial, there is a favored assumption. An initial bias if you will. The jury is instructed to assume the person is innocent, and only decide that the person is guilty if the evidence convinces them of such.

When there is a "favored assumption", as the presumed innocence of the person is in this case, and the assumption is true, but the jury decides it is false and decides the person is guilty, this is called a Type I error.

Conversely, if the "favored assumption" is false, i.e. the person is really guilty, but the jury decides it is true, that is that the person is innocent, then this is called a Type II error.

This is shown in the table below:

[pic]

In some countries the "favored assumption" is that the person is guilty. In this case the roles of the Type I and Type II errors would reverse to give:

[pic]

Further in order to decide the person is guilty, the jury must find that the evidence convinces them beyond a reasonable doubt.

In other words, if the balance of evidence toward guilt is just a bit more than the balance toward innocence, the jury should decide the person is innocent.

Let us assume that the favored assumption is that the person is innocent, and let

(=P(Type I Error)=P(Jury Decides Guilty | Person is Innocent).

Clearly we would like to keep ( small.

The alternative error is:

(=P(Type II Error)=P(Jury Decides Innocent | Person is Guilty).

Clearly we would also like to keep ( small.

In fact one can make ( = 0 by simply not convicting anyone. However, every guilty person would then be released so that ( would then equal 1.

Alternatively, one can make ( = 0 by convicting everyone. However every innocent person would not be convicted so that ( would be equal to 1.

Although the relationship between ( and ( is not simple, it is clear that they work in opposite directions.

One can think of ( as related to the concept of "reasonable doubt". If ( is small, say .001, then it will require more evidence to convict an innocent person. On the other hand if ( is large, say .1, then less evidence will be required. If mathematics could be precisely applied to the court-room (which it can't), then by choosing an acceptable level of (, we could determine how convincing the evidence would have to be to be "beyond a reasonable doubt".

Suppose we pick an (, say .05, and then the jury hears the evidence in the case. The defense attorney will ask the jury to decide innocence if they can think of any way the person on trial could be considered innocent. The prosecuting attorney will ask the jury to use their common sense and decide if something is beyond a reasonable doubt.

The jury then makes their determination. It should be emphasized that just because the jury decides the person is innocent, it does not mean that the person is really innocent. Similarly, if the jury decides the person is guilty it does not mean that the person really is guilty.

What does all this have to do with the manufacturing problem?

In the logic of statistical inference, one also has a favored assumption. This favored assumption is called the Null Hypothesis. In the case of the jury trial the favored assumption was that the person is innocent. In the manufacturing problem the natural favored assumption is that the process is in control.

The alternative to the Null Hypothesis is called the Alternative Hypothesis. In the case of the jury trial it was that the person is guilty. In the manufacturing problem the alternative is that the process is out of control.

Our decision table would now look like the following in the manufacturing problem:

[pic]

In fact, the Null Hypothesis is still not sufficiently precise for we need to define what it means to say that the process is in or out of control.

We shall say that the process is in control if the mean is 100 and out of control if the mean is not equal to 100. One usually abbreviates this with a shorthand notation. We let H0 stand for the Null Hypothesis and HA stand for the alternative hypothesis. We would then write:

[pic]

as the formal specification of the decision. Our table would now look like:

[pic]

The next step is to define reasonable doubt or equivalently ( . This is an individual choice but the two most commonly used values are .05 (or about a one in twenty chance) and .01 (or about a one in one hundred chance).

As in the trial we must now collect evidence. In a statistical situation, the evidence consists of the data. We will have the data x1, x2, . . . , xn. In order to use this data as evidence for or against the null hypothesis we need to know how to summarize the information. Since our hypothesis is a statement about the population mean, it is natural to use the sample mean as out evidence.

Let us assume that the thickness of the computer coating follows a normal distribution with a mean of 100 and a standard deviation of 10. Suppose we took a random sample of 4 chips and computed the sample mean.

Statistical theory would say that:

[pic]

Statistical theory would also indicate that the sample mean is normally distributed. This distribution is called the sampling distribution of the sample mean.

To illustrate this, I simulated 200 values of the normal distribution with a mean of 100 and a standard deviation of 10 using the command:

=norminv(rand(), 100, 10).

A histogram of the data is shown below:

[pic]

I then rearranged the values into 50 rows of four columns, and averaged the values in each row which is equivalent to simulating a random sample of 50 averages of 4 normal observations. The results are shown below:

[pic]

As can be seen, the data is still bell shaped, but the standard deviation is much smaller, in fact it is now 5.

Notice that almost no values of the mean are below 90 or above 110. This follows from the fact that the sampling distribution of the sample mean based on samples of size 4 is normal with a mean of 100 and a standard deviation of the sampling distribution of 5. Therefore we know that approximately 95% of the observations will fall within the limits of +/- 1.96 standard deviations of the mean. In this case the values are:

100 – 1.96*5 = 90.2

and

100 + 1.96*5 = 109.8.

We now have a procedure for determining if the observed data is beyond a reasonable doubt. The procedure goes like this:

a) Take a random sample of four chips, compute the thickness for each and average the results;

b) If the sample mean falls between 90.2 and 109.8, then this is what would happen approximately 95 % of the time if the manufacturing process was in control, therefore decide it is in control;

c) If the sample mean falls outside the range of 90.2 and 109.8 then either one of two things has happened: I) the process is in control and it just fell outside by chance (which will happen about 5% of time); or II) the process is out of control.

Notice that under b) just because you decide the process is in control does not mean that it in fact is in control.

Similarly under c) if you decide the process is out of control then it might just be a random mean outside the interval which happens about 5% of the time.

The procedure described above is an example of Quality Control . In this area, one monitors processes as we are doing here trying to use sample data to decide if the process is performing as it should be. (The case we have been discussing is probably the simplest situation. If you are heavily involved in Quality Control in your job, you would need a more specialized course in statistics then the one you are now taking).

If you load the EXCEL file qc.xls located in the Module III datasets, you will see a chart like the one below:

[pic]

Notice that most of the averages are within the +/- 1.96 limits, with a few points outside. If you press F9 when you are in EXCEL, you will get another set of simulated values. Look at several of them. Remember, this process is in control so all the values that fall outside the limits are false indicators of problems, that is, they are Type I errors!

Here is another simulated sequence, notice that in this case several values have fallen far outside the limits, and yet the process is in control.

[pic]

Often people expect the results to look like this sequence of simulated values:

[pic]

Here all the values stayed within the quality control limits.

I hope the above charts have indicated to you the great deal of variability that can occur even when a process is in control. A statement that is often heard is "Statistics has proved such and such". And of course there is the counter argument that "Statistics can prove anything". In fact, now that you see the structure of statistical logic, it should be clear that one doesn't "prove" anything. Rather we are looking at consistency or inconsistency of the data with the hypothesis. If data is "consistent" with the hypothesis (in our example the sample mean falls between 90.2 and 109.8), then instead of proving the hypothesis true, a better interpretation is that there is no reason to doubt that the null hypothesis is true.

Similarly, if the data is "inconsistent" with the hypothesis (in our example the sample mean falls outside the region 90.2 to 109.8), then either a rare event has occurred and the null hypothesis is true (how rare?? less than .05 or .01), or in fact the null hypothesis is not true.

The above quality control procedure was based on the fact that the sample mean of four observations taken from a normal distribution, also followed a normal distribution. However, we know that most business data does not follow a normal distribution and in fact is usually right skewed!!! What do we do if we cannot assume normality?

The answer is based on one of the most fundamental results in statistical theory, the Central Limit Theorem. To understand the idea of the central limit theorem, I will use simulation results.

First I simulate 500 pseudo random numbers using the =rand() command from EXCEL. Theoretically, this should give me a frequency distribution that is flat since each value should be equally likely. In the EXCEL file clt.xls, I did this with the resulting histogram of the 500 simulated values looking like:

[pic]

As you can see the sample values came out relatively uniform.

Now I arrange the 500 random numbers into 50 rows, each containing 10 numbers and I averaged the ten numbers. This corresponds to taking 50 random samples from the uniform distribution, each of size 10, and computing the mean. I now put the 50 means into a frequency distribution. This is called the sampling distribution of the sample mean based on a sample size of 10. The resulting frequency distribution is shown below:

[pic]

Notice that the distribution of the mean values is no longer flat! In fact the results are beginning to look bell shaped!

I repeated the above procedure, except this time, using some theoretical results, I simulated 500 values from a distribution which was highly right skewed. The histogram of the simulated data is shown below:

[pic]

I then rearranged the data exactly as before into 50 rows of ten numbers and computed the average of the ten numbers in each row. The histogram of the 50 sample means is shown below:

[pic]

As can be seen the right skew has almost disappeared and the histogram again looks bell shaped!

Finally, I generated 500 values randomly from a highly left skewed distribution with the resulting histogram given below:

[pic]

Again, I rearranged the 500 values into 50 rows of ten numbers and computed the average for each of the rows. The resulting histogram looked like:

[pic]

The left skew is much diminished and the resulting curve gain is looking bell shaped!!

The above examples are all special cases of the central limit theorem. This remarkable result holds under very general conditions and allows us to make inferences about means with only one sample of data.

To motivate the Central Limit Theorem (CLT), assume we have a population of arbitrary shape (right skewed, left skewed, flat, bimodal, etc) with mean ( and standard deviation (. Take 1000 random samples each of size n from the population and for each compute the sample mean, the sample median and the sample variance. You would wind up with a table that looked something like:

Sample Values Sample Statistics

[pic] [pic]

Now if we took any of the samples of size n, say the 2nd, and put the values into a frequency distribution, the resulting frequency distribution would tend to look like the population from which we took the random sample. This would be true for any of the 1000 random samples.

Instead of doing that, take the 1000 mean values, each based on a random sample of size n, and put these into a frequency distribution. This is called the Sampling Distribution of the sample mean based on samples of size n.

The Central Limit Theorem states three things that are unrelated to the shape of the population distribution:

1) The sampling distribution of the sample mean tends to the normal distribution;

2) the mean of this normal distribution is given by:

[pic]

3) the standard deviation of the sampling distribution of the sample mean (called simply the standard error of the sample mean) is given by:

[pic]

The key feature here is that this tends to occur irrespective of the shape of the distribution in the population from which the sample was taken.

Note that the Central Limit Theorem says that the distribution tends to the normal distribution. This tendency is essentially governed by the size of n, the sample size. If the population distribution is close to normal, then samples of size as small as 5 will result in sampling distributions for the mean which are close to normal. If the population distributions are highly skewed, then large sample sizes will be necessary.

It is the case, however, that most samples of 30 or more, even from the most skewed populations, will have sampling distributions that are close to the normal curve.

The Central Limit Theorem also applies to proportions. Suppose we take a random sample of n people and count the number who are female (say x). Define:

[pic]

which is the proportion of females in the sample. Now define a random variable (i which takes on the value 1 if the ith person is female and 0 if not. Then it is easy to show that:

[pic]

In other words,[pic], the sample proportion of females, is an average. Thus by the Central Limit Theorem, the sampling distribution of [pic]will be approximately normal with mean

[pic]

and standard error given by:

[pic]

It turns out that regresssion coefficients (remember our discussion of multiple regression in module I) also can be represented as averages so the Central Limit Theorem also applies to them.

The major advantage of the Central Limit Theorem is since we know that the sampling distribution of the mean will be normal, we do not have to take thousands of samples. Instead we can just take one sample and invoke the theoretical result.

The idea of knowing the sampling distribution of a statistics is so critical, that theoretical statisticians devote a great deal of research to this topic. It turns out that other statistics have sampling distributions which approach the normal distribution, even though they are not averages.

For example, if I took the sample medians from the 1000 samples previously discussed, and put them into a frequency distribution, I would get the sampling distribution of the sample median of a sample of size n.

Surprisingly, this also approaches the normal distribution. However instead of being centered around the population mean (, the sampling distribution of the sample median is centered around the population median, say (median, and it has standard error (the standard deviation of the sampling distribution) approximately equal to:

[pic]

Thus for the same sample size the sample mean is a more precise estimate of the population mean than the sample median is of the population median.

At this point you might expect that all sampling distributions are approximately normal. This is not the case. For example if you took the 1000 values of the sample variances ([pic]), and put them into a frequency distribution, you would not get a bell shaped curve. Instead you would get a right skewed curve called a [pic] distribution.

However, the Central Limit Theorem applies to the most useful measures used in business situations such as means, proportions, and regression coefficients.

We have now established one way of testing statistical hypotheses of the form:

[pic]

based on a sample of size n. We shall call this the Quality Control method. In summary we set up the acceptance region:

[pic]

If our sample mean falls in this interval, we "accept" the null hypothesis, otherwise we reject. Of course we must specify our ( level. If we use ( = .05, then we take z(/2=1.96. If we use ( = .01, then we take z(/2 = 2.576.

In our case with a mean of 100 and a standard deviation of 10, based on a sample of size 4 and with ( = .05, our acceptance region is 90.2 to 109.8.

Consider the three cases:

a) [pic] in which case we would reject the null hypothesis;

b) [pic] in which case we accept the null hypothesis;

c) [pic] in which case we accept the null hypothesis.

A second method is commonly found in research articles. It is simply a re-arrangement of the basic quality control results. The following steps show the re-arrangement:

[pic]

if and only if,

[pic]

if and only if,

[pic]

if and only if,

[pic]

where

[pic]

.

This is called the z-score method and it very easy to apply.

The steps are:

1) Determine ( and look up z(/2 (in our case ( =.05 so z.025 =1.96);

2) Compute [pic] ;

3) If [pic] then accept H0 ;

4) Otherwise reject H0.

For ( = .05, we use the cutoffs +/- 1.96.

So in Case a) [pic] so we reject the hypothesis that ( = 100 since it is outside the range +/- 1.96.

In Case b) [pic] so we accept the hypothesis that ( = 100 since it is inside the range +/- 1.96.

In Case c) [pic] so we accept the hypothesis that ( = 100.

Notice that are conclusions are exactly as before since they are based on the same relationship.

The third method for testing hypotheses is used in most computer programs (in fact we used it when we did step-wise regression in Module I). One of the steps in testing a hypothesis is to pick the Type I error rate (. The computer program has no idea what value you will pick so it cannot, for example, perform the z-score test discussed above. All it could to is compute the zobs statistic and the manager would still have to look up the appropriate z(/2 . One way around this problem is to compute the probability of the observed statistic or anything more extreme. This is called the p-value. Since we have defined rare as anything that has a lower probability than ( of occurring, the p-value procedure is very simple:

Step 1: Compute the probability of the observed result or anything more extreme occurring, call it the p-value:

Step 2: If p-value < ( reject the hypothesis;

Step 3: Otherwise (i.e. is p-value [pic] ( ) accept the hypothesis.

Let us apply this to our three sample cases.

Case a)

Step 1 -- In this case the sample mean value is 110. What is more extreme? Certainly any value greater than 110 makes us even more skeptical that the mean of the process is at 100. But this is only the upper tail of the distribution. Remember our rejection region consisted of values that were too large and also values that were too small. If we only report the probability of exceeding 110, it can never exceed one half of ( . To get around this problem ,when we are testing hypotheses where the alternative is non-equality, we simply double the computed value. This is called a two-sided p-value.

Step 2 -- In many EXCEL functions, the p-value is computed for you. Unfortunately, EXCEL does not have a direct procedure for the specific example we are studying. The two sided p-value for this particular case can be computed using the EXCEL formula:

[pic]

This gives the value two-sided p-value = .0455 which is less than .05, so we reject the hypothesis.

Case b)

In this case the mean is 105. Using the formula given above yields the two-sided p-value = .3173 which is greater than .05. Therefore we accept the null hypothesis.

Case c)

In this case the mean is 91.5. Using the formula given above, the two-sided p-value = .0891 which is greater than .05. Therefore we accept the null hypothesis.

Notice that once again the conclusions are exactly the same. This is because we are just doing the same thing in different ways.

Sometimes in reading journals in business you may see both methods two and three used simultaneously as in the following phrase:

"We tested the hypothesis that the mean agreed with our model's prediction of 100 and found the result in conformance with our theory ( z = 1.6, p= .11)."

What the authors are saying is that they tested the hypothesis that the mean was equal to 100, obtained a zobs = 1.6 and since a common default value for ( is .05, accepted the hypothesis since the two-sided p-value was .1096 and thus greater than .05.

Notice that all three of the above methods placed emphasis on the null hypothesis value for the mean of 100. However, what would happen if we took exactly the same cases and tested the hypothesis that the mean was equal to 99 instead of 100?

Method 1

Acceptance Region becomes 99 +/- 9.8 = 89.2 to 108.8.

Case a) sample mean = 110 ( reject

Case b) sample mean = 105 ( accept

Case c) sample mean = 91.5 ( accept

Method 2

Case a) zobs = 2.2 ( reject

Case b) zobs = 1.2 ( accept

Case c) zobs = -1.5 ( accept

Method 3

Case a) two-sided pvalue = .0278 ( reject

Case b) two-sided pvalue = .2301 ( accept

Case c) two-sided pvalue = .1336 ( accept

The conclusions are exactly the same! In Case a) we declare that the mean is not 100 and the mean is not 99. The natural question becomes "Based on this data, what values of the mean are consistent with the data?"

In Cases b) and c) the same data has indicated that we could say that the mean is 100 and we can say that the mean is 99!

There is really no conflict since Cases b) and c) are saying that both the values of 100 and 99 are "consistent" with the data. There is only a problem if you make the incorrect inference that statistics has "proved" the mean is 100 or 99.

The natural question to ask is "What is the set of all possible values of ( which would be accepted if we did a formal hypothesis test at level (?" The answer to this question leads us to the fourth method called the method of confidence intervals.

To apply this method to our particular problem, begin with the quality control (method 1) acceptance region for a test at level .05 of:

[pic]

We could rearrange the left inequality as :

( (

and also rearrange the right inequality as:

( (

This leads to the statement:

[pic]

This is called a confidence interval on ( . It is the set of all possible values of ( which if either methods one, two or three were applied testing the null hypothesis, the resulting conclusion would be “accept the null hypothesis”

In Case a), with sample mean 110, the interval is:

This implies that if we hypothesized any value of ( between 100.2 and 119.8 , the observed mean of 110 would lead us to accept the hypothesis. Notice that 100 is not in the interval so that value would not lead to accepting the hypothesis which means it would be rejected. This leads us to the rule:

Accept the hypothesis that ( = (0 if (0 is in the confidence interval

otherwise reject the null hypothesis.

In Case b), with sample mean equal to 105, the confidence interval becomes:

Since 100 is in the interval, we would accept the null hypothesis.

In Case c) with sample mean equal to 91.5, the confidence interval would be:

Again 100 is in the interval, so we would accept the null hypothesis.

Notice again that the conclusions in all four cases are exactly the same.

When I derived the confidence interval from the quality control limit, I did not include the probability statement. This is because the quality control limit is a statement made before data is collected about what values of the sample mean are expected if the hypothesized population mean value is 100.

The confidence interval, in contrast, can only be constructed after the data has been collected (since you need the sample mean). It cannot therefore be a probabilistic statement in the same way.

Taking ( = .05, we know that 95% of the time the sample mean will fall within the quality control limit. This implies that 95% of the time the confidence interval will bracket the true population mean, (. Therefore there is a 95% chance that the confidence interval constructed will contain the true mean. We would say that we have 95% confidence that a particular confidence interval contains the true mean value.

The general formula for a (1 - ( )*100% confidence interval, in this situation, is:

Since all four methods will always yield the same conclusions, which of any is preferred?

Clearly if one is trying to regularly control a process, Method I, the quality control approach is preferred.

Research articles in business tend to use Method II, the z-score method.

Most computer programs report the p-value of Method III.

However, by far the most practical for most managers are confidence intervals. The most important reason is that if you reject a hypothesis about the mean, say that the mean is equal to 100, the natural question is what value does it have. In Case a) we rejected the null hypothesis mean value of 100. The confidence interval indicated that the mean had shifted and was somewhere between 100.2 and 119.8. Secondly, even if we accept the hypothesis, the confidence interval indicates that there is a range of values that are consistent with the data so we are not tempted to claim that one particular value has been “proven”. For example in Case b, we accepted the hypothesis that the mean equaled 100, however we could just have easily have accepted any value between 95.2 and 114.8.

The only drawback is that in some cases, it is complicated to compute the confidence interval (i.e. the mathematics involved is quite elaborate). However, for most of the cases used by working managers and executives involving means and proportions, relatively simple confidence intervals can be used.

Notice also, that one can construct a confidence interval even if one is not formally testing a hypothesis. This is called interval estimation. Given the observed data, the confidence interval gives a range of estimates of the population value.

Given the practical implications, I will tend to use confidence intervals whenever possible.

Before finishing this section, I should point out there is another kind of hypothesis, called a one-side hypothesis test that you will encounter in most statistics books. This is usually invoked in a situation like the following:

A supplier requires that the defective rate of delivered good has a defective rate no higher than 1%. Accordingly, our company does quality control to test the hypothesis that our manufacturing process does not have a higher defective rate.

The null hypothesis would usually be taken to be

This implies that we are only concerned if we exceed the limit, and are unconcerned if we have a much lower defective rate. Now any practical executive would realize that if his product had a significantly lower defective rate than is required, and he could compete on price, he would want to advertise this fact. Using the above one-sided structure a significantly lower defective rate could never be detected.

The only time we will discuss one-sided hypothesis test is in situations that naturally involve the square of a statistic. When a value deviates too much in a positive direction, the square will be large. Similarly, when a value deviates too much in a negative direction, the square will also be large.

Since most business problems wish to determine whether or not a value diverges from a target value in either a positive or negative direction, we shall do almost all of our inferences using two-sided alternatives involving confidence intervals.

-----------------------

Sample

1

2

.

.

.

.

1000

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download