ࡱ> DFCi[@ 9bjbj44 "lViVi1NNNNptttt$p.rH ^ ^ ^ b u!u!u!:.<.<.<.<.<.<.$-0R2L`.n$q!u!n$n$`.NN^ b u.+++n$TN8^ b :.+n$:.+b+++b < pCUt(t+,,.0.+26+:2+t&JNNNN2+,u!1"+"l##Ku!u!u!`.`.pptp+pptLisa Sypek November 4, 2003. Kathy Harty WHAT IS REGRESSION ANALYSIS? Regression analysis calculates an equation that provides values of y for given values of x. The goal of regression analysis is to determine the values of constants for a function that result in the function to best fitting a set of data. In linear regression, the function is a linear (straight-line) equation (y=b0 +b1x). There are also other equations that can best describe the relationship between the two variables, such as quadratic (y=ax2 +bx +c), exponential(y=abx), logarithmic (y=a logbx) or higher degree polynomial functions. The purpose of obtaining these equations is to then use them to make predictions. Since the line or curve that results is actually one of best fit, the difference between the actual value of the dependent variable and its predicted value for a particular observation is the error of the estimate which is known as the "deviation'' or "residual''. The goal of regression analysis is to determine the values of the parameters that minimize the sum of the squared residual values for the set of observations. This is known as a "least squares'' regression fit. (Source: http://www.nlreg.com/intro.htm) MATHEMATICAL FOUNDATIONS OF REGRESSION ANALYSIS The result of regression analysis is a mathematical equation that describes the line or curve that best fits the data. There is a difference between the observed value of y and the value of y predicted by the equation. This vertical offset is called a residual. The error is measured by the difference between these two values. The goal of regression analysis is to find the relationship, while minimizing the error. The sum of the squares is used, rather than the absolute values, so that it can then be treated as a continuous differentiable quantity. The method of least squares is used to find the constants of the equation where the sum of the squares of the differences in these y values is as small as possible. A linear equation (y=mx+b), the values for two constants m and b must be obtained, in a quadratic (y=ax2 +bx+c), the values for three constants must be found. The condition for R2 to be a minimum is that partial derivatives for the equation with respect to each constant must be equal to zero. What results is a set of equations (the number of which depends on the number of unknowns, 2 for linear, 3 for quadratic, and so on.) which can be solved by a variety of methods. LINEAR Vs. QUADRATIC Vs. EXPONENTIAL Linear regression analysis is used to find the best fit straight line for a set of data. Since the equation y= mx+ b, has two constants, m and b that need to be determined, it is necessary to take the partial derivatives of the sum of the squares equation ,with respect to a and b individually. Upon calculating all the necessary sums of the from the data, what results in a system of two linear equations with two unknowns that can be solved quite simply, to determine a and b. Quadratic regression analysis is used to find the best fit parabola for a set of data. Since the equation y=ax2 +bx+c the values for three constants a,b and c must be found. It is necessary to take the partial derivatives of the sum of the squares equation ,with respect to a and b and c individually. Upon calculating all the necessary sums from the data, what results in a system of three linear equations with three unknowns that can be solved quite simply, to determine a and b. Below is the actual Maple Code one could use to determine the quadratic equation which best fits some parabolic data. > x_val:=[0,3,2,5,5,6]; > y_val:=[6,0,1,1,4,6]; > n:=6; > for i from 1 to n do C[i]:=[x_val[i],y_val[i]];od; > > our_data_plot:=plot([seq(C[i],i=1..n)],style=point): > display(our_data_plot); > parab_graph:=plot(0.5*x^2-x*4+6,x=0..8,color=green,thickness=2): > display(parab_graph); > display({parab_graph,our_data_plot}); > A:=matrix(3,4,[0,0,0,0,0,0,0,0,0,0,0,0]); > for i from 1 to n do A[1,1]:=A[1,1] + x_val[i]^4;od; > for i from 1 to n do A[1,2]:=A[1,2] + x_val[i]^3;od; > for i from 1 to n do A[1,3]:=A[1,3] + x_val[i]^2;od; > for i from 1 to n do A[1,4]:=A[1,4] + x_val[i]^2*y_val[i];od; > for i from 1 to n do A[2,1]:=A[2,1] + x_val[i]^3;od; > for i from 1 to n do A[2,2]:=A[2,2] + x_val[i]^2;od; > for i from 1 to n do A[2,3]:=A[2,3] + x_val[i];od; > for i from 1 to n do A[2,4]:=A[2,4] + x_val[i]*y_val[i];od; > for i from 1 to n do A[3,1]:=A[3,1] + x_val[i]^2;od; > for i from 1 to n do A[3,2]:=A[3,2] + x_val[i];od; > A[3,3]:=n; > for i from 1 to n do A[3,4]:=A[3,4] +y_val[i];od; > evalm(A);  reduce the matrix using Gauss Jordan algorithm > Lisa:=gaussjord(A);  > Lisa[1,4];  > evalf(%);  > Lisa[2,4];  > evalf(%);  > Lisa[3,4];  > evalf(%);  > best_parab:=plot(Lisa[1,4]*x^2+ Lisa[2,4]*x+Lisa[3,4], x=-5..10): > display(best_parab,parab_graph, our_data_plot); Exponential regression analysis is used to find the best fit exponential curve for a set of data. Since higher order polynomials can appear to be exponential, if a simple graph of x vs. lny appears linear the original data is exponential. The equation y= Aebx can be made linear by taking the log of both sides to end with ln y = lnA + bx. This can be dealt with similar to the linear case, but lastly one must calculate elnA to get A. HOW GOOD OF A FIT ? the use of r/r2 The linear correlation coefficient, r, measures the strength of the relationship between the two variables. It always has a value between -1 and 1. Positive 1 means that there is a perfect positive correlation, negative 1 means a perfect negative correlation or an inverse relationship exists. When r is zero or close to zero we assume there is little linear correlation between the two variables. The squared correlation describes the proportion of variance in common between the two variables. If we multiply this by 100 we then get the percent of variance in common between two variables. r r20.10.01 = 1%0.20.04 = 4%0.30.09 = 9%0.40.16 = 16%0.50.25 = 25%0.60.36 = 36%0.70.49 = 49%0.80.64 = 64%0.90.81 = 81%1.01.0 = 100% For example, we found that the correlation between a nation's power and its defense budget was .66. This correlation squared is .45, which means that across the fourteen nations constituting the sample 45 percent of their variance on the two variables is in common (or 55 percent is not in common). In thus squaring correlations and transforming covariance to percentage terms we have an easy to understand meaning of correlation. And we are then in a position to evaluate a particular correlation. As a matter of routine it is the squared correlations that should be interpreted. This is because the correlation coefficient is misleading in suggesting the existence of more covariation than exists, and this problem gets worse as the correlation approaches zero SOURCE: http://www.mega.nu:8080/ampp/rummel/uc.htm#C8 TECHNOLOGY NLREG is a very powerful regression analysis program. Using it you can perform multivariate, linear, polynomial, exponential, logistic, and general nonlinear regression. What this means is that you specify the form of the function to be fitted to the data, and the function may include nonlinear terms such as variables raised to powers and library functions such as log, exponential, sine, etc. For complex analyses, NLREG allows you to specify function models using conditional statements (if, else), looping (for, do, while), work variables, and arrays. NLREG uses a state-of-the-art regression algorithm that works as well, or better, than any you are likely to find in any other, more expensive, commercial statistical packages. (SOURCE: http://www.nlreg.com/intro.htm) Technology Pros and cons of various pieces of technology Graphing Calculator The LinReg function on a TI-89 calculator can be used for linear regression analysis. A number of programs utilizing Linear Regression can also be downloaded to a TI 89 graphing calculator. These programs calculate the best-fit line for a set of data without using the LinReg function on the TIs. These programs perform linear regression on a set of points, and unlike the LinR function, GRAPH the approximated line with the points. The Regression Package on TIs allow the user to fit a set of points to a linear, logarithmic, sinusoidal, exponential, or power regression model, then visually compare the fit line and the original points on the graph. Quadratic regression program for TIs works in same manner as built-in linear regression. Excel Doing a Linear Regression Analysis, Using Excel There are actually two ways to do a linear regression analysis using Excel. The first is done using the Tools menu, and results in a tabular output that contains the relevant information. The second is done if data have been graphed and you wish to plot the regression line on the graph. In this version you have the choice of also having the equation for the line and/or the value of R squared included on the graph. Maple The stats package in maple provides a number of sub-packages and functions for data visualization, sorting, tabulating interval frequencies, computations of the measures of location and dispersion, computations of distributions and linear regression. Many of these functions are illustrated in the tutorial. Pros and Cons For ease of use Excel outweighs the other methods at least in the context of secondary student. Often students find the graphing calculator confusing and for many Maple would seem like an outdated programming language. (Remember, none of these students have ever seen Fortran!) The Secondary Curriculum The National Council of Teachers of Mathematics (NCTM) recommends that instructional programs in secondary schools should enable students to formulate questions that can be address through a multitude of mathematical procedures. These procedures most certainly would include the selection and use of appropriate statistical methods. Statistics provides students with an rich opportunity to practice the development and evaluation of inferences and predictions that are based on data collection and analysis. The increased emphasis on data analysis and evaluation is supported by some of the more common themes in the standard algebra curriculum, yet the development of mathematical models based upon statistical procedures remains an infrequent experience in traditional algebra classes. In studying data analysis and statistics, students many times learn that solutions to some problems depend upon assumptions and a certain degree of uncertainty. Mathematical models that simulate linear relationships for instance are popular but not always realistic as taught in the context of a typical algebra class. The simplest type of model relating a response variable y to a single quantitative independent variable x is given by the equation of a straight line y = mx+b. Since this represents a deterministic model where there is no error reading in y, that is to say a discrete value for y can be predicted exactly using the equation y = mx+b, it is fairly limited in its practical interpretation. Knowing that many times a variable cant be represented as a simple deterministic equation in one or more quantitative independent variables, it becomes valuable for students to participate in classroom activities that force them to investigate deterministic linear equations in the context of a more realistic setting. This discussion ends up becoming their first introduction to statistics via regression analysis. Appendix: Terminology for the Novice Regression A method for fitting a curve (not necessarily a straight line) through a set of points using some goodness-of-fit criterion. The most common type of regression is linear regression. Least Squares A mathematical procedure for finding the best fitting curve to a given set of points by minimizing the sum of the squares of the offsets ("the residuals") of the points from the curve. The sum of the squares of the offsets is used instead of the offset absolute values because this allows the residuals to be treated as a continuous differentiable quantity. Interpolation The computation of points or values between ones that are known or tabulated using the surrounding points or values. Extrapolation An estimate of future conditions based on the assumption that the current trends will continue. Example per capita cigarette consumption(x)lung cancer deaths(y) per 1 millionRhode Island27097Massachusetts300115Vermont350165Maine485170New Hampshire505190Connecticut535210  EMBED Excel.Chart.8 \s  /L ' ( @ A ?@/0 58MPUX"%:>cg69vz!\_ h0vc5B*OJQJphh0vcB*QJph h0vc6 h0vcH* h0vcH* h0vc5h0vcN"./L 126NV $dh7$8$H$a$7$8$H$99#;<de7wx]  $dh7$8$H$a$ $7$8$H$a$7$8$H$ I_anp|~./0 e $dh7$8$H$a$7$8$H$K^_`cmnor{|}-.3q`\h0vc j h0vcB*EHQJUph j h0vcB*EHQJUph jw h0vcB*EHQJUph jh0vcB*EHQJUph jih0vcB*EHQJUph jh0vcB*EHQJUph j$h0vcB*EHQJUphh0vc5B*OJQJphh0vcB*QJph jh0vcB*EHQJUph#35      & ' ( -#.#\#h#s&~&&&&v(x()))))++++,,,,--h0vcCJQJh0vcCJOJQJh0vcOJQJh0vc5CJQJ h0vcCJh0vch0vc5 h0vc5QJh0vc5CJH*QJh0vc5CJQJ h0vc5H* h0vc5 h0vcQJh0vc h0vcH*QJ:eokd $$Ifl0,"LL t04 la $If $Ifokd$$Ifl0,"LL t04 la $Ifokd $$Ifl0,"LL t04 la $Ifokd$$Ifl0,"LL t04 la $Ifokd$$Ifl0,"LL t04 la $Ifokd$$Ifl0,"LL t04 la $Ifokd3$$Ifl0,"LL t04 la $Ifokd$$Ifl0,"LL t04 la   $IfokdG$$Ifl0,"LL t04 la   & $Ifokd$$Ifl0,"LL t04 la& ' ( &#\#]#h#q&r&s&~&&&&}wugd0vc@&okd[$$Ifl0,"LL t04 la &&'w(x())))))++++++,,,,,--..  `-..55|66778w88888888<9I9Q9_9h9p9y999999999999999ǼӼӼӼӼӼӲh.Nj[h$hh$h5U!j[C h$hCJQJUVaJ h$h5jh$h5Uh0vc5OJQJhh0vc5>*OJQJhh0vcOJQJh h0vc5 h0vc5QJh0vc h0vcQJh h0vch h0vcCJh&.(1j25555{6|66778v8w88888889;9! !!!!!!!!!!!!!!!!p  $$1$Ifa$$1$If  ;9<9I9M9P9Q9_9c9Yp BpXkdc$$If6F\(=y    4 6a $$1$Ifa$Xkd$$If6F\(=y    4 6ac9g9h9p9t9x9y99 p BpXkd_$$If6F\(=y    4 6aXkd$$If6F\(=y    4 6a $$1$Ifa$99999999 p BXkd[$$If6F\(=y    4 6aXkd$$If6F\(=y    4 6a $$1$Ifa$9999999999p D!!!!!1$Zkd$$If6F\(=y    4 6a $$1$Ifa$ 1h/ =!"#$%$Dd  <  C A2}iT5JQBUpD`!h}iT5JQBUFh% 6xcdd``Vfd``ba ff``L MXc112BY@u30$ \P%@pT9F]jZEVF  6l+:&d^32O  ČScABe d++&1`Fi, Ff&>3!ҞQ &klb EV|\a&gTχPyuBEV>F3 -.1 WįDd  H<  C A2za8_A>hNh`!za8_A>hNv !xڕҽK@w&ZuXPTDAc D:"BG]]/1g^"B{ A/bh#~d'ۏXO8 8k> u,WU^/oEλ׾28;~M{|>|_wsު>cn~eki8Fu>z 1Oz++q*RUQՎN5Jũ*M*Tꚨ'z쒪"DT UZ6A* j6FSUT6 &5Ju KTubR]]R5h[ ?%0rr||eN:WwQEVt)hFN&[V?%5V̨W džcMv c݂J᫴Dd sK<  C A2ɤgzp yI`,`!ɤgzp yI`, xcdd``e```ba ff``L  c112BY@u8_@؜`t!*D1fY@0L- @2@= fU#ZZǰ@pT9Nv `hn32ab` ]tDd <  C A2zXRFM~qlmi`!zXRFM~qlmis xcdd``e```da ff``L =c1(cbd8f6a#4br?@ABdEIHcfKLMNOPQRSTUVWXYZ[\]^_`abgehklmnopqrstuvwxyz{|}~Root Entry  F`?XG@ Data 7WordDocument "lObjectPool0T`?X_1130080514!F0TpCUOle EPRINTtCompObj b  !"$l0 EMFt' F, EMF+@XXF\PEMF+"@ @ $@ !@ 0@?@     !" !" !  " !   " !   " F4(EMF+*@$??FEMF+@  !FMicrosoft Excel ChartBiff8Excel.Chart.89q Oh+'0@HT` x ObjInfo WorkbookJ0SummaryInformation( DocumentSummaryInformation80 \pCCC Ba= =--<X@"1 Arial1 Arial1 Arial1 Arial1F Arial1F Arial"$"#,##0_);\("$"#,##0\)!"$"#,##0_);[Red]\("$"#,##0\)""$"#,##0.00_);\("$"#,##0.00\)'""$"#,##0.00_);[Red]\("$"#,##0.00\)7*2_("$"* #,##0_);_("$"* \(#,##0\);_("$"* "-"_);_(@_).))_(* #,##0_);_(* \(#,##0\);_(* "-"_);_(@_)?,:_("$"* #,##0.00_);_("$"* \(#,##0.00\);_("$"* "-"??_);_(@_)6+1_(* #,##0.00_);_(* \(#,##0.00\);_(* "-"??_);_(@_)                + ) , *  `Chart1Sheet1T  M\\girona\AutoCAD Lab HP4si S odXXLetterPRIV0''''\ \KhCu`p \ IUPHd [none] [none]4Pd?CMOOMEY"dXX??3` M:p&` M:p&m3d 3Q:  FoodQ ;Q ;Q3_4E4 3Q:  GasQ ;Q ;Q3_4E4 3Q: MotelQ ;Q ;Q3_4E4D $% M3O&Q4$% M3O&Q4FA 3OT X3 b#M43*#M! M4523  O43" g3Og% M3OQ44444eJanJanJanFebFebFebMarMarMarAprAprAprMayMayMayJunJunJune(@1@$@1@&@5@6@=@,@,@$@1@(@1@$@3@.@4@e>    dMbP?_*+%M\\girona\AutoCAD Lab HP4si S odXXLetterPRIV0''''\ \KhCu`p \ IUPHd [none] [none]4Pd?CMOOMEY"dXX??U>@  7 CCCCCCMicrosoft Excel@@\ޫ՜.+,0 PX   Worcester Polytechnic Institute{ Sheet1Chart1  WorksheetsChartsOh+'0h"=H*fj\@KI@==H*fj\@KIAAǼ"jxؽnAY ȇR$"D)ďm0@qGA\ѻү'@\ p y7|EF4KݞЌJV_^=8~A#[8<3upxq˽p|yu )勺ܣp^OY~w7fu.2sV+أ>C;պw0w]YH=;wo[i|_\Qܦ)[c?hc׮:wkr>9gv}#aR.Lonm=.G-\dYmi+Pl V[zSl=Xmi@ 6`m(lC6$ېՖۈl#V[6VlcYmiDM6am*lS6%۔Ֆیl3V[6WlsYmi[B-ȶ`m)lKŶ$ےՖRJVd?#U/Hɔ9m[dƶ lA@`lWޘ3c[YzS~6R'7a -c[tZ7wD4eجa"{>zx`N7{䗴_Ҝ{oc.M!>1Tablej2SummaryInformation(DocumentSummaryInformation8@CompObj#j  $ 0 <HPX` Lisa Sypek isaCCCCCCC Normal.dotCCC2CMicrosoft Word 10.0@F#@NB@NBoc*՜.+,0 hp   Worcester Polytechnic InstituteZ1{  Lisa Sypek Title  FMicrosoft Word Document MSWordDocWord.Document.89qD@D NormalCJQJ_HaJmH sH tH D@D Heading 1$@&5CJOJQJX@"X Heading 2dd@&[$\$5B*CJ$\aJ$phF@F Heading 3$$@&a$ 5CJQJ@@@ Heading 4$@& 5CJQJN@N Heading 5$@& 5QJhtH uDA@D Default Paragraph FontVi@V  Table Normal :V 44 la (k@(No List B^@B Normal (Web)dd[$\$e@ HTML Preformatted7 2( Px 4 #\'*.25@9CJOJQJ^JaJ6U@6 Hyperlink >*B*phFV@!F FollowedHyperlink >*B* ph6B@26 Body TextCJQJJOAJ Maple Input5B*OJQJ\^JphlORl Maple Output$dh7$8$H$a$!B*CJ_HaJmH phsH tH bORb Maple Plot$7$8$H$a$!B*CJ_HaJmH phsH tH 11l z z z z z z z  '01]4X2"./L12 6NV#;<de7wx] I_anp|~./0 e &'(&\]hqrs~w x !!!!!!######$$$$$%%&&()j*----{.|..//0v0w00000001;1<1I1M1P1Q1_1c1g1h1p1t1x1y1111111111111111100000000000000000000000000000000000000000000000000000000000000000000000000000000  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0(0'(0'0&0&0&0&0&0& 0&0&80&00000000(0!0!0!0!0!0!00#0# 0#0#80#0$0$0$80#0%0%0%0%0%H0%H0%0-0-H0%0|.0|.H0%0/0/H0%0w00w00w0x0w00w0 0w0 0w0 0w0 0w0 0w0 0w0 0w0 0w0 0w0 0w0 0w0 0w0 0w0 0w0 0w0 0w0 0w0 0w0 0w0 0w0 0w0 0w0 0w0 0w0 0w0 0w0 0w0 0w0003-9"#0 e  & &.;9c9999 !$%&'()*+,-./1234591111:l,2$ 8*ʇ)?C;H7"l@0(  B S  ?1 &d7&&&b&b&c& &&8&L&&&<1<1Q1Q1h1h1y1y111111     H1H1^1^1o1o1~1~111111 9 *urn:schemas-microsoft-com:office:smarttagsState9 *urn:schemas-microsoft-com:office:smarttagsplace8 *urn:schemas-microsoft-com:office:smarttagsdate 1120034DayMonthYear   (-%(=BBFN P 2 6 W Z %8=\]optyz{}-8G`(-./=>_defpqsu~%&GLMNPUVWY[cd QZrw+15uw|: > v!z!++++11)7bj ' 2 : 5 ; &(>@PQ%->F9<z}!$_b OQchrxrfgltv!!}%%%%''-@..B/0t011:::::::::::::::::::::::::::::::::::::::::::::::::::##%%0011111 CCCPreferred CustomerKathyKathyKathy0vc$h*.Ng e &'0001;1<1I1M1P1Q1_1c1g1h1p1t1x1y11111111111111111eo0o0@1111\(1`@UnknownGz Times New Roman5Symbol3& z Arial?5 z Courier New7&  VerdanaCTungaCourier"ph\{F\{Foc*Zoc*Z!>4d113H(?0vc Lisa Sypek CCCCCC