Linear Regression Problems



Linear Regression Problems

1. As Earth’s population continues to grow, the solid waste generated by the population grows with it. Governments must plan for disposal and recycling of ever growing amounts of solid waste. Planners can use data from the past to predict future waste generation and plan for enough facilities for disposing of and recycling the waste.

Given the following data on the waste generated in Florida from 1990-

1994, how can we construct a function to predict the waste that was generated in the years 1995-1999? The scatter plot is shown in Figure 1.85.

|Year |Tons of Solid Waste Generated (in |

| |thousands) |

|1990 |19,358 |

|1991 |19,484 |

|1992 |20,293 |

|1993 |21,499 |

|1994 |23,561 |

a) Make a scatterplot of the data, letting x represent the number of years since 1990.

b) Use a graphing calculator to fit linear, quadratic, cubic, and power functions to the data. By comparing the values of [pic], determine the function that best fits the data.

c) Graph the function of best fit with the scatterplot of the data.

d) With each function found in part (b), predict the average tons of waste in 2000 and 2005, and determine which function gives the most realistic predictions.

2. The numbers of insured commercial banks y (in thousands) in the United States for the years 1987 to 1996 are shown in the table. (Source: Federal Deposit Insurance Corporation).

|Year |1987 |

|1910 |139 |

|1920 |149 |

|1930 |157 |

|1940 |175 |

|1950 |216 |

|1959 |303 |

|1969 |390 |

|1978 |449 |

|1987 |462 |

|1997 |487 |

a) Make a scatterplot of the data, letting x represent the number of years since 1900.

b) Use a graphing calculator to fit linear, quadratic, cubic, and power functions to the data. By comparing the values of [pic], determine the function that best fits the data.

c) Graph the function of best fit with the scatterplot of the data.

d) With each function found in part (b), predict the average acreage in 2000 and 2010 and determine which function gives the most realistic predictions.

3. Sports The winning times (in minutes) in the women’s 400-meter freestyle swimming event in the Olympics from 1936 to 1996 are given by the following ordered pairs.

[pic]

[pic]

a) Make a scatterplot of the data, letting x represent the number of years since 1972.

b) Use a graphing calculator to fit linear, quadratic, cubic, and power functions to the data. By comparing the values of [pic], determine the function that best fits the data.

c) Graph the function of best fit with the scatterplot of the data.

d) Plot the actual data and the model you selected on the same graph. How closely does the model represent the data?

Quadratic Regression Problems

1. The following data was obtained by throwing a rubber ball at a CBR.

|Time (sec) |Height (m) |

|0.0000 |1.03754 |

|0.1080 |1.40205 |

|0.2150 |1.63806 |

|0.3225 |1.77412 |

|0.4300 |1.80392 |

|0.5375 |1.71522 |

|0.6450 |1.50942 |

|0.7525 |1.21410 |

|0.8600 |0.83173 |

a) Use the data above to make a scatterplot, letting x represent the number of seconds elapsed.

b) Next, use a graphing calculator to find the model that best expresses the height and vertical velocity of the rubber ball. We can also use this model to predict the maximum height of the ball and its vertical velocity when it hits the face of the CBR.

c) Fit linear, quadratic, cubic, and power functions to the data. By comparing the values of[pic], determine the function that best fits the data.

d) Graph the function of best fit with the scatterplot of the data.

e) Determine the maximum height of the ball (in meters).

f) With the model you selected in part (b), predict when the height of the ball is at least 1.5 meters.

2. Stopping Distance A state highway patrol safety division collected the data on stopping distances in Table 2.16.

a) Draw a scatter plot of the data.

b) Fit linear, quadratic, cubic, and power functions to the data. By comparing the values of[pic], determine the function that best fits the data.

c) Superimpose the regression curve on the scatter plot.

d) Use the regression model to predict the stopping distance for a vehicle traveling at 25 mph.

e) Use the regression model to predict the speed of a car if the stopping distance is 300 ft.

Table 2.16 Highway Safety Division

|Speed (mph) |Stopping Distance (ft) |

|10 |15.1 |

|20 |39.9 |

|30 |75.2 |

|40 |120.5 |

|50 |175.9 |

3. Home Schooling Growth The estimated number of U.S. children that were home-schooled in the years from 1992 to 1997 were:

Table 1.13 Home Schooling

|Year |Number |

|1992 |703,000 |

|1993 |808,000 |

|1994 |929,000 |

|1995 |1,060,000 |

|1996 |1,220,000 |

|1997 |1,347,000 |

a) Produce a scatter plot of the number of children home-schooled in thousands (y) as a function of years since 1990 (x).

b) Find the linear regression equation. (Round the coefficients to the nearest 0.01.)

c) Does the value of [pic] suggest that the linear model is appropriate?

d) Find the quadratic regression equation. (Round the coefficients to the nearest 0.01.)

e) Does the value of [pic] suggest that a quadratic model is appropriate?

f) Use both curves to predict the number of U.S. children that are home-schooled in the year 2005. How different are the estimates?

g) Writing to Learn Use the results of this exploration to explain why it is risky to use regression equations to predict y-values for x values that are not very close to the data points, even when the curves fit the data points very well.

4. Leisure Time The following table shows the median number of hours of leisure time that Americans had each week in various years.

|Year |Median Number of Leisure Hours Per Week |

|1973, 0 |26.2 |

|1980, 7 |19.2 |

|1987, 14 |16.6 |

|1993, 20 |18.8 |

|1997, 24 |19.5 |

Source: Louis Harris and Associates

a) Make a scatterplot of the data, letting x represent the number of years since 1973, and determine which model best fits the data.

b) Use a graphing calculator to fit the type of function determined in part (a) to the data.

c) Graph the equation with the scatterplot. Then, use the function found in part (b) to estimate the number of leisure hours per week in 1978,1990, and 2005.

5. On-line Travel Revenue With the explosion of increased Internet use, more and more travelers are booking their travel reservations on-line. The following table lists the total on-line revenue for recent years. Most of the revenue is from airline tickets.

|Year |On-Line Travel Revenue (In Millions) |

|1996 |$ 276 |

|1997 |827 |

|1998 |1900 |

|1999 |3200 |

|2000 |4700 |

|2001 |6500 |

|2002 |8900 |

Source: Travel and Interactive Technology 1999

a) Create a scatterplot of the data. Let x= the number of years since 1996.

b) Use a graphing calculator to fit the data with linear, quadratic, and exponential functions. Determine which function has the best fit.

c) Graph all three functions found in part (b) with the scatterplot in part (a).

d) Use the functions found in part (b) to estimate the on-line travel revenue in 2010. Which function provides the most realistic prediction?

Quartic Regression Problems

1. Consumer Debt Nonmortgage consumer debt is mounting in the United States, as shown in the table below.

|Year |Non-mortgage Debt (In |

| |Billions) |

|1989 |$ 762 |

|1990 |789 |

|1991 |783 |

|1992 |775 |

|1993 |804 |

|1994 |902 |

|1995 |1038 |

|1996 |1161 |

|1997 |1216 |

|1998 |1266 |

f) Draw a scatter plot of the data.

g) Fit linear, exponential, power, cubic, and quartic functions to the data. By comparing the values of[pic], determine the function that best fits the data.

h) Superimpose the regression curve on the scatter plot.

i) Use the regression model to predict when consumer debt will reach 1400 billion dollars.

2. Declining Number of Farms in the United States Today U.S. farm acreage is about the same as it was in the early part of the twentieth century, but the number of farms has shrunk.

|Year |Number of Farms (in millions) |

|1910 |6.4 |

|1920 |6.5 |

|1930 |6.3 |

|1940 |6.1 |

|1950 |5.4 |

|1959 |3.7 |

|1969 |2.7 |

|1978 |2.3 |

|1987 |2.1 |

|1997 |1.9 |

Looking at the table above, we note that the data could be modeled with a cubic or a quartic function.

a) Model the data with both cubic and quartic functions. Let the first coordinate of each data point be the number of years after 1900. That is, enter the data as (10, 6.4), (20, 6.5), and so on. Then using [pic], the coefficient of determination, decide which functions is the better fit. The [pic]-value gives an indication of how well the function fits the data. The closer [pic] is to 1, the better the fit.

b) Graph the function with the scatterplot of the data.

c) Use the answer to part (a) to estimate the number of farms in 1900, 1975, and 2003.

Exponential Regression Problems

1. In the years before the Civil War, the population of the United States grew rapidly, as shown in the following table from the U.S. Bureau of the Census.

|Year |Population in Millions |

|1790 |3.93 |

|1800 |5.31 |

|1810 |7.24 |

|1820 |9.64 |

|1830 |12.86 |

|1840 |17.07 |

|1850 |23.19 |

|1860 |31.44 |

a) Draw a scatter plot of the data.

b) Fit linear, quadratic, exponential, power, logarithmic, and logistic functions to the data. By comparing the values of[pic], determine the function that best fits the data.

c) Superimpose the regression curve on the scatter plot.

d) Use the regression model to predict the population in 1870.

e) Use the regression model to predict the population in 1930. Explain why/why not you feel this prediction has validity. (Hint: you may want to complete this problem after you finish the problem dealing with Census records after the Civil War.)

2. Projected Number of Alzheimer’s Patients: German psychiatrist Alois Alzheimer first described the disease, later called Alzheimer’s disease, in 1906. Since life expectancy has significantly increased in the last century, the number of Alzheimer’s patients has increased dramatically. The number of patients in the United States reached 4 million in 2000. The following table lists projected data regarding the number of Alzheimer’s patients in years beyond 2000.

|Year, x |Projected Number of Alzheimer’s |

| |Patients in the United States (In |

| |millions) |

|2000 |4.0 |

|2010 |5.8 |

|2020 |6.8 |

|2030 |8.7 |

|2040 |11.8 |

|2050 |14.3 |

a) Draw a scatter plot of the data.

b) Fit linear, exponential, power, logistic and logarithmic functions to the data. By comparing the values of[pic], determine the function that best fits the data.

c) Superimpose the regression curve on the scatter plot.

d) Use the regression model to estimate the number of Alzheimer’s patients in 2005, 2025, and 2100.

3. Number of physicians: The following table contains data regarding the number of physicians in the United States in selected years.

|Year |Total Number of Physicians |

|1950 |219,997 |

|1955 |241.711 |

|1960 |260.484 |

|1965 |292,088 |

|1970 |334,028 |

|1975 |393,742 |

|1980 |467,679 |

|1985 |552,716 |

|1990 |615,421 |

|1994 |684,414 |

|1995 |720,325 |

|1996 |737,764 |

a) Draw a scatter plot of the data.

b) Fit linear, quadratic, cubic, exponential, quartic, and power functions to the data. By comparing the values of[pic], determine the function that best fits the data.

c) Superimpose the regression curve on the scatter plot.

d) Use the regression model to predict the population in 1975.

e) Use the regression model to estimate the number of physicians in 2000 and 2025.

3. Credit Card Volume: The total credit card volume for Visa, MasterCard, American Express, and Discover has increased dramatically in recent years, as shown in the table below. (Source, CardWeb Inc.’s CardData)

|Year, x |Credit Card Volume, y (In Billions) |

|1988 |261.0 |

|1989 |296.3 |

|1990 |338.4 |

|1991 |361.0 |

|1992 |403.1 |

|1993 |476.7 |

|1994 |584.8 |

|1995 |701.2 |

|1996 |798.3 |

|1997 |885.2 |

a) Draw a scatter plot of the data.

b) Fit linear, quadratic, cubic, exponential, quartic, and power functions to the data. By comparing the values of[pic], determine the function that best fits the data.

c) Superimpose the regression curve on the scatter plot.

d) Use the regression model to predict the credit card volume in 2003 and in 2010.

Logarithmic Regression Problems

1. Forgetting In an art class, students were tested at the end of the course on a final exam. Then they were retested with an equivalent test at subsequent time intervals. Their scores after time t, in months, are given in the table.

|Time, t (in |Score, y |

|months) | |

|1 |84.9% |

|2 |84.6% |

|3 |84.4% |

|4 |84.2% |

|5 |84.1% |

|6 |83.9% |

a) Draw a scatter plot of the data.

b) Fit linear, quadratic, logarithmic, and power functions to the data. By comparing the values of[pic], determine the function that best fits the data.

c) Superimpose the regression curve on the scatter plot.

d) Use the regression model to predict test scores after 8, 10, 24, and 36 months.

e) After how long will the test scores fall below 82%?

2. Jamie, a meteorologist, is interested in finding a function that explains the relation between the height of a weather balloon (in kilometers) and the atmospheric pressure (measured in millimeters of mercury) on the balloon. She collects the data shown in Table 10.

a) Using a graphing utility, draw a scatter diagram of the data with atmospheric pressure as the independent variable.

b) Fit linear, quadratic, logarithmic, and power functions to the data. By comparing the values of[pic], determine the function that best fits the data.

c) Superimpose the regression curve on the scatter plot.

d) Use the function in part (b) to predict the height of the weather balloon if the atmospheric pressure is 560 millimeters of mercury.

3. Economics and Marketing The following data represent the price and quantity supplied in 2005 for IBM personal computers.

|Price ($/Computer) |Quantity Supplied |

|2300 |180 |

|2000 |173 |

|1700 |160 |

|1500 |150 |

|1300 |137 |

|1200 |130 |

|1000 |113 |

a) Using a graphing utility, draw a scatter diagram of the data with price as the dependent variable.

b) Using a graphing utility, try a variety of function families. Compare the values [pic] to find the function that best fits the data.

c) Using a graphing utility, draw the function found in part (b) on the scatter diagram.

d) Use the function found in part (b) to predict the number of IBM personal computers that will be supplied if the price is $1650.

Power Regression Problems

1. Use the data in the table below to obtain a model for speed p versus distance traveled d. Consider linear, quadratic, exponential, power, and quartic models. Then use the model you selected as the best fit to predict the speed of the ball at impact, given that impact occurs when [pic]m.

Table 2.12 Rubber Ball Data from CBR Experiment

|Distance (m) |Speed (m/s) |

|0.00000 |0.00000 |

|0.04298 |0.82372 |

|0.16119 |1.71163 |

|0.35148 |2.45860 |

|0.59394 |3.05209 |

|0.89187 |3.74200 |

|1.25557 |4.49558 |

2. The length of time that a planet takes to make one complete rotation around the sun is its year. The table shows the length (in earth years) of each planet’s year and the distance of that planet from the sun (in millions of miles). Find a model for this data in which x is the length of the year and y the distance from the sum.

|Planet |Year |Distance |

|Mercury |.24 |36.0 |

|Venus |.62 |67.2 |

|Earth |1 |92.9 |

|Mars |1.88 |141.6 |

|Jupiter |11.86 |483.6 |

|Saturn |29.46 |886.7 |

|Uranus |84.01 |1783.0 |

|Neptune |164.79 |2794.0 |

|Pluto |247.69 |3674.5 |

3. Cholesterol Level and the Risk of Heart Attack. The data in the following table show the relationship of cholesterol level in men to the risk of a heart attack.

|Cholesterol Level, x |Men, Per 10,000, Who Suffer A Heart Attack, y |

|100 |30 |

|200 |65 |

|250 |100 |

|275 |130 |

|300 |180 |

a) Use a graphing calculator to fit a model function to the data. Consider linear, exponential, power, and cubic functions.

b) Graph the function with the scatterplot of the data.

c) Use the answer to part (a) to estimate the heart attack rate for men with cholesterol levels of 150, 350, and 400.

Logistic Regression Problems

1. After the Civil War, the U.S. population increased, as shown below.

|Year |Population in Millions |

|1870 |38.56 |

|1880 |50.19 |

|1890 |62.98 |

|1900 |76.21 |

|1910 |92.23 |

|1920 |106.02 |

|1930 |123.20 |

|1940 |132.16 |

|1950 |151.33 |

|1960 |179.32 |

|1970 |202.30 |

|1980 |226.54 |

|1990 |248.72 |

|2000 |281.42 |

a) Draw a scatter plot of the data.

b) Fit linear, quadratic, exponential, power, logarithmic, and logistic functions to the data. By comparing the values of[pic], determine the function that best fits the data.

c) Use the regression model to predict the population in 1975 and in 2010. Explain why/why not you feel this prediction has validity.

2. Effect of Advertising A company introduces a new software product on a trial run in a city. They advertised the product on television and found the following data relating the percent P of people who bought after x ads were run.

|Number of Ads, x |% Who Bought, P |

|0 |0.2 |

|10 |0.7 |

|20 |2.7 |

|30 |9.2 |

|40 |27 |

|50 |57.6 |

|60 |83.3 |

|70 |94.8 |

|80 |98.5 |

|90 |99.6 |

Draw a scatter plot of the data. Then, fit linear, exponential, power, logistic and logarithmic functions to the data. By comparing the values of[pic], determine the function that best fits the data. Then use the regression model to predict the percent P of people who will buy the software after 100 ads are run.

* Relate what you have discovered in this exercise to what you have observed in television ads. What could the company do to change this pattern?

-----------------------

Table 10

|Atmospheric Pressure, p |Height,h |

|760 |0 |

|740 |0.184 |

|725 |0.328 |

|700 |0.565 |

|650 |1.079 |

|630 |1.291 |

|600 |1.634 |

|580 |1.862 |

|550 |2.235 |

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download