Statistics 112: Final Project - Statistics Department



Statistics 112: Final Project

For the final project, you should work alone or with a partner. The purpose of the project is for you to gain experience in applying the methods taught in the class to a real data set of interest to you.

Due Dates

Now – November 16th: Set up appointment with me or talk with me during office hours about your ideas for the final project.

Thursday, November 16th (beginning of class): Hand in to me a paragraph describing the data set you plan to analyze and the questions of interest.

Tuesday, December 12th (5 p.m.): Hand in to me annotated JMP output on which your report will be based along with a few paragraphs describing your results. If you have any issues about what you should do in your data analysis, write them down for me and I will discuss them with you. I will look this over and have my comments available for you by Wednesday afternoon. If you give me your draft earlier, I will return it to you earlier.

Tuesday, December 19th (5 p.m.): Hand in to me your final report. Note that the final homework assignment will also be due at this time.

I will be available throughout the reading and exam period to discuss your projects with you.

Project Description

The standard project is to use multiple regression analysis to analyze a data set that is of interest to you. If you have a strong interest in analysis of variance (the topic we will cover after multiple regression), your project can consist of using analysis of variance to analyze a data set.

The final report for the project should be a 5-10 page paper (this does not include additional JMP output) that describes the questions of interest, how you used your data set to analyze these questions with details on the steps you used in your analysis, your findings about your question of interest and the limitations of your study. Specifically, your report should contain the following:

1. Abstract: A one paragraph summary of what you set out to learn, and what you ended up finding. It should summarize the entire report.

2. Introduction: A discussion of what questions you are interested in.

3. Data Set: Describe details about how the data set was collected and the variables in the data set.

4. Analysis: Describe how you used multiple regression to analyze the data set. Specifically, you should discuss how you carried out the steps in analysis discussed in class, i.e., exploration of data to find an initial reasonable model, checking the model and changes to the model based on your checking of the model.

5. Results: Provide inferences about the questions of interest and discussion.

6. Limitations of study and conclusion: Describe any limitations of your study and how they might be overcome in future research and provide brief conclusions about the results of your study.

Data Sets

The project will be of most interest to you if you find questions of interest and a data set that are of interest to you.

Examples of questions of interest are as follows:

What properties of a baseball team best predict its success over the course of a season?

What properties of a college are related to its rank in the U.S. News and World Report rankings?

Is the unemployment rate related to economic measures such as interest rates, stock returns, and the inflation rate?

What properties of a state predict the proportion of the vote that George Bush (John Kerry) received in it?

You will need a data set to explore your question of interest. I will be happy to help you with suggestions. The data set should ideally contain at least 30-50 observations (e.g., companies, people, countries, etc., as the case may be), and at least 4 variables (pieces of information about the observations; e.g., stock price, revenues, profits, salaries, gender, etc.), although if that is not possible, exceptions will be allowed (subject to my approval). One of the variables should be such that it is a numerical variable that would be of interest to try to model or forecast (e.g., for the examples above, team winning percentage, stock price change, U.S. News and World Report rank, gas mileage, unemployment rate, and proportion of vote received respectively).

I will be happy to discuss ideas with you. Here are a few potential sources of ideas and data:

The Data and Story Library (DASL) has many interesting data sets:



The following web site from a course at Duke has several interesting data sets:



Samples

A good sample of what I’m expecting from the projects and reports is contained at the web site . Note that these reports are for a class taught at New York University by Jeffrey Simonoff, so some of the methods used in the regression analyses may be unfamiliar to you.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download