Assignment No



SD Module- Python

Assignment No. 4

Title:

Write python code that loads any data set (example game_medal.csv) & does some basic data cleaning. Add component on data set.

Objectives:

Understand the basics of Data preprocessing, learn Pandas basic plot function ,matplotlib, Seaborn etc.

Problem Definition:

Develop python code that loads any data set (example game_medal.csv) & does some basic data cleaning. Add component on data set

Outcomes:

10 1. Students will be able to demonstrate Python data preprocessing

11 2. Students will be able to demonstrate Plot the Graph in Python using Pandas Plot Function

12 3. Students will be able to demonstrate matplotlib, seborn packages.

Hardware Requirement: Any CPU with Pentium Processor or similar, 256 MB RAM or more,1 GB Hard Disk or more

14

Software Requirements: 32/64 bit Linux/Windows Operating System, R Studio

16

Theory:

Preprocessing

Data preprocessing is a data mining technique that involves transforming raw data into an understandable format. Real-world data is often incomplete, inconsistent, and/or lacking in certain behaviors or trends, and is likely to contain many errors. Data preprocessing is a proven method of resolving such issues. Data preprocessing prepares raw data for further processing.

Why preprocessing?

Real-world data are generally:

Incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data

Noisy: containing errors or outliers

Inconsistent: containing discrepancies in codes or names

Tasks in data preprocessing:

• Data cleaning: fill in missing values, smooth noisy data, identify or remove outliers, and resolve inconsistencies.

• Data integration: using multiple databases, data cubes, or files.

• Data transformation: normalization and aggregation.

• Data reduction: reducing the volume but producing the same or similar analytical results.

• Data discretization: part of data reduction, replacing numerical attributes with nominal ones.

Mini Project- In this Assignment we are using PANDAS Package to perform following Operation on given dataset

grouping

filtering

visualizing



Dataset- Here we are used same dataset Summer.CSV used in Assignment-3



[pic]

[pic]

Using .pivot_table() to count medals by type

• Rather than ranking countries by total medals won and showing that list, you may want to see a bit more detail.

▪ You can use a pivot table to compute how many separate bronze, silver and gold medals each country won.

o That pivot table can then be used to repeat the previous computation to rank by total medals won.

[pic]

[pic]

[pic]

[pic]

[pic]

Applying .drop_duplicates()

• What could be the difference between the 'Event_gender' and 'Gender' columns?

• you should be able to evaluate your guess by looking at the unique values of the pairs (Event_gender, Gender)

• The duplicates can be dropped using the .drop_duplicates() method, leaving behind the unique observations.

[pic]

[pic]

Finding possible errors with .groupby()

• you will now use .groupby() to continue your exploration. Your job is to group by 'Event_gender' and 'Gender' and count the rows.

• You will see that there is only one suspicious row: This is likely a data error.

Locating suspicious data

[pic]

Constructing alternative country rankings

Counting distinct events

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

Conclusion/Analysis: Hence we are able to draw the various plot using seaborn, matplotlib and pandas packages on suitable dataset.

Assignment Question?

1. Write a command for draw pivot table?

2. What is the command for see table information?

3. How to Applying .drop_duplicates() method?

4. How to use .groupby() Method?

5. What do you mean by unstacking?

6. Write a command for to create area plot?

7. Write list of command for visualization?

8. Write list of command for grouping?

9. Write list of command for filtering?

Oral Question?

1. What do you mean histogram?

2. What do you mean scatter plot?

3. What do you mean pie chat?

4. What do you mean bar chart?

5. What do you mean heatmap?

6. What do you mean scatter plot?

References:-



[pic]

-----------------------

|W (4) |C |D |V |T |Total Marks with |

| |(4) |(4) |(4) |(4) |Sign |

| | | | | | |

-----------------------

SNJB’S K.B.J. COLLEGE OF ENGINEERING, CHANDWAD

1

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download