Getting your data into SAS: entering data with viewtable



An overview of SAS system and how to run it.

• What is SAS?

SAS (Statistical analysis system) is many things. Among other things, it can be used for data management and report generation, as a statistical package including descriptive and inferential statistics (for a wide variety of problems) and as a general programming environment, including a portion called IML which allows for easy handling of matrices. The complete package is extremely comprehensive, and continues to expand at a rapid rate This course provides an introduction to SAS with the primary focus on data management and statistical computing, including how to program to handle customized analyses.

Important links through OIT:

- Info on OIT Classrooms (which have SAS): http: ://oit.umass.edu/classrooms/index.html

- Info on SAS site licenses if you want to buy your own copy.:

- Basic instructions for using SAS, including online documentation for SAS



• Where do I run SAS?

a. On a PC with Windows: You can run SAS in any of the OIT computer classrooms or you can buy a copy (renewable each year) for your own machine.

b. The Math-stat 1537 has one PC with windows that has SAS installed. The computer lab LGRT 110 has SAS as well, but it is mainly for math-stat undergraduates.

c. Using Unix. For graduate students in Math-Stat, you can run SAS interactively from the workstations in 1537. Or you can sign into the dept. machine remotely and use SAS but you need X-windows emulation.

• How do I run SAS?

a. On a PC, SAS is run interactively in the windows environment. When SAS is invoked there are various sub-windows, the three main ones being the program window, the output window and the log window. When you are actually using programming statements (as opposed to click and go as in Analyst, etc.) the programming statements are put in the program window. The program is then submitted (by clicking on the running figure), with output going to the output window and the log window containing information (including error messages) about the job. To run only portions of the program in the program window, you highlight that portion and then submit.You can open and save files from the various windows. Various other aspects of running in this environment will be demonstrated in class and are described in the on-line help.

b. On Unix, if you want to run SAS interactively with the various sub-windows, type the command “sas” in the terminal. It can be easier to run SAS on a unix machine in batch mode, without an Xterminal emulator (but this is tedious when you are first writing programs that could need a lot of debugging). Some editor (e.g., emacs or VI) is used to create control files, examine and edit output files, etc. To do this first, the control file is created using the editor and saved as xxx.sas where xxx is a name of your choosing. To submit the control file in batch mode one simply types sas xxx or sas xxx & where the version with the & runs the job in background. When the job is done you get a return to the system prompt (or a done message if you submitted it to background). Two files will have been created xxx.log which is the log file and xxx.lst which is the output file (but xxx.lst will be empty if there were errors that prevented the execution). You can now read and edit these as usual using an editor. It is always a good idea to look over the .log file first as one

can often get output in the .lst file even though some errors might have occurred.

• The general structure of running SAS using programming statements versus “click and go”.

a. Using programming statements.

The data step creates a SAS data file. The data step might involve reading data from a file in a particular way or consist of programming statements which generate a data file (or a combination of the two). SAS data sets can also be created by importing data. Once a SAS data set is in place, there is a large number of pre-programmed procedures (procs) that can be run on that data. These are invoked with a proc statement. For the most part, each proc has a fixed set of rules and options that must be followed.

b. Running SAS without “programming”.

There are various options now available in SAS for running SAS without programming, but instead using drop down menus and “click and go”. We will not cover this type of analysis. The analyst, the main tool for the “click and go” analysis, is easy to use (although if you are proficient with SAS language, it is sometimes faster and easier to program directly), creates the accompanying SAS code, produces graphs and has some statistical tools not available in other parts of SAS (such as power and sample size calculations). At the same time, it does not have all of the options available through the procedures.

Getting your data into SAS: entering data with viewtable

• Menu: tools ( table editor

• A blank viewtable is displayed. The rows (observations) are labeled with numbers and the columns (variables) are labeled with letters. You can enter the data and SAS will automatically figure out if your columns are numeric or character, according to the data values

• The column names can be changed by double clicking the default names. The column attributes such as the variable type, can be changed by right clicking the column names and open the Column Attributes Window.

• To save the data, click the save icon above the viewtable. Select a library, which corresponds to a directory, and then specify the member name of your table. Note if you choose the library Work, the data will only be available for the current SAS session.

• Open and edit existing tables by clicking the open icon above the blank viewtable. Or one can open an existing table through SAS explorer or windows explorer.

• To reference the table you created in viewtable, use libraryname.membername. For example,

Proc print data=sasuser.test;

Run;

Getting started using SAS software

The SAS program

• A SAS program is a sequence of statements executed in order

• Every SAS statement ends with a semicolon

• Layout of SAS programs is flexible

1) SAS statements can be in upper or lower case

2) Statements can continue on the next line (don’t split words)

3) Statements can be on the same line as other statements, but use semicolon to separate

4) Statements can start in any column

• Comments can be inserted into a program. Comments can be enclosed in asterisk (*) and semicolon (;), or a slash asterisk (/*) and an asterisk slash (*/)

*Add a new variable to the existing data sasuser.test;

data sasuser.test;

set sasuser.test;

height_cm=2.5*height;

proc print data=sasuser.test; /* print the results*/

run;

• SAS programs are constructed from two basic building blocks: DATA steps and PROC steps. Most statements work in only one type of step. The basic differences between them:

DATA steps PROC steps

|begin with DATA statements |begin with PROC statements |

|read and modify data |perform specific analysis or function |

|many statements can be used: input data, do loops, |only a handful of possible statements for each SAS procedure |

|if-then/else logic and many functions | |

|create a SAS data set |produce results or report |

• A step ends when SAS encounters a new step, a RUN statement, or the end of the program. Run statements tell SAS to run all the proceeding lines of the step and are among those rare global statements that are not part of a DATA or PROC step.

SAS data sets

• SAS data sets are also called tables, observations are also called rows, and variables are also called columns.

• There are just two types of variables: numeric and character. A variable containing letters or special characters must be character data. Sometimes data that consist only of numerals make more sense as character data than as numeric, ZIP codes, for example.

• Missing character data are represented by blanks, and missing numeric data are represented by a single period

• Rules for SAS names:

1) Names must be 32 characters or fewer in length

2) Names must start with a letter or an underscore ( _ )

3) Names can contain only letters, numerals, or underscores. No other special characters

4) Names can contain upper and lower case letters. SAS is insensitive to case.

• Viewing data sets with SAS explorer: menu: view ( explorer, where you can

1) Create new library

2) View existing data sets

3) List the properties of a SAS data set

Submitting SAS programs

• SAS programs are entered into the editor window. Highlight the part of the program that you want to run, then click on the submit icon on top

• After submission, any notes, errors, or warnings associated with your program as well as the program statements themselves will appear in the log window. It is very important to check SAS log for errors before proceeding to the results.

• If your program generates any printable results, then they will appear in the Output window

• The Results window is like a table of contents for your output window

Using SAS system options

• System options control SAS –how it works, what the output looks like, how much memory is used, error handing and a host of other things.

• In SAS windowing environment, two common ways of specifying system options:

1) Change selected options in the SAS System Options window. Menu: Tools ( options

2) Use the OPTIONS global statement as a part of your SAS program:

OPTIONS LINESIZE=80 NODATE;

• Common options:

CENTER|NOCENTER Controls whether output are centered or left-aligned. Default: CENTER

DATE|NODATE Controls whether or not today’s date will appear at the top of each page of output. Default: DATE

LINESIZE = n Controls the maximum length of output lines. Possible values for n are 64 to 256. Default varies according to system.

NUMBER|NONUMBER Controls whether or not page number appear on each page of SAS output. Default: NUMBER

ORIENTATION=PORTRAIT Specifies the orientation for printing output. Default: PORTRAINT

ORIENTATION=LANDSCAPE

PAGENO = n Starts numbers output pages with n. Default is 1.

PAGESIZE = n Controls the maximum number of lines per page of output. Possible values for n are 15 to 32767. Default varies.

RIGHTMARGIN = n Specifies size of margin (such as 0.75in or 2cm) to be used for printing output. Default: 0in

LEFTMARGIN = n

TOPMARGIN = n

BOTTOMMARGIN = n

YEARCUTOFF = yyyy Specifies the cutoff year for DATE informats/functions

Getting your data into SAS: continued

Methods of getting data into SAS:

1) Enter data directly with viewtable

2) Creating SAS data sets from raw data files

3) Converting other software’s data files into SAS data sets

4) Reading other software’s data files directly

Reading files with the Import Wizard

Menu: File ( Import Data. Show example: bands.csv

The Import Wizard can read all types of delimited files including comma-separated values (CSV) files which are a common file type for moving data between applications. If you have SAS/ACCESS for PC File Formats software, then you can also read a number of popular PC file types (Excel, Access, Lotus, etc).

Creating SAS data sets from raw data files

• Telling SAS where to find your raw data

If your data are in raw data files (also referred to as text, ASCII, sequential, or flat files), using the DATA step to read the data give you the most flexibility.

1) Internal raw data: use the CARDS or DATALINES statement to indicate internal data. The CARDS or DATALINES statement must be the last statement in the DATA step. All lines in the SAS program following the DATALINES statement are considered data until SAS encounters a semicolon. The semicolon must be on a line by itself.

DATA uspresidents;

INPUT President $ Party $ Number;

DATALINES;

Adams F 2

Lincoln R 16

Grant R 18

Kennedy D 35

;

RUN;

2) External raw data: use the INFILE statement to tell SAS the filename and location

DATA uspresidents;

INFILE 'C:\Documents and Settings\anna\Desktop\MyDesktop\597.F10\president.txt';

INPUT President $ Party $ Number;

RUN;

➢ SAS log shows the number of records read from the file, which should be checked to ensure the file is read correctly.

➢ In some operating systems, SAS assumes external files have a record length of 256 or less. The default can be changed with the LRECL= option in the INFILE statement:

INFILE 'C:\Documents and Settings\anna\Desktop\MyDesktop\597.F10\president.txt' LRECL=2000;

• Reading Raw Data Separated by Spaces

The simplest INPUT statement simply list the variable names after the INPUT keyword in the order they appear in the data file. This type of input is called list input (or free formatted input). It is appropriate for reading data separated by spaces. It is easy to use but have a few limitations:

a. You must read all the data in a record – no skipping

b. Any missing data must be indicated with a period

c. Character data cannot have embedded spaces and cannot be greater than eight characters in length

d. Special data such as dates needs formatted input, therefore cannot be read by list input

DATA HTWT;

INFILE 'C:\Documents and Settings\anna\Desktop\MyDesktop\597.F10\htwt.txt';

INPUT ID gender$ age height weight;

RUN;

• Reading Raw Data Arranged in Columns

Column input can be used if each of the variable’s value is always found in the same place of the data line. It has the following advantages over list input:

i. Spaces are not required between values

ii. Missing values can be left blank

iii. Character data can have embedded spaces or longer than 8 characters

iv. You can skip unwanted variables

DATA sales;

INPUT visitingteam $ 1-20 ConcessionSales 21-24 BleacherSales 25-28

ourhits 29-31 theirhits 32-34 ourRuns 35-37 theirRuns 38-40;

*---+----1----+----2----+----3----+----4;

DATALINES;

Columbia Peaches 35 67 1 10 2 1

Plains Peanuts 210 2 5 0 2

Gilroy Garlics 151035 12 11 7 6

Sacramento Tomatoes 124 85 15 4 9 1

;

• Reading Raw Data Not in Standard Format

Numbers with embedded commas or dollar signs, and dates like 10-31-2003 or 31OCT03 are examples of non-standard data. These data can be read with formatted input.

DATA contest;

INPUT name $ 1-16 age type $ date mmddyy10. sccore1 amount comma9.1;

*format date mmddyy10.;

*format amount comma9.2;

*---+----1----+----2----+----3----+----4----+;

DATALINES;

Alica Grossman 13 c 10-28-2003 7.8 $1,000.8

Matthew Lee 9 D 10-30-2003 6.5 $1,023.4

;

PROC PRINT DATA=contest;

RUN;

General format for informat (input format):

Character Numeric Date

$informatw. $informatw.d $informatw.

Commonly used informats:

a. For character: $w.

b. For numeric: w.d, commaw.d

c. For date: mmddyyw.

Summary on types of input

1. list input: easy to use

2. column input: allow space in each variable field; do not require spaces between variables

3. formatted input: read special data

/*Exercise1*/

Variable names: Name Age Type Date Score1-Score5

Alicia Grossman 13 c 10-28-2003 7.8 6.5 7.2 8.0 7.9

Matthew Lee 9 d 10-30-2003 6.5 5.9 6.8 6.0 8.1

Elizabeth Garcia 10 C 10-29-2003 8.9 7.9 8.5 9.0 8.8

/*Exercise2*/

Variable names: ID Name Score;

1 stevenson 89

2 cody 100

3 smith 55

4 gettlefinger 92

/*Exercise3*/

Variable names: Park Name State Year Acreage

Yellowstone ID/MT/WY 1872 4,065,493

Everglades FL 1934 1,398,800

Yosemite CA 1864 760,917

Great Smoky Mountains NC/TN 1926 520,269

Wolf Trap Farm VA 1966 130

• Mixing input styles

One needs to be careful when mixing input styles. Note when SAS reads a line of raw data, it uses a pointer to mark its place. Each style of input uses the pointer a little differently. After reading a variable, the pointer stays at the end of the variable field, if the next variable is a

1. list input, the pointer automatically moves to the next non-blank field and starts reading

2. column input, the pointer follows the instruction of the column specifier

3. format input, SAS starts reading immediately

There are three commonly used column pointers and two line pointers.

Three column pointers:

1. @n, put pointer at column n

2. +n, move pointer to the right by n columns

3. @character, put pointer next to the end of the specified character

data contest;

input name $16. age 3. +1 type $1. +1 date mmddyy10. sccore1 5.1 +2 dollar comma10.1;

format dollar comma9.2;

*---+----1----+----2----+----3----+----4----+;

datalines;

Alica Grossman 13 c 10-28-2003 7.8 $1,000.8

Matthew Lee 9 D 10-30-2003 6.5 $1,023.4

;

proc print data=contest;

run;

or equivalently,

data contest;

input name $16. age 3. @21 type $1. @23 date mmddyy10. sccore1 5.1 @38 dollar comma8.1;

*---+----1----+----2----+----3----+----4----+;

datalines;

Alica Grossman 13 c 10-28-2003 7.8 $1,000.8

Matthew Lee 9 D 10-30-2003 6.5 $1,023.4

;

/* @'character' pointer */

data weblogs;

input @'[' accessdate date11. @'/' file :$20.;

datalines;

130.192.70.235 -- [08/Jun/2001:23:51:32 - 0700] "GET /rover.jpg http/1.1" 200 66820

128.32.236.8 -- [08/Jun/2001:23:51:40 - 0700] "GET /grooming.html http/1.0" 200 8471

;

proc print data=weblogs;

run;

/* Note that pointer can also be used to change the order of reading variables*/

• Multiple lines of data per observation

data highlow;

input city $ state $ normalHigh normalLow RecordHigh RecordLow;

datalines;

Nome AK

55 44

88 29

Miami FL

90 75

97 65

Raleigh NC

88 68

105 50

;

proc print;

run;

/*SAS secret: the pointer moves to the next line if there are more variables than data on the current line*/

data highlow;

input city $ state $;

input normalHigh normalLow;

input RecordHigh RecordLow;

datalines;

Nome AK

55 44

88 29

Miami FL

90 75

97 65

Raleigh NC

88 68

105 50

;

proc print;

run;

/*SAS secret: the pointer moves to the next line after an input statement;*/

Two line pointers: moves pointer to the next line (/) or the specified line (#n). Using line pointers, one can skip certain lines and read selective lines.

data highlow;

input city $ state $

/ normalHigh normalLow

#3 RecordHigh RecordLow;

datalines;

Nome AK

55 44

88 29

Miami FL

90 75

97 65

Raleigh NC

88 68

105 50

;

proc print;

run;

SAS secret: Data steps execute line by line and observation by observation

[pic]

data highlow;

input city $ state $

/ normalHigh normalLow #3;

datalines;

Nome AK

55 44

88 29

Miami FL

90 75

97 65

Raleigh NC

88 68

105 50

;

proc print;

run;

• Multiple observations per line and reading part of raw data

Use line holders: double trailing @@ or single trailing @. Double trailing @@: a double trailing at the end of the sas input statement instructs sas to hold that line of data, continuing to read observations until it either runs out of data or reaches an input statement that does not end with a double trailing. Single trailing releases the pointer to the next line until it either runs out of data, or reaches an input statement that does not end with trailing, or SAS reaches the end of the data step for an observation. Double trailing holds the line more strongly. The double trailing holds a line even when SAS starts building a new observation.

@@ is mainly used to read data with multiple observations per line

data rainfall;

input city $ state $ normalRain meanDaysRain @@;

datalines;

Nome AK 2.5 15 Miami FL 6.75

18 Raleigh NC . 12

;

proc print;

run;

@ is mainly used to read part of raw data

/* The following data step outputs "freeway" data only*/

data traffic;

input type $ name $ 9-38 AMtraffic PMtraffic;

if type="freeway";

Mtraffic=mean(of AMtraffic PMtraffic);

*---+----1----+----2----+----3----+----4;

datalines;

freeway 408 3684 3459

surface Martin Luther King Jr. Blvd. 1590 1234

surface Broadway 1259 1290

surface Rodeo Dr. 1890 2067

freeway 608 4583 3860

freeway 808 2386 2518

surface Lake Shore Dr. 1590 1234

surface Pennsylvania Ave. 1259 1290

;

proc print;

run;

data traffic;

input type $ @;

if type="surface" then delete;

if type="freeway" then input name $ 9-38 AMtraffic PMtraffic;

datalines;

freeway 408 3684 3459

surface Martin Luther King Jr. Blvd. 1590 1234

surface Broadway 1259 1290

surface Rodeo Dr. 1890 2067

freeway 608 4583 3860

freeway 808 2386 2518

surface Lake Shore Dr. 1590 1234

surface Pennsylvania Ave. 1259 1290

;

proc print;

run;

/*double trailing and single trailing do not make a difference in the following example*/

data traffic;

input type $ @@;

if type="surface" then

input name_c $ 9-38 AMtraffic PMtraffic;

if type="freeway" then

input name_n $ 9-38 AMtraffic PMtraffic;

datalines;

freeway 408 3684 3459

surface Martin Luther King Jr. Blvd. 1590 1234

surface Broadway 1259 1290

surface Rodeo Dr. 1890 2067

freeway 608 4583 3860

freeway 808 2386 2518

surface Lake Shore Dr. 1590 1234

surface Pennsylvania Ave. 1259 1290

;

proc print;

run;

Exercise:

Each line contains a three-digit number, a two-digit number, and a four-digit number, with four sets of numbers for each subject;

datalines;

123121234217874444123872345873235432

192837465748392919283747372818182838

;

• Reading delimited files with the data step

To read files delimited by comma, or tab, use the dlm option with the infile statement.

*Reading comma delimited files;

data HTWT;

infile datalines dlm=',';

input ID name$ age height weight;

datalines;

1, Ali G, 23, 68, 155

2, Mat L, , 61, 102

3, Liz G, 55, 70, 202

;

proc print;run;

*Use dlm='09'X to read tab delimited files;

data HTWT;

infile "C:\Documents and Settings\anna\Desktop\MyDesktop\597.F10\htwt_tab.txt" dlm='09'X;

input ID name$ age height weight;

proc print;run;

data HTWT;

infile datalines dlm=',' dsd;

input ID name $ age height weight;

datalines;

1, "G, Ali", 23, 68, 155

2, "L, Mat", , 61, 102

3, "G, Liz", 55, 70, 202

;

proc print data=htwt;

run;

More options with the infile statement

1. DLM option. Two delimiters in a row is treated as one.

2. DSD option (DSD represents delimiter-sensitive data). It does:

1) Ignores delimiter enclosed in quotation marks

2) Strip off the quotation marks

3) Two consecutive delimiters indicate a missing value

3. firstobs=

4. obs=

5. lrecl=256

6. missover: missing values anticipated at the end of a data line

7. truncover: use truncover if you are using column or formatted input for the last variable and expect some of its data values are shorter than others.

8. pad: pad each line of data with blank spaces

data HTWT;

infile 'C:\Documents and Settings\anna\Desktop\MyDesktop\597.F10\htwt_tab.txt' dlm='09'X lrecl=5;

input ID gender$ age height weight;

proc print data=htwt;

run;

Note  The DATALINES automatically PADs the data whereas the external file does not.

/*Exercise*/

Variable names: Name, Number, Address

John Garcia 114 Maple Ave.

Sylvia Chung 1302 Washington Drive

Martha Newton 45 S.E. 14th St.

;

/*Exercise*/

Variable names: BandName, GigDate, EightPM, NinePM, TenPM, ElevenPM

Lupine Lights,12/3/2003,45,63,70,

Awesome Octaves,12/15/2003,17,28,44,12

"Stop,Drop, and Rock-N-Roll",1/5/2004,34,62,77,91

The Silveyville Jazz Quartet,1/18/2004,38,30,42,43

Catalina Converts,1/13/2004,56,,65,34

;

• Reading delimited files or PC files with the import procedure

Proc import datafile='filename' out=data-set DBMS=identifier REPLACE;

If filename is with extension .csv, .txt, .xls, dbf, .wk1, .wk3, .wk4, DBMS option can be

skiped. Otherwise, use DBMS=DLM, and specify delimiter with DELIMITER='delimiter-character' statement;

Proc import datafile='C:\Documents and Settings\anna\Desktop\MyDesktop\597.F10\Bands.csv'

out=bandsData REPLACE;

getnames=no;

run;

proc print;run;

• Permanent SAS data sets

When a SAS data set is created and given a two level name: libname.memberName, it is a permanent unless the libname is WORK or unspecified. A library can be defined through point-and-click the menus. It can also be defined through a SAS statement:

libname mysas "C:\Documents and Settings\anna\Desktop\MyDesktop\597.F10";

Proc import datafile='C:\Documents and Settings\anna\Desktop\MyDesktop\597.F10\Bands.csv'

out=mysas.bandsData REPLACE;

getnames=no;

run;

You can see a SAS data set called bandsData.sas7bdat created under the directory to which the library mysas corresponds.

Another way to create permanent SAS data set is through direct referencing:

data "C:\Documents and Settings\anna\Desktop\MyDesktop\597.F10\bands";

infile datalines dlm="," dsd;

input BandName :$50. GigDate :mmddyy10. EightPM NinePM TenPM ElevenPM;

datalines;

Lupine Lights,12/3/2003,45,63,70,

Awesome Octaves,12/15/2003,17,28,44,12

"Stop,Drop, and Rock-N-Roll",1/5/2004,34,62,77,91

The Silveyville Jazz Quartet,1/18/2004,38,30,42,43

Catalina Converts,1/13/2004,56,,65,34

;

run;

You can see a SAS data set called bandsData.sas7bdat created under the specified directory

• The contents procedure

proc contents data=datasetName;

-----------------------

Observation 1

Observation 2

Observation 3

Observation 4

Line 1

Line 2

Line 3

Input data

SAS program

Output data

Observation 1

Observation 2

Observation 3

Observation 4

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download