|
|
沙发

楼主 |
发表于 2007-4-2 10:14:48
|
只看该作者
1.2
1.2 SAS Data Sets
Before you run an analysis, before you write a report, before you do anything with your data, SAS must be able to read your data. Before SAS can analyze your data, the data must be in a special form called a SAS data set.[1] Getting your data into a SAS data set is usually quite simple as SAS is very flexible and can read almost any data. Once your data have been read into a SAS data set, SAS keeps track of what is where and in what form. All you have to do is specify the name and location of the data set you want, and SAS figures out what is in it.
Variables and observations Data, of course, are the primary constituent of any data set. In traditional SAS terminology the data consist of variables and observations. Adopting the terminology of relational databases, SAS data sets are also called tables, observations are also called rows, and variables are also called columns. Below you see a rectangular table containing a small data set. Each line represents one observation, while Id, Name, Height, and Weight are variables. The data point Charlie is one of the values of the variable Name and is also part of the second observation.
Open table as spreadsheet Variables (Also Called Columns)
Id Name Height Weight
Observations (Also Called Rows) 1 53 Susie 42 41
2 54 Charlie 46 55
3 55 Calvin 40 35
4 56 Lucy 46 52
5 57 Dennis 44 .
6 58 43 50
Data types Raw data come in many different forms, but SAS simplifies this. In SAS there are just two data types: numeric and character. Numeric fields are, well, numbers. They can be added and subtracted, can have any number of decimal places, and can be positive or negative. In addition to numerals, numeric fields can contain plus signs (+), minus signs (−), decimal points(.), or E for scientific notation. Character data are everything else. They may contain numerals, letters, or special characters (such as $ or !) and can be up to 32,767 characters long.
If a variable contains letters or special characters, it must be character data. However, if it contains only numbers, then it may be numeric or character. You should base your decision on how you will use the variable.[2] Sometimes data that consist solely of numerals make more sense as character data than as numeric. ZIP codes, for example, are made up of numerals, but it just doesn't make sense to add, subtract, multiply, or divide ZIP codes. Such numbers make more sense as character data. In the previous data set, Name is obviously a character variable, and Height and Weight are numeric. Id, however, could be either numeric or character. It's your choice.
Missing data Sometimes despite your best efforts, your data may be incomplete. The value of a particular variable may be missing for some observations. In those cases, missing character data are represented by blanks, and missing numeric data are represented by a single period (.). In the preceding data set, the value of Weight for observation 5 is missing, and its place is marked by a period. The value of Name for observation 6 is missing and is just left blank.
Size of SAS data sets Prior to SAS 9.1, SAS data sets could contain up to 32,767 variables. Beginning with SAS 9.1, the maximum number of variables in a SAS data set is limited by the resources available on your computer―but SAS data sets with more than 32,767 variables cannot be used with earlier versions of SAS. The number of observations, no matter which version of SAS you are using, is limited only by your computer's capacity to handle and store them.
Rules for SAS names You make up names for the variables in your data and for the data sets themselves. It is helpful to make up names that identify what the data represent, especially for variables. While the variable names A, B, and C might seem like perfectly fine, easy-to-type names when you write your program, the names Sex, Height, and Weight will probably be more helpful when you go back to look at the program six months later. Follow these simple rules when making up names for variables and data set members:
・ Names must be 32 characters or fewer in length.[3]
・ Names must start with a letter or an underscore ( _ ).
・ Names can contain only letters, numerals, or underscores ( _ ). No %$!*&#@, please.[4]
・ Names can contain upper- and lowercase letters.
This last point is an important one. SAS is insensitive to case so you can use uppercase, lowercase or mixed case―whichever looks best to you. SAS doesn't care. The data set name heightweight is the same as HEIGHTWEIGHT or HeightWeight. Likewise, the variable name BirthDate is the same as BIRTHDATE and birThDaTe. However, there is one difference for variable names. SAS remembers the case of the first occurrence of each variable name and uses that case when printing results. That is why, in this book, we use mixed case for variable names but lowercase for other SAS names.
Documentation stored in SAS data sets In addition to your actual data, SAS data sets contain information about the data set such as its name, the date that you created it, and the version of SAS you used to create it. SAS also stores information about each variable, including its name, type (numeric or character), length (or storage size), and position within the data set. This information is sometimes called the descriptor portion of the data set, and it makes SAS data sets self-documenting.
[1]There are exceptions. If your data are in a format written by another software product, you may be able to read your data directly without creating a SAS data set. For database management systems and spreadsheets, you may be able to use SAS/ACCESS software. See chapter 2 for more information. For SPSS you can use the SPSS data engine. See appendix D.
[2]If disk space is a problem, you may also choose to base your decision on storage size. You can use the LENGTH statement, discussed in section 10.15, to control the storage size of variables.
[3]Beginning with SAS 9, format names can also be 32 characters long, and informat names can be 31 characters (including the $ for character values). Prior to SAS 9, format names could be 8 characters while informat names could be 7 characters (also including the $). Librefs and filerefs must be 8 characters or fewer in length, and member names for versioned data sets must be 28 characters or fewer.
[4]It is possible to use special characters, including spaces, in variable names if you use the system option VALIDVARNAMES=ANY and a name literal of the form ‘variable-name’N. See the SAS Help and Documentation for details. |
|