SAS中文论坛

标题: 请问数据集的转换问题 [打印本页]

作者: shiyiming    时间: 2004-2-25 09:49
标题: 请问数据集的转换问题
想把如下数据集A
a b c  r
1 5 4 22
1 5 4 4
1 5 4 9
2 7 3 11
2 7 3 5
2 7 3 7
转换成B
a b c r1 r2 r3
1 5 4 22 4  9
2 7 3 11 5  7

大家有什么简单一点的方法?
作者: shiyiming    时间: 2004-2-25 11:16
There are too many ways to do it.  One of the simplest is shown here.

data a;
input a b c r;
datalines;
1 5 4 22
1 5 4 4
1 5 4 9
2 7 3 11
2 7 3 5
2 7 3 7
;

data b(drop=r);
retain r1 r2 r3;
set a;
by a b c;
if first.c then do;
  r1=.; r2=.; r3=.;
end;
if r1=. then r1=r;
else if r2=. then r2=r;
else if r3=. then r3=r;
if last.c then output;
run;
作者: shiyiming    时间: 2004-2-25 13:21
谢谢指点,不过请问if first.c这句话什么意思?这是属于sas哪一块语句的内容?
作者: shiyiming    时间: 2004-2-25 17:36
标题: 找到资料了,自问自答 :)
How the DATA Step Identifies BY Groups
In the DATA step, SAS identifies the beginning and end of each BY group by creating two temporary variables for each BY variable: FIRST.variable and LAST.variable. These temporary variables are available for DATA step programming but are not added to the output data set. Their values indicate whether an observation is


the first one in a BY group

the last one in a BY group

neither the first nor the last one in a BY group

both first and last, as is the case when there is only one observation in a BY group.

You can take actions conditionally, based on whether you are processing the first or the last observation of a BY group.
When an observation is the first in a BY group, SAS sets the value of the FIRST.variable to 1. For all other observations in the BY group, the value of the FIRST.variable is 0. Likewise, if an observation is the last in a BY group, SAS sets the value of LAST.variable to 1. For all other observations in the BY group, the value of LAST.variable is 0. If the observations are sorted by more than one BY variable, the FIRST.variable for each variable in the BY statement is set to 1 at the first occurrence of a new value for the variable.

This example shows how SAS uses the FIRST.variable and LAST.variable to flag the beginning and end of four BY groups. Six temporary variables are created within the program data vector. These variables can be used during the DATA step, but they do not become variables in the new data set.

In the figure that follows, observations in the SAS data set are arranged in an order that can be used with this BY statement:

by State City ZipCode;
SAS creates the following temporary variables: FIRST.State, LAST.State, FIRST.City, LAST.City, FIRST.ZipCode, and LAST.ZipCode.


FIRST. and LAST. Values for Four BY Groups







Chapter Contents
Previous
Next
Top of Page


Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
作者: shiyiming    时间: 2004-2-26 12:07
proc transpose data=tem out=tem1(drop=_name_) prefix=r;
by a b c;
run;
作者: shiyiming    时间: 2004-3-1 14:27
标题: 有没有更灵活的方法
提问中a b c值相同的记录均为三条,若a b c值相同的记录数任意呢,如
a b c r
1 2 3 1
1 2 3 2
1 2 3 3
......
1 2 3 n
将此集合转换为
a b c r1 r2 r3 ...... rn
1 2 3 1  2  3  ......  n
有没有通用的转换方法
作者: shiyiming    时间: 2004-3-2 01:00
Just follow my previous example, I did some expansion.  It may not be the best way, but it works.

data a;
input a b c r;
datalines;
1 5 4 22
1 5 4 4
1 5 4 9
1 5 4 22
1 5 4 4
1 5 4 9
2 7 3 11
2 7 3 5
2 7 3 7
2 7 3 11
;

proc sql;
create table a as
select *, count(*) as cnt
from a
group by a, b, c;
quit;

%macro expand;

proc sql noprint;
select max(cnt) into :maxcnt
from a;
quit;

data b(drop=r);
retain
%do i=1 %to &maxcnt;
   r&i
%end;;
set a;
by a b c;
if first.c then do;
%do i=1 %to &maxcnt;
   r&i=.;
%end;
end;
%do i=1 %to &maxcnt;
     if r&i=. then do;
        r&i=r;
        goto abc;
     end;
%end;
abc:
if last.c then output;  
run;

%mend;

%expand;
作者: shiyiming    时间: 2004-3-2 10:03
多谢指点,因为工作原因,其实我才刚开始学sas。我对宏不是很熟悉,有没有介绍宏比较详细的sas书籍
作者: shiyiming    时间: 2004-3-3 10:49
我不太明白为什么你的程序里maxcnt可以取出两个值,maxcnt不是一个变量吗?也许这个问题太菜鸟级了,见笑。如果解释起来不是很困难,请不吝赐教,谢谢
作者: shiyiming    时间: 2004-3-3 23:14
In the program, maxcnt has only one value which is 6.  It is the maximum count of the records for a distinct combination of (a,b,c).
作者: shiyiming    时间: 2004-3-4 10:17
标题: 我作了一点小修改
xic,我把你的程序稍微改动了一下,但是我不太明白为什么不能达到预期效果,请指教
data a;
input a b c r;
datalines;
1 5 4 22
1 5 4 4
1 5 4 9
1 5 4 22
1 5 4 4
1 5 4 9
2 7 3 11
2 7 3 5
2 7 3 7
2 7 3 11
;

proc sql;
create table a as
select *, count(*) as cnt
from a
group by a, b, c;
quit;

%macro expand;
%global count;

proc sql noprint;
select max(cnt) into :maxcnt
from a;
quit;

data b(drop=r);
retain
%do i=1 %to &maxcnt;
r&i
%end;;
set a;
by a b c;
if first.c then do;
%do i=1 %to &maxcnt;
r&i=.;
%end;
%let count=1;
r&count=r;
end;
else do;
%let count=count+1;
r&count=r;
end;
if last.c then output;
run;

%mend;

%expand;
作者: shiyiming    时间: 2004-3-4 22:11
I think you have mixed up the macro variable with a SAS variable in data step.  Is COUNT a macro variable or variable in data step, in either way, the line %let count=count+1 will not work.
作者: shiyiming    时间: 2004-3-5 20:00
[quote="willon":5bc30]proc transpose data=tem out=tem1(drop=_name_) prefix=r;
by a b c;
run;[/quote:5bc30]

I think Willon's solution is one of the best (simple and common), providing the data set is sorted by a b c, and the the number of r(r1,r2,...,rn) is smaller than 32764 (SAS on windows platform).




欢迎光临 SAS中文论坛 (http://mysas.net/forum/) Powered by Discuz! X3.2