0 2002-01-01 2002-01-01 0
1 2002-01-01 2002-01-03 10
1 2002-01-08 2002-01-10 9
2 2002-01-01 2002-01-01 9
2 2002-01-02 2002-01-02 8
2 2002-01-03 2002-01-03 14
2 2002-01-04 2002-01-04 8
2 2002-01-08 2002-01-09 11
2 2002-01-10 2002-01-10 12作者: shiyiming 时间: 2004-6-4 03:27 标题: Every SAS programmer should try this It is definitely a very good homework assignment for every SAS programmer to work out, and I strongly recommend to put this one on the top of this list. In our daily work, we have a lot of tasks similar to this one, so surely I have worked out my way to do it. I will hold my program for a few days and let people to work it out from scratch.作者: shiyiming 时间: 2004-6-4 10:01 标题: 回复 我有个思路,但是没有完成 <!-- s:( --><img src="{SMILIES_PATH}/icon_sad.gif" alt=":(" title="Sad" /><!-- s:( -->
如果在原数据集后增加一个index字段:
data tem;
input id:$1. date:yymmdd10. vol index;
format date yymmdd10.;
cards;
0 20020101 0 1
1 20020101 10 2
1 20020102 10 2
1 20020103 10 2
1 20020108 9 3
1 20020109 9 3
1 20020110 9 3
2 20020101 9 4
2 20020102 8 5
2 20020103 14 6
2 20020104 8 7
2 20020108 11 8
2 20020109 11 8
2 20020110 12 9
;run;
用下面这段程序就可以实现:
proc means data=tem;
class id index;
var date vol;
output out=tem_result mean(vol)=vol min(date)=sdate max(date)=edate;
run;
data tem_result;
set tem_result;
where id ne '' and index ne .;
drop index _type_ _freq_;
run;作者: shiyiming 时间: 2004-6-4 10:46 标题: 我的理解 这个问题的难点在于sas软件读数据时只能对当前记录的前面记录能够访问,而对后面的记录不能访问,所以我引入了两个变量fflag和bflag,意思是前、后记录的标记,具体说如当前记录和上一条记录相同(即id和vol相同,下同)fflag为1否则为0;同理当前记录和下一条记录相同,bflag为1否则为0。结果生成如下数据集:
id date vol fflag bflag
0 20020101 0 0 0
1 20020101 10 0 1
1 20020102 10 1 1
1 20020103 10 1 0
1 20020108 9 0 1
1 20020109 9 1 1
1 20020110 9 1 0
2 20020101 9 0 0
2 20020102 8 0 0
2 20020103 14 0 0
2 20020104 8 0 0
2 20020108 11 0 1
2 20020109 11 1 0
2 20020110 12 0 0
然后对该数据集进行筛选即可得结果。
源代码
[code:ea82f]
/*生成fflag字段*/
data tem1;
set tem;
if id=lag(id) and vol=lag(vol) then fflag=1;
else fflag=0;
proc sort data=tem1;
by descending id descending date;
/*生成bflag字段*/
data tem2;
set tem1;
bflag=lag(fflag);
if _n_=1 then bflag=0;
proc sort data=tem2;
by id date;
/*生成最后结果*/
data tem3;
set tem2;
retain sdate;
if fflag=0 then sdate=date;
if bflag=0 then
do;
edate=date;
new_vol=vol;
output;
end;
format sdate yymmdd10. edate yymmdd10.;
keep id sdate edate new_vol;
run;[/code:ea82f]作者: shiyiming 时间: 2004-6-4 10:58 标题: 试试这个 [code:9e624]DATA SSS(KEEP=ID DATE VOL) TTT(KEEP=END_DATE) TT(KEEP=DATE);
SET TEM END=LAST;
END_ID=LAG(ID);
END_DATE=LAG(DATE);
END_VOL=LAG(VOL);
IF ID=END_ID AND VOL=LAG(VOL) THEN DELETE;
ELSE OUTPUT SSS TTT;
IF LAST THEN OUTPUT TT;
DATA TTT;
SET TTT(FIRSTOBS=2) TT(RENAME=(DATE=END_DATE));
DATA RESULT;
MERGE SSS TTT;
RUN;
[/code:9e624]作者: shiyiming 时间: 2004-6-4 12:40 标题: try below code proc sort data=tem; by id vol date;run;
data tem(keep=id sdate edate vol);
retain id '' sdate edate vol 0;
set tem; by id vol date;
if first.vol then sdate=date;
if last.vol then do;
edate=date;output;
end;
format sdate edate yymmdd10.;
run;
proc sort data=tem ; by id sdate;run;作者: shiyiming 时间: 2004-6-4 14:51 标题: 回复 jimmy的代码有问题
因为id 和vol值相同的记录可能不是相邻的,但是你这样做也把他们合成一条记录了。作者: shiyiming 时间: 2004-6-4 17:22 标题: hi proc sort data=tem;
by id vol;
run;
proc sql;
create table aa as select id, vol as new_vol, min(date) as sdate, max(date) as edate from tem
group by id, vol;
我不知道怎么把min(date)代表的天数转换为日期,只好用个笨办法
data final;
set aa;
format sdate:yymmdd10. edate:yymmdd10.;
run;作者: shiyiming 时间: 2004-6-4 17:55 标题: ONE MORE 刚才那个用了LAG,在给个用DIF的
[code:7a981]
DATA SSS(KEEP=ID DATE VOL) TTT(KEEP=END_DATE) TT(KEEP=DATE);
SET TEM END=LAST;
RL=DIF(ID)+DIF(VOL);
END_DATE=LAG(DATE);
IF RL=0 THEN DELETE;
ELSE OUTPUT SSS TTT;
IF LAST THEN OUTPUT TT;
DATA TTT;
SET TTT(FIRSTOBS=2) TT(RENAME=(DATE=END_DATE));
DATA RESULT;
MERGE SSS TTT;
RUN;
[/code:7a981]作者: shiyiming 时间: 2004-6-4 19:35 标题: Reply 很多帖子!每个帖子都有一种思路可以让我借鉴。
希望各位能更多的考虑运行效率,以及避免拼数据。
以下是我的程序,如果有问题或建议,请及时回复。
[code:09378]
proc sort data=tem;by id date;run;
data result;
retain id sdate edate new_vol;
format sdate yymmdd10. edate yymmdd10.;
set tem;
by id;
lag_vol=lag(vol);
lag_date=lag(date);
if first.id then call symput('sdate',date);
else do;
if lag_vol^=vol then do;
sdate=resolve('&sdate');
edate=lag_date;
new_vol=lag_vol;
output;
call symput('sdate',date);
end;
end;
if last.id then do;
sdate=resolve('&sdate');
edate=date;
new_vol=vol;
output;
end;
keep id sdate edate new_vol;
run;[/code:09378]作者: shiyiming 时间: 2004-6-4 22:43 标题: 回复 proc sort;by id vol date;run;
data t;set tem;
by id vol;
if first.vol then do;
sdate=date;output;
end;
if last.vol & not first.vol then do;
edate=date;output;
end;
run;
data t1;set t;if sdate>0;
keep id vol sdate ;
run;
data t2;set t2;if edate>0;
keep id vol edate;
run;
data m;merge t1 t2;by id vol;
if edate=. then edate=sdate;
format sdate edate yymmdd10.;
run;作者: shiyiming 时间: 2004-6-5 18:14 标题: My Solution Wow, it is good to see many SAS programmers are trying this one out, I hope you have enjoyed it.
There are definitely many ways to do it, the key is to have a clear logic to handle it. I wrote the following program to do this, and any comment is welcome.
I will try to read some of your programs and give you some of my inputs.
data result(keep=id1 from to vol1);
retain id1 from to vol1;
format from to yymmdd10.;
set tem;
by id date;
if first.id then do;
id1=id; from=date; to=date; vol1=vol;
end;
else do;
if (vol ne vol1) or (date ne to+1) then do;
output;
from=date; to=date; vol1=vol;
end;
else do;
to=date;
end;
end;
if last.id then do;
to=date;
output;
end;
run;作者: shiyiming 时间: 2004-6-5 19:22 标题: To Willon I have gone through Willon's program, it is pretty good and the logic is clear. But, there is a minor problem.
If you delete the throd record, '1 20020102 10', the program will no longer work. The reason is that you did not consider the time gap when a subject stay on same vol.
Besides, the call of the macro variable SDATE may not be necessary, since you have already retained it and its value changes over iterations.
Just my 2 cents.作者: shiyiming 时间: 2004-6-6 00:17 标题: .... Give sincere thanks to XIC for pointing out my mistakes caused by my carelessness.
......
if first.id then call symput('sdate',date);
else do;
[color=red:6567f]if lag_vol^=vol then do;[/color:6567f]
......
the RED line must be replaced with
[color=red:6567f]if lag_vol^=vol or date^=lag_date+1 then do;[/color:6567f]
The aim of "RETURN" is only to change the order of variables in output table,Originally,I thinked less of it.So thank you again for your suggestion.作者: shiyiming 时间: 2004-6-6 01:29 标题: To QSMY I have read the program of QSMY, but not into the detail.
First, the program will not run; I assume that the line
data t2; set t2;
should be data t2; set t;
Second, the logic of this program has a problem. If a subject has same volumn on different dates and the dates are not in a continuous episode, just as it happened in the data, the program will fail.
Just re-run the program and compare your output with the answer given by Willon, you can see that the output is not correct.作者: shiyiming 时间: 2004-6-6 02:06 标题: To GENECHINA I have read both of your program, and they are impressive because of the simplicity. But, I think you did not consider a key condistion. As in previous postings, if you delete the third data line, '1 20020102 10', you will see that the program will not generate the correct answer.
There is another comment which is also very important to all SAS programmers. As we know, the order of the records and the order of variables in a database or a data set does not make too much sense. Changing the order does not change the nature of the data. If a program depends upon the order, you had better to add BY statement for each data step. LAG and DIF functions are order dependent, you have to keep them in mind. Besides, when you do MERGE, you had better have a BY statement. MERGE without BY is very dangerous except you know exactly what are you doing.
As matter of fact, in my opinion, it is something SAS need to be modified in future. As we know, when you use FIRST.X or LAST.X, SAS will require you to have a BY statement. Same requirement should apply to LAG and DIF.作者: shiyiming 时间: 2004-6-6 02:40 标题: To All the Others I will not write individual comments to each of you, just a summary.
To GBT, please take a reference to the first paragraph of my comment to Genechina. If you delete the third data line, '1 20020102 10', you will see that the program will not generate the correct answer.
To Tomwalk, Yooyok and Jimmy, please take a reference to my comment to Gsmy. Your logic does not work for following data provided by Willon:
2 20020102 8
2 20020103 14
2 20020104 8
The episode for VOL=8 is not continuous here.
For yooyok, in PROC SQL, you can say
proc sql;
create table a as
select min(date) as sdate format=mmddyy10.
from tem;
quit;作者: shiyiming 时间: 2004-6-6 08:42 标题: to xic 谢谢 xic。其实昨天在sasor就知道自己做错了。本来要删除我的帖子的,想想算了就留在上面吧。作者: shiyiming 时间: 2004-6-7 08:56 标题: 更正 多谢xic的指点,我将源程序中
if id=lag(id) and vol=lag(vol) then fflag=1;
改为
if id=lag(id) and vol=lag(vol)and date=lag(date)+1 then fflag=1;
应该没问题作者: shiyiming 时间: 2004-6-7 10:43 标题: to xic 我提交的code中,有个错误。
date t2;set t2;....
个人意见,仅供参考。作者: shiyiming 时间: 2004-6-7 13:59 标题: 我也贴一个,适应性和效率有待考证! data tem;
retain ord 0;
set tem;
x=lag(vol);
if vol=x then ord=ord;else ord=ord+1;
run;
proc sort;by id ord date;run;
data first last;
set tem;
by id ord;
if first.ord then output first;
if last.ord then output last;
run;
data final;
merge first (rename=(date=date1))
last (rename=(date=date2));
by id ord;
run;
proc print;
var id date1 date2 vol;
run;作者: shiyiming 时间: 2004-7-28 02:19 标题: simplest solution I think this is the simplest solution:
data res(drop=date);
set tem;
by id vol notsorted;
retain bdate edate;
format bdate yymmdd10. edate yymmdd10.;
if first.vol then bdate=date;
if last.vol then do;
edate=date;
output;
end;
run;作者: shiyiming 时间: 2004-7-28 12:45 标题: 没有用到转置,只用到data step 和proc sort,数据结果在result? data tem;
input id:$1. date:yymmdd10. vol;
format date yymmdd10.;
cards;
0 20020101 0
1 20020101 10
1 20020102 10
1 20020103 10
1 20020108 9
1 20020109 9
1 20020110 9
2 20020101 9
2 20020102 8
2 20020103 14
2 20020104 8
2 20020108 11
2 20020109 11
2 20020110 12
;run;
data tem_1;
retain s 0;
set tem;
if _n_>1 & (id ^=lag(id) | vol ^=lag(vol)) then s+1;
run;
proc sort data=tem_1;
by s;
run;
data tem_2;
set tem_1;
by s;
if first.s then output;
if last.s then output;
run;
proc sort data=tem_2;
by s;
run;
data result;
format sdate edate yymmdd10.;
set tem_2;
by s;
sdate=lag(date);
edate=date;
if last.s=1;
run;作者: shiyiming 时间: 2004-8-24 13:05 标题: re:程序员测试 proc sort data=tem;
by id vol date;
run;
data rst(drop=date);
set tem;
by id vol;
format sdate yymmdd10.;
format edate yymmdd10.;
if first.vol then sdate=date;
if last.vol then do;
edate=date;
output;
end;
retain;
run;作者: shiyiming 时间: 2004-9-16 12:51 标题: My Solution: data tem;
input id:$1. date:yymmdd10. vol;
format date yymmdd10.;
cards;
0 20020101 0
1 20020101 10
1 20020102 10
1 20020103 10
1 20020108 9
1 20020109 9
1 20020110 9
2 20020101 9
2 20020102 8
2 20020103 14
2 20020104 8
2 20020108 11
2 20020109 11
2 20020110 12
;run;
proc sql;
select distinct id, min(date) as SDate format=yymmdd10., max(date) as EDate format=yymmdd10., Vol as New_Vol from tem group by id, vol;
quit;作者: shiyiming 时间: 2004-12-1 15:24 标题: sql data Sasuser.test;
input id:$1. date:yymmdd10. vol;
format date yymmdd10.;
cards;
0 20020101 0
1 20020101 10
1 20020102 10
1 20020103 10
1 20020108 9
1 20020109 9
1 20020110 9
2 20020101 9
2 20020102 8
2 20020103 14
2 20020104 8
2 20020108 11
2 20020109 11
2 20020110 12
;run;
proc sql;
create table sasuser.result as
select id,min(date) as starttime,max(date) as endtime,vol from Sasuser.test group by id,vol;
run;
proc print data=sasuser.result;
format starttime YYMMDD10.;
format endtime YYMMDD10.;
run;作者: shiyiming 时间: 2005-3-20 09:23 标题: 似乎这样好一些 基本同意topgun_li的作法,但从题目字面看,还是应排序
proc sort data=tem;
by id vol;
run;
data result;
set tem;
by id vol;
if first.vol then sdate=date;
if last.vol then do;
edate=date;
new_vol=vol;
output;
end;
keep id sdate edate new_vol;
format sdate yymmdd10. edate yymmdd10.;
retain sdate;
run;作者: shiyiming 时间: 2005-4-14 20:33 标题: Thanks a lot nice guy!
cool!作者: shiyiming 时间: 2005-6-18 13:11 标题: 今天再看这个远古的贴子, 突然觉得挺有意思! 好像大家都没考虑一 如果原数据中的日期可以重复, 也就是说同一天有2个VOL值(也许不可能发生,但允许数据录入错误), 很多程序就可能有问题! 我试改了一个数, 测试一下看看! 欢迎讨论!!
proc transpose data=tem out=tem(drop=_name_);
by vol id;
run;
data tem(keep=vol from id to);
set tem;
array arr _numeric_;
from=arr(2); /* Here consider variable "vol" is numeric.*/
do i=1 to dim(arr);
if arr(i)^=0 then do;
to=arr(i);
leave;
end;
format to:ddmmyy10.
format from:ddmmyy10.
output;
end;
run;
proc print data=tem; run;作者: shiyiming 时间: 2009-7-26 16:43 标题: Re: SAS程序员测试(二) 这是我的程序。 <!-- s:) --><img src="{SMILIES_PATH}/icon_smile.gif" alt=":)" title="Smile" /><!-- s:) -->
[code:1u2qw26w]proc sort data=tem;
by id vol;
run;
proc means data=tem;
by id vol;
var date;
output out=tem_result min(date)=sdate max(date)=edate;
run;[/code:1u2qw26w]作者: shiyiming 时间: 2009-7-27 11:12 标题: Re: SAS程序员测试(二) data aa(drop=garote);
set tem;
retain flag 1;
garote=lag(vol);
if garote^=vol then flag+1;
run;
proc sql;
select distinct vol,id, min(date) as startdate, max(date) as enddate
from aa
group by id,flag
order by id,flag;
quit;作者: shiyiming 时间: 2009-7-27 22:28 标题: Re: SAS程序员测试(二) Its my code below,You can do this by using the OUTPUT statement in PROC MEANS to create the data set incloud the summarized variables like min & max of date(tem2).
Then,just select the variables and observations which you need(tem3).
[code:1o9d899k]
data tem;
input id:$1. date:yymmdd10. vol;
format date yymmdd10.;
cards;
0 20020101 0
1 20020101 10
1 20020102 10
1 20020103 10
1 20020108 9
1 20020109 9
1 20020110 9
2 20020101 9
2 20020102 8
2 20020103 14
2 20020104 8
2 20020108 11
2 20020109 11
2 20020110 12
;
proc means data=tem min max noprint;
class id vol;
var date;
output out=tem2
min=sdate
max=edata;
run;
data tem3(keep=id sdate edata vol);set tem2;
where id gt '' and vol ge 0;
run;
[/code:1o9d899k]作者: shiyiming 时间: 2009-7-31 04:42 标题: Re: SAS程序员测试(二) [code:3f4z5q7i]
proc sort data=tem;
by id date vol;
run;
data new;
retain id s_date e_date vol;
set tem;
by id vol notsorted;
if first.vol then s_date=date;
if last.vol then do;
e_date=date;
output; end;
format s_date e_date yymmdd10.;
drop date;
run; [/code:3f4z5q7i]