标题: 找data集的子集 [打印本页] 作者: shiyiming 时间: 2010-9-16 19:22 标题: 找data集的子集 请教
Data A: code date a sdate flag
1 20090101 20090306 0
…
1 20090306 20090306 1
…
1 20091231 20090306 0
2 20090101
…
2 20091231
…
…
100 20091231
Data A 5列, a的数值省略,code从1-100, 每个code都有2009年全年的数据, 当flag=1是, 即date=sdate, 要找它前面的100个和后40个一共140个每行的数据。可能code1的sdate是2009年其中的几天, 对于这几天都找一遍这行数据的前100和后40条数据。 可能还有其他code 的sdate出现在2009年中某一天, 则相应的flag=1, 其他为0.作者: shiyiming 时间: 2010-9-17 00:13 标题: Re: 找data集的子集 totally losted by your expression, or sorry for my misunderstanding your request if I did.
anyway, could you give us an example on what you already have and what you really wanted and the rule to go through,
and explain each new term(e.g. code1) as well.
thanks.作者: shiyiming 时间: 2010-9-17 14:29 标题: Re: 找data集的子集 sorry, 可能表达的不清楚。 就是说code 是代码(1-100个), date 日期(2009年全年),sdate是我找到的符合要求的日期(2009年对不同的代码可能出现一个, 也可能几个, 例如code=1, sdate=20090306,20090508; code=2时, sdate=20090708, code=3,...),现在我把sdate都列举出来, by code, 当code=1,sdate=20090306时,我把这个数据和code=1 全年的数据加总在一个数据集里,当date=sdate=20090306时, flag=1, 其他flag=0, 如果对于code=1, sdate=20090508时, 我就继续上面的步骤.
现在的问题是我要找当flag=1是,在这行数据的前 100行和后40行数据,其他删除。
例如 code date sdate a flag
1 20090101 20090306 0.0234 0
。。。。
1 20090306 20090306 0.0216 1
.。。。
1 20091231 20090306 0.0873 0
1 20090101 20090508 0.0432 0
.。。。
1 20090508 20090508 0.478 1
.。。
1 20091231 20090508 0.093 0作者: shiyiming 时间: 2010-9-17 23:41 标题: Re: 找data集的子集 [code:wt3ihvch]data have;
keep code date sdate a flag;
array _flag{100} _temporary_; ***set a safely large dimension for the array,if you have >100 flag =1 for some code then make the number larger;
do until(last. code);
set had; by code notsorted;*** you are not care if code in order but same code should be together instead;
obs1 = sum(obs1, 1);***record the observation order for each code;
if first. code then call missing(of _flag[*]); ***reset the array for each code;
if flag = 1 then do;
d = sum(d, 1); _flag[d] = obs1; ***set observation order to array where flag = 1;
end;
end;
do until(last. code);
set had; by code notsorted;
obs2 = sum(obs2, 1); ***recorder the order again;
do i = 1 to dim(_flag);
if not _flag[i] then leave; *if no flag = 1 then go to next observation, in fact, here more efficient if it can go to next code but i dont know how to do it;
if _flag[i] -100 <= obs2 <= _flag[i] +40 then do;
output; leave; *** output the observation where meets the selection rules and then go to next observation;
end;
end;
end;
run;[/code:wt3ihvch]
我每次回帖都战战兢兢。我知道在某些个角落,类似于oloooooo, sxlionnnnnn,或者猪头头头头头头头头,之辈的,一见到我的帖子,就不自觉地提起了手里的棍子斧子一类的。。。
不过,几十年前,我的学长们就斜着眼睛对我说过,你不就是个小辈吗。。。作者: shiyiming 时间: 2010-9-18 00:49 标题: Re: 找data集的子集 学习了......作者: shiyiming 时间: 2010-9-18 02:13 标题: Re: 找data集的子集 I select 5 records before flag=1 and 2 records after flag=1, and hope this demonstration works for your real data.
proc sql;
create table ss1 as
select *,sum(date1) as d1,sum(date2) as d2
from (select *,ifn(flag=1,date-5,.) as date1,ifn(flag=1,date+2,.) as date2
from (select *,(date=sdate) as flag from ss))
group by sdate
order by sdate,date;
quit;
data ss2; set ss1;by sdate date;retain sf1;
if date=d1 then ff=1;
else if date=d2 then do;sf=1;sf1=1;end;
if sf=1 then sfg=0;
sfg+sf1;
if sfg=2 then ff=1;
ffg1+ff;
format date sdate yymmddn8.;
run;
proc sql;
create table ss3(keep=code date sdate a flag)as
select *,sum(sf=1) as sfg2
from ss2
group by ffg1
having sfg2=1
order by sdate,date;
quit;[/code:1p9fi9b7]作者: shiyiming 时间: 2010-9-18 02:35 标题: Re: 找data集的子集 add 'code' values.
[code:3mv1b1wv]data ss;
input code date yymmdd8. sdate yymmdd10. a;
cards;
1 20090101 20090306 0.0234
1 20090227 20090306 0.0216
1 20090228 20090306 0.0216
1 20090301 20090306 0.0216
1 20090302 20090306 0.0216
1 20090303 20090306 0.0216
1 20090304 20090306 0.0216
1 20090305 20090306 0.0216
1 20090306 20090306 0.0216
1 20090307 20090306 0.0216
1 20090308 20090306 0.0216
1 20091231 20090306 0.0873
1 20090501 20090508 0.478
1 20090502 20090508 0.478
1 20090503 20090508 0.478
1 20090504 20090508 0.478
1 20090505 20090508 0.0402
1 20090506 20090508 0.0437
1 20090507 20090508 0.0431
1 20090508 20090508 0.478
1 20090509 20090508 0.478
1 20090510 20090508 0.478
1 20090511 20090508 0.478
1 20091231 20090508 0.093
2 20090101 20090306 0.0234
2 20090227 20090306 0.0216
2 20090228 20090306 0.0216
2 20090301 20090306 0.0216
2 20090302 20090306 0.0216
2 20090303 20090306 0.0216
2 20090304 20090306 0.0216
2 20090305 20090306 0.0216
2 20090306 20090306 0.0216
2 20090307 20090306 0.0216
2 20090308 20090306 0.0216
2 20091231 20090306 0.0873
2 20090501 20090508 0.478
2 20090502 20090508 0.478
2 20090503 20090508 0.478
2 20090504 20090508 0.478
2 20090505 20090508 0.0402
2 20090506 20090508 0.0437
2 20090507 20090508 0.0431
2 20090508 20090508 0.478
2 20090509 20090508 0.478
2 20090510 20090508 0.478
2 20090511 20090508 0.478
2 20091231 20090508 0.093
;
run;
proc sql;
create table ss1 as
select *,sum(date1) as d1,sum(date2) as d2
from (select *,ifn(flag=1,date-5,.) as date1,ifn(flag=1,date+2,.) as date2
from (select *,(date=sdate) as flag from ss))
group by code,sdate
order by code,sdate,date;
quit;
data ss2; set ss1;by code sdate date;retain sf1;
if date=d1 then ff=1;
else if date=d2 then do;sf=1;sf1=1;end;
if sf=1 then sfg=0;
sfg+sf1;
if sfg=2 then ff=1;
ffg1+ff;
format date sdate yymmddn8.;
run;
proc sql;
create table ss3(keep=code date sdate a flag)as
select *,sum(sf=1) as sfg2
from ss2
group by ffg1
having sfg2=1
order by code,sdate,date;
quit;[/code:3mv1b1wv]作者: shiyiming 时间: 2010-9-18 22:35 标题: Re: 找data集的子集 谢谢 jingju11和 gogotiger, 都可以运行,结果也是我想要的。就是gogotiger的程序还得和我的数据相结合, 仔细研究下。 如果两位有时间的话, 能否给个思路, 或者注释 一下, 非常感谢。