我知道,直接用如下的方法,很快。一次读完house数据,就直接同时生成40000个我所需要的小文件。但是,这样一行一行写40000行,在编程方面就太没效率了。
data house_1 house_2 house_3 house_4 house_5 ...... (此处省略) ;
set house ;
if house_ID = 1 then output house_1 ;
if house_ID = 2 then output house_2 ;
if house_ID = 3 then output house_3 ;
if house_ID = 4 then output house_4 ;
if house_ID = 5 then output house_5 ;
if house_ID = 6 then output house_6 ;
...... (此处省略)
run;
希望各位大虾,给个可能的电脑跑起来比较快而且编程不会太冗长的解决之道。 多谢各位了。作者: shiyiming 时间: 2008-11-23 14:50 标题: Re: 一个对我来说的难题。望各位给个援手 [code:2vp912gh]%macro cut(num);
%local i;
data
%do i=1 %to #
house_&i
%end;;
set a;
%do i=1 %to #
if house_id=&i then output house_&i;
%end;
run;
%mend;
%cut(×××××);[/code:2vp912gh]
测测这个程序看怎么样,手头上没这么大的数据,无法测试。
前面一个很慢应该是由于频繁打开这么大的文件,耗时肯定是很长的。作者: shiyiming 时间: 2008-11-23 15:41 标题: Re: 一个对我来说的难题。望各位给个援手 To tianwild
<!-- s:? --><img src="{SMILIES_PATH}/icon_confused.gif" alt=":?" title="Confused" /><!-- s:? -->作者: shiyiming 时间: 2008-11-23 22:35 标题: Re: 一个对我来说的难题。望各位给个援手 %macro cut(num);
%local i;
data
%do i=1 %to &num;
house_&i
%end;;
set a (where=(house_id=&i ));
run;
%mend;
%cut(×××××);
试一试这个看下作者: shiyiming 时间: 2008-11-23 22:38 标题: Re: 一个对我来说的难题。望各位给个援手 我也遇到比较近似的问题,在近2G的文件里面提取子集,用data步无法提取,后来用sql才提取出来。但sql的效率一直是很低的。
[code:17ijaw8d]%macro cut(num);
proc sql noprint;
%do i=1 %to &num;
create table house_&i as
select house_id,x,y,z
from a
where house_id=&i;
%end;
quit;
%mend;
%cut(XXXX);
[/code:17ijaw8d]
你测测下看效率如何作者: shiyiming 时间: 2008-11-24 14:31 标题: Re: 一个对我来说的难题。望各位给个援手 I would try to avoid
#1, opening 40,000 datasets at the same time;
#2, 40,000 passes of all rows of the original dataset.
Post #8 is ok if there is an index on house_id.
If you could sort the whole dataset by house_id in a reasonable amount of time, then
[code:9hwbi856]
options nonotes nosource;
data _null_;
do count=1 by 1 until(last.house_id);
set houses_sorted(keep=house_id);
by house_id;
end;
lastobs+count;
call execute('data house_' || compress(house_id) || ';');
call execute('set houses_sorted(firstobs=' || compress(lastobs-count+1) || ' obs=' || compress(lastobs) || ');');
run;
run;
[/code:9hwbi856]
Otherwise, you may break the big dataset into pieces (for example, cut it into 10 pieces by the remainder of house_id/10), and sort each piece before apply the code above for each sorted piece of data.
Anyway, it's a very bold action to generate 40,000 datasets and seems too many for my little brain.作者: shiyiming 时间: 2008-11-24 14:39 标题: Re: 一个对我来说的难题。望各位给个援手 to applesloves