|
|
9#

楼主 |
发表于 2008-11-24 14:31:21
|
只看该作者
Re: 一个对我来说的难题。望各位给个援手
I would try to avoid
#1, opening 40,000 datasets at the same time;
#2, 40,000 passes of all rows of the original dataset.
Post #8 is ok if there is an index on house_id.
If you could sort the whole dataset by house_id in a reasonable amount of time, then
[code:9hwbi856]
options nonotes nosource;
data _null_;
do count=1 by 1 until(last.house_id);
set houses_sorted(keep=house_id);
by house_id;
end;
lastobs+count;
call execute('data house_' || compress(house_id) || ';');
call execute('set houses_sorted(firstobs=' || compress(lastobs-count+1) || ' obs=' || compress(lastobs) || ');');
run;
run;
[/code:9hwbi856]
Otherwise, you may break the big dataset into pieces (for example, cut it into 10 pieces by the remainder of house_id/10), and sort each piece before apply the code above for each sorted piece of data.
Anyway, it's a very bold action to generate 40,000 datasets and seems too many for my little brain. |
|