proc sql;
create table ids as
select distinct id,count(*) as freq
from (select * from d1 union all select * from d2 union all select * from d3)
group by id
having freq>1;
quit;[/code:2kwzil5v]
这个对于小数据还可以,但是现在有50个data,而且每个data都很大(平均obs>=10M),所以,请指教一下有何好办法?谢谢!
oloolo,能给个code吗?谢谢!作者: shiyiming 时间: 2011-10-19 06:52 标题: Re: 如何统计出重复出现的ID及其次数? for 50 data sets as large as 10M each, I think a doable solution rather than the speed-only may be more realistic. I think of a solution for this but I cannot verify that since i don't have either SAS or such tremendous data.
it is assumed that your data is sortable. usually, a data set with a single variable and 10 million records, can be sorted in a couple of minutes by the lowest efficiency.
if you want to try the following code, please use back-up data sets.
jingju
[code:b5z3fqru]proc sort data =a1(keep =id) out =aa; by id;
run;
data aa;
length name $ 124;
retain name "a1" n 1;
set aa;
run;
%macro countm;
%do i =2 %to 50;
proc sort data =a&i(keep =id); by id;
run;
data aa;
update aa a&i(in =ina);
by id;
if ina =1 then do;
n =sum(n, 1);
name =catx(',', name, "a&i");
end;
run;
%end;
%mend;