你还是给几个example data说的会比较清楚。作者: shiyiming 时间: 2010-8-23 15:53 标题: Re: sas一个取最大值的问题[求助,急!!!] 我不确定是否实现了在并列值中随机排序,你测试看看吧
[code:2x1rsbjo]data raw;
length COMPANY $1 YEAR $4 INDUSTRY $40;
input;
COMPANY=scan(_infile_,1,' ');
YEAR=scan(_infile_,2,' ');
INDUSTRY=substr(_infile_,8,anydigit(_infile_,8)-9);
COUNT=input(scan(_infile_,-1,' '),best.);
datalines;
A 1999 Communications and Media 1
A 1999 Computer Software and Services 1
A 1999 Internet Specific 5
A 2000 Communications and Media 1
A 2000 Computer Software and Services 7
A 2000 Internet Specific 1
A 2001 Communications and Media 1
A 2001 Computer Software and Services 1
A 2001 Internet Specific 1
B 1999 Biotechnology 8
B 1999 Communications and Media 7
B 1999 Computer Hardware 2
B 1999 Computer Software and Services 23
B 1999 Consumer Related 23
B 1999 Industrial/Energy 12
B 1999 Internet Specific 21
B 1999 Medical/Health 7
B 1999 Other Products 34
B 1999 Semiconductors/Other Elect. 5
B 2000 Biotechnology 13
B 2000 Communications and Media 14
B 2000 Computer Hardware 5
B 2000 Computer Software and Services 34
B 2000 Consumer Related 12
B 2000 Industrial/Energy 19
B 2000 Internet Specific 56
B 2000 Medical/Health 11
B 2000 Other Products 25
B 2000 Semiconductors/Other Elect. 3
;
proc transpose data=raw out=temp(drop=_name_);
var count;
by company year;
id industry;
run;
data temp(drop=i);
do _n_=1 by 1 until(last.company);
set temp;
by company;
array industry_count_ {*} _numeric_;
array temp {10} _temporary_;
do i=1 to dim(industry_count_);
industry_count_(i)=sum(ifn(missing(industry_count_(i)),0,industry_count_(i)),temp(i));
temp(i)=industry_count_(i);
end;
output;
end;
run;
data temp(keep=company year industry_count_1 industry_count_2 industry_name_1 industry_name_2);
set temp;
array temp {*} _numeric_;
array industry_count_ {10};
array industry_name_ {10} $32;
flag=ceil(ranuni(0)*10);
do i=1 to dim(industry_count_);
j=i+ifn(i ge flag,-flag,10-flag)+1;
industry_count_(j)=temp(i);
call vname(temp(i),industry_name_(j));
end;
do i=dim(industry_count_)-1 to 1 by -1;
do j=1 to i;
if industry_count_(j+1) gt industry_count_(j) then do;
temp_num=industry_count_(j);
industry_count_(j)=industry_count_(j+1);
industry_count_(j+1)=temp_num;
temp_chr=industry_name_(j);
industry_name_(j)=industry_name_(j+1);
industry_name_(j+1)=temp_chr;
end;
end;
end;
run;[/code:2x1rsbjo]作者: shiyiming 时间: 2010-8-23 17:28 标题: Re: sas一个取最大值的问题[求助,急!!!] 不好意思 第一次上来提问 问得辞不达意。。。
我既要算每年该公司的情况,也要算这年前所有的情况。
楼上大侠的数据结构貌似是对的,代码…我要好好研究学习下呀~
贴个sample上来吧:
COMPANY YEAR INDUSTRY COUNT
A 1999 Communications and Media 1
A 1999 Computer Software and Services 1
A 1999 Internet Specific 5
A 2000 Communications and Media 1
A 2000 Computer Software and Services 7
A 2000 Internet Specific 1
A 2001 Communications and Media 1
A 2001 Computer Software and Services 1
A 2001 Internet Specific 1
B 1999 Biotechnology 8
B 1999 Communications and Media 7
B 1999 Computer Hardware 2
B 1999 Computer Software and Services 23
B 1999 Consumer Related 23
B 1999 Industrial/Energy 12
B 1999 Internet Specific 21
B 1999 Medical/Health 7
B 1999 Other Products 34
B 1999 Semiconductors/Other Elect. 5
B 2000 Biotechnology 13
B 2000 Communications and Media 14
B 2000 Computer Hardware 5
B 2000 Computer Software and Services 34
B 2000 Consumer Related 12
B 2000 Industrial/Energy 19
B 2000 Internet Specific 56
B 2000 Medical/Health 11
B 2000 Other Products 25
B 2000 Semiconductors/Other Elect. 3作者: shiyiming 时间: 2010-8-24 17:24 标题: Re: sas一个取最大值的问题[求助,急!!!] 我用前辈给的代码貌似的确完成了排序工作(虽然最后那小段琢磨不出什么意思望指点小弟。。。)。
[quote="死猪头":iphbomve]我跟版主是一个师傅教出来的。
[code:iphbomve]
data raw;
length COMPANY $1 YEAR $4 INDUSTRY $40;
input;
COMPANY=scan(_infile_,1,' ');
YEAR=scan(_infile_,2,' ');
INDUSTRY=substr(_infile_,8,anydigit(_infile_,8)-9);
COUNT=input(scan(_infile_,-1,' '),best.);
datalines;
A 1999 Communications and Media 1
A 1999 Computer Software and Services 1
A 1999 Internet Specific 5
A 2000 Communications and Media 1
A 2000 Computer Software and Services 7
A 2000 Internet Specific 1
A 2001 Communications and Media 1
A 2001 Computer Software and Services 1
A 2001 Internet Specific 1
B 1999 Biotechnology 8
B 1999 Communications and Media 7
B 1999 Computer Hardware 2
B 1999 Computer Software and Services 23
B 1999 Consumer Related 23
B 1999 Industrial/Energy 12
B 1999 Internet Specific 21
B 1999 Medical/Health 7
B 1999 Other Products 34
B 1999 Semiconductors/Other Elect. 5
B 2000 Biotechnology 13
B 2000 Communications and Media 14
B 2000 Computer Hardware 5
B 2000 Computer Software and Services 34
B 2000 Consumer Related 12
B 2000 Industrial/Energy 19
B 2000 Internet Specific 56
B 2000 Medical/Health 11
B 2000 Other Products 25
B 2000 Semiconductors/Other Elect. 3
;
proc sort data=raw out=zhutou;
by company industry year;
run;
data sizhutou(drop=count);
do until (last.industry);
do until(last.year);
set zhutou;
by company industry year;
acum_count = sum(acum_count, count);
end;
random_number = ranuni(12345);
output;
end;
run;
proc sort data=sizhutou out=zhupo;
by company year descending acum_count random_number;
run;
data zhuzai;
do _n_=1 by 1 until(last.year);
set zhupo;
by company year;
if _n_ < 3 then output;
end;
run;
[/code:iphbomve][/quote:iphbomve]作者: shiyiming 时间: 2010-8-27 01:46 标题: 一个新的小问题 噢,BTW,这个程序有个前提是每个公司的这些年份都是连着的(你看我给是1999-2001和1999-2000),如果中间跳开(实际情况是数据从1958-2010间断断续续有)那个acum_count就会有问题:
比如数据变成:
A 1989 Communications and Media 1
A 1997 Computer Software and Services 1
A 1999 Internet Specific 5
A 2000 Communications and Media 1
A 2000 Computer Software and Services 7
A 2000 Internet Specific 1
A 2001 Communications and Media 1
A 2001 Computer Software and Services 1
A 2008 Internet Specific 1
B 1994 Biotechnology 8
B 1999 Communications and Media 7
B 1999 Computer Hardware 2
B 1999 Computer Software and Services 23
B 1999 Consumer Related 23
B 1999 Industrial/Energy 12
B 1999 Internet Specific 21
B 1999 Medical/Health 7
B 1999 Other Products 34
B 1999 Semiconductors/Other Elect. 5
B 2000 Biotechnology 13
B 2000 Communications and Media 14
B 2000 Computer Hardware 5
B 2000 Computer Software and Services 34
B 2000 Consumer Related 12
B 2000 Industrial/Energy 19
B 2000 Internet Specific 56
B 2000 Medical/Health 11
B 2008 Other Products 25
B 2008 Semiconductors/Other Elect. 3作者: shiyiming 时间: 2010-8-27 05:20 标题: Re: sas一个取最大值的问题[求助,急!!!] 猪头又上当了!
mz同学你说得不错,可以施采补之术
你只有几万行数据,不必担心冗余。
说来说去,问题还得靠你自己发现解决。作者: shiyiming 时间: 2010-8-27 08:39 标题: Re: sas一个取最大值的问题[求助,急!!!] 越来越没耐心看这种描述不清的问题了。 <!-- s:( --><img src="{SMILIES_PATH}/icon_sad.gif" alt=":(" title="Sad" /><!-- s:( -->作者: shiyiming 时间: 2010-8-27 11:05 标题: Re: sas一个取最大值的问题[求助,急!!!] 真的太感谢你了:)
你的这个方法比我那个明显好。。。
基本问题都解决,受教了受教了!!!
[quote="死猪头":3uagxwrw]猪头又上当了!
mz同学你说得不错,可以施采补之术,在第一个proc sort后加
[code:3uagxwrw]
data zhutou(drop=last_year);
do until (last.company);
do until(last.industry);
set zhutou;
by company industry;
end;
last_year = max(year, last_year);
end;
do until (last.company);
do until(last.industry);
set zhutou;
by company industry;
output;
end;
if year<last_year then do;
year = last_year;
count=.;
output;
end;
end;
run;
proc expand data=zhutou out=zhutou from=day;
id year;
by company industry;
convert count/method=none;
run;
[/code:3uagxwrw]
也可以自行采补,将data sizhutou步换成
[code:3uagxwrw]
data sizhutou(drop=count year prev_year last_year rename=(year1=year));
do until (last.company);
do until(last.industry);
set zhutou;
by company industry;
end;
last_year = max(year, last_year);
end;
do until (last.company);
acum_count=0;
do until(last.industry);
set zhutou;
by company industry;
if not first.industry then do year1=prev_year+1 to year-1;
random_number = ranuni(12345);
output;
end;
acum_count = sum(acum_count, count);
year1 = year;
random_number = ranuni(12345);
output;
prev_year = year;
end;
do year1 = year+1 to last_year;
random_number = ranuni(12345);
output;
end;
end;
run;
[/code:3uagxwrw]
你只有几万行数据,不必担心冗余。
说来说去,问题还得靠你自己发现解决。[/quote:3uagxwrw]