标题: 问一个merge问题 [打印本页] 作者: shiyiming 时间: 2010-1-12 15:42 标题: 问一个merge问题 我现在的dataset
name var1 var2
A 1 3
A 3 4
A 5 2
B 2 1
B 4 0
C 3 6
我想把每个name的var1都写成1-5的形式,即
name var1 var2
A 1 3
A 2 .
A 3 4
A 4 2
A 5 .
B 1 .
B 2 1
B 3 .
B 4 0
B 5 .
C 1 .
C 2 .
C 3 6
C 4 .
C 5 .
我大脑有点秀逗了,我试图直接用merge,但是貌似不成,用sql好像也不能做到完全merge。咋办?作者: shiyiming 时间: 2010-1-12 18:04 标题: Re: 问一个merge问题 有点复杂,应该还有更简单的步骤。
[code:lxlqs776]data raw;
input name $ var1 var2;
datalines;
A 1 3
A 3 4
A 5 2
B 2 1
B 4 0
C 3 6
;
run;
data tmp;
do name='A','B','C';
do var1=1 to 5;
output;
end;
end;
run;
proc sort data=raw;
by name var1;
run;
proc sort data=tmp;
by name var1;
run;
data result;
merge tmp raw;
by name var1;
run;
[/code:lxlqs776]作者: shiyiming 时间: 2010-1-12 18:46 标题: Re: 问一个merge问题 谢谢哈,不过我的name不止A, B, C而已,而是一大串复杂的名字。另外var也不是简单的1,2,3...这样子,我只是为了简化问题随手写的,实际上是一个时间序列,不过这个倒是可以construct一个dataset出来。然后把使得每一个name的var1都包含这个dataset。作者: shiyiming 时间: 2010-1-12 19:09 标题: Re: 问一个merge问题 data step can do
and
means proc also can do!作者: shiyiming 时间: 2010-1-12 20:48 标题: Re: 问一个merge问题 data step写成这样成不?
byes大侠说的第2种方法我就不会写了... <!-- s:? --><img src="{SMILIES_PATH}/icon_confused.gif" alt=":?" title="Confused" /><!-- s:? -->
[code:39bh1dvx]proc sql noprint;
select max(var1)-min(var1)+1 into:n from raw;
quit;
%let n=&n;
data temp(drop=_v:);
array arr{*} _v1-_v&n;
do _n_=1 by 1 until(last.name);
set raw;
by name;
arr(var1)=var2;
end;
do var1=1 to dim(arr);
var2=arr(var1);
output;
end;
run;[/code:39bh1dvx]作者: shiyiming 时间: 2010-1-12 21:28 标题: Re: 问一个merge问题 仰视......
我都没看懂怎么实现的。
如果我有多个variable呢?比如:
[code:3eubphfq]
data raw;
input name $ week var1 var2;
datalines;
A 3 1 3
A 5 3 4
A 53 5 2
B 2 1 .
B 26 0 3
C 1 6 .
;
run;
[/code:3eubphfq]作者: shiyiming 时间: 2010-1-13 12:34 标题: Re: 问一个merge问题 What about this? <!-- s:-) --><img src="{SMILIES_PATH}/icon_smile.gif" alt=":-)" title="Smile" /><!-- s:-) -->
[code:1zh0zsok]options validvarname = any;
%let varsBesidesVar1Var2 = name week;
proc sort data = raw;
by &varsBesidesVar1Var2.;
run;
proc transpose data = raw
out = results(drop = _name_);
by &varsBesidesVar1Var2.;
var var2;
id var1;
run;
proc transpose data = results
out = results(rename = (col1 = var2
_name_ = var1
)
);
by &varsBesidesVar1Var2.;
run;
proc sort data = results;
by &varsBesidesVar1Var2.
var1;
run;
[/code:1zh0zsok]作者: shiyiming 时间: 2010-1-13 15:23 标题: Re: 问一个merge问题 To dengzi:
你第二次给的示例数据有个VAR1=0的情况,用原来hopewell的方法行不通。作者: shiyiming 时间: 2010-1-13 16:14 标题: Re: 问一个merge问题 data aa;
input name $ var1 var2;
cards;
A 1 3
A 3 4
A 5 2
B 2 1
B 4 0
C 3 6
;
run;
data bb (keep=name var1);
set aa;
do var1=1 to 5 by 1;
output;
end;
run;
proc sort data =aa;
by name var1;
run;
proc sort data =bb out=bb1 nodupkey;
by name var1;
run;
data cc;
merge aa bb1;
by name var1;
run;
这个是个小数据,要是做数据量很大的就不太可行,效率太低。作者: shiyiming 时间: 2010-1-13 18:47 标题: Re: 问一个merge问题 no,no,no
just only a data step
proc sql not nedded作者: shiyiming 时间: 2010-1-14 16:16 标题: Re: 问一个merge问题 to wl1017
这个可以得到我想要的结果,不过就是需要插入连续变化的var1。作者: shiyiming 时间: 2010-2-10 08:54 标题: Re: 问一个merge问题 终于实现只在一个data步中的序列填充。供各位参考,欢迎修改意见。
[code:25b9wtq2]data raw;
input name $ var1 var2;
datalines;
A 1 3
A 3 4
A 5 2
B 2 1
B 4 0
C 3 6
;
run;
proc sort data=raw out=raw_sorted;
by name var1;
run;
data temp(drop=i j /*var1_*/);
do i=1 to 5 by 1;
retain var1 1;
set raw_sorted(rename=(var1=var1_)) end=eof;
by name;
if first.name then do;
if last.name then do;
do j=1 to 5;
var1=j;
output;
end;
end;
else do;
do j=1 to var1_;
var1=j;
output;
end;
end;
end;
else do;
var1=var1+1;
if last.name then do;
if var1=5 then do;
do j=var1 to 5-1;
var1=j;
output;
end;
var1=var1_;
output;
end;
else do;
do j=var1 to 5;
var1=j;
output;
end;
end;
end;
else do;
do j=var1 to var1_;
var1=j;
output;
end;
end;
end;
end;
run;
data raw;
input name $ var1 var2;
datalines;
A 1 3
A 3 4
A 5 2
B 2 1
B 4 0
C 3 6
;
run;
proc freq data=raw noprint;
table name *var1/sparse out=_new(drop=COUNT PERCENT);
run;
data new;
merge raw _new;
by name var1;
run;作者: shiyiming 时间: 2010-2-13 10:29 标题: Re: 问一个merge问题 [quote="byes":3nofhcxa]no,no,no
just only a data step
proc sql not nedded[/quote:3nofhcxa]
莫非是用hash object? 不确定...
[code:3nofhcxa]data raw;
input name $ week var1 var2;
datalines;
A 3 1 3
A 5 3 4
A 53 5 2
B 2 1 .
B 26 0 3
C 1 6 .
;
data temp(drop=rc i);
length name $8;
if _n_=1 then do;
declare hash h(dataset:"raw",hashexp:16);
rc=h.defineKey('name','week');
rc=h.defineData('name','week','var1','var2');
rc=h.defineDone();
call missing(name,week,var1,var2);
end;
set raw(keep=name week);
by name;
if first.name then do;
do i=1 to 53;
rc=h.find(key:name,key:i);
week=i;
output;
call missing(of var:);
end;
end;
run;[/code:3nofhcxa]作者: shiyiming 时间: 2010-2-19 00:08 标题: Re: 问一个merge问题 why 'do i=1 to 53'? intended to cover all possible values?
all the code fragments above using so called one-step DATA STEP assumed that you've already known the max value of week, so that you can fill-the-holes. In case this is unknown, the data has to be passed twice, so that whether you enclose that two pass in one DATA STEP or not really doesn't matter.作者: shiyiming 时间: 2010-2-20 02:57 标题: Re: 问一个merge问题 我假定你只需要所有出现在数据中的Var1的值,而不必连续,比如0-7,9-11,而8没有出现,你不需要为每个名字都添加一行 var1=8。
data raw;
input name $ week var1 var2;
datalines;
A 3 1 3
A 5 3 4
A 53 5 2
B 2 1 .
B 26 0 3
C 1 6 .
;
proc sql;
create table allvar1 as
select distinct var1 from raw;
create table allnames as
select distinct name from raw;
create table temp as
select name,var1 from allname ,allvar1
order by name,var1;
quit;
proc sort data=raw;
by name var1;
run;
data result;
merge temp (in=a) raw;
by name var1;
if a;
run;作者: shiyiming 时间: 2010-2-21 14:56 标题: Re: 问一个merge问题 no no no!
hash not needed.
just one data step ,not include hash !作者: shiyiming 时间: 2010-2-22 08:47 标题: Re: 问一个merge问题 受不了了,BYES老大给点提示吧...作者: shiyiming 时间: 2011-1-16 13:34 标题: Re: 问一个merge问题 oloolo方法不错,不过需要sort。
two set in one data step can do it according to byes' idea ?