SAS中文论坛

标题: 求助利用SAS实现 cluster bootstrap [打印本页]

作者: shiyiming 时间: 2011-2-20 17:21
标题: 求助利用SAS实现 cluster bootstrap
最近在用SAS做一些模拟，遇到一点困难还请各位大侠帮忙！

如何利用 SAS 实现 cluster bootstrap？所谓cluster的，就是数据中有个cluster的变量，我想重抽样的时候将属于某一个cluster的数据都抽出来，也就是只对cluster抽样。在网上搜到一段代码但是没看明白，还请各位大侠验证正确与否～～～谢谢！
* Create test data;
%let Nclusters=10;
data datain ([color=#FF0000:3qh9j67a]index=(cluster) [/color:3qh9j67a]sortedby=cluster);
do cluster = 1 to &nclusters;
case0 = ceil(ranuni(12345)*5); *to check;
do case = 1 to case0; /* 1 to 5 cases per cluster */
  output;
end;
end;
run;
proc print;
  run;

* Create replicate datastep view;
data bootdata / view=bootdata;
do sample=1 to 10;
do _i =1 to &nclusters;
  cluster = ceil(ranuni(123245)*&nclusters);
  [color=#FF0000:3qh9j67a]do until(_iorc_ ne 0);
   set datain key=cluster;
*put _all_;
   if _iorc_ eq 0 then output;
end;[/color:3qh9j67a] end;
  end;
_error_ = 0;
stop;
run;
proc print;
run;

作者: shiyiming 时间: 2011-2-21 02:10
标题: Re: 求助利用SAS实现 cluster bootstrap
程序好像可以理解。我不是很理解的是1.你的抽样方式 2. 他的boot方法。他的方法应该和以下程序类似（如果顺序无关）

[code:v9hvp7zb]proc surveyselect data =datain out =SampleOut reps =10 n =10 method =urs seed =555;
samplingunit cluster;
run;
data have;
set SampleOut;
do _n_ =1 to NumberHits;
output;
end;
run;[/code:v9hvp7zb]

作者: shiyiming 时间: 2011-2-21 19:15
标题: Re: 求助利用SAS实现 cluster bootstrap
to jingju11
谢谢你的回复，我想实现的抽样方式是这样的：假设我有一个学校学生的数据，有10个班，每班40人，我想实现的是对这10个班（作为整体）进行有放回的重复抽样（即我所说的cluster的bootstrap）
主要是不明白如何将data中的某一部分的观测作为整体来抽样，谢谢！

作者: shiyiming 时间: 2011-2-21 23:20
标题: Re: 求助利用SAS实现 cluster bootstrap
to jingju11
agree.
maybe like this is more efficient:
data class;
   do class=1 to 10;
      do studentid=1 to 40; output; end;
   end;
run;

data sampschema;
   do class=1 to 10;
      _NSIZE_=40;
output;
   end;
run;

proc sql;
   create view classv as
   select *
   from class
   order by class
   ;
quit;
proc surveyselect data=classv  sampsize=sampschema  method=urs  out=samp;
   strata class;
run;

作者: shiyiming 时间: 2011-2-22 00:14
标题: Re: 求助利用SAS实现 cluster bootstrap
caicaisas：
我对index的效率知之甚少。但我感觉那个程序应该很有效率。他的程序也是一种标准寻找重复记录的方法。
那个程序完全可以实现你所描述的cluster-bootstrap要求。有一点值得注意的是，如果你的班级的code（cluster）不是从1开始连续的，你得创造出这么个cluster来适应他的程序（他的cluster总是从1 到10，这个问题很小）。
关于proc SURVEYSELECT:一种疑虑：为什么用surveyselect选出的记录数总比data-step要多（我换了10不同的seed）？其给出的output已经sort-out and collapsed。可能的优点是NumberHits变量是个很自然的weight/freq变量。

OL的程序当然更好了。只是我对view的使用很少，所以常常不得其妙。

作者: shiyiming 时间: 2011-2-22 18:25
标题: Re: 求助利用SAS实现 cluster bootstrap
oloolo ，jingju11
很感谢你们的回答，但是我实现了一下，似乎不是我想要的那种结果，可能我还没有表达准确
首先bootstrap是有放回的抽样，
比如我有 5个值1，2，3，4，5；对这个数据集进行bootstrap抽样，我可以得到多个样本：1（1，1，2，3，3）2（1，2，2，4，5）3（2，3，3，4，5）。。。。
就是说每一个值抽出来后可以被放回去再抽。那么对于cluster的数据集的话，最终我们应该可能会重复抽到某个cluster，而cluster内的值是不会变的，所以不知道oloolo 和jingju11是不是理解成其他形式的抽样了呢？
谢谢

作者: shiyiming 时间: 2011-2-23 01:18
标题: Re: 求助利用SAS实现 cluster bootstrap
I don’t understand you even I thought I did. In fact, I still think that is the solution right for you. JingJu

作者: shiyiming 时间: 2011-2-23 02:36
标题: Re: 求助利用SAS实现 cluster bootstrap
to jingju11
I think he/she only wants URS with respect to Cluster, and once a cluster is selected, all ID within the Cluster will be selected.
then it is simple
proc freq data=yourdata noprint;
table cluster /out=cluster;
run;
proc surveyselect data=cluster method=urs sampsize=10; run;

*then merge with original data;

作者: shiyiming 时间: 2011-2-23 16:50
标题: Re: 求助利用SAS实现 cluster bootstrap
to olo;
是的，我基本上就是这个意思，而且你的merge真是一语点醒梦中人。。其实我对cluster抽就可以了，没必要纠结于原数据。
但是我最后是需要放回的重复抽样，比如cluster有10个值，那么抽了之后还是10个值，但是使用您的程序似乎是无放回的抽样，我试着写了一下，但是最后的merge上有些问题，还想请问下～～
%macro boot(sampnum);
%do sampnum = 1 %to &sampnum ;

data bootsamp_&sampnum;

      do i = 1 to nobs;
      x = round(ranuni(0) * nobs);
      set cluster
            nobs = nobs
            point = x;
      output;
   end;
   stop;
run;

proc sort data=bootsamp_&sampnum;
      by i class;
run;

data boot_&sampnum ;
   merge bootsamp_&sampnum(in=b) class(in=c);
   by class;
   if b;
run;
  %end;
%mend;

%boot(5);

cluster就是你前个程序出来的数据集
得到的bootsamp_&sampnum：i  class
                                          1 2
                                          2 2
                                          3 3
                                          。。。。
如果使用我后面的merge程序，对于class一样而i不同的值，并不能把所有的studentid  merge进去
如果加上by i class; 程序会报错，因为我原数据class 中没有i这个变量
不知道大侠有没有解决方法呢？

作者: shiyiming 时间: 2011-2-23 21:45
标题: Re: 求助利用SAS实现 cluster bootstrap
[quote:27bq9xny]to jingju11
I think he/she only wants URS with respect to Cluster, and once a cluster is selected, all ID within the Cluster will be selected...[/quote:27bq9xny]

We understand that. and that is why I use samplingunit in the proc SURVEYSELECT.
In the original index code, it used cluster as key variable and loop to index all IDs in a cluster instead of a single one.
I think proc freq + surveyselect is not necessary.

JIngJu

作者: shiyiming 时间: 2011-2-24 20:27
标题: Re: 求助利用SAS实现 cluster bootstrap
to jingju

I tried your code
but it didn't work, is that available for 9.2 ?? mine is 9.1

here is the log:

proc surveyselect data =class out =SampleOut reps =10 n =10 method =urs seed =555;
9    samplingunit cluster;
      ------------
      180
ERROR 180-322: 语句无效或未按正确顺序使用。
10 run;

Thanks!

作者: shiyiming 时间: 2011-2-24 21:11
标题: Re: 求助利用SAS实现 cluster bootstrap
to caicaisas
你的意思似乎是要先对CLUSTER做一个URS，然后给定CLUSTER号码以后，对CLUSTER内部的ID再做URS，对不对？
刚吃早饭，目前只能想到一个很笨的方法，容我想想有没有更有效率的。

作者: shiyiming 时间: 2011-2-24 21:55
标题: Re: 求助利用SAS实现 cluster bootstrap
to OL:

[quote:15z1w7rs](samplingunit) ....is that available for 9.2 ?? mine is 9.1[/quote:15z1w7rs]

You are right!

[quote:15z1w7rs]Beginning with SAS/STAT 9.22 in SAS 9.2 TS2M3, use the SAMPLINGUNIT or CLUSTER statement to name variables that identify the sampling units as groups of observations (clusters).[/quote:15z1w7rs]

作者: shiyiming 时间: 2011-2-24 23:10
标题: Re: 求助利用SAS实现 cluster bootstrap
to OLO, 我不需要对cluster内的ID再做URS，但是在merge的时候出现了问题

BY the way , URS 是无放回抽样吗？
其实我要做的是有放回抽样
所以cluster做完之后有重复的值，导致merge的时候就出错了

作者: shiyiming 时间: 2011-2-25 23:40
标题: Re: 求助利用SAS实现 cluster bootstrap
to caicaisas
URS=unrestricted sampling=有放回抽样
try the following code

data temp;
   do cluster=1 to 10;
      do id=1 to 40;
   output;
end;
   end;
run;

proc freq data=temp noprint;
   table cluster/out=cluster(drop=percent);
run;

proc surveyselect data=cluster  method=urs  sampsize=10 out=cluster_samp;
run;

data temp2;
   merge temp  cluster_samp;
   by cluster;
   if NumberHits>=1 then do;
      do seq=1 to NumberHits; output; end;
   end;
run;

proc sort data=temp2; by cluster  seq; run;

作者: shiyiming 时间: 2011-2-26 18:24
标题: Re: 求助利用SAS实现 cluster bootstrap
to OLO~

已经试出来了～～非常感谢！！

作者: shiyiming 时间: 2011-3-3 22:44
标题: Re: 求助利用SAS实现 cluster bootstrap
to oloolo
这段代码确实在功能上实现了cluster bootstrap～
但是关于cluster bootstrap 不知道还有没有改进的可能？程序最后只实现了一个样本，如果我想得到B个bootstrap的样本，就需要套一个宏，将它跑B次，然后把得到的B个数据集append在一起。这样特别不高效，因为我还需要把这个结果套在另一个宏里面，所以今天一直在想如何能不用宏解决多次的cluster bootstrap，不过真的水平有限，一直没弄出来，而且我的9.2 确实没有 samplingunit 功能。。。。可能还没到9.22。。。所以还得再次请教各位高手了！！
不胜感激啊！

作者: shiyiming 时间: 2011-3-4 11:53
标题: Re: 求助利用SAS实现 cluster bootstrap
to caicaisas
very simple
add one more loop for BY variable

作者: shiyiming 时间: 2011-3-5 08:31
标题: Re: 求助利用SAS实现 cluster bootstrap
to oloolo
但是最后merge的时候出现问题了，假设sampnum是指示sample的变量， by sampnum cluster,会出问题，因为temp这个原数据里面没有sampnum这个变量，所以merge不上。。。。
唉，真的比较愚笨 <img src="{SMILIES_PATH}/icon_confused.gif" alt=":?" title="Confused" /> ，，，能不能稍微展示一下code? 多谢了！

作者: shiyiming 时间: 2011-3-5 22:50
标题: Re: 求助利用SAS实现 cluster bootstrap
原始程序里利用index key来寻到cluster的方法的确很好，为什么不采用呢？

作者: shiyiming 时间: 2011-3-6 22:26
标题: Re: 求助利用SAS实现 cluster bootstrap
to jingju

是可以啊，但是还是merge的时候有问题啊，不知道我说明白了没有
可以生成1000个bootstrap样本，可是样本里是cluster值，再将每个cluster的值对应到n个subject时就出问题了，所以还是不懂怎么实现呢。。。。

作者: shiyiming 时间: 2011-3-7 21:49
标题: Re: 求助利用SAS实现 cluster bootstrap
我不甚明白你为什么要merge呢？在那个key index 程序里，当每一次选出一个cluster值，利用key找出与此cluster值匹配的所有ID，也就是以cluster为单位的抽样。我觉得很是简单明了，为什么你纠结于此呢？我可能没有完全理解你的意图。

作者: shiyiming 时间: 2011-3-10 20:33
标题: Re: 求助利用SAS实现 cluster bootstrap
to jingju
可能我对key 的用法不怎么会呢，其实是很菜的鸟。。。所以不知道你说的程序到底是什么样子的啊
不知道能否提点一下。。。。？

欢迎光临 SAS中文论坛 (http://mysas.net/forum/)