|
6#

楼主 |
发表于 2004-3-6 11:56:53
|
只看该作者
If you can look into the log window of SAS Enterprise Miner, you will find SAS did it in the same way. P.S. I knew the code has been tested on some decent financial cases overseas.
it is simple to verify the algorithm, create the sample data and compare it to the whole population, see if it meets the row count and the distribution.
there're a quite a few ways to do sampling in SAS:
Proc Survey can provide some built-in features to let you select specific method, while it is quite inefficient;
using some distribution func like uniform() to customize the method -- the way above, will be suitable for experienced programmers.
sometimes for those not-so-accurate cases, you can even use rantbl(), which is quite efficient for large database.
e.g. for an 10% sampling,
data sample;
set population(where=(rantbl(-1,0.1) = 1));
run;
we tested it with 30GB, 400M rows data, also, it is OK with SAS/Access to other DBMS. |
|