标题: An Economic Approach for a Class of Dimensionality Reduction [打印本页] 作者: shiyiming 时间: 2010-10-22 13:39 标题: An Economic Approach for a Class of Dimensionality Reduction From oloolo's blog on SasProgramming
<p><a href="http://feedads.g.doubleclick.net/~a/aA_miOlbV0dESAgnH2eH1fWgcvo/0/da"><img src="http://feedads.g.doubleclick.net/~a/aA_miOlbV0dESAgnH2eH1fWgcvo/0/di" border="0" ismap="true"></img></a><br/>
<a href="http://feedads.g.doubleclick.net/~a/aA_miOlbV0dESAgnH2eH1fWgcvo/1/da"><img src="http://feedads.g.doubleclick.net/~a/aA_miOlbV0dESAgnH2eH1fWgcvo/1/di" border="0" ismap="true"></img></a></p>Just back from KDD2010. In the conference, there are several papers that interested me.<br />
<br />
On the computation side, Liang Sun et al.'s paper [1], "A Scalable Two-Stage Approach for a Class of Dimensionality Reduction Techniques" caught my eyes. Liang proves that a class of dimension reduction techniques, such as CCA, OPLS, LDA, etc, that relies on general eigenvalue decomposition, can be computed in a much cheaper way by decomposing the original computation into a least square problem and a much smaller scale eigenvalue decomposition problem. The equivalence of their two stage approach and direct eigenvalue decomposition is rigourously proved. <br />
<br />
This technique is of particular interest to ppl like me that only have limited computing resources and I believe it would be good to implement their algorithm in SAS. For example, a Canonical Discriminant Analysis with above idea is demonstrated below. Note also that by specifing RIDGE= option in PROC REG, the regularized version can be implemented as well, besides, PROC REG is multi-threaded in SAS. Of course, the computing advantage is only appreciatable when the number of features is very large.<br />
<br />
The canonical analysis result from reduced version PROC CANDISC is the same as the full version. <br />
<br />
In fact, this exercise is the answer for Exercise 4.3 of The Elements of Statistical Learning [2] <br />
<br />
[1]. Liang Sun, Betul Ceran, Jieping Ye, "<a href="http://www.public.asu.edu/%7Elsun27/Publications/KDD_2010.pdf">A Scalable Two-Stage Approach for a Class of Dimensionality Reduction Techniques</a>", KDD2010, Washington DC. <br />
<br />
[2]. Trevor Hastie, Robert Tibshirani, Jerome Friedman, "The Elements of Statistical Learning", 2nd Edition.<br />
<br />
<a href="http://www.amazon.com/Elements-Statistical-Learning-Prediction-Statistics/dp/0387848576?ie=UTF8&tag=xie1978&link_code=bil&camp=213689&creative=392969" imageanchor="1" target="_blank"><img alt="The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer Series in Statistics)" src="http://ws.amazon.com/widgets/q?MarketPlace=US&ServiceVersion=20070822&ID=AsinImage&WS=1&Format=_SL160_&ASIN=0387848576&tag=xie1978" /></a><img alt="" border="0" height="1" src="http://www.assoc-amazon.com/e/ir?t=xie1978&l=bil&camp=213689&creative=392969&o=1&a=0387848576" style="border: medium none ! important; margin: 0px ! important; padding: 0px ! important;" width="1" /><br />
<br />
<pre style="background-color: #ebebeb; border: 1px dashed rgb(153, 153, 153); color: #000001; font-family: Andale Mono,Lucida Console,Monaco,fixed,monospace; font-size: 12px; line-height: 14px; overflow: auto; padding: 5px; width: 100%;"><code>
proc format;
value specname
1='Setosa '
2='Versicolor'
3='Virginica ';
run;
ods select none;
proc glmmod data=iris outdesign=H(keep=COL:);
class Species;
model SepalLength=Species/noint;
run;
data H;
merge H iris;
run;
/**************************
for efficiency consideration, a view can also be used:
data H/view=H;
set iris;
array _S{*} Col1-Col3 (3*0);
do j=1 to dim(_S); _S[j]=0; end;
_S[Species]=1;
drop j;
run;
****************************/
proc reg data=H outest=beta;
model Col1-Col3 = SepalLength SepalWidth PetalLength PetalWidth;
output out=P p=yhat1-yhat3;
run;quit;
ods select all;
proc candisc data=P;
class Species;
var yhat1-yhat3;
run;