SAS中文论坛
标题:
A SAS MACRO FOR DECISION STUMP
[打印本页]
作者:
shiyiming
时间:
2010-10-22 13:24
标题:
A SAS MACRO FOR DECISION STUMP
From Wensui Liu's blog
<font face="monospace"><span style="background-color:#ffffff"><font color="#008000">*************************************************************;</font></span><br /><span style="background-color:#ffffff"><font color="#008000">* A SAS MACRO FOR DECISION STUMP SIMILAR TO %SPLIT() MACRO *;</font></span><br /><span style="background-color:#ffffff"><font color="#008000">* IN "Pharmaceutical Statistics Using SAS" by Dmitrienko, *;</font></span><br /><span style="background-color:#ffffff"><font color="#008000">* Chuang-Stein, AND D'Agostino *;</font></span><br /><span style="background-color:#ffffff"><font color="#008000">* --------------------------------------------------------- *;</font></span><br /><span style="background-color:#ffffff"><font color="#008000">* GENERAL CONCEPT: *;</font></span><br /><span style="background-color:#ffffff"><font color="#008000">* 1. DECISION STUMP IS A NAIVELY SIMPLE 1-LEVEL DECISION *;</font></span><br /><span style="background-color:#ffffff"><font color="#008000">* TREE WITH TWO TERMINAL NODES *;</font></span><br /><span style="background-color:#ffffff"><font color="#008000">* 2. COMMMONLY USED AS COMPONENTS IN BAGGING / BOOSTING *;</font></span><br /><span style="background-color:#ffffff"><font color="#008000">* MACHINE LEARNING ENSEMBLE ALGORITHMS *;</font></span><br /><span style="background-color:#ffffff"><font color="#008000">* --------------------------------------------------------- *;</font></span><br /><span style="background-color:#ffffff"><font color="#008000">* PRACTICAL USAGES: *;</font></span><br /><span style="background-color:#ffffff"><font color="#008000">* 1. VARIABLES PRE-SCREENING BEFORE MODEL DEVELOPMENT *;</font></span><br /><span style="background-color:#ffffff"><font color="#008000">* 2. POINT SEARCHING FOR SCORECARD CUTOFF STRATEGY *;</font></span><br /><span style="background-color:#ffffff"><font color="#008000">* --------------------------------------------------------- *;</font></span><br /><span style="background-color:#ffffff"><font color="#008000">* AUTHOR: <!-- e --><a href="mailto:wensliu@paypal.com">wensliu@paypal.com</a><!-- e --> *;</font></span><br /><span style="background-color:#ffffff"><font color="#008000">*************************************************************;</font></span><br /><br /><span style="background-color:#ffffff"><font color="#000080"><b>data</b></font></span> example1;<br /> <span style="background-color:#ffffff"><font color="#0000ff">do</font></span> i = <span style="background-color:#ffffff"><font color="#2e8b57"><b>1</b></font></span> to <span style="background-color:#ffffff"><font color="#2e8b57"><b>1000</b></font></span>;<br /><span style="background-color:#ffffff"><font color="#008000"> * x1: key driver with single cutoff = 5 *;</font></span><br /> x1 = <span style="background-color:#ffffff"><font color="#2e8b57"><b>10</b></font></span> * <span style="background-color:#ffffff"><font color="#0000ff">ranuni</font></span>(<span style="background-color:#ffffff"><font color="#2e8b57"><b>1</b></font></span>);<br /><span style="background-color:#ffffff"><font color="#008000"> * x2: related with 2 different cutoffs *;</font></span><br /> x2 = <span style="background-color:#ffffff"><font color="#2e8b57"><b>10</b></font></span> * <span style="background-color:#ffffff"><font color="#0000ff">ranuni</font></span>(<span style="background-color:#ffffff"><font color="#2e8b57"><b>2</b></font></span>);<br /><span style="background-color:#ffffff"><font color="#008000"> * x3: unrelated *;</font></span><br /> x3 = <span style="background-color:#ffffff"><font color="#2e8b57"><b>10</b></font></span> * <span style="background-color:#ffffff"><font color="#0000ff">ranuni</font></span>(<span style="background-color:#ffffff"><font color="#2e8b57"><b>3</b></font></span>);<br /> <span style="background-color:#ffffff"><font color="#0000ff">if</font></span> (x1 < <span style="background-color:#ffffff"><font color="#2e8b57"><b>5</b></font></span> <span style="background-color:#ffffff"><font color="#0000ff">and</font></span> x2 < <span style="background-color:#ffffff"><font color="#2e8b57"><b>1.5</b></font></span>) <span style="background-color:#ffffff"><font color="#0000ff">then</font></span> y = <span style="background-color:#ffffff"><font color="#2e8b57"><b>0</b></font></span>;<br /> <span style="background-color:#ffffff"><font color="#0000ff">if</font></span> (x1 < <span style="background-color:#ffffff"><font color="#2e8b57"><b>5</b></font></span> <span style="background-color:#ffffff"><font color="#0000ff">and</font></span> x2 >= <span style="background-color:#ffffff"><font color="#2e8b57"><b>1.5</b></font></span>) <span style="background-color:#ffffff"><font color="#0000ff">then</font></span> y = <span style="background-color:#ffffff"><font color="#2e8b57"><b>1</b></font></span>;<br /> <span style="background-color:#ffffff"><font color="#0000ff">if</font></span> (x1 >= <span style="background-color:#ffffff"><font color="#2e8b57"><b>5</b></font></span> <span style="background-color:#ffffff"><font color="#0000ff">and</font></span> x2 < <span style="background-color:#ffffff"><font color="#2e8b57"><b>7.5</b></font></span>) <span style="background-color:#ffffff"><font color="#0000ff">then</font></span> y = <span style="background-color:#ffffff"><font color="#2e8b57"><b>0</b></font></span>;<br /> <span style="background-color:#ffffff"><font color="#0000ff">if</font></span> (x1 >= <span style="background-color:#ffffff"><font color="#2e8b57"><b>5</b></font></span> <span style="background-color:#ffffff"><font color="#0000ff">and</font></span> x2 >= <span style="background-color:#ffffff"><font color="#2e8b57"><b>7.5</b></font></span>) <span style="background-color:#ffffff"><font color="#0000ff">then</font></span> y = <span style="background-color:#ffffff"><font color="#2e8b57"><b>1</b></font></span>;<br /> w = <span style="background-color:#ffffff"><font color="#2e8b57"><b>1</b></font></span>;<br /> <span style="background-color:#ffffff"><font color="#0000ff">output</font></span>;<br /> <span style="background-color:#ffffff"><font color="#0000ff">end</font></span>;<br /><span style="background-color:#ffffff"><font color="#000080"><b>run</b></font></span>;<br /><br /><span style="background-color:#ffffff"><font color="#0000ff">%macro</font></span> stump(<span style="background-color:#ffffff"><font color="#000080"><b>data</b></font></span> = , w = , y = , xlist = );<br /><span style="background-color:#ffffff"><font color="#0000ff">%let</font></span> i = <span style="background-color:#ffffff"><font color="#2e8b57"><b>1</b></font></span>;<br /><span style="background-color:#ffffff"><font color="#0000ff">%local</font></span> i;<br /><br /><span style="background-color:#ffffff"><font color="#000080"><b>proc sql</b></font></span>;<br /><span style="background-color:#ffffff"><font color="#0000ff">create</font></span> <span style="background-color:#ffffff"><font color="#0000ff">table</font></span> _out<br /> (<br /> variable char(<span style="background-color:#ffffff"><font color="#2e8b57"><b>32</b></font></span>),<br /> gt_value num,<br /> gini num<br /> );<br /><span style="background-color:#ffffff"><font color="#000080"><b>quit</b></font></span>;<br /><br /><span style="background-color:#ffffff"><font color="#0000ff">%do</font></span> <span style="background-color:#ffffff"><font color="#0000ff">%while</font></span> (<span style="background-color:#ffffff"><font color="#0000ff">%scan</font></span>(<font color="#0000ff"><b>&xlist</b></font>, <font color="#0000ff"><b>&i</b></font>) ne <span style="background-color:#ffffff"><font color="#0000ff">%str</font></span>()); <br /> <span style="background-color:#ffffff"><font color="#0000ff">%let</font></span> <span style="background-color:#ffffff"><font color="#0000ff">x</font></span> = <span style="background-color:#ffffff"><font color="#0000ff">%scan</font></span>(<font color="#0000ff"><b>&xlist</b></font>, <font color="#0000ff"><b>&i</b></font>);<br /> <br /> <span style="background-color:#ffffff"><font color="#000080"><b>data</b></font></span> _tmp1(<span style="background-color:#ffffff"><font color="#0000ff">keep</font></span> = <font color="#0000ff"><b>&w</b></font> <font color="#0000ff"><b>&y</b></font> <font color="#0000ff"><b>&x</b></font>);<br /> <span style="background-color:#ffffff"><font color="#0000ff">set</font></span> <font color="#0000ff"><b>&data</b></font>;<br /> <span style="background-color:#ffffff"><font color="#0000ff">where</font></span> <font color="#0000ff"><b>&y</b></font> <span style="background-color:#ffffff"><font color="#0000ff">in</font></span> (<span style="background-color:#ffffff"><font color="#2e8b57"><b>0</b></font></span>, <span style="background-color:#ffffff"><font color="#2e8b57"><b>1</b></font></span>);<br /> <span style="background-color:#ffffff"><font color="#000080"><b>run</b></font></span>;<br /><br /><span style="background-color:#ffffff"><font color="#000080"><b> proc sql</b></font></span>;<br /> <span style="background-color:#ffffff"><font color="#0000ff">create</font></span> <span style="background-color:#ffffff"><font color="#0000ff">table</font></span><br /> _tmp2 <span style="background-color:#ffffff"><font color="#0000ff">as</font></span><br /> <span style="background-color:#ffffff"><font color="#0000ff">select</font></span><br /> b.<font color="#0000ff"><b>&x</b></font> <span style="background-color:#ffffff"><font color="#0000ff">as</font></span> gt_value,<br /> <span style="background-color:#ffffff"><font color="#0000ff">sum</font></span>(case when a.<font color="#0000ff"><b>&x</b></font> <= b.<font color="#0000ff"><b>&x</b></font> <span style="background-color:#ffffff"><font color="#0000ff">then</font></span> <font color="#0000ff"><b>&w</b></font> * <font color="#0000ff"><b>&y</b></font> <span style="background-color:#ffffff"><font color="#0000ff">else</font></span> <span style="background-color:#ffffff"><font color="#2e8b57"><b>0</b></font></span> <span style="background-color:#ffffff"><font color="#0000ff">end</font></span>) / <br /> <span style="background-color:#ffffff"><font color="#0000ff">sum</font></span>(case when a.<font color="#0000ff"><b>&x</b></font> <= b.<font color="#0000ff"><b>&x</b></font> <span style="background-color:#ffffff"><font color="#0000ff">then</font></span> <font color="#0000ff"><b>&w</b></font> <span style="background-color:#ffffff"><font color="#0000ff">else</font></span> <span style="background-color:#ffffff"><font color="#2e8b57"><b>0</b></font></span> <span style="background-color:#ffffff"><font color="#0000ff">end</font></span>) <span style="background-color:#ffffff"><font color="#0000ff">as</font></span> p1_1,<br /> <span style="background-color:#ffffff"><font color="#0000ff">sum</font></span>(case when a.<font color="#0000ff"><b>&x</b></font> > b.<font color="#0000ff"><b>&x</b></font> <span style="background-color:#ffffff"><font color="#0000ff">then</font></span> <font color="#0000ff"><b>&w</b></font> * <font color="#0000ff"><b>&y</b></font> <span style="background-color:#ffffff"><font color="#0000ff">else</font></span> <span style="background-color:#ffffff"><font color="#2e8b57"><b>0</b></font></span> <span style="background-color:#ffffff"><font color="#0000ff">end</font></span>) / <br /> <span style="background-color:#ffffff"><font color="#0000ff">sum</font></span>(case when a.<font color="#0000ff"><b>&x</b></font> > b.<font color="#0000ff"><b>&x</b></font> <span style="background-color:#ffffff"><font color="#0000ff">then</font></span> <font color="#0000ff"><b>&w</b></font> <span style="background-color:#ffffff"><font color="#0000ff">else</font></span> <span style="background-color:#ffffff"><font color="#2e8b57"><b>0</b></font></span> <span style="background-color:#ffffff"><font color="#0000ff">end</font></span>) <span style="background-color:#ffffff"><font color="#0000ff">as</font></span> p1_2,<br /> <span style="background-color:#ffffff"><font color="#0000ff">sum</font></span>(case when a.<font color="#0000ff"><b>&x</b></font> <= b.<font color="#0000ff"><b>&x</b></font> <span style="background-color:#ffffff"><font color="#0000ff">then</font></span> <span style="background-color:#ffffff"><font color="#2e8b57"><b>1</b></font></span> <span style="background-color:#ffffff"><font color="#0000ff">else</font></span> <span style="background-color:#ffffff"><font color="#2e8b57"><b>0</b></font></span> <span style="background-color:#ffffff"><font color="#0000ff">end</font></span>) / count(*) <span style="background-color:#ffffff"><font color="#0000ff">as</font></span> ppn1,<br /> <span style="background-color:#ffffff"><font color="#0000ff">sum</font></span>(case when a.<font color="#0000ff"><b>&x</b></font> > b.<font color="#0000ff"><b>&x</b></font> <span style="background-color:#ffffff"><font color="#0000ff">then</font></span> <span style="background-color:#ffffff"><font color="#2e8b57"><b>1</b></font></span> <span style="background-color:#ffffff"><font color="#0000ff">else</font></span> <span style="background-color:#ffffff"><font color="#2e8b57"><b>0</b></font></span> <span style="background-color:#ffffff"><font color="#0000ff">end</font></span>) / count(*) <span style="background-color:#ffffff"><font color="#0000ff">as</font></span> ppn2,<br /> <span style="background-color:#ffffff"><font color="#2e8b57"><b>2</b></font></span> * calculated p1_1 * (<span style="background-color:#ffffff"><font color="#2e8b57"><b>1</b></font></span> - calculated p1_1) * calculated ppn1 + <br /> <span style="background-color:#ffffff"><font color="#2e8b57"><b>2</b></font></span> * calculated p1_2 * (<span style="background-color:#ffffff"><font color="#2e8b57"><b>1</b></font></span> - calculated p1_2) * calculated ppn2 <span style="background-color:#ffffff"><font color="#0000ff">as</font></span> gini<br /> <span style="background-color:#ffffff"><font color="#0000ff">from</font></span><br /> _tmp1 <span style="background-color:#ffffff"><font color="#0000ff">as</font></span> a,<br /> (<span style="background-color:#ffffff"><font color="#0000ff">select</font></span> <span style="background-color:#ffffff"><font color="#0000ff">distinct</font></span> <font color="#0000ff"><b>&x</b></font> <span style="background-color:#ffffff"><font color="#0000ff">from</font></span> _tmp1) <span style="background-color:#ffffff"><font color="#0000ff">as</font></span> b<br /> <span style="background-color:#ffffff"><font color="#0000ff">group</font></span> <span style="background-color:#ffffff"><font color="#0000ff">by</font></span><br /> b.<font color="#0000ff"><b>&x</b></font>;<br /><br /> <span style="background-color:#ffffff"><font color="#0000ff">insert</font></span> <span style="background-color:#ffffff"><font color="#0000ff">into</font></span> _out<br /> <span style="background-color:#ffffff"><font color="#0000ff">select</font></span><br /> <span style="background-color:#ffffff"><font color="#a020f0">"&x"</font></span>,<br /> gt_value,<br /> gini<br /> <span style="background-color:#ffffff"><font color="#0000ff">from</font></span><br /> _tmp2<br /> <span style="background-color:#ffffff"><font color="#0000ff">having</font></span><br /> gini = <span style="background-color:#ffffff"><font color="#0000ff">min</font></span>(gini);<br /><br /> <span style="background-color:#ffffff"><font color="#0000ff">drop</font></span> <span style="background-color:#ffffff"><font color="#0000ff">table</font></span> _tmp1;<br /> <span style="background-color:#ffffff"><font color="#000080"><b>quit</b></font></span>;<br /><br /> <span style="background-color:#ffffff"><font color="#0000ff">%let</font></span> i = <span style="background-color:#ffffff"><font color="#0000ff">%eval</font></span>(<font color="#0000ff"><b>&i</b></font> + <span style="background-color:#ffffff"><font color="#2e8b57"><b>1</b></font></span>); <br /><span style="background-color:#ffffff"><font color="#0000ff">%end</font></span>;<br /><br /><span style="background-color:#ffffff"><font color="#000080"><b>proc sort</b></font></span> <span style="background-color:#ffffff"><font color="#000080"><b>data</b></font></span> = _out;<br /> <span style="background-color:#ffffff"><font color="#0000ff">by</font></span> gini;<br /><span style="background-color:#ffffff"><font color="#000080"><b>run</b></font></span>;<br /><br /><span style="background-color:#ffffff"><font color="#000080"><b>proc report</b></font></span> <span style="background-color:#ffffff"><font color="#000080"><b>data</b></font></span> = _out box spacing = <span style="background-color:#ffffff"><font color="#2e8b57"><b>1</b></font></span> split = <span style="background-color:#ffffff"><font color="#a020f0">"*"</font></span> nowd;<br /> column(<span style="background-color:#ffffff"><font color="#a020f0">"DECISION STUMP SUMMARY"</font></span><br /> variable gt_value gini);<br /> define variable / <span style="background-color:#ffffff"><font color="#a020f0">"VARIABLE"</font></span> width = <span style="background-color:#ffffff"><font color="#2e8b57"><b>30</b></font></span> center;<br /> define gt_value / <span style="background-color:#ffffff"><font color="#a020f0">"CUTOFF VALUE*(GREATER THAN)"</font></span> width = <span style="background-color:#ffffff"><font color="#2e8b57"><b>15</b></font></span> center;<br /> define gini / <span style="background-color:#ffffff"><font color="#a020f0">"GINI"</font></span> width = <span style="background-color:#ffffff"><font color="#2e8b57"><b>10</b></font></span> center <span style="background-color:#ffffff"><font color="#0000ff">format</font></span> = <span style="background-color:#ffffff"><font color="#2e8b57"><b>9.4</b></font></span>;<br /><span style="background-color:#ffffff"><font color="#000080"><b>run</b></font></span>;<br /><br /><span style="background-color:#ffffff"><font color="#0000ff">%mend</font></span> stump;<br /><br />%stump(<span style="background-color:#ffffff"><font color="#000080"><b>data</b></font></span> = example1, w = w, y = y, xlist = x1 x2 x3);<br /><br /><span style="background-color:#ffffff"><font color="#008000">/*</font></span><br /><span style="background-color:#ffffff"><font color="#008000"> -----------------------------------------------------------</font></span><br /><span style="background-color:#ffffff"><font color="#008000"> | DECISION STUMP SUMMARY |</font></span><br /><span style="background-color:#ffffff"><font color="#008000"> | CUTOFF VALUE |</font></span><br /><span style="background-color:#ffffff"><font color="#008000"> | VARIABLE (GREATER THAN) GINI |</font></span><br /><span style="background-color:#ffffff"><font color="#008000"> |---------------------------------------------------------|</font></span><br /><span style="background-color:#ffffff"><font color="#008000"> | x1 | 4.9742638 | 0.3125 |</font></span><br /><span style="background-color:#ffffff"><font color="#008000"> |---------------------------------------------------------|</font></span><br /><span style="background-color:#ffffff"><font color="#008000"> | x2 | 7.4602286 | 0.3534 |</font></span><br /><span style="background-color:#ffffff"><font color="#008000"> |---------------------------------------------------------|</font></span><br /><span style="background-color:#ffffff"><font color="#008000"> | x3 | 0.6173268 | 0.4939 |</font></span><br /><span style="background-color:#ffffff"><font color="#008000"> -----------------------------------------------------------</font></span><br /><span style="background-color:#ffffff"><font color="#008000">*/</font></span></font>
欢迎光临 SAS中文论坛 (https://mysas.net/forum/)
Powered by Discuz! X3.2