SAS中文论坛

 找回密码
 立即注册

扫一扫,访问微社区

查看: 894|回复: 0
打印 上一主题 下一主题

A SAS MACRO FOR DECISION STUMP

[复制链接]

49

主题

76

帖子

1462

积分

管理员

Rank: 9Rank: 9Rank: 9

积分
1462
楼主
 楼主| 发表于 2010-10-22 13:24:03 | 只看该作者

A SAS MACRO FOR DECISION STUMP

From Wensui Liu's blog

<font face="monospace"><span style="background-color&#58;#ffffff"><font color="#008000">*************************************************************;</font></span><br /><span style="background-color&#58;#ffffff"><font color="#008000">* A SAS MACRO FOR DECISION STUMP SIMILAR TO %SPLIT() MACRO  *;</font></span><br /><span style="background-color&#58;#ffffff"><font color="#008000">* IN &quot;Pharmaceutical Statistics Using SAS&quot; by Dmitrienko,   *;</font></span><br /><span style="background-color&#58;#ffffff"><font color="#008000">* Chuang-Stein, AND D'Agostino                              *;</font></span><br /><span style="background-color&#58;#ffffff"><font color="#008000">* --------------------------------------------------------- *;</font></span><br /><span style="background-color&#58;#ffffff"><font color="#008000">* GENERAL CONCEPT&#58;                                          *;</font></span><br /><span style="background-color&#58;#ffffff"><font color="#008000">* 1. DECISION STUMP IS A NAIVELY SIMPLE 1-LEVEL DECISION    *;</font></span><br /><span style="background-color&#58;#ffffff"><font color="#008000">*    TREE WITH TWO TERMINAL NODES                           *;</font></span><br /><span style="background-color&#58;#ffffff"><font color="#008000">* 2. COMMMONLY USED AS COMPONENTS IN BAGGING / BOOSTING     *;</font></span><br /><span style="background-color&#58;#ffffff"><font color="#008000">*    MACHINE LEARNING ENSEMBLE ALGORITHMS                   *;</font></span><br /><span style="background-color&#58;#ffffff"><font color="#008000">* --------------------------------------------------------- *;</font></span><br /><span style="background-color&#58;#ffffff"><font color="#008000">* PRACTICAL USAGES&#58;                                         *;</font></span><br /><span style="background-color&#58;#ffffff"><font color="#008000">* 1. VARIABLES PRE-SCREENING BEFORE MODEL DEVELOPMENT       *;</font></span><br /><span style="background-color&#58;#ffffff"><font color="#008000">* 2. POINT SEARCHING FOR SCORECARD CUTOFF STRATEGY          *;</font></span><br /><span style="background-color&#58;#ffffff"><font color="#008000">* --------------------------------------------------------- *;</font></span><br /><span style="background-color&#58;#ffffff"><font color="#008000">* AUTHOR&#58; <!-- e --><a href="mailto:wensliu@paypal.com">wensliu@paypal.com</a><!-- e -->                                *;</font></span><br /><span style="background-color&#58;#ffffff"><font color="#008000">*************************************************************;</font></span><br /><br /><span style="background-color&#58;#ffffff"><font color="#000080"><b>data</b></font></span> example1;<br />  <span style="background-color&#58;#ffffff"><font color="#0000ff">do</font></span> i = <span style="background-color&#58;#ffffff"><font color="#2e8b57"><b>1</b></font></span> to <span style="background-color&#58;#ffffff"><font color="#2e8b57"><b>1000</b></font></span>;<br /><span style="background-color&#58;#ffffff"><font color="#008000">    * x1&#58; key driver with single cutoff = 5 *;</font></span><br />    x1 = <span style="background-color&#58;#ffffff"><font color="#2e8b57"><b>10</b></font></span> * <span style="background-color&#58;#ffffff"><font color="#0000ff">ranuni</font></span>(<span style="background-color&#58;#ffffff"><font color="#2e8b57"><b>1</b></font></span>);<br /><span style="background-color&#58;#ffffff"><font color="#008000">    * x2&#58; related with 2 different cutoffs *;</font></span><br />          x2 = <span style="background-color&#58;#ffffff"><font color="#2e8b57"><b>10</b></font></span> * <span style="background-color&#58;#ffffff"><font color="#0000ff">ranuni</font></span>(<span style="background-color&#58;#ffffff"><font color="#2e8b57"><b>2</b></font></span>);<br /><span style="background-color&#58;#ffffff"><font color="#008000">    * x3&#58; unrelated *;</font></span><br />    x3 = <span style="background-color&#58;#ffffff"><font color="#2e8b57"><b>10</b></font></span> * <span style="background-color&#58;#ffffff"><font color="#0000ff">ranuni</font></span>(<span style="background-color&#58;#ffffff"><font color="#2e8b57"><b>3</b></font></span>);<br />    <span style="background-color&#58;#ffffff"><font color="#0000ff">if</font></span> (x1 &lt; <span style="background-color&#58;#ffffff"><font color="#2e8b57"><b>5</b></font></span>  <span style="background-color&#58;#ffffff"><font color="#0000ff">and</font></span> x2 &lt; <span style="background-color&#58;#ffffff"><font color="#2e8b57"><b>1.5</b></font></span>)  <span style="background-color&#58;#ffffff"><font color="#0000ff">then</font></span> y = <span style="background-color&#58;#ffffff"><font color="#2e8b57"><b>0</b></font></span>;<br />    <span style="background-color&#58;#ffffff"><font color="#0000ff">if</font></span> (x1 &lt; <span style="background-color&#58;#ffffff"><font color="#2e8b57"><b>5</b></font></span>  <span style="background-color&#58;#ffffff"><font color="#0000ff">and</font></span> x2 &gt;= <span style="background-color&#58;#ffffff"><font color="#2e8b57"><b>1.5</b></font></span>) <span style="background-color&#58;#ffffff"><font color="#0000ff">then</font></span> y = <span style="background-color&#58;#ffffff"><font color="#2e8b57"><b>1</b></font></span>;<br />    <span style="background-color&#58;#ffffff"><font color="#0000ff">if</font></span> (x1 &gt;= <span style="background-color&#58;#ffffff"><font color="#2e8b57"><b>5</b></font></span> <span style="background-color&#58;#ffffff"><font color="#0000ff">and</font></span> x2 &lt; <span style="background-color&#58;#ffffff"><font color="#2e8b57"><b>7.5</b></font></span>)  <span style="background-color&#58;#ffffff"><font color="#0000ff">then</font></span> y = <span style="background-color&#58;#ffffff"><font color="#2e8b57"><b>0</b></font></span>;<br />    <span style="background-color&#58;#ffffff"><font color="#0000ff">if</font></span> (x1 &gt;= <span style="background-color&#58;#ffffff"><font color="#2e8b57"><b>5</b></font></span> <span style="background-color&#58;#ffffff"><font color="#0000ff">and</font></span> x2 &gt;= <span style="background-color&#58;#ffffff"><font color="#2e8b57"><b>7.5</b></font></span>) <span style="background-color&#58;#ffffff"><font color="#0000ff">then</font></span> y = <span style="background-color&#58;#ffffff"><font color="#2e8b57"><b>1</b></font></span>;<br />    w = <span style="background-color&#58;#ffffff"><font color="#2e8b57"><b>1</b></font></span>;<br />    <span style="background-color&#58;#ffffff"><font color="#0000ff">output</font></span>;<br />  <span style="background-color&#58;#ffffff"><font color="#0000ff">end</font></span>;<br /><span style="background-color&#58;#ffffff"><font color="#000080"><b>run</b></font></span>;<br /><br /><span style="background-color&#58;#ffffff"><font color="#0000ff">%macro</font></span> stump(<span style="background-color&#58;#ffffff"><font color="#000080"><b>data</b></font></span> = , w = , y = , xlist = );<br /><span style="background-color&#58;#ffffff"><font color="#0000ff">%let</font></span> i = <span style="background-color&#58;#ffffff"><font color="#2e8b57"><b>1</b></font></span>;<br /><span style="background-color&#58;#ffffff"><font color="#0000ff">%local</font></span> i;<br /><br /><span style="background-color&#58;#ffffff"><font color="#000080"><b>proc sql</b></font></span>;<br /><span style="background-color&#58;#ffffff"><font color="#0000ff">create</font></span> <span style="background-color&#58;#ffffff"><font color="#0000ff">table</font></span> _out<br />  (<br />  variable   char(<span style="background-color&#58;#ffffff"><font color="#2e8b57"><b>32</b></font></span>),<br />  gt_value   num,<br />  gini       num<br />  );<br /><span style="background-color&#58;#ffffff"><font color="#000080"><b>quit</b></font></span>;<br /><br /><span style="background-color&#58;#ffffff"><font color="#0000ff">%do</font></span> <span style="background-color&#58;#ffffff"><font color="#0000ff">%while</font></span> (<span style="background-color&#58;#ffffff"><font color="#0000ff">%scan</font></span>(<font color="#0000ff"><b>&amp;xlist</b></font>, <font color="#0000ff"><b>&amp;i</b></font>) ne <span style="background-color&#58;#ffffff"><font color="#0000ff">%str</font></span>());  <br />  <span style="background-color&#58;#ffffff"><font color="#0000ff">%let</font></span> <span style="background-color&#58;#ffffff"><font color="#0000ff">x</font></span> = <span style="background-color&#58;#ffffff"><font color="#0000ff">%scan</font></span>(<font color="#0000ff"><b>&amp;xlist</b></font>, <font color="#0000ff"><b>&amp;i</b></font>);<br />  <br />  <span style="background-color&#58;#ffffff"><font color="#000080"><b>data</b></font></span> _tmp1(<span style="background-color&#58;#ffffff"><font color="#0000ff">keep</font></span> = <font color="#0000ff"><b>&amp;w</b></font> <font color="#0000ff"><b>&amp;y</b></font> <font color="#0000ff"><b>&amp;x</b></font>);<br />    <span style="background-color&#58;#ffffff"><font color="#0000ff">set</font></span> <font color="#0000ff"><b>&amp;data</b></font>;<br />    <span style="background-color&#58;#ffffff"><font color="#0000ff">where</font></span> <font color="#0000ff"><b>&amp;y</b></font> <span style="background-color&#58;#ffffff"><font color="#0000ff">in</font></span> (<span style="background-color&#58;#ffffff"><font color="#2e8b57"><b>0</b></font></span>, <span style="background-color&#58;#ffffff"><font color="#2e8b57"><b>1</b></font></span>);<br />  <span style="background-color&#58;#ffffff"><font color="#000080"><b>run</b></font></span>;<br /><br /><span style="background-color&#58;#ffffff"><font color="#000080"><b>  proc sql</b></font></span>;<br />    <span style="background-color&#58;#ffffff"><font color="#0000ff">create</font></span> <span style="background-color&#58;#ffffff"><font color="#0000ff">table</font></span><br />      _tmp2 <span style="background-color&#58;#ffffff"><font color="#0000ff">as</font></span><br />    <span style="background-color&#58;#ffffff"><font color="#0000ff">select</font></span><br />      b.<font color="#0000ff"><b>&amp;x</b></font>                                                          <span style="background-color&#58;#ffffff"><font color="#0000ff">as</font></span> gt_value,<br />      <span style="background-color&#58;#ffffff"><font color="#0000ff">sum</font></span>(case when a.<font color="#0000ff"><b>&amp;x</b></font> &lt;= b.<font color="#0000ff"><b>&amp;x</b></font> <span style="background-color&#58;#ffffff"><font color="#0000ff">then</font></span> <font color="#0000ff"><b>&amp;w</b></font> * <font color="#0000ff"><b>&amp;y</b></font> <span style="background-color&#58;#ffffff"><font color="#0000ff">else</font></span> <span style="background-color&#58;#ffffff"><font color="#2e8b57"><b>0</b></font></span> <span style="background-color&#58;#ffffff"><font color="#0000ff">end</font></span>) / <br />      <span style="background-color&#58;#ffffff"><font color="#0000ff">sum</font></span>(case when a.<font color="#0000ff"><b>&amp;x</b></font> &lt;= b.<font color="#0000ff"><b>&amp;x</b></font> <span style="background-color&#58;#ffffff"><font color="#0000ff">then</font></span> <font color="#0000ff"><b>&amp;w</b></font> <span style="background-color&#58;#ffffff"><font color="#0000ff">else</font></span> <span style="background-color&#58;#ffffff"><font color="#2e8b57"><b>0</b></font></span> <span style="background-color&#58;#ffffff"><font color="#0000ff">end</font></span>)                <span style="background-color&#58;#ffffff"><font color="#0000ff">as</font></span> p1_1,<br />      <span style="background-color&#58;#ffffff"><font color="#0000ff">sum</font></span>(case when a.<font color="#0000ff"><b>&amp;x</b></font> &gt;  b.<font color="#0000ff"><b>&amp;x</b></font> <span style="background-color&#58;#ffffff"><font color="#0000ff">then</font></span> <font color="#0000ff"><b>&amp;w</b></font> * <font color="#0000ff"><b>&amp;y</b></font> <span style="background-color&#58;#ffffff"><font color="#0000ff">else</font></span> <span style="background-color&#58;#ffffff"><font color="#2e8b57"><b>0</b></font></span> <span style="background-color&#58;#ffffff"><font color="#0000ff">end</font></span>) / <br />      <span style="background-color&#58;#ffffff"><font color="#0000ff">sum</font></span>(case when a.<font color="#0000ff"><b>&amp;x</b></font> &gt;  b.<font color="#0000ff"><b>&amp;x</b></font> <span style="background-color&#58;#ffffff"><font color="#0000ff">then</font></span> <font color="#0000ff"><b>&amp;w</b></font> <span style="background-color&#58;#ffffff"><font color="#0000ff">else</font></span> <span style="background-color&#58;#ffffff"><font color="#2e8b57"><b>0</b></font></span> <span style="background-color&#58;#ffffff"><font color="#0000ff">end</font></span>)                <span style="background-color&#58;#ffffff"><font color="#0000ff">as</font></span> p1_2,<br />      <span style="background-color&#58;#ffffff"><font color="#0000ff">sum</font></span>(case when a.<font color="#0000ff"><b>&amp;x</b></font> &lt;= b.<font color="#0000ff"><b>&amp;x</b></font> <span style="background-color&#58;#ffffff"><font color="#0000ff">then</font></span> <span style="background-color&#58;#ffffff"><font color="#2e8b57"><b>1</b></font></span> <span style="background-color&#58;#ffffff"><font color="#0000ff">else</font></span> <span style="background-color&#58;#ffffff"><font color="#2e8b57"><b>0</b></font></span> <span style="background-color&#58;#ffffff"><font color="#0000ff">end</font></span>) / count(*)      <span style="background-color&#58;#ffffff"><font color="#0000ff">as</font></span> ppn1,<br />      <span style="background-color&#58;#ffffff"><font color="#0000ff">sum</font></span>(case when a.<font color="#0000ff"><b>&amp;x</b></font> &gt;  b.<font color="#0000ff"><b>&amp;x</b></font> <span style="background-color&#58;#ffffff"><font color="#0000ff">then</font></span> <span style="background-color&#58;#ffffff"><font color="#2e8b57"><b>1</b></font></span> <span style="background-color&#58;#ffffff"><font color="#0000ff">else</font></span> <span style="background-color&#58;#ffffff"><font color="#2e8b57"><b>0</b></font></span> <span style="background-color&#58;#ffffff"><font color="#0000ff">end</font></span>) / count(*)      <span style="background-color&#58;#ffffff"><font color="#0000ff">as</font></span> ppn2,<br />      <span style="background-color&#58;#ffffff"><font color="#2e8b57"><b>2</b></font></span> * calculated p1_1 * (<span style="background-color&#58;#ffffff"><font color="#2e8b57"><b>1</b></font></span> - calculated p1_1) * calculated ppn1 + <br />      <span style="background-color&#58;#ffffff"><font color="#2e8b57"><b>2</b></font></span> * calculated p1_2 * (<span style="background-color&#58;#ffffff"><font color="#2e8b57"><b>1</b></font></span> - calculated p1_2) * calculated ppn2 <span style="background-color&#58;#ffffff"><font color="#0000ff">as</font></span> gini<br />    <span style="background-color&#58;#ffffff"><font color="#0000ff">from</font></span><br />      _tmp1 <span style="background-color&#58;#ffffff"><font color="#0000ff">as</font></span> a,<br />      (<span style="background-color&#58;#ffffff"><font color="#0000ff">select</font></span> <span style="background-color&#58;#ffffff"><font color="#0000ff">distinct</font></span> <font color="#0000ff"><b>&amp;x</b></font> <span style="background-color&#58;#ffffff"><font color="#0000ff">from</font></span> _tmp1) <span style="background-color&#58;#ffffff"><font color="#0000ff">as</font></span> b<br />    <span style="background-color&#58;#ffffff"><font color="#0000ff">group</font></span> <span style="background-color&#58;#ffffff"><font color="#0000ff">by</font></span><br />      b.<font color="#0000ff"><b>&amp;x</b></font>;<br /><br />    <span style="background-color&#58;#ffffff"><font color="#0000ff">insert</font></span> <span style="background-color&#58;#ffffff"><font color="#0000ff">into</font></span> _out<br />    <span style="background-color&#58;#ffffff"><font color="#0000ff">select</font></span><br />      <span style="background-color&#58;#ffffff"><font color="#a020f0">&quot;&amp;x&quot;</font></span>,<br />      gt_value,<br />      gini<br />    <span style="background-color&#58;#ffffff"><font color="#0000ff">from</font></span><br />      _tmp2<br />    <span style="background-color&#58;#ffffff"><font color="#0000ff">having</font></span><br />      gini = <span style="background-color&#58;#ffffff"><font color="#0000ff">min</font></span>(gini);<br /><br />    <span style="background-color&#58;#ffffff"><font color="#0000ff">drop</font></span> <span style="background-color&#58;#ffffff"><font color="#0000ff">table</font></span> _tmp1;<br />  <span style="background-color&#58;#ffffff"><font color="#000080"><b>quit</b></font></span>;<br /><br />  <span style="background-color&#58;#ffffff"><font color="#0000ff">%let</font></span> i = <span style="background-color&#58;#ffffff"><font color="#0000ff">%eval</font></span>(<font color="#0000ff"><b>&amp;i</b></font> + <span style="background-color&#58;#ffffff"><font color="#2e8b57"><b>1</b></font></span>); <br /><span style="background-color&#58;#ffffff"><font color="#0000ff">%end</font></span>;<br /><br /><span style="background-color&#58;#ffffff"><font color="#000080"><b>proc sort</b></font></span> <span style="background-color&#58;#ffffff"><font color="#000080"><b>data</b></font></span> = _out;<br />  <span style="background-color&#58;#ffffff"><font color="#0000ff">by</font></span> gini;<br /><span style="background-color&#58;#ffffff"><font color="#000080"><b>run</b></font></span>;<br /><br /><span style="background-color&#58;#ffffff"><font color="#000080"><b>proc report</b></font></span> <span style="background-color&#58;#ffffff"><font color="#000080"><b>data</b></font></span> = _out box spacing = <span style="background-color&#58;#ffffff"><font color="#2e8b57"><b>1</b></font></span> split = <span style="background-color&#58;#ffffff"><font color="#a020f0">&quot;*&quot;</font></span> nowd;<br />  column(<span style="background-color&#58;#ffffff"><font color="#a020f0">&quot;DECISION STUMP SUMMARY&quot;</font></span><br />         variable gt_value gini);<br />  define variable / <span style="background-color&#58;#ffffff"><font color="#a020f0">&quot;VARIABLE&quot;</font></span>                     width = <span style="background-color&#58;#ffffff"><font color="#2e8b57"><b>30</b></font></span> center;<br />  define gt_value / <span style="background-color&#58;#ffffff"><font color="#a020f0">&quot;CUTOFF VALUE*(GREATER THAN)&quot;</font></span>  width = <span style="background-color&#58;#ffffff"><font color="#2e8b57"><b>15</b></font></span> center;<br />  define gini     / <span style="background-color&#58;#ffffff"><font color="#a020f0">&quot;GINI&quot;</font></span>                         width = <span style="background-color&#58;#ffffff"><font color="#2e8b57"><b>10</b></font></span> center <span style="background-color&#58;#ffffff"><font color="#0000ff">format</font></span> = <span style="background-color&#58;#ffffff"><font color="#2e8b57"><b>9.4</b></font></span>;<br /><span style="background-color&#58;#ffffff"><font color="#000080"><b>run</b></font></span>;<br /><br /><span style="background-color&#58;#ffffff"><font color="#0000ff">%mend</font></span> stump;<br /><br />%stump(<span style="background-color&#58;#ffffff"><font color="#000080"><b>data</b></font></span> = example1, w = w, y = y, xlist = x1 x2 x3);<br /><br /><span style="background-color&#58;#ffffff"><font color="#008000">/*</font></span><br /><span style="background-color&#58;#ffffff"><font color="#008000"> -----------------------------------------------------------</font></span><br /><span style="background-color&#58;#ffffff"><font color="#008000"> |                 DECISION STUMP SUMMARY                  |</font></span><br /><span style="background-color&#58;#ffffff"><font color="#008000"> |                                CUTOFF VALUE             |</font></span><br /><span style="background-color&#58;#ffffff"><font color="#008000"> |           VARIABLE            (GREATER THAN)     GINI   |</font></span><br /><span style="background-color&#58;#ffffff"><font color="#008000"> |---------------------------------------------------------|</font></span><br /><span style="background-color&#58;#ffffff"><font color="#008000"> |              x1              |   4.9742638   |   0.3125 |</font></span><br /><span style="background-color&#58;#ffffff"><font color="#008000"> |---------------------------------------------------------|</font></span><br /><span style="background-color&#58;#ffffff"><font color="#008000"> |              x2              |   7.4602286   |   0.3534 |</font></span><br /><span style="background-color&#58;#ffffff"><font color="#008000"> |---------------------------------------------------------|</font></span><br /><span style="background-color&#58;#ffffff"><font color="#008000"> |              x3              |   0.6173268   |   0.4939 |</font></span><br /><span style="background-color&#58;#ffffff"><font color="#008000"> -----------------------------------------------------------</font></span><br /><span style="background-color&#58;#ffffff"><font color="#008000">*/</font></span></font>
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

QQ|小黑屋|手机版|Archiver|SAS中文论坛  

GMT+8, 2026-2-3 22:03 , Processed in 0.068966 second(s), 20 queries .

Powered by Discuz! X3.2

© 2001-2013 Comsenz Inc.

快速回复 返回顶部 返回列表