|
|
楼主

楼主 |
发表于 2010-11-4 21:21:02
|
只看该作者
PROC MIXED: Macro to assess fixed and random effects for sig
From LCChien's blog on blogspot
原文載點:<span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: 13px;"><a href="https://docs.google.com/viewer?url=http://analytics.ncsu.edu/sesug/2009/SC013.Bardenheier.pdf"><!-- m --><a class="postlink" href="http://analytics.ncsu.edu/sesug/2009/SC013.Bardenheier.pdf">http://analytics.ncsu.edu/sesug/2009/SC ... nheier.pdf</a><!-- m --></a></span><br /><br />若要從兩個nested mixed model挑選較佳的模式,比較專業的人會推薦使用Likelihood Ratio Test(以下簡稱LRT)來處理這類的問題。PROC MIXED程序並沒有語法來進行這個檢定,但這個檢定需要使用到的參數,卻全部都可以從PROC MIXED程序的輸出報表內找到。懂得運作的使用者可以從報表拿擷取相關參數,然後自己算一算再查個表大概就知道結果了。我自己曾經在準備博士班資格考時寫了一個簡易的macro來做LRT,但沒想到真的有人把這個觀念發表出來。我稍微看了一下這份技術文件內的macro寫法,與我當年寫的大同小異,但作者有將輸出報表格式設計的比較好看一點,所以我就來介紹一下這個方便進行LRT的macro。<br /><br /><a name='more'></a><br />假設兩個nested mixed model分別稱為reduced model和full model,通常reduced model會被設定在虛無假設H0下,full model會被設定在對立假設H1下。而整個LRT會從這兩個模式各取出兩個數據出來,一個是他們的自由度,另一個是他們-2log likelihood function的值。這兩個數據全部都可以從PROC MIXED的報表內找到。我以PROC MIXED線上手冊所附的一個範例為例,請參照這個網頁:<br /><br /><a href="http://www.tau.ac.il/cc/pages/docs/sas8/stat/chap41/sect30.htm"><!-- m --><a class="postlink" href="http://www.tau.ac.il/cc/pages/docs/sas8/stat/chap41/sect30.htm">http://www.tau.ac.il/cc/pages/docs/sas8 ... sect30.htm</a><!-- m --></a><br /><br />這兩個數據可以在這個地方找到,圖示如下(紅色箭頭處):<br /><span class="Apple-style-span" style="color: #4c4c4c; font-family: arial, helvetica, sans-serif; font-size: 12px;"><img id="imgThumbnail" src="http://rookeryiis2.aviary.com/storage/3472500/3472607_252c_rss.jpg" style="border-bottom-style: none; border-bottom-width: 0px; border-color: initial; border-left-style: none; border-left-width: 0px; border-right-style: none; border-right-width: 0px; border-top-style: none; border-top-width: 0px; border-width: initial;" /></span><br /><span class="Apple-style-span" style="color: #4c4c4c; font-family: arial, helvetica, sans-serif; font-size: 12px;"><br /></span><br />至於公式,引用 wiki 的資料:<br /><br /><span class="Apple-style-span" style="font-family: sans-serif; font-size: 13px; line-height: 19px;"><img alt="\begin{align}D & = -2(\ln(\text{likelihood for null model}) - \ln(\text{likelihood for alternative model})) \\& = -2\ln\left( \frac{\text{likelihood for null model}}{\text{likelihood for alternative model}} \right).\end{align}" class="tex" src="http://upload.wikimedia.org/math/a/c/6/ac612f42c3987cbac42333ad2c338013.png" style="border-bottom-style: none; border-color: initial; border-left-style: none; border-right-style: none; border-top-style: none; border-width: initial; vertical-align: middle;" /></span><br /><br />白話一點,就是把 reduced model 的 -2 log likelihood function 的值減掉 full model 的 -2 log likelihood function 的值,就可以算出上面那個 D。然後兩個模式的自由度數量的差異套進卡方分配裡面,就可以算出 p-value 了。如果 p-value < 0.05,就表示代表 reduced model 的 H0 被拒絕了,也就是說 full model 比 reduced model 好。反之亦然。<br /><br />這種計算其實根本不需要使用到電腦的,但懶人永遠可以找藉口,所以原文作者還是花了將近兩頁的篇幅寫了這個 macro 來服務大眾。<br /><br /><code><span class="Apple-style-span" style="color: blue;">%macro LRMixtureTest(fullmodel=,redmodel=,DFfull=,DFred=); </span><br /><span class="Apple-style-span" style="color: #38761d;">/* </span><br /><span class="Apple-style-span" style="color: #38761d;">This macro calculates the Likelihood Ratio Test for fixed effects in a </span><br /><span class="Apple-style-span" style="color: #38761d;">mixed model </span><br /><span class="Apple-style-span" style="color: #38761d;">See Singer Applied Longitudinal Data Analysis pp 116-120. </span><br /><span class="Apple-style-span" style="color: #38761d;">and also calculates the mixture method pvalue for random effects in a </span><br /><span class="Apple-style-span" style="color: #38761d;">mixed model -see KKNM chapter 26. </span><br /><span class="Apple-style-span" style="color: #38761d;">The macro argument &fullmodel refers to the output dataset produced by </span><br /><span class="Apple-style-span" style="color: #38761d;">running the full model in proc mixed and using the statement: </span><br /><span class="Apple-style-span" style="color: #38761d;"> ods output FitStatistics=fm; </span><br /><span class="Apple-style-span" style="color: #38761d;">The macro argument &DFfull refers to the output dataset produced by </span><br /><span class="Apple-style-span" style="color: #38761d;">running the full model in proc mixed and adding to the statement: </span><br /><span class="Apple-style-span" style="color: #38761d;"> ods output FitStatistics=fm SolutionF=SFfm; The number of </span><br /><span class="Apple-style-span" style="color: #38761d;">parameters being tested (ie, fixed effects) determine the degrees of </span><br /><span class="Apple-style-span" style="color: #38761d;">freedom in the LRT and come from this dataset. </span><br /><span class="Apple-style-span" style="color: #38761d;">The argument &redmodel refers to the output dataset produced by running </span><br /><span class="Apple-style-span" style="color: #38761d;">the reduced model in proc mixed using the statement: </span><br /><span class="Apple-style-span" style="color: #38761d;"> ods output FitStatistics=rm SolutionF=SFrm ; </span><br /><span class="Apple-style-span" style="color: #38761d;">Maximum Likelihood should be used when comparing fixed effects because </span><br /><span class="Apple-style-span" style="color: #38761d;">it maximizes the likelihood of the sample data. Using the Restricted </span><br /><span class="Apple-style-span" style="color: #38761d;">Maximum likelihood should only be used when assessing a random term </span><br /><span class="Apple-style-span" style="color: #38761d;">(ie, the fixed effects should be the same in both models) </span><br /><span class="Apple-style-span" style="color: #38761d;">because in RML we maximize the likelihood of the residuals. The degrees </span><br /><span class="Apple-style-span" style="color: #38761d;">of freedom would not be correct if the number of fixed effects were </span><br /><span class="Apple-style-span" style="color: #38761d;">different in each model when using RML -which is the default in PROC </span><br /><span class="Apple-style-span" style="color: #38761d;">MIXED. </span><br /><span class="Apple-style-span" style="color: #38761d;">*/ </span><br /><span class="Apple-style-span" style="color: blue;">data &fullmodel; set &fullmodel; </span><br /><span class="Apple-style-span" style="color: blue;"> if descr='-2 Res Log Likelihood' then chisqfullRML=value; </span><span class="Apple-style-span" style="color: #38761d;">*RML; </span><br /><span class="Apple-style-span" style="color: blue;"> else if descr='-2 Log Likelihood' then chisqfullFML=value; </span><span class="Apple-style-span" style="color: #38761d;">*FML; </span><br /><span class="Apple-style-span" style="color: blue;">run; </span><br /><span class="Apple-style-span" style="color: blue;">data &redmodel; set &redmodel; </span><br /><span class="Apple-style-span" style="color: blue;"> if descr='-2 Res Log Likelihood' then chisqredRML=value; </span><span class="Apple-style-span" style="color: #38761d;">*RML; </span><br /><span class="Apple-style-span" style="color: blue;"> else if descr='-2 Log Likelihood' then chisqredFML=value; </span><span class="Apple-style-span" style="color: #38761d;">*FML; </span><br /><span class="Apple-style-span" style="color: blue;">run; </span><br /><span class="Apple-style-span" style="color: blue;">proc sql; </span><br /><span class="Apple-style-span" style="color: blue;">CREATE TABLE flDF AS SELECT effect, count(*) AS fullDF from </span><br /><span class="Apple-style-span" style="color: blue;">work.&DFfull; quit; </span><span class="Apple-style-span" style="color: #38761d;">*Count number of effects to get degrees of freedom; </span><br /><br /><span class="Apple-style-span" style="color: blue;">proc sql; </span><br /><span class="Apple-style-span" style="color: blue;">create table rdDF as select effect, count(*) as redDF from </span><br /><span class="Apple-style-span" style="color: blue;">work.&DFred; quit; </span><br /><br /><span class="Apple-style-span" style="color: blue;">data degfree (drop=effect); </span><br /><span class="Apple-style-span" style="color: blue;"> merge work.flDF work.rdDF; </span><br /><span class="Apple-style-span" style="color: blue;">if _n_=1 then LRTDF= fullDF - redDF; run; </span><br /><br /><span class="Apple-style-span" style="color: blue;">data likelihood; </span><br /><span class="Apple-style-span" style="color: blue;"> merge &fullmodel &redmodel degfree; </span><br /><span class="Apple-style-span" style="color: blue;"> testintRML=abs((chisqredRML)-(chisqfullRML)); </span><span class="Apple-style-span" style="color: #38761d;">**Models can yield </span><br /><span class="Apple-style-span" style="color: #38761d;">negative LLs, those that are smaller in absolute value -ie, closer to 0 </span><br /><span class="Apple-style-span" style="color: #38761d;">fit better (pg 116-117, Singer); </span><br /><br /><span class="Apple-style-span" style="color: blue;">7 </span><br /><span class="Apple-style-span" style="color: blue;">testintFML=abs((chisqredFML)-(chisqfullFML)); </span><span class="Apple-style-span" style="color: #38761d;">**Models can yield </span><br /><span class="Apple-style-span" style="color: #38761d;">negative LLs, those that are smaller in absolute value -ie, closer to 0 </span><br /><span class="Apple-style-span" style="color: #38761d;">fit better (pg 116-117, Singer); </span><br /><span class="Apple-style-span" style="color: blue;"> pvaluemixture=(( .5*(1-probchi(testintRML,2)) + .5*(1-</span><br /><span class="Apple-style-span" style="color: blue;">probchi(testintRML,1)) )); </span><span class="Apple-style-span" style="color: #38761d;">*for random terms; </span><br /><span class="Apple-style-span" style="color: blue;"> pvalueLRT=(1-probchi(testintFML,LRTDF)); </span><span class="Apple-style-span" style="color: #38761d;">*For fixed terms; </span><br /><span class="Apple-style-span" style="color: blue;"> </span><br /><span class="Apple-style-span" style="color: blue;">proc print data=likelihood split='*' noobs; </span><br /><span class="Apple-style-span" style="color: blue;"> var testintFML pvalueLRT; </span><br /><span class="Apple-style-span" style="color: blue;"> format pvalueLRT 6.4; </span><br /><span class="Apple-style-span" style="color: blue;"> where testintFML ne .; </span><br /><span class="Apple-style-span" style="color: blue;"> label testintFML='likelihood ratio*test statistic* </span><br /><span class="Apple-style-span" style="color: blue;"> comparing*reduced model to full model' </span><br /><span class="Apple-style-span" style="color: blue;"> pvalueLRT='p-value LRT'; </span><br /><span class="Apple-style-span" style="color: blue;"> title "Likelihood Ratio Test for fixed effects"; run; </span><br /><span class="Apple-style-span" style="color: blue;"> </span><br /><span class="Apple-style-span" style="color: blue;">proc print data=likelihood split='*' noobs; </span><br /><span class="Apple-style-span" style="color: blue;"> var testintRML pvaluemixture; </span><br /><span class="Apple-style-span" style="color: blue;"> format pvaluemixture 6.4; </span><br /><span class="Apple-style-span" style="color: blue;"> where testintRML ne . ; </span><br /><span class="Apple-style-span" style="color: blue;"> label testintRML='likelihood ratio*test statistic* </span><br /><span class="Apple-style-span" style="color: blue;"> comparing*reduced model to full model' </span><br /><span class="Apple-style-span" style="color: blue;"> pvaluemixture='mixture method p-value'; </span><br /><span class="Apple-style-span" style="color: blue;"> title "Mixture method Test for random effects"; </span><br /><span class="Apple-style-span" style="color: blue;"> run; </span><br /><span class="Apple-style-span" style="color: blue;">%mend LRMixtureTest; </span></code><br /><br />這個 macro 裡面綠色的部分完全都是作者的註釋和說明,只有藍色的部分才是真正 SAS 會去執行的程式碼。其中只有四個參數要說明:<br /><br /><ul><li>fullmodel:full model 的 -2 log likelihood function 值</li><li>redmodel:reduced model 的 -2 log likelihood function 值</li><li>DFfull:full model 的自由度</li><li>DFred:reduced model 的自由度</li></ul><br />四個參數設定就和上一段的敘述相同。全部都是必要的參數,缺一不可。因此,只要呼叫 %macro LRMixtureTest 再套上四個參數值,電腦就會幫你做 LRT。我們來看原文內的一個範例。<br /><br />範例模式:<br /><br /><img alt="doodle.png" src="http://rookeryiis1.aviary.com/storage/3472500/3472730_b795_625x625.jpg" /><br /><br />這模式只使用一個因變數叫做 PCTBLK,但他同時是固定效應也是隨機效應。假設這是個 full model,而 reduced model 就設定為拿掉 PCTBLK 固定效應的模式,也就是拿掉上面劃紅線的那一段。<br /><br />因此,先跑兩個程式把那四個要用到的參數算出來:<br /><br /><code><span class="Apple-style-span" style="color: blue;">%include '\\cdc\private\mixture method pvalue macro1.sas';<br />proc mixed data=bhb7.allUS2levge10blk method=ml covtest empirical <br />noclprint ;<br />class state_id;<br />model diffcov=pctblk/ ddfm=contain s;<br />random intercept pctblk /subject=state_id type=ar(1) s ;<br />ods output FitStatistics=fm SolutionF=SFfm ;<br />run;<br /><br />proc mixed data=bhb7.allUS2levge10blk method=ml covtest empirical<br />noclprint;<br />class state_id;<br />model diffcov=/ddfm=contain s;<br />random intercept pctblk /subject=state_id type=ar(1) s;<br />ods output FitStatistics=rm SolutionF=SFrm ;<br />run;</span> </code><br /><br />跑完後,兩個模式的 -2 log likelihood function 值和自由度會被 ods output 輸出到不同的資料裡面,接著使用那個 macro 去把從這四個資料裡面抓數值出來算:<br /><br /><code><span class="Apple-style-span" style="color: blue;">%LRMixtureTest(fullmodel=fm,redmodel=rm,DFfull=SFfm,DFred=SFrm); </span></code><br /><br />最後這個 macro 就會輸出下面這個報表給你:<br /><span class="Apple-style-span" style="color: #4c4c4c; font-family: arial, helvetica, sans-serif; font-size: 12px;"><img id="imgThumbnail" src="http://rookeryiis1.aviary.com/storage/3472500/3472768_226b_rss.jpg" style="border-bottom-style: none; border-bottom-width: 0px; border-color: initial; border-left-style: none; border-left-width: 0px; border-right-style: none; border-right-width: 0px; border-top-style: none; border-top-width: 0px; border-width: initial;" /></span><br /><br />結果顯示 p-value < 0.05,所以 full model 比 reduced model 好。<br /><br />從這個範例中,特別要強調一點是,如果兩個 nested mixed model 差異只有在固定效應時,一開始那兩個 PROC MIXED 執行時,後面的 method option 必須強制設定為 ML,因為如果沒有特別去設定的話,PROC MIXED 預設的參數估計方法是 REML,但這只有在兩個模式的差異是在隨機效應或者是 covariance structure 時才能使用。原文中的第一個範例就是用 REML 再估計參數,因為兩個模式的差別只有在一個 random intercept 而已。<br /><br /><br /><b><span class="Apple-style-span" style="font-size: x-large;">Contact Information: </span></b><br />Barbara Bardenheier, MPH, MA<br />Centers for Disease Control and Prevention<br />1600 Clifton Rd, MS E-52<br />Atlanta, GA 30333<br />Tel : (404) 639-8789<br />Fax : (404) 639-8614<br />Email : <!-- e --><a href="mailto:bfb7@cdc.gov">bfb7@cdc.gov</a><!-- e --><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6268919072942670865-6835902750623542080?l=sugiclub.blogspot.com' alt='' /></div> |
|