标题: change point models [打印本页] 作者: shiyiming 时间: 2010-10-19 21:52 标题: change point models 各位老师:
请教一个问题,就是关于change point models 这个统计模型的,中文意思应该是“变点分析”。比如说:某中疾病D,现在有一种新的指标A,A是一种连续性变量,当A<K时,疾病D的患病率较低,且不随A值的变化而变化。但当A>=K时,疾病D的患病率随连续性变量A的升高而升高,因此“K”就是连续性变量A的一个拐点,change point model 应该就是寻找拐点的一种统计模式,下面是change point model的程序。
想请教各位的是:下面的语句中有“sgplot”,我是9.1版本的不能用sgplot语句,只有9.2才能用,不知道有没有变通的办法可以在9.1中使用change point model?
期待各位老师的帮助.
不胜感激!
The MCMC Procedure
Example 52.6 Change Point Models
Consider the data set from Bacon and Watts (1971), where is the logarithm of the height of the stagnant surface layer and the covariate is the logarithm of the flow rate of water. The following statements create the data set:
title 'Change Point Model';
data stagnant;
input y x @@;
ind = _n_;
datalines;
1.12 -1.39 1.12 -1.39 0.99 -1.08 1.03 -1.08
0.92 -0.94 0.90 -0.80 0.81 -0.63 0.83 -0.63
0.65 -0.25 0.67 -0.25 0.60 -0.12 0.59 -0.12
0.51 0.01 0.44 0.11 0.43 0.11 0.43 0.11
0.33 0.25 0.30 0.25 0.25 0.34 0.24 0.34
0.13 0.44 -0.01 0.59 -0.13 0.70 -0.14 0.70
-0.30 0.85 -0.33 0.85 -0.46 0.99 -0.43 0.99
-0.65 1.19
;
A scatter plot (Output 52.6.1) shows the presence of a nonconstant slope in the data. This suggests a change point regression model (Carlin, Gelfand, and Smith; 1992). The following statements generate the scatter plot in Output 52.6.1:
proc sgplot data=stagnant;
scatter x=x y=y;
run;
Output 52.6.1 Scatter Plot of the Stagnant Data Set
Let the change point be cp. Following formulation by Spiegelhalter et al. (1996), the regression model is as follows:
You might consider the following diffuse prior distributions:
The following statements generate Output 52.6.2:
proc mcmc data=stagnant outpost=postout seed=24860 ntu=1000
nmc=20000;
ods select PostSummaries;
ods output PostSummaries=ds;
j = 1 + (x >= cp);
mu = alpha + beta[j] * (x - cp);
model y ~ normal(mu, var=s2);
run;
The PROC MCMC statement specifies the input data set (stagnant), the output data set (postout), a random number seed, a tuning sample of 1000, and an MCMC sample of 20000. The ODS SELECT statement displays only the summary statistics table. The ODS OUTPUT statement saves the summary statistics table to the data set ds.
The ARRAY statement allocates an array of size 2 for the beta parameters. You can use beta1 and beta2 as parameter names without allocating an array, but having the array makes it easier to construct the likelihood function. The two PARMS statements put the five model parameters in two blocks. The three PRIOR statements specify the prior distributions for these parameters.
The symbol j indicates the segment component of the regression. When x is less than the change point, (x >= cp) returns 0 and j is assigned the value 1; if x is greater than or equal to the change point, (x >= cp) returns 1 and j is 2. The symbol mu is the mean for the jth segment, and beta[j] changes between the two regression coefficients depending on the segment component. The MODEL statement assigns the normal model to the response variable y.
Posterior summary statistics are shown in Output 52.6.2.
Output 52.6.2 MCMC Estimates of the Change Point Regression Model
Change Point Model
You can use PROC SGPLOT to visualize the model fit. Output 52.6.3 shows the fitted regression lines over the original data. In addition, on the bottom of the plot is the kernel density of the posterior marginal distribution of cp, the change point. The kernel density plot shows the relative variability of the posterior distribution on the data plot. You can use the following statements to create the plot:
data _null_;
set ds;
call symputx(parameter, mean);
run;
data b;
missing A;
input x1 @@;
if x1 eq .A then x1 = &cp;
if _n_ <= 2 then y1 = &alpha + &beta1 * (x1 - &cp);
else y1 = &alpha + &beta2 * (x1 - &cp);
datalines;
-1.5 A 1.2
;
proc kde data=postout;
univar cp / out=m1 (drop=count);
run;
data m1;
set m1;
density = (density / 25) - 0.653;
run;
data all;
set stagnant b m1;
run;
proc sgplot data=all noautolegend;
scatter x=x y=y;
series x=x1 y=y1 / lineattrs = graphdata2;
series x=value y=density / lineattrs = graphdata1;
run;
The macro variables &alpha, &beta1, &beta2, and &cp store the posterior mean estimates from the data set ds. The data set predicted contains three predicted values, at the minimum and maximum values of x and the estimated change point &cp. These input values give you fitted values from the regression model. Data set m1 contains the kernel density estimates of the parameter cp. The density is scaled down so the curve would fit in the plot. Finally, you use PROC SGPLOT to overlay the scatter plot, regression line and kernel density plots in the same graph.
Output 52.6.3 Estimated Fit to the Stagnant Data Set
Example 44.9: Saving and Using Parameters for MCMC
This example uses the MCMC method with multiple chains as specified in Example 44.6. It saves the parameter values used for each imputation in an output data set of type EST called miest. This output data set can then be used to impute missing values in other similar input data sets. The following statements invoke the MI procedure and specify the MCMC method with multiple chains to create three imputations.
proc mi data=FitMiss seed=21355417 nimpute=6 mu0=50 10 180 ;
mcmc chain=multiple initial=em outest=miest;
var Oxygen RunTime RunPulse;
run;
The following statements list the parameters used for the imputations in Output 44.9.1. Note that the data set includes observations with _TYPE_=`SEED' containing the seed to start the next random number generator.
proc print data=miest(obs=15);
title 'Parameters for the Imputations';
run;
Output 44.9.1: OUTEST Data Set
Parameters for the Imputations
The following statements invoke the MI procedure and use the INEST= option in the MCMC statement.
proc mi data=FitMiss;
mcmc inest=miest;
var Oxygen RunTime RunPulse;
run;
Output 44.9.2: Model Information
The MI Procedure
Model Information
Data Set WORK.FITMISS
Method MCMC
INEST Data Set WORK.MIEST
Number of Imputations 6
The "Model Information" table shown in Output 44.9.2 describes the method used in the multiple imputation process. The remaining tables for the example are identical to the tables in Output 44.6.2, Output 44.6.4, Output 44.6.5, and Output 44.6.6 in Example 44.6.
×××××××××××××××××××××××××××××××××××××××
proc gplot data=gy;
plot sbp*age;
run;
quit;作者: shiyiming 时间: 2010-10-21 21:09 标题: Re: change point models 继续求助,请相助。作者: shiyiming 时间: 2010-10-22 12:13 标题: Re: change point models 在这个例子里,绘图只是把结果图示化而已。和这个模型应没有很大的关系吧。
因为是9.1,sgplot不可用。可以考虑用gplot的overlay代替。数据可以保持不变。另外,proc kde 的univar statement很可能在9.1里(或者更早)是var statement,但是9.1.3应该是一样的。在9.1.3 sas开始Bayesian 分析:proc mcmc即是其一(虽然迄今仍为实验的过程)。
这个例子里change point model适用于一个拐点的情形。如果超过一个点,恐怕在分段定义likelihood的时候不能简单照搬硬套吧。
我一直把这种方法叫piecewise linear model但有时候混淆于nonparametric spline(一阶)的叫法。如果称为change point model就要清晰一些。不过拐点的选择和确立历来就是其难点和颇有争论之处。我以前用过loess来估计(差不多是主观来决定),看样子用此例方法很不错。作者: shiyiming 时间: 2010-10-24 10:06 标题: Re: change point models to[b:dl0g5m6y] jingju11 [/b:dl0g5m6y]:
可否进一步请教你啊作者: shiyiming 时间: 2010-10-25 02:25 标题: Re: change point models Hi, that is all what I know about the model. So what do you want to know? JingJu作者: shiyiming 时间: 2010-10-30 17:39 标题: Re: change point models 问题没有解决,继续求助各位老师帮忙指导,不胜感激。作者: shiyiming 时间: 2010-11-13 14:59 标题: Re: change point models 希望高手帮帮忙啊,快一个月还没能得到解决啊