|
7#

楼主 |
发表于 2004-12-29 04:36:18
|
只看该作者
Handout
This is from the handout of my course. Hope it is useful for you.
************************************************************;
* Hotelling T^2-test for two samples:
*
* Assume that we are given two d-dimensional VECTOR-VALUED samples
*
* X1 X2 X3 ... Xm where Xi are independent N(muX,Sigma)
*
* Y1 Y2 Y3 ... Yn where Yj are independent N(muY,Sigma)
*
* and we want to test
*
* H0:muX=muY for VECTOR-VALUED means muX and muY
*
* If d=1, we could try the classical 2-sample t-test, which uses the
* statistic
*
* t = Root((mn/(m+n)) (Ybar-Xbar)/Root(s^2) where
*
* s^2 = (1/(m+n-2))(Sum_i (Xi-Xbar)^2 + Sum_j (Yj-Ybar)^ )
*
* Here s^2 is the POOLED VARIANCE of the two samples and Ybar and
* Xbar are the two sample means. The factor mn/(m+n) comes from the
* identity Var(Ybar-Xbar)=Var(Ybar)+Var(Xbar)=((1/n)+(1/m))sigma^2.
*
* Given H0:muX=muY and d=1, t has a Student's t distribution with
* L=m+n-2 degrees of freedom.
*
* For vector-valued data, a possible generalization of t (or t^2) is
* the quadratic form
*
* T^2 = mn/(m+n) (Ybar-Xbar)'S^{-1}(Ybar-Xbar) (*)
*
* where
*
* S = (1/(m+n-2)) SumXY for
*
* SumXY = Sum (Xi-Xbar)(Xi-Xbar)' + Sum (Yj-Ybar)(Yj-Ybar)'
*
* Here S is the pooled covariance matrix of the two samples, which
* came up in discriminant analysis (dferns.sas), but T2 is a number.
* If d=1. T2=t^2. The dxd matrix S replaces s^2 in the univariate
* case.
*
* Exactly as in the univariate case, we can rotate Xi,Yj (more
* precisely, we rotate the rows of the nxd matrix X and the rows of
* the mxd matrix Y) so that
*
* SumXY = Sum(k=1,k=L) Zk Zk' (W)
*
* where L=m+n-2 and Z1,..ZL are independent N(0,Sigma).
*
* A random dxd matrix with the distribution (W) is said to have a
* WISHART DISTRIBUTION with parameters Sigma and L (symbolically,
* SumXY is W(Sigma,L)).
*
* Also as in the univariate case, the random vector
*
* Q = Root(mn/(m+n)) (Ybar-Xbar) is N(muY-muX,Sigma)
*
* and is independent of the rotated Z1,Z2,...,ZL. In general,
* if Q,Z1,Z2,..,ZL are independent N(0,Sigma), the random variable
*
* T2 = Q'((1/L)(Sum(k=1,k=L) ZkZk'))^{-1}Q (**)
*
* is said to have a HOTELLING T2 DISTRIBUTION (T2(d,L)). If d=1,
* note that this is the square of a Student-t distribution.
*
* Since N(0,Sigma) has the same distribution as Sigma^{1/2}N(0,I_d),
* the distribution of T2 in (**) does not depend on Sigma.
* (Exercise: Prove this.)
*
* Thus, given H0:muY=muX, The statistic T2 defined in (*) for the
* two d-dimensional samples Xi,Yj has distribution T2(d,m+n-2).
*
* Amazingly enough, if d<=L, T2(d,L) can be expressed in terms of
* F distributions:
*
* T2(d,L) has the same distr. as (L*d)/(L-d+1) F(d,L-d+1)
*
* This makes P-values for the Hotelling T^2 test easy to find. Note
* that if d=1 the right-hand side above reduces to F(1,L), which is
* the square of Student t distribution with L degrees of freedom.
* If d>L, the matrix in (W) cannot be invertible (Exercise: prove
* this), so that T2(d,L) cannot be defined if d>L.
*
* SAS NOTES FOR THE PROGRAM BELOW:
* (1) `infile' says to read from a file, not from a datalines block
* Add an explicit path to `infile' if needed.
* (2) `firstobs' tells say to ignore the first `firstobs'-1 lines
* of `infile', which are in fact comments in this case.
* (3) `proc format' allows you to attach descriptive text to the
* VALUES of a variable, while `label' does the same for the
* NAMES of variables.
************************************************************;
proc glm;
title3 'HOTELLING T^2 TEST FOR TWO LIZARD CLASSES';
class Type;
model lnMass lnSVL = Type / nouni;
manova h = Type / printe;
run;
OUTPUT:
MANOVA Test Criteria and Exact F Statistics
for the Hypothesis of No Overall Type Effect
H = Type III SSCP Matrix for Type
E = Error SSCP Matrix
S=1 M=0 N=27.5
Statistic Value F Value Num DF Den DF Pr > F
Wilks' Lambda 0.20266269 112.13 2 57 <.0001
Pillai's Trace 0.79733731 112.13 2 57 <.0001
[color=red:a1e9a]Hotelling-Lawley Trace 3.93430737 112.13 2 57 <.0001[/color:a1e9a]
Roy's Greatest Root 3.93430737 112.13 2 57 <.0001 |
|