|
楼主

楼主 |
发表于 2011-3-28 10:33:51
|
只看该作者
Ten ways to build a wrong scoring model(转载)
From supersasmacro's blog on Sina
<div>Ten ways to build a wrong scoring model</DIV>
<div><br /></DIV>
<div> </DIV>
<div><br /></DIV>
<div>Some ways to build a wrong scoring model are below- The author
doesn’t take any guarantee if your modeling team is using one of
these and still getting a correct model.</DIV>
<div><br /></DIV>
<div>1) Over fit the model to the sample. This over fitting can be
checked by taking a random sample again and fitting the scoring
equation and compared predicted conversion rates versus actual
conversion rates. The over fit model does not rank order deciles
with lower average probability may show equal or more conversions
than deciles with higher probability scores.</DIV>
<div><br /></DIV>
<div>2) Choose non random samples for building and validating the
scoring equation. Read over fitting above.</DIV>
<div><br /></DIV>
<div>3) Use Multicollinearity
(<!-- m --><a class="postlink" href="http://en.wikipedia.org/wiki/Multicollinearity">http://en.wikipedia.org/wiki/Multicollinearity</a><!-- m --> ) without business
judgment to remove variables which may make business sense.Usually
happens a few years after you studied and forgot
Multicollinearity.</DIV>
<div><br /></DIV>
<div>If you don't know the difference between Multicollinearity ,
Heteroskedasticity <!-- m --><a class="postlink" href="http://en.wikipedia.org/wiki/Heteroskedasticity">http://en.wikipedia.org/wiki/Heteroskedasticity</a><!-- m -->
this could be the real deal breaker for you</DIV>
<div><br /></DIV>
<div>4) Using legacy codes for running scoring usually with step
wise forward and backward regression .Happens
usually on Fridays and when in a hurry to make models.</DIV>
<div><br /></DIV>
<div>5) Ignoring signs or magnitude of parameter estimates ( that's
the output or the weightage of the variable in the equation).</DIV>
<div><br /></DIV>
<div>6) Not knowing the difference between Type 1 and Type 2 error
especially when rejecting variables based on P value. ( Not knowing
P value means you may kindly stop reading and click the You Tube
video in the right margin )</DIV>
<div><br /></DIV>
<div>7) Excessive zeal in removing variables. Why ? Ask yourself
this question every time you are removing a variable.</DIV>
<div><br /></DIV>
<div> Using the wrong causal event (like mailings
for loans) for predicting the future with scoring model (for
mailings of deposit accounts) . or using the right causal event in
the wrong environment ( rapid decline/rise of sales due to factors
not present in model like competitor entry/going out of business
,oil prices, credit shocks sob sob sigh)</DIV>
<div><br /></DIV>
<div>9) Over fitting</DIV>
<div><br /></DIV>
<div>10) Learning about creating models from blogs and not
reading and refreshing your old statistics
textbooks</DIV> |
|