标题: SAS dataset declassified by Matt Shotwell [打印本页] 作者: shiyiming 时间: 2011-7-28 06:09 标题: SAS dataset declassified by Matt Shotwell From Dapangmao's blog on sas-analysis
<div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-N3A2XRONlzA/TjCD3WGLZLI/AAAAAAAAAtA/g6RPaP071N8/s1600/Presentation1.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="300" src="http://3.bp.blogspot.com/-N3A2XRONlzA/TjCD3WGLZLI/AAAAAAAAAtA/g6RPaP071N8/s400/Presentation1.jpg" width="400" /></a></div><br />
<a href="http://biostat.mc.vanderbilt.edu/wiki/main/MattShotwell">Matt Shotwell</a>’s new R package ‘<a href="http://biostatmatt.com/archives/1468">sas7bdat</a>’ is a great achievement to bridge SAS and R. Earlier this year Revolution R, a commercial competitor against SAS, launched a <a href="http://www.revolutionanalytics.com/news-events/news-room/2011/revolution-analytics-unlocks-sas-data-for-r.php">RxSasData() function</a> to read SAS’s unique ‘dataset’ data structure. However, we more like the free lunch provided by the community R. <br />
<br />
Now R would have a free access toward SAS’s datasets, including many SAS’s own help datasets. And we will be able to do a lot of tricks toward SAS’s datasets powered by R, in many areas where SAS can’t reach or we didn’t pay the licenses. For example, SAS has a <a href="http://support.sas.com/documentation/cdl/en/graphref/62458/HTML/default/viewer.htm#p164nq1v7qwmp9n19ac0wbpc4kyh.htm">SASHELP.LAKE </a>dataset to show the surface plot feature. We can use R to directly read it and draw a picture combining a contour plot and a surface plot.<br />
<br />
<pre style="background-color: #ebebeb; border: 1px dashed rgb(153, 153, 153); color: #000001; font-size: 14px; line-height: 14px; overflow: auto; padding: 5px; width: 100%;"><code>
library('sas7bdat', 'lattice')
x = read.sas7bdat('c:/program files/sas/sasfoundation/9.2/graph/sashelp/lake.sas7bda')
panel.3d.contour <-
function(x, y, z, rot.mat, distance,
nlevels = 20, zlim.scaled, ...){
add.line <- trellis.par.get("add.line")
panel.3dwire(x, y, z, rot.mat, distance,
zlim.scaled = zlim.scaled, ...)
clines <-
contourLines(x, y, matrix(z, nrow = length(x), byrow = TRUE),
nlevels = nlevels)
for (ll in clines) {
m <- ltransform3dto3d(rbind(ll$x, ll$y, zlim.scaled[2]),
rot.mat, distance)
panel.lines(m[1,], m[2,], col = add.line$col,
lty = add.line$lty, lwd = add.line$lwd)
}
}
</code></pre><br />
I also had a little test to evaualate the speed of the read.sas7bdat() function. Reading the SAS dataset SASHELP.LAKE 30 times only took 1.64 second on my 3-yea-old desktop, which is certainly much faster than transforming it to a CSV file and inputting. <br />
<pre style="background-color: #ebebeb; border: 1px dashed rgb(153, 153, 153); color: #000001; font-size: 14px; line-height: 14px; overflow: auto; padding: 5px; width: 100%;"><code>
library('sas7bdat')
test <- function(n = 30) {
system.time(
for(i in 1:n)
read.sas7bdat('</code><code>c:/program files/sas/sasfoundation/9.2/graph/sashelp/lake.sas7bda</code><code>')
)
}
test()
> user system elapsed
1.60 0.05 1.64
</code></pre>Hope Matt continues to improve this wonderful package:<br />
1. add the support for SAS datasets generated by 64bit systems;<br />
2. add a write.sas7bdat() function (that will be so cool!).<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3256159328630041416-789618718739158108?l=www.sasanalysis.com' alt='' /></div><img src="http://feeds.feedburner.com/~r/SasAnalysis/~4/15kwMdEI6RU" height="1" width="1"/>