proc sort data=tem.ex out=tem.ex;
by num2 num1;
run;
1一亿条时没有问题,2亿条就出现下面问题。按照附log的提示,需要增加C盘空间。
现在的问题是,需要优化它,因为除了No disk space外,排序花的时间很长,另外需要说明的是该tem.ex里面有重复数据,需要保留。
为了节省时间和空间,我该怎么办?hash可以么?
附log:
ERROR: No disk space is available for the write operation. Filename = C:\DOCUME~1\sss\LOCALS~1\Temp\SAS Temporary
Files\SAS_util000100000F7C_sxlion\ut0F7C000004.utl.
ERROR: Failure while attempting to write page 3390 of sorted run 4.
ERROR: Failure while attempting to write page 20511 to utility file 2.
ERROR: Failure merging sorted runs from utility file 1 to utility file 2 during merge pass 1.
ERROR: Failure encountered during external sort.
ERROR: 执行排序失败。作者: shiyiming 时间: 2010-8-10 21:15 标题: Re: 多条的观察值排序之优化? 两种方法:
1,分段法,分几段后还原。
2,避免产生这种耗时吃硬盘的恶魔数据,扼杀在摇篮里。作者: shiyiming 时间: 2010-10-22 23:00 标题: Re: 多条的观察值排序之优化? to sxlion
do u have many variables? Use NOTAG option
or use INDEX作者: shiyiming 时间: 2010-10-23 21:24 标题: Re: 多条的观察值排序之优化? INDEX
如果有重复观测估计不行吧作者: Qiong 时间: 2010-10-26 17:08 标题: Re: 多条的观察值排序之优化? 如果在local pc run的话,试试用memlib
<!-- m --><a class="postlink" href="http://support.sas.com/resources/papers/proceedings10/070-2010.pdf">http://support.sas.com/resources/papers ... 0-2010.pdf</a><!-- m -->作者: shiyiming 时间: 2010-10-27 10:06 标题: Re: 多条的观察值排序之优化? to oloolo
可否具体说说这两个怎么用,有啥功能?谢谢!作者: shiyiming 时间: 2010-10-27 15:27 标题: Re: 多条的观察值排序之优化? 谢谢各位的回复和关注,当时这个问题,用二楼的方法1暂时解决了,想想绝非Final solution,SAS肯定有绝招。
The TAGSORT option in the PROC SORT statement is useful in sorts when there may not be enough disk space to sort a large SAS data set. When you specify TAGSORT, the sort is a single-threaded sort. Do not specify TAGSORT if you want the SAS to use multiple threads to sort.
When you specify the TAGSORT option, only sort keys (that is, the variables specified in the BY statement) and the observation number for each observation are stored in the temporary files. The sort keys, together with the observation number, are referred to as tags. At the completion of the sorting process, the tags are used to retrieve the records from the input data set in sorted order. Thus, in cases where the total number of bytes of the sort keys is small compared with the length of the record, temporary disk use is reduced considerably. You should have enough disk space to hold another copy of the data (the output data set) or two copies of the tags, whichever is greater. Note that while using the TAGSORT option may reduce temporary disk use, the processing time may be much higher. However, on PCs with limited available disk space, the TAGSORT option may allow sorts to be performed in situations where they would otherwise not be possible.