|
|
8#

楼主 |
发表于 2010-10-27 18:12:18
|
只看该作者
Re: 多条的观察值排序之优化?
我错了,是tagsort选项,脑袋晕了,就记得那个tag了,呵呵,不好意思。我在PC上给有30亿个观测值的数据集排序也只用了不到60多分钟,整个记录长度大概字长100左右,4个数值变量,3个字符,主要长度都来自那几个字符变量。你这个估计每个记录的字长很长,很适合TAGSORT的要求
The TAGSORT option in the PROC SORT statement is useful in sorts when there may not be enough disk space to sort a large SAS data set. When you specify TAGSORT, the sort is a single-threaded sort. Do not specify TAGSORT if you want the SAS to use multiple threads to sort.
When you specify the TAGSORT option, only sort keys (that is, the variables specified in the BY statement) and the observation number for each observation are stored in the temporary files. The sort keys, together with the observation number, are referred to as tags. At the completion of the sorting process, the tags are used to retrieve the records from the input data set in sorted order. Thus, in cases where the total number of bytes of the sort keys is small compared with the length of the record, temporary disk use is reduced considerably. You should have enough disk space to hold another copy of the data (the output data set) or two copies of the tags, whichever is greater. Note that while using the TAGSORT option may reduce temporary disk use, the processing time may be much higher. However, on PCs with limited available disk space, the TAGSORT option may allow sorts to be performed in situations where they would otherwise not be possible. |
|