a) There is a large file with an ID-Key field and a TYPE field. Both fields are characters. Also, there are extra fields in addition to those two. The extra fields are all numeric. The content of the TYPE field is such that each record contains the name of one and only one of those extra numeric fields.
You are asked to do the following:
For each record, only output the ID-Key, TYPE and the numeric field that bears the name of the content of the TYPE record, such that different numeric fields are output to different file.
b) Suppose that the code in part a) is to be re-used. You are not given the number of numeric fields in this file. Write code to find out the number of numeric fields.
2. Refer to the output files in (1.a). Provide code so that the names of the numeric field in each file are VAR1, VAR2, …etc. The order should be the same as if you sort the TYPE field in ascending order.
3.Refer to the output files in (1.a). You are asked to change all the names of the numeric fields so that they all have the same name. Then append these files into one file so that the order of appending preserves the sorting of the TYPE field by ascending.
4.A large file FILE1 with half a million records has 3 fields, one of them is numeric, and the other two are character. These two character fields index the file so that any pair of values of the character fields occurs at most once. There is another super file BIGFILE with hundreds of million records. This super file has the same two character fields as FILE1. Merge these two files so that only common elements of these two fields are extracted.
Bear in mind that sorting a file with hundreds of million records requires a huge amount of time and that it is advisable not to sort it if not necessary.作者: shiyiming 时间: 2007-1-31 22:56 标题: RE: if you can post a sample of your dataset, that will be helpful to understand your question.作者: shiyiming 时间: 2007-2-1 22:58 标题: re 1和2用macro和proc sql。方法几乎一样。
3用proc append或者proc sql,再加宏。
4没有机会接触这么大的文件,proc sql似乎可以,但不知道这么大的文件行不行,也不知性能如何;或许data step的hash速度更快,对第二个文件建立hash,然后data step处理第一个文件。作者: shiyiming 时间: 2007-2-1 23:50 标题: 第一个 其中q为原数据集,再稍微改一下第二题就可以了。第三题比较简单。
proc sql noprint;;
select n(type) into:number from q;run;
proc sql noprint;
select type into:x1-:x%cmpres(&number)
from q;
quit;