SAS中文论坛

 找回密码
 立即注册

扫一扫,访问微社区

查看: 9108|回复: 41
打印 上一主题 下一主题

SAS程序员测试(一)

[复制链接]

49

主题

76

帖子

1462

积分

管理员

Rank: 9Rank: 9Rank: 9

积分
1462
楼主
 楼主| 发表于 2004-3-4 01:52:54 | 只看该作者

SAS程序员测试(一)

Here is a small test for your SAS programming skills, which was posted by some one in SAS User Group several years ago.  I did it in my way, but I would like to share it with you all.

There is a Credit Card charging record data set, each line is either a charge or a return.  Charge is a positive number, and return is negative.  Assume that each item will be charged once, so if there is a return, it will be a full refund.  A sample data set looks like the following:

Card_Number               Charge
A123                   14.56
A123                   15.23
A123                   -14.56
A234                   11.12
A234                     3.87
A234                   11.12
A234                   -11.12
A234                     4.86

Since the charge/return pair does not generate revenue, it should be deleted.  So, the cleaned data set should look like the following:

Card_Number              Charge
A123                   15.23
A234                   11.12
A234                     3.87
A234                     4.86

Please write a SAS program to accomplish this task.

You can give yourself a grade based on the following criteria:

60 - your program will solve this example correctly;
70 - your program will not only solve this example correctly, but also  works for extended records following the same logic;
80 - besides meeting the criteria for 70, your program has a clear logical structure and other programmer is able to understand it without raising questions.
90 - clear cut, simple structure, efficient
100 - you can think about it.

If you have interests to do it and have questions about it, spell out.  Good Luck.
回复 支持 反对

使用道具 举报

49

主题

76

帖子

1462

积分

管理员

Rank: 9Rank: 9Rank: 9

积分
1462
沙发
 楼主| 发表于 2004-3-4 21:33:03 | 只看该作者
This is a good question, and I found it helpful to establish a correct way of cracking a large sum of  similar ones.
But it's a little difficult for me---a beginner in data cleaning. I found myself lack of sufficient  knowledge about this module.  <!-- s:oops: --><img src="{SMILIES_PATH}/icon_redface.gif" alt=":oops:" title="Embarassed" /><!-- s:oops: -->
I want to know how to cut in such problems and what I need  to study at the very beginning.
can you give me some advice?
回复 支持 反对

使用道具 举报

49

主题

76

帖子

1462

积分

管理员

Rank: 9Rank: 9Rank: 9

积分
1462
板凳
 楼主| 发表于 2004-3-4 22:18:02 | 只看该作者
I will give people in this group a few days to think about it, and I will have a following up posting to summarize the keys and solutions to this problem.
回复 支持 反对

使用道具 举报

49

主题

76

帖子

1462

积分

管理员

Rank: 9Rank: 9Rank: 9

积分
1462
地板
 楼主| 发表于 2004-3-5 12:30:35 | 只看该作者
我先来一段试试,只是希望每组card_number的count不要超过sas软件最大支持的变量数。很可惜sas只有行数组,没有列数组。这个问题放在别的语言上是个很简单的问题。

data tem;
input Card_Number $4. Charge;
cards;
A123 14.56
A123 15.23
A123 -14.56
A234 11.12
A234 3.87
A234 11.12
A234 -11.12
A234 11.12
A234 4.86
A234 -11.12
;run;
proc transpose data=tem out=tem;
by card_number;
run;
data tem(drop=i j _name_);
set tem;
array arr _numeric_;
do i=1 to dim(arr);
   if arr(i)=. then continue;
   do j=i+1 to dim(arr);
      if arr(j)=. then continue;
      if arr(i)=-arr(j) then do;
         arr(i)=.;
         arr(j)=.;
         leave;
      end;
   end;
end;
run;
proc sql;create table result like tem;quit;

data result(keep=card_number charge);
set tem;
array arr _numeric_;
do over arr;
   if arr^=. then do;
      card_number=card_number;
      charge=arr;
      output;
   end;
end;
run;
proc print data=result;run;
回复 支持 反对

使用道具 举报

49

主题

76

帖子

1462

积分

管理员

Rank: 9Rank: 9Rank: 9

积分
1462
5#
 楼主| 发表于 2004-3-5 19:35:13 | 只看该作者

another try

/* Step 1: prepare source data, thanks to willon for building the code */
data source;
input Card_Number $4. Charge;
cards;
A123 14.56
A123 15.23
A123 -14.56
A234 11.12
A234 3.87
A234 11.12
A234 -11.12
A234 11.12
A234 4.86
A234 -11.12
;
run;

/* Step2: basic transformations, e.g. separete charge into sign and abs value */

data v_stage / view=v_stage;
set source;
flag = sign(charge);
charge = abs(charge);
run;

/* Step3: summary of signs by card_number &amp; abs(charge) */

proc summary data=v_stage nway;
class card_number charge;
var flag;
output out=out sum=flag;
run;

/* Step4: build the final result, elminate the charge-return pair by excluding those groups having 0 for summary of signs */
/* modified for handling the return transaction exceptions */

data result(keep=card_number charge);
set out(where=(flag ne 0));
charge = charge * sign(flag);
do i = 1 to abs(flag);
        output;
end;
run;


assumption: there's no charge=0 transaction.

extendability consideration:
no software limit for either the number of cards or the frequency of charges.
no specific requirement for the sort order of the source data.

performance consideration:
step2 only built a data step view, will keep the constent cost no matter hwo large the source data is.

step3 is the most time-consuming one, but the cost will be linear with the data volumn, assuming there's no harware bottleneck.

step4 is a simple one pass full table scan. the cost is quite simliar to that of copying the dataset.

just list my code for simple performance testing as following, you can change the number of cards, freq of charges, and the freq ration of returns/total charges:

%let num_of_card = 1000;
%let avg_transaction = 1000;
%let return_ratio = 0.2;

data source;
do i = 1 to &amp;num_of_card;
        card_number = "A"||put(i,z10.);
        do j = 1 to &amp;avg_transaction;
                charge = 1000*uniform(0) + 1;
                output;
                if (rantbl(-1, &amp;return_ratio ) = 1) then do;
                        charge = 0 - charge;
                        output;
                end;
        end;
end;
drop i j;
run;
回复 支持 反对

使用道具 举报

49

主题

76

帖子

1462

积分

管理员

Rank: 9Rank: 9Rank: 9

积分
1462
6#
 楼主| 发表于 2004-3-5 21:15:44 | 只看该作者
Thank you all who were interested in working out this homework assignment.  Both Willon and SAS_Dream did excellent jobs.  I will skip most of the complimentary stuff and go to criticism directly, since it might be something getting more attention.

The logics of Willon’s program is perfect, it works with extended data.  The problem is the efficiency.  I do not want to discuss if the transpose and data array is necessary, the program has a DO loop within another DO loop.  Technically, it promoted the complexity of the problem of size N to N**2.  In case N is large, it can be time consuming.

I like the idea of SAS_Dream. In fact, it is very similar to the way I used to solve it, but it is not in line with conventional logic reasoning.  Small comments, what did you do with that View, does the Retain statement have any function.  Besides, if you add a couple of more negative value records, you will find that the program will not work, but the Willon’s program will.  It is arguable from the original assignment, people know that only purchase can get refund.  But, sometimes you can see some exceptions, do not consider it to be criticism.   

I hope more people will try this homework, and get fun with SAS programming.
回复 支持 反对

使用道具 举报

49

主题

76

帖子

1462

积分

管理员

Rank: 9Rank: 9Rank: 9

积分
1462
7#
 楼主| 发表于 2004-3-5 22:10:35 | 只看该作者
hello xic:

   welcome to visit ww.sasor.com

sincerely
qkaiwei
回复 支持 反对

使用道具 举报

49

主题

76

帖子

1462

积分

管理员

Rank: 9Rank: 9Rank: 9

积分
1462
8#
 楼主| 发表于 2004-3-5 22:41:12 | 只看该作者

modification

Thanks xic for your comment, I modified the logic for handling some exceptions.

/* Step4: build the final result, elminate the charge-return pair by excluding those groups having 0 for summary of signs */
/* modified for handling the return transaction exceptions */

data result(keep=card_number charge);
set out(where=(flag ne 0));
charge = charge * sign(flag);
do i = 1 to abs(flag);
        output;
end;
run;


in Step 2, using view in stead of sas data set is just for efficiency, it not necessary to rewrite the whole dataset only for creating two variables.
the retain statement is just personal programming habit, though without impact here.

there's also extendability issue for proc transpose, you may or may not know, SAS dataset has physical limit for number of variables, on windows, it is 32767, which means in your case, a card number with more than 32767 transactions won't work. P.S. transpose require the input dataset has been sorted by the axis variable.

it is also applicable to use proc sql to generate the summary, but again, from efficiency point of view, at least in SAS 8, proc summary is more optimized than proc sql sum
回复 支持 反对

使用道具 举报

49

主题

76

帖子

1462

积分

管理员

Rank: 9Rank: 9Rank: 9

积分
1462
9#
 楼主| 发表于 2004-3-8 12:37:20 | 只看该作者
The attached is the program I wrote for this problem.  It is very similar to the idea of SAS_Dreamer.  

It is for sure that there are many ways to solve the problem, I would like more people have a try.  Here are some hints for people in different levels.

For novice SAS users, the basic idea is that you can split the data sets into two, one with positive charge, and one with negative charge, then merge them back by card_number and absolute value of charges.  Please pay attention that the merge is not direct forward, since it may involve many to many merge.  (people may purchase multiple items with same price and return multiple items with same price.)  To handle this merge correctly is the key to solve the problem.

For experienced SAS users, you may try to solve the problem strictly withiin one data step using data manipulation skills, without any SAS procedures.  I hope you will find it enjoyable to work it out, surely it is possible and some ones have worked it out.
=========================

data charge;                 
input Card_Number $ Charge;  
abscharge=abs(charge);      
datalines;                  
A123 14.56                    
A123 15.23                  
A123 -14.56                  
A234 11.12                    
A234 3.87                    
A234 11.12                  
A234 -11.12                 
A234 4.86                    
;                           
run;
                    
proc sql;                  
create table chg as         
select card_number, abscharge, sum(charge)/abscharge as cnt
from charge                  
group by card_number, abscharge;  
quit;
                    
data charge(drop=abscharge i cnt count);         
set chg;                    
count=abs(cnt);              
do i=1 to count;            
  charge=sign(cnt)*abscharge;   
  output;                  
end;                     
run;
回复 支持 反对

使用道具 举报

49

主题

76

帖子

1462

积分

管理员

Rank: 9Rank: 9Rank: 9

积分
1462
10#
 楼主| 发表于 2004-3-25 08:35:21 | 只看该作者

我也来试试。

不好意思,我刚开始学SAS,也来试试。请大侠们多多指教~~
我的思路是在不改变原来顺序的前提下,只用data step 语句,将每一次的return对冲离这个最近的charge,最后得到一个时间序列。


data  charge;
input Card_Number $4. Charge;
cards;
A123 14.56
A123 15.23
A123 -14.56
A234 11.12
A234 3.87
A234 11.12
A234 -11.12
A234 11.12
A234 4.86
A234 -11.12
;
run;
data charge(drop=tt);
set charge;
retain time 0 tt 'xxxx';
if tt=Card_Number then do;
time=time+1;
tt=Card_Number;
end;
else do;
time=1;
tt=Card_Number;
end;
run;

data result;
set charge;
where charge&gt;0;
run;
data tem2;
set charge;
where charge&lt;0;
retain nn 0;
nn=nn+1;
run;

%macro filter;
%let dsid =%sysfunc(open(tem2));
%let nobs =%sysfunc(attrn(&amp;dsid,NOBS));
%let dsid1=%sysfunc(close(&amp;dsid));

%do i=1 %to &amp;nobs;
%let stop=0;
data _null_;
set tem2;
where nn=&amp;i;
call symput('Card_Number',Card_Number);
call symput('charge',charge);
call symput('time',time);
run;

%do %while (&amp;stop=0 and &amp;time&gt;=1);
%let time=&amp;time-1;
data _null_;
set result;
where Card_Number="&amp;Card_Number" and time=&amp;time;
sum=charge+&amp;charge;
if sum=0 then call symput('stop',1);
run;

%if &amp;stop=1 %then %do;
data result;
set result;
if Card_Number="&amp;Card_Number" and time=&amp;time then delete;
run;
%end;
%end;
%let stop=0;
%end;
data result(drop=time);
set result;
run;
%mend filter;

%filter;
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

QQ|小黑屋|手机版|Archiver|SAS中文论坛  

GMT+8, 2026-2-5 19:38 , Processed in 0.072702 second(s), 20 queries .

Powered by Discuz! X3.2

© 2001-2013 Comsenz Inc.

快速回复 返回顶部 返回列表