SAS中文论坛

标题: SAS程序员测试(一) [打印本页]

作者: shiyiming    时间: 2004-3-4 01:52
标题: SAS程序员测试(一)
Here is a small test for your SAS programming skills, which was posted by some one in SAS User Group several years ago.  I did it in my way, but I would like to share it with you all.

There is a Credit Card charging record data set, each line is either a charge or a return.  Charge is a positive number, and return is negative.  Assume that each item will be charged once, so if there is a return, it will be a full refund.  A sample data set looks like the following:

Card_Number               Charge
A123                   14.56
A123                   15.23
A123                   -14.56
A234                   11.12
A234                     3.87
A234                   11.12
A234                   -11.12
A234                     4.86

Since the charge/return pair does not generate revenue, it should be deleted.  So, the cleaned data set should look like the following:

Card_Number              Charge
A123                   15.23
A234                   11.12
A234                     3.87
A234                     4.86

Please write a SAS program to accomplish this task.

You can give yourself a grade based on the following criteria:

60 - your program will solve this example correctly;
70 - your program will not only solve this example correctly, but also  works for extended records following the same logic;
80 - besides meeting the criteria for 70, your program has a clear logical structure and other programmer is able to understand it without raising questions.
90 - clear cut, simple structure, efficient
100 - you can think about it.

If you have interests to do it and have questions about it, spell out.  Good Luck.
作者: shiyiming    时间: 2004-3-4 21:33
This is a good question, and I found it helpful to establish a correct way of cracking a large sum of  similar ones.
But it's a little difficult for me---a beginner in data cleaning. I found myself lack of sufficient  knowledge about this module.  <!-- s:oops: --><img src="{SMILIES_PATH}/icon_redface.gif" alt=":oops:" title="Embarassed" /><!-- s:oops: -->
I want to know how to cut in such problems and what I need  to study at the very beginning.
can you give me some advice?
作者: shiyiming    时间: 2004-3-4 22:18
I will give people in this group a few days to think about it, and I will have a following up posting to summarize the keys and solutions to this problem.
作者: shiyiming    时间: 2004-3-5 12:30
我先来一段试试,只是希望每组card_number的count不要超过sas软件最大支持的变量数。很可惜sas只有行数组,没有列数组。这个问题放在别的语言上是个很简单的问题。

data tem;
input Card_Number $4. Charge;
cards;
A123 14.56
A123 15.23
A123 -14.56
A234 11.12
A234 3.87
A234 11.12
A234 -11.12
A234 11.12
A234 4.86
A234 -11.12
;run;
proc transpose data=tem out=tem;
by card_number;
run;
data tem(drop=i j _name_);
set tem;
array arr _numeric_;
do i=1 to dim(arr);
   if arr(i)=. then continue;
   do j=i+1 to dim(arr);
      if arr(j)=. then continue;
      if arr(i)=-arr(j) then do;
         arr(i)=.;
         arr(j)=.;
         leave;
      end;
   end;
end;
run;
proc sql;create table result like tem;quit;

data result(keep=card_number charge);
set tem;
array arr _numeric_;
do over arr;
   if arr^=. then do;
      card_number=card_number;
      charge=arr;
      output;
   end;
end;
run;
proc print data=result;run;
作者: shiyiming    时间: 2004-3-5 19:35
标题: another try
/* Step 1: prepare source data, thanks to willon for building the code */
data source;
input Card_Number $4. Charge;
cards;
A123 14.56
A123 15.23
A123 -14.56
A234 11.12
A234 3.87
A234 11.12
A234 -11.12
A234 11.12
A234 4.86
A234 -11.12
;
run;

/* Step2: basic transformations, e.g. separete charge into sign and abs value */

data v_stage / view=v_stage;
set source;
flag = sign(charge);
charge = abs(charge);
run;

/* Step3: summary of signs by card_number &amp; abs(charge) */

proc summary data=v_stage nway;
class card_number charge;
var flag;
output out=out sum=flag;
run;

/* Step4: build the final result, elminate the charge-return pair by excluding those groups having 0 for summary of signs */
/* modified for handling the return transaction exceptions */

data result(keep=card_number charge);
set out(where=(flag ne 0));
charge = charge * sign(flag);
do i = 1 to abs(flag);
        output;
end;
run;


assumption: there's no charge=0 transaction.

extendability consideration:
no software limit for either the number of cards or the frequency of charges.
no specific requirement for the sort order of the source data.

performance consideration:
step2 only built a data step view, will keep the constent cost no matter hwo large the source data is.

step3 is the most time-consuming one, but the cost will be linear with the data volumn, assuming there's no harware bottleneck.

step4 is a simple one pass full table scan. the cost is quite simliar to that of copying the dataset.

just list my code for simple performance testing as following, you can change the number of cards, freq of charges, and the freq ration of returns/total charges:

%let num_of_card = 1000;
%let avg_transaction = 1000;
%let return_ratio = 0.2;

data source;
do i = 1 to &amp;num_of_card;
        card_number = "A"||put(i,z10.);
        do j = 1 to &amp;avg_transaction;
                charge = 1000*uniform(0) + 1;
                output;
                if (rantbl(-1, &amp;return_ratio ) = 1) then do;
                        charge = 0 - charge;
                        output;
                end;
        end;
end;
drop i j;
run;
作者: shiyiming    时间: 2004-3-5 21:15
Thank you all who were interested in working out this homework assignment.  Both Willon and SAS_Dream did excellent jobs.  I will skip most of the complimentary stuff and go to criticism directly, since it might be something getting more attention.

The logics of Willon’s program is perfect, it works with extended data.  The problem is the efficiency.  I do not want to discuss if the transpose and data array is necessary, the program has a DO loop within another DO loop.  Technically, it promoted the complexity of the problem of size N to N**2.  In case N is large, it can be time consuming.

I like the idea of SAS_Dream. In fact, it is very similar to the way I used to solve it, but it is not in line with conventional logic reasoning.  Small comments, what did you do with that View, does the Retain statement have any function.  Besides, if you add a couple of more negative value records, you will find that the program will not work, but the Willon’s program will.  It is arguable from the original assignment, people know that only purchase can get refund.  But, sometimes you can see some exceptions, do not consider it to be criticism.   

I hope more people will try this homework, and get fun with SAS programming.
作者: shiyiming    时间: 2004-3-5 22:10
hello xic:

   welcome to visit ww.sasor.com

sincerely
qkaiwei
作者: shiyiming    时间: 2004-3-5 22:41
标题: modification
Thanks xic for your comment, I modified the logic for handling some exceptions.

/* Step4: build the final result, elminate the charge-return pair by excluding those groups having 0 for summary of signs */
/* modified for handling the return transaction exceptions */

data result(keep=card_number charge);
set out(where=(flag ne 0));
charge = charge * sign(flag);
do i = 1 to abs(flag);
        output;
end;
run;


in Step 2, using view in stead of sas data set is just for efficiency, it not necessary to rewrite the whole dataset only for creating two variables.
the retain statement is just personal programming habit, though without impact here.

there's also extendability issue for proc transpose, you may or may not know, SAS dataset has physical limit for number of variables, on windows, it is 32767, which means in your case, a card number with more than 32767 transactions won't work. P.S. transpose require the input dataset has been sorted by the axis variable.

it is also applicable to use proc sql to generate the summary, but again, from efficiency point of view, at least in SAS 8, proc summary is more optimized than proc sql sum
作者: shiyiming    时间: 2004-3-8 12:37
The attached is the program I wrote for this problem.  It is very similar to the idea of SAS_Dreamer.  

It is for sure that there are many ways to solve the problem, I would like more people have a try.  Here are some hints for people in different levels.

For novice SAS users, the basic idea is that you can split the data sets into two, one with positive charge, and one with negative charge, then merge them back by card_number and absolute value of charges.  Please pay attention that the merge is not direct forward, since it may involve many to many merge.  (people may purchase multiple items with same price and return multiple items with same price.)  To handle this merge correctly is the key to solve the problem.

For experienced SAS users, you may try to solve the problem strictly withiin one data step using data manipulation skills, without any SAS procedures.  I hope you will find it enjoyable to work it out, surely it is possible and some ones have worked it out.
=========================

data charge;                 
input Card_Number $ Charge;  
abscharge=abs(charge);      
datalines;                  
A123 14.56                    
A123 15.23                  
A123 -14.56                  
A234 11.12                    
A234 3.87                    
A234 11.12                  
A234 -11.12                 
A234 4.86                    
;                           
run;
                    
proc sql;                  
create table chg as         
select card_number, abscharge, sum(charge)/abscharge as cnt
from charge                  
group by card_number, abscharge;  
quit;
                    
data charge(drop=abscharge i cnt count);         
set chg;                    
count=abs(cnt);              
do i=1 to count;            
  charge=sign(cnt)*abscharge;   
  output;                  
end;                     
run;
作者: shiyiming    时间: 2004-3-25 08:35
标题: 我也来试试。
不好意思,我刚开始学SAS,也来试试。请大侠们多多指教~~
我的思路是在不改变原来顺序的前提下,只用data step 语句,将每一次的return对冲离这个最近的charge,最后得到一个时间序列。


data  charge;
input Card_Number $4. Charge;
cards;
A123 14.56
A123 15.23
A123 -14.56
A234 11.12
A234 3.87
A234 11.12
A234 -11.12
A234 11.12
A234 4.86
A234 -11.12
;
run;
data charge(drop=tt);
set charge;
retain time 0 tt 'xxxx';
if tt=Card_Number then do;
time=time+1;
tt=Card_Number;
end;
else do;
time=1;
tt=Card_Number;
end;
run;

data result;
set charge;
where charge&gt;0;
run;
data tem2;
set charge;
where charge&lt;0;
retain nn 0;
nn=nn+1;
run;

%macro filter;
%let dsid =%sysfunc(open(tem2));
%let nobs =%sysfunc(attrn(&amp;dsid,NOBS));
%let dsid1=%sysfunc(close(&amp;dsid));

%do i=1 %to &amp;nobs;
%let stop=0;
data _null_;
set tem2;
where nn=&amp;i;
call symput('Card_Number',Card_Number);
call symput('charge',charge);
call symput('time',time);
run;

%do %while (&amp;stop=0 and &amp;time&gt;=1);
%let time=&amp;time-1;
data _null_;
set result;
where Card_Number="&amp;Card_Number" and time=&amp;time;
sum=charge+&amp;charge;
if sum=0 then call symput('stop',1);
run;

%if &amp;stop=1 %then %do;
data result;
set result;
if Card_Number="&amp;Card_Number" and time=&amp;time then delete;
run;
%end;
%end;
%let stop=0;
%end;
data result(drop=time);
set result;
run;
%mend filter;

%filter;
作者: shiyiming    时间: 2004-3-26 08:32
good
作者: shiyiming    时间: 2004-3-27 06:19
标题: Re: 我也来试试。
[quote=&quot;小猪&quot;:ea894]不好意思,我刚开始学SAS,也来试试。请大侠们多多指教~~
我的思路是在不改变原来顺序的前提下,只用data step 语句,将每一次的return对冲离这个最近的charge,最后得到一个时间序列。

data charge(drop=tt);
set charge;
retain time 0 tt 'xxxx';
if tt=Card_Number then do;
time=time+1;
tt=Card_Number;
end;
else do;
time=1;
tt=Card_Number;
end;
run;

data result;
set charge;
where charge&gt;0;
run;
data tem2;
set charge;
where charge&lt;0;
retain nn 0;
nn=nn+1;
run;
[/quote:ea894]

Thank you for your interests on this homework assignment, I think you did pretty good.  Before I go into your macro for detail, I would like to make some comments on the first part of your program.  I did a small program which will do almost exactly same thing as you did.

data charge;
set charge;
by card_number;
if first.card_number then time=0;
time+1;
run;

data result tem2;
set charge;
if charge&gt;0 then output result;
else if charge&lt;0 then output tem2;
run;

I think it is easier to understand, am I correct?  Surely, I assume that the original data was sorted by card_number, and I did not create nn in tem2.  In fact, the first assumption is also implied in your program, and I will see if the second one is necessary.  I will have a follow up later, have a good weekend.
作者: shiyiming    时间: 2004-4-8 18:43
我也来试一下,但这个小程序得到的结果没有完全按照原题要求的顺序,还望各位高手多多指教


data temp;
input Card_Number $ Charge ;
cards;
A123 14.56
A123 15.23
A123 -14.56
A234 11.12
A234 3.87
A234 11.12
A234 -11.12
A234 11.12
A234 4.86
A234 -11.12
A234 6.8
A234 -6.8

;
run;


data temp1;
  set temp;
  if charge &gt;= 0 then delete;
  return = charge;
  drop charge;
run;

data temp;
  set temp;
  if  charge &lt; 0 then delete;
run;

data test;
  merge temp temp1;
  by card_number;
run;

proc sort data= test;
  by card_number charge;
run;

data test;
  set test;
  lagcharge= lag(charge);
  if charge ne lagcharge  then
   do;
    if sum(charge,return) = 0 then delete;
   end;
  drop return lagcharge;
run;

proc sql;
  select distinct card_number,charge
  from test
  order by card_number;
run;


得到的结果

Card_Number    Charge
---------------------
A123            15.23
A234             3.87
A234             4.86
A234            11.12
作者: shiyiming    时间: 2004-4-27 12:06
Just provide a simple solution from an accounting viewpoint, say, revenue = charge + credit for each unique transaction item.  <!-- s:lol: --><img src="{SMILIES_PATH}/icon_lol.gif" alt=":lol:" title="Laughing" /><!-- s:lol: -->  <!-- s:lol: --><img src="{SMILIES_PATH}/icon_lol.gif" alt=":lol:" title="Laughing" /><!-- s:lol: -->  <!-- s:lol: --><img src="{SMILIES_PATH}/icon_lol.gif" alt=":lol:" title="Laughing" /><!-- s:lol: -->  <!-- s:lol: --><img src="{SMILIES_PATH}/icon_lol.gif" alt=":lol:" title="Laughing" /><!-- s:lol: -->

data tem;
input Card_Number $4. Charge;
cards;
A123 14.56
A123 15.23
A123 -14.56
A234 11.12
A234 3.87
A234 11.12
A234 -11.12
A234 11.12
A234 4.86
A234 -11.12
A235 10.87
A235 -10.87
;
run;

data tem;
set tem;
chg_c = put(charge, 8.2);
run;

proc summary data=tem nway;
where charge &lt; 0;
class card_number chg_c;
var charge;
output out=credit(drop = _type_  rename=(_freq_ = ncredit charge=credit)) sum=;
run;

data credit;
set credit;
chg_c = compress(compress(chg_c, '-'), ' ');
run;

proc summary data=tem nway;
where charge &gt;=0;
class card_number chg_c;
var charge;
output out=result(drop= _type_ rename=(_freq_ = ncharge))sum=;
run;

data result;
set result;
chg_c = compress(chg_c, ' ');
run;

proc sort data=credit; by card_number chg_c; run;
proc sort data=result; by card_number chg_c; run;

data result(keep=card_number charge);
merge result credit;
by card_number chg_c;
charge = sum(charge, credit);
if charge ne 0;
run;

proc print data=result noobs; run;

-------------------------------------------------

The SAS System
23:55 Monday, April 26, 2004   1

Card_
Number    Charge

A123      15.23
A234      11.12
A234       3.87
A234       4.86
作者: shiyiming    时间: 2004-6-3 16:11
标题: 试试
这是我的程序,它基于一个假设:先消费后还账。
[color=blue:eda74]data a;
  input CardN $ Charge;
  abs=abs(charge);
  if charge&lt;0 then pos=0;
  else pos=1;
cards;
A123 14.56
A123 15.23
A123 -14.56
A123 14.56
A123 15.23
A123 -14.56
A234 11.12
A234 3.87
A234 11.12
A234 -11.12
A234 4.86
A234 11.12
A234 3.87
A234 11.12
A234 11.12
A234 -11.12
A234 4.86
A234 -11.12
;
run;

proc sort;
  by cardn abs pos;
data c;
  set a;
  retain rep del 0;
  if first.cardn then
     del=0;
  if pos=0 then do;
     del+1;
     end;
  else if del&gt;0 then do;
     del+(-1);
     end;
  else  output;
  run;
proc print;
run;[/color:eda74]

提交运行结果如下:
[color=darkred:eda74]Obs  CardN    Charge
1     A123     15.23
2     A123     15.23
3     A234      3.87
4     A234      3.87
5     A234      4.86
6     A234      4.86
7     A234     11.12
8     A234     11.12 [/color:eda74]

不过,没有用其它数据测试。
作者: shiyiming    时间: 2004-6-22 22:24
标题: 用TRANSPOSE可能对这类题好一些
先把相同的ID的数据TRANSPOSE成同一行,处理起来比较容易。
[code:10dc9]proc print data=card;
run;
proc transpose data=card prefix=charge out=tpcard&#40;drop=_name_&#41;;
var charge ;
by Card_Number;
run;
data tempcard&#40;keep=card_number charge&#41;;
set tpcard;
array card&#91;*&#93; charge1-charge5;
do i= dim&#40;card&#41; to 1 by -1 ;
   if card&#91;i&#93;=&#46; then continue;
  do j= i-1 to 1 by -1;
if abs&#40;card&#91;i&#93;&#41;=abs&#40;card&#40;j&#41;&#41; and  card&#91;i&#93; ne card&#91;j&#93; then
do;
card&#91;i&#93;=&#46;;
card&#91;j&#93;=&#46;;
leave;
end;
end;
end;
do k=1 to dim&#40;card&#41; ;
if card&#91;k&#93; ne &#46; then
  do;
  charge=card&#91;k&#93;;
  output ;
  end;
end;
run;
proc print data=tempcard;
run;[/code:10dc9]
作者: shiyiming    时间: 2004-7-28 11:25
标题: 只使用data step 没有用到转置等proc
data charge;
input Card_Number $4. Charge;
cards;
A123 14.56
A123 15.23
A123 -14.56
A234 11.12
A234 3.87
A234 11.12
A234 -11.12
A234 11.12
A234 4.86
A234 -11.12
;
run;
data charge_1;
do n=obs to 1 by -1;
set charge point=n nobs=obs ;
output;
end;
stop;
run;
data charge_1;
set charge_1;
s+1;
run;
data charge_2;
retain tem 100;
set charge_1;
if charge&lt;0 then  tem=charge;
if charge&gt;0 &amp; charge=-1*tem then do;
s=0;
tem=100;
end;
run;
data charge_3;
set charge_2;
where s ^=0 &amp; charge&gt;0;
drop tem s;
run;
data result;
do n=obs to 1 by -1;
set charge_3 point=n nobs=obs ;
output;
end;
stop;
run;
作者: shiyiming    时间: 2004-8-24 12:55
标题: Re:程序员测试
data charge return(drop=charge);
input Card_Number $ Charge;
if charge&gt;=0 then do;
absCharge=charge;
output charge;
end;
else do;
absCharge=abs(charge);
return=charge;
output return;
end;
cards;
A123 14.56
A123 15.23
A123 -14.56
A234 11.12
A234 3.87
A234 11.12
A234 -11.12
A234 4.86
;
run;
proc sort data=charge;
by Card_Number absCharge;
run;
proc sort data=return;
by Card_Number absCharge;
run;
data revenue(drop=absCharge return);
merge charge return;
by Card_Number absCharge;
if charge^=. and return^=. then delete;
run;
作者: shiyiming    时间: 2004-9-9 02:43
标题: how about this?
data test;
        input card_number $ charge;
        datalines;
        A123 14.56
        A123 15.23
        A123 -14.56
        A123 -8.0
        A234 11.12
        A234 3.87
        A234 11.12
        A234 -11.12
        A234 4.86
run;
data test2 test3; set test;
        if charge &lt; 0 then output test2;
        else output test3;
run;

data test2; set test2; charge = abs(charge); run;
proc sort data=test2; by card_number charge; run;
data test2; set test2;
        by card_number charge;
        retain ct 0;
        if first.card_number or first.charge then ct = 0;
        ct+1;
        run;
proc sort data=test3; by card_number charge; run;
data test3; set test3;
        by card_number charge;
        retain ct 0;
        if first.card_number or first.charge then ct = 0;
        ct+1;
        run;
proc sort data=test2; by card_number charge ct; run;
proc sort data=test3; by card_number charge ct; run;
data test ot; merge test3(in=in1) test2(in=in2);
        by card_number charge ct;
        if in1 and not in2 then output test;
        else if not in1 and in2 then output ot;
        run;
data test; set test ot(in=a);
        drop ct;
        if a then charge = -1*charge;
        run;
proc print data=test; run;
作者: shiyiming    时间: 2004-9-16 12:41
标题: My Solution:
data source;
input Card_Number $4. Charge;
cards;
A123 14.56
A123 15.23
A123 -14.56
A135 20.00
A234 11.12
A234 3.87
A234 11.12
A234 -11.12
A234 11.12
A234 4.86
A234 -11.12
;
run;
data Charge Payment;
set source;
if Charge&gt;0 then output Charge; else if Charge&lt;0 then output Payment;
run;
proc sql;
create table Sum_Charge as select distinct Card_Number, Charge, count(*) as Cnt_Charge from Charge group by Card_Number, Charge;
create table Sum_Payment as select distinct Card_Number, Charge as Payment, count(*) as Cnt_Payment from Payment group by Card_Number, Charge;
quit;
proc print data=Sum_Charge;run;
data rst(keep=Card_Number Charge);
merge Sum_Charge Sum_Payment;
by Card_Number;
if sum(Charge,Payment) then output;
else do;
        Dif=Cnt_Charge-Cnt_Payment;
        if Dif&gt;0 then do i=1 to Dif; output; end;
end;
run;
proc print;run;
作者: shiyiming    时间: 2004-10-19 14:21
标题: 。。。。。。。
看不懂~~~~~~~~~~~~~~~~~
作者: shiyiming    时间: 2004-10-22 23:13
标题: 各位兄弟,你们做得太复杂了,其实很简单的,如果学过数据库就知
这是我的程序,供大家参考:请多多指教。
程序输出结果保存在out数据库中
data credit_card;
input card_number $ charge;
cards;
A123 14.62
A123 -14.62
A123 56.23
A124 45.2
A124 23.8
A124 -46.2
A125 56.2
A125 -28.6
;
run;
proc sql ;
create table out as
select card_number, sum(charge)
from credit_card
group by card_number;
run;
作者: shiyiming    时间: 2004-12-1 15:04
标题: 用sql写的
data Sasuser.test;
imput Card_Number $1-4 Charge 6-12;
cards;
A123 14.56
A123 15.23
A123 -14.56
A234 11.12
A234 3.87
A234 11.12
A234 -11.12
A234 4.86
run;

proc sql;

create table temp1 as
select * from Sasuser.test where Charge&gt;0 ;
create table temp2 as
select Card_Number,abs(charge) from Sasuser.test where Charge&lt;0;
select * from temp1 except all (select * from temp2);

run;
作者: shiyiming    时间: 2005-1-6 21:54
标题: hi, this is my solution
I would like to follow the exercise.  I am a neophyte although I have some programming experience in other programming language.
here is my solution, I hope I understand the issue correctly.

data test2;
input id$ amount;
cards;
A123 14.56
A123 15.23
A123 -14.56
A234 11.12
A234 3.87
A234 11.12
A234 -11.12
A234 4.86
;

data pos neg;
set test2;
select;
when (amount&gt;0) output pos;
when (amount&lt;0) do;
        amount=-amount;
        n=1;
        output neg ;
end;
run;
proc sort data=pos out=pos;
by id$ amount;

proc sort data=neg out=neg;
by id$ amount;

data merge;
merge pos neg;
by id$ amount;

data merge;
if n=1 then delete;
drop n;
run;
作者: shiyiming    时间: 2005-3-19 16:46
标题: Try this one
data charge;
input Card_Number $ Charge;
absCharge=abs(charge);
datalines;
A123 14.56
A123 15.23
A123 -14.56
A234 11.12
A234 3.87
A234 11.12
A234 -11.12
A234 4.86
run;

proc sort data=Charge;
  by Card_number absCharge;
run;

proc means data=charge noprint;
  by Card_Number absCharge;
  output out=out sum=sum;
run;

data result;
  set out;
  Charge = sign(sum)*absCharge;
  n = abs(sum/abscharge);
  do i=1 to n;
    output;
  end;
  keep Card_Number Charge absCharge;
run;
<!-- s:D --><img src="{SMILIES_PATH}/icon_biggrin.gif" alt=":D" title="Very Happy" /><!-- s:D -->
作者: shiyiming    时间: 2005-10-15 08:13
标题: Re: SAS程序员测试(一)
[quote=&quot;xic&quot;:c835f]Here is a small test for your SAS programming skills, which was posted by some one in SAS User Group several years ago.  I did it in my way, but I would like to share it with you all.

There is a Credit Card charging record data set, each line is either a charge or a return.  Charge is a positive number, and return is negative.  Assume that each item will be charged once, so if there is a return, it will be a full refund.  A sample data set looks like the following:

Card_Number               Charge <!-- s:twisted: --><img src="{SMILIES_PATH}/icon_twisted.gif" alt=":twisted:" title="Twisted Evil" /><!-- s:twisted: -->
A123                   14.56
A123                   15.23
A123                   -14.56
A234                   11.12
A234                     3.87
A234                   11.12
A234                   -11.12
A234                     4.86

Since the charge/return pair does not generate revenue, it should be deleted.  So, the cleaned data set should look like the following:

Card_Number              Charge
A123                   15.23
A234                   11.12
A234                     3.87
A234                     4.86

Please write a SAS program to accomplish this task.

You can give yourself a grade based on the following criteria:

60 - your program will solve this example correctly;
70 - your program will not only solve this example correctly, but also  works for extended records following the same logic;
80 - besides meeting the criteria for 70, your program has a clear logical structure and other programmer is able to understand it without raising questions.
90 - clear cut, simple structure, efficient
100 - you can think about it.

If you have interests to do it and have questions about it, spell out.  Good Luck.[/quote:c835f]
作者: shiyiming    时间: 2005-10-15 19:04
标题: 我的解答
我的Email:richard51521@yahoo.com.cn,欢迎交流。
data WORK.SOURCE;
input CARD_NBR $4. AMOUNT;
cards;
A123 14.56
A123 15.23
A123 -14.56
A234 11.12
A234 3.87
A234 11.12
A234 11.12
A234 -11.12
A234 -11.12
A234 4.86
A234 -9.99
A123 -14.56
;
run;
DATA WORK.CHARGE(RENAME=(AMOUNT=CHARGE_AMT)) WORK.RETURN(RENAME=(AMOUNT=RETURN_AMT));
SET WORK.SOURCE;
IF NOT MISSING(AMOUNT) THEN DO;
        IF AMOUNT&lt;0 THEN DO;
                CHARGE=-AMOUNT;
                OUTPUT WORK.RETURN;
        END;
        IF AMOUNT&gt;=0 THEN DO;
                CHARGE=AMOUNT;
                OUTPUT WORK.CHARGE;
        END;
END;
RUN;

PROC SORT DATA=WORK.CHARGE;
BY CARD_NBR CHARGE;
RUN;
PROC SORT DATA=WORK.RETURN;
BY CARD_NBR CHARGE;
RUN;
DATA WORK.RETURN;
SET WORK.RETURN;
SEQ=_N_;
RUN;
/*因为存在相同卡号与金额的情况,会导致一笔return冲掉多笔charge*/
/*
DATA WORK.CHARGE_AGAINST WORK.CHARGE_RETAIN WORK.RETURN_RETAIN;
MERGE WORK.CHARGE(IN=E) WORK.RETURN(IN=F);
BY CARD_NBR CHARGE;
IF E AND F THEN OUTPUT WORK.CHARGE_AGAINST;
IF E AND ^F THEN OUTPUT WORK.CHARGE_RETAIN;
IF ^E AND F THEN OUTPUT WORK.RETURN_RETAIN;
RUN;
*/
/*将CHARGE创建索引*/
PROC DATASETS LIB=WORK NOLIST;
MODIFY RETURN;
        INDEX DELETE CARD_CHARGE;
        INDEX CREATE CARD_CHARGE=(CARD_NBR CHARGE);
RUN;
/*在多笔CHARGE对一笔冲销,或多笔CHARGE对多笔冲销,以及一笔CHARGE对多笔冲销的情况下,以下算法均能正确*/
DATA WORK.CHARGE_RETAIN;
SET WORK.CHARGE;
SET WORK.RETURN KEY=CARD_CHARGE;
RETAIN SEQ_LST;
MATCH_IND=0;
IF _IORC_ EQ 0 AND SEQ NE SEQ_LST THEN DO;
        SEQ_LST=SEQ;
        MATCH_IND=1;
        OUTPUT;
        END;
IF _IORC_ NE 0 THEN DO;
        OUTPUT;
        _IORC_=0;       
        END;
RUN;

DATA WORK.CHARGE_RESULT(DROP=MATCH_IND);
SET WORK.CHARGE_RETAIN(KEEP=CARD_NBR CHARGE_AMT MATCH_IND);
IF MATCH_IND=1 THEN DELETE;
RUN;
作者: shiyiming    时间: 2005-10-27 10:44
标题: 试试
学SAS不久,也来试试!
我的程序如下:
data tem1 tem2;
input card_number $ charge;
abs=abs(charge);
if charge&gt;0 then output tem1;
else output tem2;
cards;
A123 14.56
A123 15.23
A123 -14.56
A123 14.56
A123 15.23
A123 -14.56
A234 11.12
A234 3.87
A234 11.12
A234 -11.12
A234 4.86
A234 11.12
A234 3.87
A234 11.12
A234 11.12
A234 -11.12
A234 4.86
A234 -11.12
run;
proc sort  data=tem1;by card_number abs;run;
proc sort  data=tem2;by card_number abs;run;
data tem(drop=abs);
merge tem1 tem2;
by card_number abs;
if charge&lt;0 then delete;
run;
作者: shiyiming    时间: 2005-10-29 22:52
标题: Re: 试试
楼上的兄弟,我不是才说过,用Merge的办法会导致一笔冲销掉多笔吗??
作者: shiyiming    时间: 2005-10-30 20:21
标题: 回复
楼上的这位,我试过几个,用merge是可以的阿。SAS语言跟别的语言不同,merge合并的时候也是一对一的,所以不用担心你说的那个问题吧。不知道我说的对不对?
我刚学SAS不久,很多地方都不懂,希望能向你请教,我的QQ:117605981
作者: shiyiming    时间: 2005-11-3 23:13
标题: 我也学不久,但从来没听说过Merge是一对一的。
Merge有三种类型:
一对一
一对多
多对多

一对一和一对多情况下,与SQL是相同的。

多对多的情况,如2*3只会有2条match的记录。会发生数据丢失,所以多对多的情况最好是慎用。
作者: shiyiming    时间: 2005-12-15 23:20
标题: ,.....................
............看不懂
作者: shiyiming    时间: 2006-2-10 09:38
标题: SAS test
data credit;
input Card_Number $ Charge;
char=abs(charge);
datalines;
A123 14.56
A123 15.23
A123 -14.56
A234 11.12
A234 3.87
A234 11.12
A234 -11.12
A234 4.86
run;
proc sort;
by card_number char charge;
run;
data Credit; set credit;
if charge&lt;0 then do;
k=_n_;
delete;
retain k; end;
if _N_=k+1 then delete;
drop char K;
run;

Note: The order of data set is the original order.
作者: shiyiming    时间: 2006-2-13 17:08
标题: my script
本人过去是写sql,转写sas时间不长,下面这个方法主要是用sql来写的,时间紧促,备注就随便写一写,大家请指教:

/*数据输入*/
data tem;
input Card_Number $4. Charge;
cards;
A123 14.56
A123 15.23
A123 -14.56
A234 11.12
A234 11.12
A234 3.87
A234 11.12
A234 -11.12
A234 -11.12
A234 4.86
;run;

/*增加一列序列号,1,2,3,4……*/

data tt;
set tem;
id=_n_;
run;

/*
为了得到这样的分组序号效果
A234 11.12  1
A234 11.12  2
A234 11.12  3
A234 -11.12 1
A234 -11.12 2
………
*/

proc sql;
create table tt1 as
select    a.*,
    (select count(*) from tt where Card_Number=a.Card_Number and Charge=a.Charge and ID&lt;=a.ID) as n
from   tt a        ;
quit;


/*把负数(还款)取出来*/

data t1;
set tt1;
money=abs(Charge);
where Charge&lt;0;
run;

/*做left join增加两列,如果这两列都为0,则就是最终所要的数据*/

proc sql;
create table t2 as
select  a.Card_Number,a.Charge,b.Charge as ref_charge1,c.Charge as ref_charge2
from tt1 a
left join t1 b on a.Charge=b.money and a.Card_Number=b.Card_Number and a.n=b.n and a.charge&gt;=0
left join t1 c on a.Charge=c.Charge and a.Card_Number=c.Card_Number and a.n=c.n and a.charge&lt;0
;
quit;

data t3 ;
set t2;
drop ref_charge1 ref_charge2;
where ref_charge1=. and ref_charge2=.;
run;
作者: shiyiming    时间: 2009-6-9 22:21
标题: Re: SAS程序员测试(一)
N年前的贴子了,这道题目还是挺经典的。不过有个朋友说会重复,其实那段是注释掉的,以下才是我写的算法。你可以试看看,帮我测试测试,呵。

/*在多笔CHARGE对一笔冲销,或多笔CHARGE对多笔冲销,以及一笔CHARGE对多笔冲销的情况下,以下算法均能正确*/
DATA WORK.CHARGE_RETAIN;
SET WORK.CHARGE;
SET WORK.RETURN KEY=CARD_CHARGE;
RETAIN SEQ_LST;
MATCH_IND=0;
IF _IORC_ EQ 0 AND SEQ NE SEQ_LST THEN DO;
SEQ_LST=SEQ;
MATCH_IND=1;
OUTPUT;
END;
IF _IORC_ NE 0 THEN DO;
OUTPUT;
_IORC_=0;
END;
RUN;

DATA WORK.CHARGE_RESULT(DROP=MATCH_IND);
SET WORK.CHARGE_RETAIN(KEEP=CARD_NBR CHARGE_AMT MATCH_IND);
IF MATCH_IND=1 THEN DELETE;
RUN;
作者: shiyiming    时间: 2009-6-9 22:49
标题: Re: SAS程序员测试(一)
I think we need more information to recognize which charge record the return record really refers. So far, it  just refers to the nearest one for the same card number  with a same charge value, which is not always true.
作者: shiyiming    时间: 2009-6-10 14:04
标题: Re: SAS程序员测试(一)
思路:对每组中相同记录分别编号,然后去交集。

[code:33ikojp9]data WORK&#46;SOURCE;
input CARD_NBR $4&#46; AMOUNT;
cards;
A123 14&#46;56
A123 15&#46;23
A123 -14&#46;56
A234 11&#46;12
A234 3&#46;87
A234 11&#46;12
A234 11&#46;12
A234 -11&#46;12
A234 -11&#46;12
A234 4&#46;86
A234 -9&#46;99
A234 -11&#46;13
A123 -14&#46;56
;
run;

data ex1 ex2;
   set source;
   if  amount&gt;=0 then output ex1;
   if  amount &lt;0 then do ;
       amount=-amount;
       output ex2;
           end;
   run;
proc sort data=ex1 out=ex1;by  CARD_NBR  AMOUNT; run;
proc sort data=ex2 out=ex2;by  CARD_NBR  AMOUNT; run;

data ex1;
    set ex1;
        by          CARD_NBR  AMOUNT;
        if (first&#46;amount=1 and last&#46;amount = 1 )  then num=1;  
    else if (first&#46;amount=1 and last&#46;amount ne 1)  then num=1;
    else if (first&#46;amount ne 1 and last&#46;amount ne 1) then num+1;
    else if (first&#46;amount ne 1 and last&#46;amount =1) then num+1;
run;

data ex2;
    set ex2;
    by  CARD_NBR  AMOUNT;
        if (first&#46;amount=1 and last&#46;amount = 1 )  then num=1;  
    else if (first&#46;amount=1 and last&#46;amount ne 1)  then num=1;
    else if (first&#46;amount ne 1 and last&#46;amount ne 1) then num+1;
    else if (first&#46;amount ne 1 and last&#46;amount =1) then num+1;
run;

proc sql;
    select * from ex1
        except
        select * from ex2;
        quit;[/code:33ikojp9]
作者: shiyiming    时间: 2009-6-10 14:43
标题: Re: SAS程序员测试(一)
如果全部用data步的 话。

[code:lwb7etar]
&#46;&#46;&#46;

data ex3;
   merge ex1(in=in1) ex2 (in=in2);
   if in1 and ^in2 then output;
   by  CARD_NBR  AMOUNT num;
   run;[/code:lwb7etar]
作者: shiyiming    时间: 2009-6-12 14:54
标题: Re: SAS程序员测试(一)
不知道是不是这个意思
data a;
input Card_Number $ Charge;
cards;
A123 14.56
A123 15.23
A123 -14.56
A234 11.12
A234 3.87
A234 11.12
A234 -11.12
A234 4.86
;
run;
proc sort data=a; by card_number; run;

data b;
        set a;
        by card_number;
        abscharge=abs(charge);
run;
proc sort data=b; by card_number abscharge charge ; run;

data c;
        set b;
        by card_number abscharge charge;
        if first.abscharge then charge0=charge;
                else charge0+charge;
        if charge0 &gt; 0 then output;
run;
作者: shiyiming    时间: 2009-6-12 23:27
标题: Re: SAS程序员测试(一)
非常经典的题目。我来一个投机取巧的,大家拿砖头拍....<!-- s:shock: --><img src="{SMILIES_PATH}/icon_eek.gif" alt=":shock:" title="Shocked" /><!-- s:shock: -->
要保证新的id=&quot;ZZZZZZZ&quot;在按id排序后排到最后一行
[code:3lvv2a4x]data a;
input id $ fee;
aa=abs(fee);
cards;
A123 14&#46;56
A123 15&#46;23
A123 -14&#46;56
A234 11&#46;12
A234 3&#46;87
A234 11&#46;12
A234 -11&#46;12
A234 4&#46;86
;
run;
proc sort;
   by id aa;
run;
*要保证生成的id=ZZZZZZZ能在排序后排到最后;
data c;
   input id $;
   cards;
ZZZZZZZ
   ;
run;
*以lag后的id和fee新的id和fee;
data b;
   retain id id_new;
   set a c;
   id_New=lag(id);
   fee_new=lag(fee);
run;
proc sort;
   by id id_new aa;
run;
*indicate1和indicate2为前后两项是否相反的标示;
data b;
   set b;
   by id_new;
   bb=lag(fee_new);if first&#46;id_new then bb=&#46;;
   indicate1=fee_new+bb;
   indicate2=fee+fee_new;
   if indicate1=0 or indicate2=0 then delete;
   if id_new^='';
   keep id_new fee_new;
run;
[/code:3lvv2a4x]
经测试,上面的程序有误...... <!-- s:( --><img src="{SMILIES_PATH}/icon_sad.gif" alt=":(" title="Sad" /><!-- s:( -->
下面的不知道是否正确:
[code:3lvv2a4x]data a;
input id $ fee;
aa=abs(fee);
sign=sign(fee);
cards;
A123 14&#46;56
A123 15&#46;23
A123 -14&#46;56
A234 11&#46;12
A234 3&#46;87
A234 11&#46;12
A234 11&#46;12
A234 11&#46;12
A234 11&#46;12
A234 -11&#46;12
A234 -11&#46;12
A234 -11&#46;12
A234 4&#46;86
A234 4&#46;86
;
run;
proc sort;
   by id aa sign;
run;
data d;
   set a;
   by id aa sign;
   retain x;
   *计数出现负的个数;
   if first&#46;aa then do;
      if sign=-1 then x=1;else x=0;
   end;
   else do;
      if sign=-1 then x+1;
   end;
   *出现正的非第一行逐行减1,可能会出现负的情况;
   *if sign=1 and ^(first&#46;sign) then x=x-1;
   if ^(first&#46;sign) then  do;
      if sign=1 then x=x-1;
   end;
   *删除标示&gt;0的行;
   *if x&gt;0 then delete;
   *keep id fee;
run;
[/code:3lvv2a4x]
作者: shiyiming    时间: 2009-6-25 18:10
标题: Re: SAS程序员测试(一)
[code:2nlx9562]data ahuige;
    input  Card  $ Charge ;
    cards;
A123 14&#46;56
A123 15&#46;23
A123  -14&#46;56
A234 11&#46;12
A234 3&#46;87
A234 11&#46;12
A234  -11&#46;12
A234 4&#46;86
;
run;

proc sql;
    create table final as
    select card,charge
    from ahuige(where=(charge&gt;0))
    except all
    select card,abs(charge) as charge
    from ahuige(where=(charge&lt;0))
    ;
[/code:2nlx9562]
突然发觉自己以前从来没有回过这个贴子。。。。。。。。。。。。。。
作者: shiyiming    时间: 2009-6-26 10:36
标题: Re: SAS程序员测试(一)
哏多方法,看得老汉眼发花!




欢迎光临 SAS中文论坛 (https://mysas.net/forum/) Powered by Discuz! X3.2