SAS Macro: How to Retrieve a Value from a Dataset
SAS is a very powerful language when one have to deal with processing huge datasets. It becomes much more complicated when one wants to play with single elements. I recently had the need to put a specific value of a dataset into a macro variable. With the help of the SAS support team of Switzerland, I have made a small macro for that. Here is the definition of the function:
- Macro name:
Get_data - Input:
myDataset(name of the dataset),myLine(line number),myColumn(name of the column),myMVar(name of the macro variable used to store the result) - Output: A new macro variable named
&myMVar.
And here is the corresponding code:
%MACRO Get_data(myDataset=,myLine=,myColumn=,myMVar=);
%GLOBAL &myMVar.;
data _null_;
set &myDataset.;
if _N_ = &myLine. then do;
call symput(symget('myMVar'),&myColumn.);
end;
run;
%MEND Get_data;
Of course, I would be very interested if someone found a simplier way of performing this action (and I think SAS support would be interested as well).
Comments
11 Comments on SAS Macro: How to Retrieve a Value from a Dataset
-
lordthias on
Thu, 24th Dec 2009 2:54 pm
-
Sandro Saitta on
Thu, 24th Dec 2009 4:05 pm
-
Chris Hemedinger on
Thu, 24th Dec 2009 4:16 pm
-
Sandro Saitta on
Sat, 26th Dec 2009 7:35 pm
-
Shane on
Wed, 30th Dec 2009 9:59 am
-
Sandro Saitta on
Sun, 3rd Jan 2010 1:23 pm
-
Manish on
Thu, 14th Jan 2010 12:01 pm
-
Sandro Saitta on
Thu, 14th Jan 2010 12:47 pm
-
Song Liu on
Thu, 18th Feb 2010 9:03 pm
-
RolandRB on
Tue, 1st Feb 2011 3:55 am
-
Sandro Saitta on
Thu, 3rd Feb 2011 8:23 pm
Hi Sandro,
Here’s an other way to do the same think (maybe a little faster on big dataset)
%MACRO Get_data(myDataset=,myLine=,myColumn=,myMVar=);
%GLOBAL &myMVar;
proc sql noprint;
select &myColumn into :&myMVar
from &myDataset
where monotonic() = &myLine;
quit;
%MEND Get_data;
%Get_data(myDataset=tablename,myLine=linenumber,myColumn=colname,myMVar=varname)
%put &varname;
I let you test the execution time
Joyeuse fête de noël !
@lordthias: Thanks for the PROC SQL example. I’m not using the get_data macro on big data sets for the moment, but I would of course compare both codes before using it on huge files
A toi aussi et bonne année!
I second the suggestion for using PROC SQL SELECT INTO. The concept of sequential records holds true for SAS datasets, but if your data is in a DBMS, you might want a different mechanism for pulling out a specific record (value in an ID column, for example).
@Chris: Thanks for sharing your opinion. Also PROC SQL is closer to mySQL and may be worth considering if planning to move from SAS to another ETL tool (Talend, etc.)
Another benefit of PROC SQL is it doesn’t require sorted datasets for a join… where as DATA …; MERGE … BY … will throw an error if it is not sorted — very annoying!!
@Shane: Good point! This is something we should keep in mind when comparing DATA step with PROC SQL execution time.
Hi
This is the simplest and the most efficient code I can think of (would be hapy to learn better way to do this).
—————————————————————-
%MACRO Get_data(myDataset=,myLine=,myColumn=);
%GLOBAL myMVar;
data _null_;
pull = &myLine; /*assigning value of macto var to datastep*/
set &myDataset (keep=&myColumn) point=Pull;
call symput(’myMVar’,&myColumn);
output; stop;
run;
%MEND Get_data;
%Get_data(myDataset=,myLine=,myColumn=);
%put &myMVar; /*to print value of macro var in log*/
—————————————————————-
It is effiencie because it will read only the line that you want SAS to read. If ‘If – Then’ is used, sas reads each obs and then does the evalauation of IF condition, more input-output. This can be optimised by telling sas to read only the required row using ‘Point=’ option. But ‘Point=’ option takes only variable name. Hence, before set statement, a variable called ‘Pull’ is defined with the value of required row-number.. Now sas just reads the required row saving i/o. DO NOTE that ‘Output;’ and ‘Stop’ must be used with ‘Point=’ option of you will execute a never-ending data-step;
I hae further optimised the program by using ‘KEEP=’ optin in set statement, which asks SAS to read only relevant columns.
The syntax of ‘Call Sumput’ is actually simpler than what you have used here. You don’t need to use Symget here.
Hope this helps.
Manish
@Manish: Thanks for your example Manish. I haven’t tested it yet, but it seems (really) optimized by only reading the correct row/column. Thanks for your input!
Use sas I/O funtions would be effient. Here is the code:
%MACRO Get_data(myDataset,myLine,myColumn);
%global myMvar;
%let dsid=%sysfunc(open(&myDataset.,i));
%let rc=%sysfunc(fetchobs(&dsid, &myLine.));
%if &rc = 0 %then %let myMvar=%sysfunc(GETVARN(&dsid,%sysfunc(varnum(&dsid,&myColumn)))); /****use getvarc for character variables*****/
%let rc=%sysfunc(close(&dsid));
%MEND Get_data;
I have a macro called %getvalue to do this that uses SAS I/O functions as above. The advantage with this method is that you can have a function-style maco to do this that allows you to call it in the middle of a data step if need be.
http://www.datasavantconsulting.com/roland/Spectre/utilmacros/getvalue.sas
More macros here:
http://www.datasavantconsulting.com/roland/Spectre/maclist2.html
@RolandRB: Thanks for the links!
Tell me what you're thinking...
















