1)What is the difference between "IF" and
"Where" conditions in SAS?
References:
SAS
programming by example By Ronald P. Cody, Ray Pass, SAS
Institute, Page 61.
ANS) Sample Program:
libname modval "\\iblt-kssrv-002\KS-User\Appala\Scorecard\NAB";
Log:
4 libname modval
"\\iblt-kssrv-002\KS-User\Appala\Scorecard\NAB";
NOTE: Libref MODVAL was
successfully assigned as follows:
Engine: V9
Physical Name:
\\iblt-kssrv-002\KS-User\Appala\Scorecard\NAB
proc contents data = modval.nabdata1;
run;
Log:
5 proc contents data=
modval.nabdata1;
6 run;
NOTE: PROCEDURE CONTENTS used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
Output:
The SAS System 19:04 Wednesday, July 22, 2009 2
The
CONTENTS Procedure
Data Set Name MODVAL.NABDATA1 Observations 331201
Member Type DATA
Variables 3
Engine V9
Indexes 0
Created Monday, March 16, 2009 10:57:52
AM Observation Length 24
Last Modified Monday, March 16, 2009 10:57:52 AM Deleted Observations 0
Protection
Compressed NO
Data Set Type
Sorted NO
Label
Data Representation WINDOWS_32
Encoding wlatin1 Western (Windows)
Engine/Host
Dependent Information
Data Set Page Size 4096
Number of Data Set Pages 1972
First Data Page 1
Max Obs per Page 168
Obs in First Data Page 114
Number of Data Set Repairs 0
Filename
\\iblt-kssrv-002\KS-User\Appala\Scorecard\NAB\nabdata1.sas7bdat
Release Created 9.0101M3
Host Created WIN_ASRV
Alphabetic List of
Variables and Attributes
# Variable
Type Len Format
Informat
1 Score
Num 8 BEST12.
BEST32.
2
Val_score Num 8
3 frdn
Num 8
data sample1;
set modval.nabdata1(obs=100);
where score gt 1000;
run;
Log:
7 data sample1;
8 set
modval.nabdata1(obs=100);
9 where score gt 1000;
10 run;
NOTE: There were 100
observations read from the data set MODVAL.NABDATA1. WHERE score>1000;
NOTE: The data set
WORK.SAMPLE1 has 100 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.14 seconds
cpu time 0.01 seconds
data sample2;
set modval.nabdata1(obs=100);
if score gt 1000;
run;
Log:
11 data sample2;
12 set
modval.nabdata1(obs=100);
13 if score gt 1000;
14 run;
NOTE: There were 100
observations read from the data set MODVAL.NABDATA1.
NOTE: The data set
WORK.SAMPLE2 has 13 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
data raj(obs=100);
set modval.nabdata1;
where score gt 1000;
run;
data raj1(obs=100);
set modval.nabdata1;
if score gt 1000;
run;
Log:
29 data raj(obs=100);
---
70
WARNING 70-63: The option OBS
is not valid in this context. Option
ignored.
30 set modval.nabdata1;
31 where score gt 1000;
32 run;
NOTE: There were 37799
observations read from the data set MODVAL.NABDATA1.WHERE score>1000;
NOTE: The data set WORK.RAJ has
37799 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 4.35 seconds
cpu time 0.14 seconds
33
34 data raj1(obs=100);
---
70
WARNING 70-63: The option OBS
is not valid in this context. Option
ignored.
35 set modval.nabdata1;
36 if score gt 1000;
37 run;
NOTE: There were 331201
observations read from the data set MODVAL.NABDATA1.
NOTE: The data set WORK.RAJ1
has 37799 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.06 seconds
cpu time 0.06 seconds
proc print data=modval.nabdata1(obs = 10);
where score>1000;
run;
Log:
38 proc print
data=modval.nabdata1(obs = 10);
39 where score>1000;
40 run;
NOTE: There were 10
observations read from the data set MODVAL.NABDATA1.WHERE score>1000;
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
Output:
The
SAS System 19:04 Wednesday, July
22, 2009 5
Val_
Obs Score score
frdn
1 1038 1038
0
5 1021 1021
0
7 1071 1071
0
18 1014 1014
0
21 1132 1132
0
36 1071 1071
0
37 1163 1163
0
48 1026 1026
0
50 1013 1013
0
56 1005
1005 0
data sample1;
set modval.nabdata1;
where score gt 1000;
run;
data sample2;
set modval.nabdata1;
if score gt 1000;
run;
Log:
56 data sample1;
57 set modval.nabdata1;
58 where score gt 1000;
59 run;
NOTE: There were 37799 observations
read from the data set MODVAL.NABDATA1.WHERE score>1000;
NOTE: The data set WORK.SAMPLE1
has 37799 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 4.36 seconds
cpu time 0.25 seconds
60 data sample2;
61 set modval.nabdata1;
62 if score gt 1000;
63 run;
NOTE: There were 331201
observations read from the data set MODVAL.NABDATA1.
NOTE: The data set WORK.SAMPLE2
has 37799 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.07 seconds
cpu time 0.07 seconds
Differences
between IF and WHERE Statements
Both of the above programs produce
identical results, but there are differences between IF and WHERE statements.
The WHERE statement may be more
efficient than the subsetting IF(especially
if you are taking a very small subset from a large file)because it checks
on the validity of the condition before the observation is brought into a
temporary holding area, whereas the sub setting IF statement brings in the entire observation and then checks the
condition to see if the observation is to be kept or not. This temporary
holding area is called the Program
Data Vector (PDV). A WHERE
statement can only be used with variables in the existing data set, whereas a
subsetting IF statement can be used
with raw data as well.
Another difference between a
subsetting IF statement and a WHERE statement may surface when you
use the FIRST. And LAST. Logical variables. When
the WHERE condition is not true, the
observation is not brought into the PDV, and therefore it does not affect the
logical values of the FIRST.
And LAST. Variables.
Another major difference between IF and WHERE statements is that you may include WHERE statement in SAS procedures. For example, if you have a data
set called ALL (containing
the variables ID, SEX, and SALARY), and you want a listing only
for MALES(M), you could code this
as:
PROC PRINT DATA=ALL;
WHERE SEX=’M’;
RUN;
This saves you the work of creating
a new data set just to obtain your listing.
2) What are the
difference and “Proc Means” and “Proc Univariate” in SAS?
ANS1)
Both procedure produce descriptive statistics. By proc uni
variate,
by default it produce all the statistics(some time
not
all required) but in proc means it is possible to
request
the statistics that we want.
ex---proc
means data=xyz mean max sd;run;
*it
would produce statistics of above which we've mentioned;
ANS2) PROC UNIVARIATE gives more descriptive statistics such as
skewness, kurtosis, Q_PLOT and so on. If you are looking for an indepth analysis of the data, like clustering, association tree,..., we start with PROC UNIVARIATE.
If we are looking for some simple results like sum mean SD and to find extreme values we use PROC MEANS as it takes less of machine time than PROC UNIVARIATE.
3)What is the
difference between “NODUP” and “NODUPKEY” options in SAS?
References:
DEFINING NODUP AND
NODUPKEY OPTIONS
The NODUP option checks for and
eliminates duplicate observations. If you specify this option, PROC SORT
compares all variable values for each observation to those for the previous
observation that was written to the output data set. If an exact match is
found, the observation is not written to the output data set.
The NODUPKEY option checks for and
eliminates observations with duplicate BY variable values. If you specify this
option, PROC SORT compares all BY variable values for each observation to those
for the previous observation written to the output data set. If an exact match
using the BY variable values is found, the observation is not written to the
output data set.
Notice that with the NODUPKEY option, PROC SORT is
comparing all BY variable values while the NODUP option compares all the variables in the data
set that is being sorted. An easy way to remember the difference between these
options is to keep in mind the word “key” in NODUPKEY. It evaluates the “key”
or BY variable values that you specify. One thing to beware of with both
options is that they both compare the previous observation written to the
output data set. So, if the observations that you want eliminated are not
adjacent in the data set after the sort, they will not be eliminated.
Joins in SAS+dealing
with a key variable having different lengths?
What is the difference and “Proc Means” and “Proc Summary”
in SAS?
What is the difference and “Proc Tabulate” and “Proc Freq”
in SAS?
What is difference between rename and lable in sas?
What is the difference between “Proc Univariate” and “Proc Freq”
in SAS?
Explain the message 'Merge has one or more datasets with
repeats of by variables'
how to write a variable having multiple formats into a SAS data
set?
how to process a variable with multiple formats in input and
output datasets?
SAS Technical questions and answers
Explain the message
‘MERGE HAS ONE OR MORE DATASETS WITH REPEATS OF BY VARIABLES”.
what are the general errors occur during reading and writing
data from text file in SAS?
what are the common errors occur during reading and writing data
from text file in SAS?
What is the difference between Informat & Format?
How to convert numeric values into character values?
How to convert character values into numeric?
What are the different character functions you have used in your
work?
What are the different options you will use while importing data
from .xls, .csv files?
What are the different automatic variables?
What is the use of _NULL_ ?
How to read every 5th observation of a dataset?
How to get all duplicate records into a separate dataset?
What is the difference between where and having clause?
No comments:
Post a Comment