1)What is the difference between "IF" and "Where" conditions in SAS?

References:

http://www2.sas.com/proceedings/sugi31/238-31.pdf

http://www2.sas.com/proceedings/forum2007/213-2007.pdf

http://books.google.co.in/books?id=wubfWLLuyI0C&pg=PA61&lpg=PA61&dq=What+is+the+difference+between+%22IF%22+and+%22Where%22+conditions+in+SAS%3F&source=bl&ots=EbLjcNNnfj&sig=G7rr5AEIbMJUPPGxfLJz1mrNY38&hl=en&ei=ivVnSruUENeekQWS8ai1DA&sa=X&oi=book_result&ct=result&resnum=5

SAS programming by example By Ronald P. Cody, Ray Pass, SAS Institute, Page 61.

ANS) Sample Program:

libname modval "\\iblt-kssrv-002\KS-User\Appala\Scorecard\NAB";

Log:

4 libname modval "\\iblt-kssrv-002\KS-User\Appala\Scorecard\NAB";

NOTE: Libref MODVAL was successfully assigned as follows:

Engine: V9

Physical Name: \\iblt-kssrv-002\KS-User\Appala\Scorecard\NAB

proc contents data = modval.nabdata1;

run;

Log:

5 proc contents data= modval.nabdata1;

6 run;

NOTE: PROCEDURE CONTENTS used (Total process time):

real time 0.01 seconds

cpu time 0.00 seconds

Output:

The SAS System 19:04 Wednesday, July 22, 2009 2

The CONTENTS Procedure

Data Set Name MODVAL.NABDATA1 Observations 331201

Member Type DATA Variables 3

Engine V9 Indexes 0

Created Monday, March 16, 2009 10:57:52 AM Observation Length 24

Last Modified Monday, March 16, 2009 10:57:52 AM Deleted Observations 0

Protection Compressed NO

Data Set Type Sorted NO

Label

Data Representation WINDOWS_32

Encoding wlatin1 Western (Windows)

Engine/Host Dependent Information

Data Set Page Size 4096

Number of Data Set Pages 1972

First Data Page 1

Max Obs per Page 168

Obs in First Data Page 114

Number of Data Set Repairs 0

Filename \\iblt-kssrv-002\KS-User\Appala\Scorecard\NAB\nabdata1.sas7bdat

Release Created 9.0101M3

Host Created WIN_ASRV

Alphabetic List of Variables and Attributes

# Variable Type Len Format Informat

1 Score Num 8 BEST12. BEST32.

2 Val_score Num 8

3 frdn Num 8

data sample1;

set modval.nabdata1(obs=100);

where score gt 1000;

run;

Log:

7 data sample1;

8 set modval.nabdata1(obs=100);

9 where score gt 1000;

10 run;

NOTE: There were 100 observations read from the data set MODVAL.NABDATA1. WHERE score>1000;

NOTE: The data set WORK.SAMPLE1 has 100 observations and 3 variables.

NOTE: DATA statement used (Total process time):

real time 0.14 seconds

cpu time 0.01 seconds

data sample2;

set modval.nabdata1(obs=100);

if score gt 1000;

run;

Log:

11 data sample2;

12 set modval.nabdata1(obs=100);

13 if score gt 1000;

14 run;

NOTE: There were 100 observations read from the data set MODVAL.NABDATA1.

NOTE: The data set WORK.SAMPLE2 has 13 observations and 3 variables.

NOTE: DATA statement used (Total process time):

real time 0.01 seconds

cpu time 0.01 seconds

data raj(obs=100);

set modval.nabdata1;

where score gt 1000;

run;

data raj1(obs=100);

set modval.nabdata1;

if score gt 1000;

run;

Log:

29 data raj(obs=100);

---

WARNING 70-63: The option OBS is not valid in this context. Option ignored.

30 set modval.nabdata1;

31 where score gt 1000;

32 run;

NOTE: There were 37799 observations read from the data set MODVAL.NABDATA1.WHERE score>1000;

NOTE: The data set WORK.RAJ has 37799 observations and 3 variables.

NOTE: DATA statement used (Total process time):

real time 4.35 seconds

cpu time 0.14 seconds

34 data raj1(obs=100);

---

WARNING 70-63: The option OBS is not valid in this context. Option ignored.

35 set modval.nabdata1;

36 if score gt 1000;

37 run;

NOTE: There were 331201 observations read from the data set MODVAL.NABDATA1.

NOTE: The data set WORK.RAJ1 has 37799 observations and 3 variables.

NOTE: DATA statement used (Total process time):

real time 0.06 seconds

cpu time 0.06 seconds

proc print data=modval.nabdata1(obs = 10);

where score>1000;

run;

Log:

38 proc print data=modval.nabdata1(obs = 10);

39 where score>1000;

40 run;

NOTE: There were 10 observations read from the data set MODVAL.NABDATA1.WHERE score>1000;

NOTE: PROCEDURE PRINT used (Total process time):

real time 0.01 seconds

cpu time 0.00 seconds

Output:

The SAS System 19:04 Wednesday, July 22, 2009 5

Val_

Obs Score score frdn

1 1038 1038 0

5 1021 1021 0

7 1071 1071 0

18 1014 1014 0

21 1132 1132 0

36 1071 1071 0

37 1163 1163 0

48 1026 1026 0

50 1013 1013 0

56 1005 1005 0

data sample1;

set modval.nabdata1;

where score gt 1000;

run;

data sample2;

set modval.nabdata1;

if score gt 1000;

run;

Log:

56 data sample1;

57 set modval.nabdata1;

58 where score gt 1000;

59 run;

NOTE: There were 37799 observations read from the data set MODVAL.NABDATA1.WHERE score>1000;

NOTE: The data set WORK.SAMPLE1 has 37799 observations and 3 variables.

NOTE: DATA statement used (Total process time):

real time 4.36 seconds

cpu time 0.25 seconds

60 data sample2;

61 set modval.nabdata1;

62 if score gt 1000;

63 run;

NOTE: There were 331201 observations read from the data set MODVAL.NABDATA1.

NOTE: The data set WORK.SAMPLE2 has 37799 observations and 3 variables.

NOTE: DATA statement used (Total process time):

real time 0.07 seconds

cpu time 0.07 seconds

Differences between IF and WHERE Statements

Both of the above programs produce identical results, but there are differences between IF and WHERE statements. The WHERE statement may be more efficient than the subsetting IF(especially if you are taking a very small subset from a large file)because it checks on the validity of the condition before the observation is brought into a temporary holding area, whereas the sub setting IF statement brings in the entire observation and then checks the condition to see if the observation is to be kept or not. This temporary holding area is called the Program Data Vector (PDV). A WHERE statement can only be used with variables in the existing data set, whereas a subsetting IF statement can be used with raw data as well.

Another difference between a subsetting IF statement and a WHERE statement may surface when you use the FIRST. And LAST. Logical variables. When the WHERE condition is not true, the observation is not brought into the PDV, and therefore it does not affect the logical values of the FIRST. And LAST. Variables.

Another major difference between IF and WHERE statements is that you may include WHERE statement in SAS procedures. For example, if you have a data set called ALL (containing the variables ID, SEX, and SALARY), and you want a listing only for MALES(M), you could code this as:

PROC PRINT DATA=ALL;

WHERE SEX=’M’;

RUN;

This saves you the work of creating a new data set just to obtain your listing.

2) What are the difference and “Proc Means” and “Proc Univariate” in SAS?

http://www.allinterview.com/showanswers/71970.html

ANS1) Both procedure produce descriptive statistics. By proc uni

variate, by default it produce all the statistics(some time

not all required) but in proc means it is possible to

request the statistics that we want.

ex---proc means data=xyz mean max sd;run;

*it would produce statistics of above which we've mentioned;

ANS2) PROC UNIVARIATE gives more descriptive statistics such as

skewness, kurtosis, Q_PLOT and so on.  If you are looking for an indepth analysis of the data, like clustering, association tree,..., we start with PROC UNIVARIATE.

If we are looking for some simple results like sum mean SD and to find extreme values we use PROC MEANS as it takes less of machine time than PROC UNIVARIATE.

3)What is the difference between “NODUP” and “NODUPKEY” options in SAS?

References:

http://www2.sas.com/proceedings/sugi30/037-30.pdf

http://www2.sas.com/proceedings/forum2007/069-2007.pdf

http://www.lexjansen.com/pharmasug/2007/tu/tu07.pdf

http://www.allinterview.com/showanswers/57869.html

DEFINING NODUP AND NODUPKEY OPTIONS

The NODUP option checks for and eliminates duplicate observations. If you specify this option, PROC SORT compares all variable values for each observation to those for the previous observation that was written to the output data set. If an exact match is found, the observation is not written to the output data set.

The NODUPKEY option checks for and eliminates observations with duplicate BY variable values. If you specify this option, PROC SORT compares all BY variable values for each observation to those for the previous observation written to the output data set. If an exact match using the BY variable values is found, the observation is not written to the output data set.

Notice that with the NODUPKEY option, PROC SORT is comparing all BY variable values while the NODUP option compares all the variables in the data set that is being sorted. An easy way to remember the difference between these options is to keep in mind the word “key” in NODUPKEY. It evaluates the “key” or BY variable values that you specify. One thing to beware of with both options is that they both compare the previous observation written to the output data set. So, if the observations that you want eliminated are not adjacent in the data set after the sort, they will not be eliminated.

Joins in SAS+dealing with a key variable having different lengths?

http://www.sfu.ca/sasdoc/sashtml/lrcon/z1107839.htm

http://www.allinterview.com/showqa/SAS/page15.html

http://www.techinterviews.com/sas-interview-questions

http://www.allinterview.com/Interview-Questions/SAS.html

http://www.ats.ucla.edu/stat/sas/topics/data_management.htm

http://www.wuss.org/proceedings08/08WUSS%20Proceedings/papers/cod/cod14.pdf

What is the difference and “Proc Means” and “Proc Summary” in SAS?

What is the difference and “Proc Tabulate” and “Proc Freq” in SAS?

What is difference between rename and lable in sas?

What is the difference between “Proc Univariate” and “Proc Freq” in SAS?

Explain the message 'Merge has one or more datasets with repeats of by variables'