Monday, 9 December 2013

SAS INTERVIEW QUESTIONS AND ANSWERS

1)What is the difference between "IF" and "Where" conditions in SAS?

References:
SAS programming by example By Ronald P. Cody, Ray Pass, SAS Institute, Page 61.

ANS) Sample Program:
libname modval "\\iblt-kssrv-002\KS-User\Appala\Scorecard\NAB";
Log:
4    libname modval "\\iblt-kssrv-002\KS-User\Appala\Scorecard\NAB";
NOTE: Libref MODVAL was successfully assigned as follows:
      Engine:        V9
      Physical Name: \\iblt-kssrv-002\KS-User\Appala\Scorecard\NAB

proc contents data = modval.nabdata1;
run;
Log:
5    proc contents data= modval.nabdata1;
6    run;
NOTE: PROCEDURE CONTENTS used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds
Output:
                                         The SAS System       19:04 Wednesday, July 22, 2009   2

                                     The CONTENTS Procedure

    Data Set Name        MODVAL.NABDATA1                       Observations          331201
    Member Type          DATA                                  Variables             3
    Engine               V9                                    Indexes               0
    Created              Monday, March 16, 2009 10:57:52 AM    Observation Length    24
    Last Modified        Monday, March 16, 2009 10:57:52 AM    Deleted Observations  0
    Protection                                                 Compressed            NO
    Data Set Type                                              Sorted                NO
    Label
    Data Representation  WINDOWS_32
    Encoding             wlatin1  Western (Windows)


                               Engine/Host Dependent Information

  Data Set Page Size          4096
  Number of Data Set Pages    1972
  First Data Page             1
  Max Obs per Page            168
  Obs in First Data Page      114
  Number of Data Set Repairs  0
  Filename                    \\iblt-kssrv-002\KS-User\Appala\Scorecard\NAB\nabdata1.sas7bdat
  Release Created             9.0101M3
  Host Created                WIN_ASRV



                           Alphabetic List of Variables and Attributes

                      #    Variable     Type    Len    Format     Informat

                      1    Score        Num       8    BEST12.    BEST32.
                      2    Val_score    Num       8
                      3    frdn         Num       8
data sample1;
set modval.nabdata1(obs=100);
where score gt 1000;
run;
Log:
7    data sample1;
8    set modval.nabdata1(obs=100);
9    where score gt 1000;
10   run;
NOTE: There were 100 observations read from the data set MODVAL.NABDATA1. WHERE score>1000;
NOTE: The data set WORK.SAMPLE1 has 100 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.14 seconds
      cpu time            0.01 seconds

data sample2;
set modval.nabdata1(obs=100);
if score gt 1000;
run;
Log:
11   data sample2;
12   set modval.nabdata1(obs=100);
13   if score gt 1000;
14   run;
NOTE: There were 100 observations read from the data set MODVAL.NABDATA1.
NOTE: The data set WORK.SAMPLE2 has 13 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

data raj(obs=100);
set modval.nabdata1;
where score gt 1000;
run;

data raj1(obs=100);
set modval.nabdata1;
if score gt 1000;
run;

Log:
29   data raj(obs=100);
              ---
              70
WARNING 70-63: The option OBS is not valid in this context.  Option ignored.
30   set modval.nabdata1;
31   where score gt 1000;
32   run;
NOTE: There were 37799 observations read from the data set MODVAL.NABDATA1.WHERE score>1000;
NOTE: The data set WORK.RAJ has 37799 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           4.35 seconds
      cpu time            0.14 seconds

33
34   data raj1(obs=100);
               ---
               70
WARNING 70-63: The option OBS is not valid in this context.  Option ignored.
35   set modval.nabdata1;
36   if score gt 1000;
37   run;
NOTE: There were 331201 observations read from the data set MODVAL.NABDATA1.
NOTE: The data set WORK.RAJ1 has 37799 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.06 seconds
      cpu time            0.06 seconds
proc print data=modval.nabdata1(obs = 10);
where score>1000;
run;
Log:
38   proc print data=modval.nabdata1(obs = 10);
39   where score>1000;
40   run;
NOTE: There were 10 observations read from the data set MODVAL.NABDATA1.WHERE score>1000;
NOTE: PROCEDURE PRINT used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds
Output:
                                         The SAS System       19:04 Wednesday, July 22, 2009   5

                                                       Val_
                               Obs           Score    score    frdn

                                 1            1038     1038      0
                                 5            1021     1021      0
                                 7            1071     1071      0
                                18            1014     1014      0
                                21            1132     1132      0
                                36            1071     1071      0
                                37            1163     1163      0
                                48            1026     1026      0
                                50            1013     1013      0
                                56            1005     1005      0
data sample1;
set modval.nabdata1;
where score gt 1000;
run;
data sample2;
set modval.nabdata1;
if score gt 1000;
run;
Log:
56   data sample1;
57   set modval.nabdata1;
58   where score gt 1000;
59   run;
NOTE: There were 37799 observations read from the data set MODVAL.NABDATA1.WHERE score>1000;
NOTE: The data set WORK.SAMPLE1 has 37799 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           4.36 seconds
      cpu time            0.25 seconds

60   data sample2;
61   set modval.nabdata1;
62   if score gt 1000;
63   run;
NOTE: There were 331201 observations read from the data set MODVAL.NABDATA1.
NOTE: The data set WORK.SAMPLE2 has 37799 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.07 seconds
      cpu time            0.07 seconds


Differences between IF and WHERE Statements
Both of the above programs produce identical results, but there are differences between IF and WHERE statements. The WHERE statement may be more efficient than the subsetting IF(especially if you are taking a very small subset from a large file)because it checks on the validity of the condition before the observation is brought into a temporary holding area, whereas the sub setting IF statement brings in the entire observation and then checks the condition to see if the observation is to be kept or not. This temporary holding area is called the Program Data Vector (PDV). A WHERE statement can only be used with variables in the existing data set, whereas a subsetting IF statement can be used with raw data as well.
Another difference between a subsetting IF statement and a WHERE statement may surface when you use the FIRST. And LAST. Logical variables. When the WHERE condition is not true, the observation is not brought into the PDV, and therefore it does not affect the logical values of the FIRST. And LAST. Variables.
Another major difference between IF and WHERE statements is that you may include WHERE statement in SAS procedures. For example, if you have a data set called ALL (containing the variables ID, SEX, and SALARY), and you want a listing only for MALES(M), you could code this as:
PROC PRINT DATA=ALL;
WHERE SEX=’M’;
RUN;

This saves you the work of creating a new data set just to obtain your listing.



2) What are the difference and “Proc Means” and “Proc Univariate” in SAS?
ANS1) Both procedure produce descriptive statistics. By proc uni
variate, by default it produce all the statistics(some time
not all required) but in proc means it is possible to
request the statistics that we want.
ex---proc means data=xyz mean max sd;run;
*it would produce statistics of above which we've mentioned;
 
ANS2) PROC UNIVARIATE gives more descriptive statistics such as 
skewness, kurtosis, Q_PLOT and so on.  If you are looking for an indepth analysis of the data, like clustering, association tree,..., we start with PROC UNIVARIATE.  
 
If we are looking for some simple results like sum mean SD and to find extreme values we use PROC MEANS as it takes less of machine time than PROC UNIVARIATE.

3)What is the difference between “NODUP” and “NODUPKEY” options in SAS?
References:

DEFINING NODUP AND NODUPKEY OPTIONS

The NODUP option checks for and eliminates duplicate observations. If you specify this option, PROC SORT compares all variable values for each observation to those for the previous observation that was written to the output data set. If an exact match is found, the observation is not written to the output data set.
The NODUPKEY option checks for and eliminates observations with duplicate BY variable values. If you specify this option, PROC SORT compares all BY variable values for each observation to those for the previous observation written to the output data set. If an exact match using the BY variable values is found, the observation is not written to the output data set.
Notice that with the NODUPKEY option, PROC SORT is comparing all BY variable values while the NODUP option compares all the variables in the data set that is being sorted. An easy way to remember the difference between these options is to keep in mind the word “key” in NODUPKEY. It evaluates the “key” or BY variable values that you specify. One thing to beware of with both options is that they both compare the previous observation written to the output data set. So, if the observations that you want eliminated are not adjacent in the data set after the sort, they will not be eliminated.

Joins in SAS+dealing with a key variable having different lengths?

What is the difference and “Proc Means” and “Proc Summary” in SAS?
What is the difference and “Proc Tabulate” and “Proc Freq” in SAS?
What is difference between rename and lable in sas?
What is the difference between “Proc Univariate” and “Proc Freq” in SAS?
Explain the message 'Merge has one or more datasets with repeats of by variables'
how to write a variable having multiple formats into a SAS data set?
how to process a variable with multiple formats in input and output datasets?
SAS Technical questions and answers
Explain the message ‘MERGE HAS ONE OR MORE DATASETS WITH REPEATS OF BY VARIABLES”.
what are the general errors occur during reading and writing data from text file in SAS?
what are the common errors occur during reading and writing data from text file in SAS?
What is the difference between Informat & Format?
How to convert numeric values into character values?
How to convert character values into numeric?
What are the different character functions you have used in your work?
What are the different options you will use while importing data from .xls, .csv files?
What are the different automatic variables?
What is the use of _NULL_ ?
How to read every 5th observation of a dataset?
How to get all duplicate records into a separate dataset?
What is the difference between where and having clause?

No comments:

Post a Comment