Sunday, 8 December 2013

SAS Experienced Interview Questions And Answers

1.What SAS statements would you code to read an external raw data file to a DATA step?

Ans: Infile and Input statements are used to read external raw data file to a Data Step.

2. How do you read in the variable that you need?

Ans: If we want to read a particular variable in a set of SAS data set, we can mention the
variable we want in the INPUT statement.

3.Are you familiar with special input delimiters? How are they used?

Ans: Yes, we have special delimiters like DLM and DSD in SAS. Both these delimiters can be used in the infile statement. The DLM can read the commas and spaces as data delimiters. You may choose any delimiters you wish with this option. You can choose multiple character such as DLM=”XX” for your delimiter. The DSD option allows you to treat two consecutive delimiters as containing a missing value.

4.If reading a variable length file with fixed input, how would you prevent SAS from reading the next record if the last variable didn’t have a value?

Ans: We can use MISS OVER option in the INFILE statement

5.What is the difference between an informat and a format? Name three informat or format?

Ans: An informat is an instruction that SAS uses to read data values into a variable.
A format is an instruction that SAS uses to write data values.
The three informat / format are: -
A) Date informat
B) Character informat
c) Numeric informat
 
6.Name and describe three SAS function that u have used, if any?

Ans:
SUM Function: It adds the variable together by ignoring the missing values if any
E.G: Var=SUM (var1, var2…varn); Var1= SUM (1,., 3) = 4
 
Mean Function: This function returns the arithmetic mean (average) and ignores the missing value. E.G: Var=MEAN (var1, var2, var3…varn);


SUBSTR Function: The SUBSTR function extracts a portion of the character data values based on how many characters are designated for retrieval. E.G: Var=SUBSTR (var, start<, number of characters); Var1=SUBSTR (ASHOK, 1, 3). In the above example the SUBSTR function takes String ASHOK cuts from start-point (1) till number of Characters (3) and stores ASH in Var1.
 
7.How would you code the criteria to restrict the output to be produced?

Ans: ods output close;

8.What is the purpose of trailing@? The @@? How would you use them?

Ans: The trailing @ is also known as column pointer. By using the trailing@, in the INPUT
statement gives you ability to read a part of your raw data line, test it, and then decide
how to read additional data from the same record. The single trailing @ tells the SAS
system to “hold the line”. The double Trailing @@ tells the SAS system to “Hold the line
more strongly”.
NOTE : An INPUT statement ending with @@ instructs the program to release the current raw data line only when there are no data values left to be read from that line. The @@, therefore, hold the input record even across multiple iteration of the data step.

9.Under what circumstances would you code a SELECT construct instead of IF statement?

Ans: Especially if you are recoding a variable into a large number of categories.

10.What statement do you code to tell SAS that it is to write to an external file?

Ans: Filename fileref ‘path’;
File fileref;
Put _all_ /* will write all the variables. */Or put the variables which you require.

11.If reading an external file to produce an external file, what shortcut to write record without coding every single variable on the record?

Ans: Put _all _


12.If you do not want any SAS output from a data step, how would you code the data statement to prevent SAS from producing a set?

Ans: By using DATA _NULL_ the desired output is a file and not a SAS dataset.
 
13.What is the one statement to set the criteria of a data that can be coded in any step?

Ans: Options statement


14.Have you ever-linked SAS code? If so, describe the like and any required statement used to either process the code or the step itself.

Ans: The link statement tells SAS to jump immediately to the statement label that is indicated in the Label statement and to continue executing statements from that point until a RETURN statement is executed. The RETURN statement ends program control to the statement immediately following the LINK statement.
Note: The LINK statement and the destination must be in the same DATA step. The
destination is identified by a statement label in the LINK statement.


15.How would you include common or reuse code to be processed along with your statement?

Ans: By using %Include


16.When looking for the data contained in a character string of 150 bytes, which function is the best to locate that data: scan, index or index? 

Ans: Scan


17. If you have a data set that contains 100 variables, but you need only five of those, what is
the code to force SAS to use only those variables?

Ans: Use keep = option;


18. Code a PROC SORT on a data set containing state, district and country as the primary
variable, along with several numeric variables.

Ans: PROC SORT data-set-name; BY state district country; Run;


19.How would you delete duplicate observation?

Ans: There are three ways to delete duplicate observations in a dataset
Proc sort data=SAS-data-set nodups; by var; run;
Proc sql; Create sas-data-set as select * from old_sas_data_set
where var=distinct(var); quit;
c. Data clean; Set temp; By group;
If first.group and last.group then
Run;


20. How would you code a merge that will keep only the observation that have matches form
both sets?

Ans: By using the IN internal variable in the merge statement.
DATA NEW;
MERGE ONE_TEMP (IN=ONE) TWO_TEMP (IN=TWO);
BY NAME;
IF ONE=1 AND TWO=1;
RUN;


21.What is the Program Data Vector (PDV)? What are their functions?

Ans: Program Data Vector is the temporary holding area. For example The WHERE statement is may be more efficient then the sub setting If (especially if you are taking a very small sunset from a large file) because it checks on the validity of the condition to see if the observation is to be kept or not. This temporary holding area is called the program data vector (PDV).


22. Does SAS ‘Translate’ (compile) or does it ‘Interpret’? Explain.

Ans: When you submit a DATA step for execution, SAS checks the syntax of the SAS statements and compiles them, that is, automatically translates the statements into machine code. In this phase, SAS identifies the type and length of each new variable, and determines whether a type conversion is necessary for each subsequent reference to a variable.


23. At compile time when a SAS data set is read, what items are created?

Ans: At compile time SAS creates the following
Input Buffer
Program Data Vector (pdv)
Descriptor information

24. Name statements that are recognized at compile time Only?

Ans: Drop Keep e.t.c

25. Identify statement whose placement in the DATA step is critical

Ans: Input Statement.


26. Name statements that function at both compile and execution time.


27. Name statements that are execution only.


28. In the flow of the DATA step processing, what is the first action in a typical DATA step?

Ans: SAS first performs Syntax check.


29. What is _n_?

Ans: This is nothing but a implicit variable created by SAS during data processing. It gives the total number Of records SAS has iterated in a dataset. It is Available only for data step and not for procs.
E.G: If we want to find every third record in a Dataset then we can use the _n_ as follows
Data new-sas-data-set;
Set old;
If mod (_n_, 3) =1 then;
Run; Note: If we use a where clause to subset the _n_ Will not yield the required result.


BASE SAS:

30. What is the effect of the OPTION statement ERROR=1?

Ans: If the particular data step has one or more errors then end the processing.


31. What’s the difference between VAR A1 – A4 and VAR A1--A4?


32. What do the SAS log messages “numeric values have been converted to character” mean?

Ans: If we try some character function on the numeric values the SAS will automatically convert the numeric variable into character variable.

33. Why is a STOP statement needed for a POINT=option on a SET statement?

Ans: Because POINT= reads only the specified observations, SAS cannot detect an end-of-file condition as it would if the file were being read sequentially. Because detecting an end-of-file condition terminates a DATA step automatically, failure to substitute another means of terminating the DATA step when you use POINT= can cause the DATA step to go into a continuous loop.
NOTE: You cannot use the POINT= option with any of the following:
BY statement
WHERE statement
WHERE= data set option
transport format data sets
sequential data sets (on tape or disk)
a table from another vendor's relational database management system.

34. How do you control the number of observation and /or variable read or write?

Ans: By specifying obs option


35. Approximately what date is represented by the SAS date value of 730?

Ans: 1 January 1962.


36. How would remove a format that has been permanently associated with a variable.

Ans: By Using proc datasets library= somelibrary; Modify sasdataset; Run;


37. What does the RUN statement do?

Ans: The run statement executes the statement.


38. Why SAS considered self-documenting?

Ans: when a sas-data-set is created SAS creates the Descriptor portion and the data portion of the Data set. The descriptor portion contains the Details like when the dataset was created, no. of Observations, no. of variables e.t.c. Hence SAS is considered self documenting.


39. Briefly describe 5 ways to do a “table lookup” in SAS.

Ans: 1) Simple table lookup (merging (merge (including
IN=OPTION) and sub setting IF statement)
2) Simple table lookup (formats (PROC FORMAT AND PUT function).
3) Looking up with two variable (merging (merge (including IN=OPTION) and sub setting
IF statement)
4) Looking up with two variable ((formats (PROC FORMAT, PUT AND INPUT Function)
5) A two-way Looking table (merge statement using two variables).


40. What are some good SAS programming practices for processing vary large data set?

Ans: For vary large data set with many variables we can make use of arrays in the SAS
Systerm.


41. How would you create a data set with 1 observation and 30 variables from a data set with
30 observations and 1 Variable?

Ans: Using Proc Transpose and also do with the sas arrays.


44. What are _numeric_ and _character_ and what do they do?

Ans: If we want to do a particular task for all the numeric variable we can use the _numeric_ and same as if we want to do a particular task for all the character variable we can use the _character_.
 
46. What is the order of application for output data set option, input data set option and SAS statement?

Ans: INPUT data set option, SAS statement option and then OUTPUT option.

47. What is the order of evaluation of the comparison operators: + - * /** ()?


Missing Value:


56. How many missing values are available? When might you use them?

Ans: Two missing values are available in SAS, they are numeric and character.


57. How do you test for missing values?

Ans: We can test the missing values by using NMISS option in the input statement.


58. How are numeric and character missing values represented internally?

Ans: The numeric missing values represented as dots (.) and the character missing values
represented as blank.


FUNCTIONS:


59. What is the significance of the ‘OF’ in X=SUM (OF a1-a4, a6, a9);?


60. What do the PUT and INPUT function do?
Ans: The PUT function is used to identify the logic problem which piece of code is executed
and not executed what the current value of the particular variable and what the current
value of the all variable.


INPUT function:
The traditional use is the reread a character variable with a numeric format, execute a character-to-numeric conversion.


The character to numeric conversion function;
INPUT (variable, informat-name)
The INPUT function converts the character variable to numeric
Salary=input (EMP_SALARY, dollar7.);
Character value Numeric value
EMP_SALARY SALARY
$85,000 85000
Rename the assigning variable we cannot have the same name.
Like: EMP_SALARY=input (EMP_SALARY, dollar7.);

The numeric to character conversion function

PUT (variable, informat-name);
newphone=put (phone, 7);
numeric value character value
PHONE PHONE
6778000 6778000


61. Which date advances a date, time or date/time value by a given interval?

62. What do the MOD and INT function do?

Ans: MOD function is very useful if suppose you want to select every third observation from
SAS data set.
Example: data third; Set old; If mod (_N_, 3) =1; Run;
The INT function retunes the integer portion of an argument. To truncate a number (drop off the fractional part), you use the INT function.


63. In ARRAY processing, what does the DIM function do?

Ans: DIM is the dimension function. This returns the length of the array (i.e. the number of variable in the list).


64. How would you determine the number of missing or non-missing value in computation?

Ans: We can use the N option for the number of NON- MISSING values and NMISS option for the number of MISSING values.


65. What is the difference between: X=a+b+c+d; and X=SUM (a, b, c, d);?

Ans: If we use SUM (a, b, c, d) it will ignore the missing values if any and compute the sum.
For E.G SUM (1,.,2,3)=6 X=1+.+2+3 = MISSING.


66. There is a field containing a date. It needs to be displayed in the format “ddmonyy” if it’s before 1975,”dd mon ccyy” if it’s after 1985, and as ‘disco years’ if its between 1975 and 1985. How would you accomplish this in data step code? Using only PROC FORMAT.


67. In the following DATA step, what is needed for ‘fraction’to print to the log

Ans: data _null_; X=1/3; if X=.333 then; put ‘fraction’; run;


68. What is the difference between calculating the ‘mean’ using the mean function and PROC MEANS?

Ans: The mean function returns the mean of the non-missing values in the variable list. Actually, you may not have figured out the importance of the way the MEAN function deals with the missing values, and this is quit important .if you calculate SCORE by simply adding up all the item and dividing by 50 as follows
SCORE=(item1 +item2+item3+..+item50)/50;
You would be in big trouble if any of the items had missing values. When SAS statement tries to do arithmetic operation on missing values, the result is always missing.


PROCs:


69. If you were given several SAS data sets you were unfamiliar with, how would you find out the variable names and formats of each dataset?

Ans: I can use the contents Procedure of all in the libname and see all the variable name and formats of each data set
EG: PROC CONTENTS DATA=LIBREF._ALL_; RUN;


70. How would you keep SAS from overlaying the SAS set with its sorted version?

Ans: By creating a new dataset after sorting by specifying Out = new sas dataset


71. In PROC PRINT, can you print only variable that begin with the letter “A”.

Ans: Yes we can print variable which begin with the letter “A” by using the WHERE statement in the PROC PRINT statement


WHERE (VARIABLE NAME) LIKE ‘A%’; Or
WHERE (VARIABLE NAME =: ‘A’;

72. What are some differences between PROC SUMMARY and PROC MEANS?

Ans:
PROC MEANS produces subgroup statistics only when a BY statement is used and the input data has been previously sorted (use PROC SORT) by the BY variables. PROC SUMMARY automatically produces statistics for all subgroups, giving you all the information in one run that you would get by repeatedly sorting a data set by the variables that define each subgroup and running PROC MEANS/.


PROC SUMMARY does not produce any information in your output so you will always need to use the OUTPUT statement to create a new data set and use PROC PRINT to see the computed statistics.
 
PROC FREQ:


73. Code the table statement for a single-level (most common) frequency.

Ans: The statement for single-level.

DATA MAR.FREQTEST;
SET BAS.AMPERS;
PROC FREQ DATA =MAR.FREQTEST;
TABLE AGE;
RUN;


74. Code the table statement to produce a multi-level frequency.

Ans: The statement for multilevel.
DATA MAR.FREQTEST;
SET BAS.AMPERS;
PROC FREQ DATA =MAR.FREQTEST;
TABLE AGE * gender;
RUN;

75. Name the option to produce a frequency line items rather that a table.


76. Produce output from a frequency. Restrict the printing of the table.


PROC MEANS:


77. Code a PROC MEANS that shows both summed and averaged output of the data.


78. Code the option that will allow MEANS to include missing numeric data to be included in the report.


79. Code the MEANS to produce output to be used later.


80. Do you use PROC REPORT or PROC TABULATE? Which do you refer? Explain.








MERGING/UPDATING:


81. What happens in a one-on-one merge? When would you use one?

Ans: If you want to merge two data set that have different variable and only one variable as a common variable with that unique variable we can merge the data set with one-on-one merge.


82. How would you combine 3 or more tables with different structures?


83. What is the problem with merging two data set that have variable with the same name but different data?

Ans: The second data set value will overwrite the value of the first data set.


84. When would you choose to MERGE two data sets together and when would you SET two data sets?

Ans: If we want to create a dataset as an exact copy of the old dataset without any bothering
about which
Dataset is going to contribute to the new dataset then we will use set statement.
If we want to control the contribution of the old datasets to the new dataset then we will
use the Merge statement


85. Which data set is the controlling data set in the MERGE statement?
Ans: The second final dataset after the merge statement.


86. How do the IN= variable improve the capability of a MERGE?
Ans: IN is a implicit variable in SAS which helps in controlling which dataset needs to contribute to the new dataset


87. Explain the message ‘MERGE HAS ONE OR MORE DATASETS WITH REPEATS OF BY VARIABLE’.

COSTOMIZED REPORT WRITING:



88. What is the purpose of the statement DATA_NULL_?
Ans: Use the keyword _NULL_, which allows the power of the DATA step without creating a data set.


89. What is the pound sign used for the DATA _NULL_?
 
What is the purpose of using the N=PS option?
Ans: Specifying N=PS in the FILE statement allows the output pointer to write on any line of the current output.














MACRO:


91. What system option would you use to help debug a macro?
Ans: Symbolgen Mlogic Mprint


92. Describe how you would create a macro variable?
Ans: %let var=value;


93. How do you identify a macro variable?


94. How do you define the end of a macro?

Ans: %mend


95. How do you assign a macro variable to a SAS variable?

Ans: Using CallSymput


96. What is the difference between %LOCAL and %GLOBAL?

Ans: The %LOCAL that variable will be used only at the particular block only but in case of the %GLOBAL that variable will be used till the end of the SAS session


97. How long can a macro variable be? A token?

Ans: Till it passes to the word scanner.


98. If you use a SYMPUT in a DATA step, when and where can you use the macro variable?

Ans: It can be used outside the scope of dataset and will be globally available.


100. How would you code a macro statement to produce information on the SAS log?

Ans: %put ‘Statement’

 

No comments:

Post a Comment