1.What
SAS statements would you code to read an external raw data file to a DATA step?
Ans: Infile and Input statements are used to read external raw data
file to a Data Step.
2.
How do you read in the variable that you need?
Ans: If we want to read a particular variable in a set of SAS data
set, we can mention the
variable we want
in the INPUT statement.
3.Are
you familiar with special input delimiters? How are they used?
Ans: Yes, we have special delimiters like DLM and DSD in SAS. Both
these delimiters can be used in the infile statement. The DLM can read the
commas and spaces as data delimiters. You may choose any delimiters you wish
with this option. You can choose multiple character such as DLM=”XX” for your
delimiter. The DSD option allows you to treat two consecutive delimiters as
containing a missing value.
4.If
reading a variable length file with fixed input, how would you prevent SAS from
reading the next record if the last variable didn’t have a value?
Ans: We can use MISS OVER option in the INFILE statement
5.What
is the difference between an informat and a format? Name three informat or
format?
Ans: An informat is an
instruction that SAS uses to read data values into a variable.
The three informat / format are: -
A) Date informat
B) Character
informat
c) Numeric
informat
6.Name
and describe three SAS function that u have used, if any?
Ans:
SUM Function: It adds the variable together by ignoring the missing values if
any
E.G: Var=SUM
(var1, var2…varn); Var1= SUM (1,., 3) = 4
Mean Function: This function returns the arithmetic mean (average) and ignores
the missing value. E.G: Var=MEAN (var1, var2, var3…varn);
SUBSTR Function: The SUBSTR function extracts a portion of the character data values
based on how many characters are designated for retrieval. E.G: Var=SUBSTR
(var, start<, number of characters); Var1=SUBSTR (ASHOK, 1, 3). In the above
example the SUBSTR function takes String ASHOK cuts from start-point (1) till
number of Characters (3) and stores ASH in Var1.
7.How
would you code the criteria to restrict the output to be produced?
Ans:
ods output close;
8.What
is the purpose of trailing@? The @@? How would you use them?
Ans:
The trailing @ is also known as column pointer. By
using the trailing@, in the INPUT
statement gives
you ability to read a part of your raw data line, test it, and then decide
how to read
additional data from the same record. The single trailing @ tells the SAS
system to “hold
the line”. The double Trailing @@ tells the SAS system to “Hold the line
more strongly”.
NOTE :
An INPUT statement ending with @@ instructs the
program to release the current raw data line only when there are no data values
left to be read from that line. The @@, therefore, hold the input record even
across multiple iteration of the data step.
9.Under
what circumstances would you code a SELECT construct instead of IF statement?
Ans: Especially if you are recoding a variable into a large
number of categories.
10.What
statement do you code to tell SAS that it is to write to an external file?
Ans:
Filename fileref ‘path’;
File fileref;
Put _all_ /* will write
all the variables. */Or put the variables which you require.
11.If reading an external file to produce an
external file, what shortcut to write record without coding every single
variable on the record?
Ans: Put _all _
12.If
you do not want any SAS output from a data step, how would you code the data
statement to prevent SAS from producing a set?
Ans:
By using DATA _NULL_ the desired output is a file and
not a SAS dataset.
13.What
is the one statement to set the criteria of a data that can be coded in any
step?
Ans:
Options statement
14.Have
you ever-linked SAS code? If so, describe the like and any required statement
used to either process the code or the step itself.
Ans:
The link statement tells SAS to jump immediately to
the statement label that is indicated in the Label statement and to continue
executing statements from that point until a RETURN statement is executed. The
RETURN statement ends program control to the statement immediately following
the LINK statement.
Note: The LINK statement and the destination must be in the same
DATA step. The
destination is identified
by a statement label in the LINK statement.
15.How
would you include common or reuse code to be processed along with your
statement?
Ans:
By using %Include
16.When
looking for the data contained in a character string of 150 bytes, which
function is the best to locate that data: scan, index or index?
Ans:
Scan
17.
If you have a data set that contains 100
variables, but you need only five of those, what is
the
code to force SAS to use only those variables?
Ans:
Use keep = option;
18. Code a PROC SORT on a data set containing
state, district and country as the primary
variable,
along with several numeric variables.
Ans:
PROC SORT data-set-name; BY state district country;
Run;
19.How
would you delete duplicate observation?
Ans:
There are three ways to delete duplicate observations
in a dataset
Proc sort data=SAS-data-set nodups;
by var; run;
Proc sql; Create
sas-data-set as select * from old_sas_data_set
where var=distinct(var);
quit;
c. Data clean; Set temp; By group;
If first.group and last.group then
Run;
20.
How would you code a merge that will keep only the observation that have
matches form
both
sets?
Ans: By
using the IN internal variable in the merge statement.
DATA NEW;
MERGE ONE_TEMP
(IN=ONE) TWO_TEMP (IN=TWO);
BY NAME;
IF ONE=1 AND
TWO=1;
RUN;
21.What
is the Program Data Vector (PDV)? What are their functions?
Ans: Program Data
Vector is the temporary holding area. For example The WHERE statement is may be more efficient then the
sub setting If (especially if you are taking a very small sunset from a large
file) because it checks on the validity of the condition to see if the
observation is to be kept or not. This temporary holding area is called the
program data vector (PDV).
22.
Does SAS ‘Translate’ (compile) or does it ‘Interpret’? Explain.
Ans:
When you submit a DATA step for execution, SAS checks
the syntax of the SAS statements and compiles them, that is, automatically
translates the statements into machine code. In this phase, SAS identifies the
type and length of each new variable, and determines whether a type conversion
is necessary for each subsequent reference to a variable.
23.
At compile time when a SAS data set is read, what items are created?
Ans: At compile time SAS creates the following
Input Buffer
Program Data Vector (pdv)
Descriptor
information
24.
Name statements that are recognized at compile time Only?
Ans: Drop Keep e.t.c
25.
Identify statement whose placement in the DATA step is critical
Ans: Input Statement.
26.
Name statements that function at both compile and execution time.
27.
Name statements that are execution only.
28.
In the flow of the DATA step processing, what is the first action in a typical
DATA step?
Ans: SAS first performs Syntax check.
29.
What is _n_?
Ans: This is nothing but a implicit variable created by
SAS during data processing. It gives the total number Of records SAS has
iterated in a dataset. It is Available only for data step and not for procs.
E.G: If we want to
find every third record in a Dataset then we can use the _n_ as follows
Data
new-sas-data-set;
Set old;
If mod (_n_, 3) =1
then;
Run; Note: If
we use a where clause to subset the _n_ Will not yield the required result.
BASE SAS:
30.
What is the effect of the OPTION statement ERROR=1?
Ans: If the particular data step has one or more errors then end the
processing.
31.
What’s the difference between VAR A1 – A4 and VAR A1--A4?
32.
What do the SAS log messages “numeric values have been converted to character”
mean?
Ans: If we try some character function on the numeric values the SAS
will automatically convert the numeric variable into character variable.
33.
Why is a STOP statement needed for a POINT=option on a SET statement?
Ans: Because POINT= reads only the specified observations, SAS cannot
detect an end-of-file condition as it would if the file were being read
sequentially. Because detecting an end-of-file condition terminates a DATA step
automatically, failure to substitute another means of terminating the DATA step
when you use POINT= can cause the DATA step to go into a continuous loop.
NOTE: You cannot
use the POINT= option with any of the following:
BY statement
WHERE statement
WHERE= data set option
transport format data sets
sequential data sets (on tape or disk)
a table from
another vendor's relational database management system.
34.
How do you control the number of observation and /or variable read or write?
Ans: By specifying obs option
35.
Approximately what date is represented by the SAS date value of 730?
Ans: 1 January 1962.
36.
How would remove a format that has been permanently associated with a variable.
Ans: By Using proc datasets library= somelibrary; Modify sasdataset;
Run;
37.
What does the RUN statement do?
Ans: The run statement executes the statement.
38.
Why SAS considered self-documenting?
Ans: when a sas-data-set is created SAS creates the Descriptor portion
and the data portion of the Data set. The descriptor portion contains the
Details like when the dataset was created, no. of Observations, no. of
variables e.t.c. Hence SAS is considered self documenting.
39.
Briefly describe 5 ways to do a “table lookup” in SAS.
Ans:
1) Simple table lookup (merging (merge (including
IN=OPTION) and sub
setting IF statement)
2) Simple table
lookup (formats (PROC FORMAT AND PUT function).
3) Looking up with
two variable (merging (merge (including IN=OPTION) and sub setting
IF statement)
4) Looking up with
two variable ((formats (PROC FORMAT, PUT AND INPUT Function)
5) A two-way Looking table (merge statement using two variables).
40.
What are some good SAS programming practices for processing vary large data
set?
Ans: For vary large data set with many variables we can make use of
arrays in the SAS
Systerm.
41.
How would you create a data set with 1 observation and 30 variables from a data
set with
30
observations and 1 Variable?
Ans:
Using Proc Transpose and also do with the sas arrays.
44.
What are _numeric_ and _character_ and what do they do?
Ans: If we want to do a particular task for all the numeric variable
we can use the _numeric_ and same as if we want to do a particular task for all
the character variable we can use the _character_.
46.
What is the order of application for output data set option, input data set
option and SAS statement?
Ans: INPUT data set option, SAS statement option and then OUTPUT
option.
47.
What is the order of evaluation of the comparison operators:
+ - * /** ()?
Missing
Value:
56.
How many missing values are available? When might you use them?
Ans: Two missing values are available in SAS, they are numeric and
character.
57.
How do you test for missing values?
Ans: We can test the missing values by using NMISS option in the input
statement.
58.
How are numeric and character missing values represented internally?
Ans: The numeric missing values represented as dots (.) and the
character missing values
represented as
blank.
FUNCTIONS:
59.
What is the significance of the ‘OF’ in X=SUM (OF a1-a4, a6, a9);?
60.
What do the PUT and INPUT function do?
Ans: The PUT function is used to identify the logic problem which
piece of code is executed
and not executed
what the current value of the particular variable and what the current
value of the all
variable.
INPUT function:
The traditional
use is the reread a character variable with a numeric format, execute a
character-to-numeric conversion.
The
character to numeric conversion function;
INPUT (variable,
informat-name)
The INPUT function
converts the character variable to numeric
Salary=input
(EMP_SALARY, dollar7.);
Character value
Numeric value
EMP_SALARY SALARY
$85,000 85000
Rename the
assigning variable we cannot have
the same name.
Like:
EMP_SALARY=input (EMP_SALARY, dollar7.);
The numeric to character conversion function
PUT (variable,
informat-name);
newphone=put
(phone, 7);
numeric value
character value
PHONE PHONE
6778000 6778000
61.
Which date advances a date, time or date/time value by a given interval?
62.
What do the MOD and INT function do?
Ans: MOD function is very useful if suppose you want to
select every third observation from
SAS data set.
Example: data
third; Set old; If mod (_N_, 3) =1; Run;
The INT function
retunes the integer portion of an argument. To truncate a number (drop off the
fractional part), you use the INT function.
63.
In ARRAY processing, what does the DIM function do?
Ans: DIM is the dimension function. This returns the
length of the array (i.e. the number of variable in the list).
64.
How would you determine the number of missing or non-missing value in
computation?
Ans: We can use the N
option for the number of NON- MISSING values and NMISS option for the number of
MISSING values.
65.
What is the difference between: X=a+b+c+d; and X=SUM (a, b, c, d);?
Ans: If we use SUM (a, b, c, d) it will ignore the
missing values if any and compute the sum.
For E.G SUM
(1,.,2,3)=6 X=1+.+2+3 = MISSING.
66. There is a field containing a date. It needs to be displayed in
the format “ddmonyy” if it’s before 1975,”dd mon ccyy” if it’s after 1985, and
as ‘disco years’ if its between 1975 and 1985. How would you accomplish this in
data step code? Using only PROC FORMAT.
67.
In the following DATA step, what is needed for ‘fraction’to print to the log
Ans: data _null_; X=1/3; if
X=.333 then; put ‘fraction’; run;
68.
What is the difference between calculating the ‘mean’ using the mean function
and PROC MEANS?
Ans: The mean function returns the mean of the
non-missing values in the variable list. Actually, you may not have figured out
the importance of the way the MEAN function deals with the missing values, and
this is quit important .if you calculate SCORE by simply adding up all the item
and dividing by 50 as follows
SCORE=(item1 +item2+item3+..+item50)/50;
You would be in
big trouble if any of the items had missing values. When SAS statement tries to
do arithmetic operation on missing values, the result is always missing.
PROCs:
69.
If you were given several SAS data sets you were unfamiliar with, how would you
find out the variable names and formats of each dataset?
Ans:
I can use the contents Procedure of all in the libname
and see all the variable name and formats of each data set
EG: PROC CONTENTS
DATA=LIBREF._ALL_; RUN;
70.
How would you keep SAS from overlaying the SAS set with its sorted version?
Ans: By creating a new dataset after sorting by specifying Out =
new sas dataset
71.
In PROC PRINT, can you print only variable that begin with the letter “A”.
Ans:
Yes we can print variable which begin with the letter
“A” by using the WHERE statement in the PROC PRINT statement
WHERE (VARIABLE
NAME) LIKE ‘A%’; Or
WHERE (VARIABLE
NAME =: ‘A’;
72. What are some differences between PROC SUMMARY and PROC MEANS?
Ans:
PROC MEANS produces subgroup statistics only when a BY statement is
used and the input data has been previously sorted (use PROC SORT) by the BY
variables. PROC SUMMARY automatically produces statistics for all subgroups,
giving you all the information in one run that you would get by repeatedly
sorting a data set by the variables that define each subgroup and running PROC
MEANS/.
PROC SUMMARY does
not produce any information in your output so you will always need to use the
OUTPUT statement to create a new data set and use PROC PRINT to see the
computed statistics.
PROC
FREQ:
73.
Code the table statement for a single-level (most common) frequency.
Ans: The statement for single-level.
DATA MAR.FREQTEST;
SET BAS.AMPERS;
PROC FREQ DATA
=MAR.FREQTEST;
TABLE AGE;
RUN;
74. Code the table statement to produce a multi-level frequency.
Ans: The statement for
multilevel.
DATA MAR.FREQTEST;
SET BAS.AMPERS;
PROC FREQ DATA
=MAR.FREQTEST;
TABLE AGE * gender;
RUN;
75. Name the option to produce a frequency line items rather that a
table.
76. Produce output from a frequency. Restrict the printing of the
table.
PROC MEANS:
77. Code a PROC MEANS that shows both summed and averaged output of the data.
78. Code the option that will allow MEANS to include missing numeric
data to be included in the report.
79. Code the MEANS to produce output to be used later.
80. Do you use PROC REPORT or PROC TABULATE? Which do you refer?
Explain.
MERGING/UPDATING:
81. What happens in a one-on-one merge? When would you use one?
Ans: If you want to merge two data set that have
different variable and only one variable as a common variable with that unique
variable we can merge the data set with one-on-one merge.
82. How would you combine 3 or more tables with different
structures?
83. What is the problem with merging two data set that have
variable with the same name but different data?
Ans: The second data set value will overwrite the value
of the first data set.
84. When would you choose to MERGE two data sets together and when
would you SET two data sets?
Ans: If we want to
create a dataset as an exact copy of the old dataset without any bothering
about which
Dataset is going to contribute to the new dataset then we will use
set statement.
If we want to control the contribution of the old datasets to the
new dataset then we will
use the Merge statement
85. Which data set is the controlling data set in the MERGE
statement?
Ans: The second final
dataset after the merge statement.
86. How do the IN= variable improve the capability of a MERGE?
Ans: IN is a implicit variable in SAS which helps in
controlling which dataset needs to contribute to the new dataset
87. Explain the message ‘MERGE HAS ONE OR MORE DATASETS WITH
REPEATS OF BY VARIABLE’.
COSTOMIZED REPORT WRITING:
88. What is the purpose of the statement DATA_NULL_?
Ans: Use the keyword _NULL_, which allows the power of
the DATA step without creating a data set.
89. What is the pound sign used for the DATA _NULL_?
What is the purpose of using the N=PS option?
Ans: Specifying N=PS in the FILE statement allows the output pointer to
write on any line of the current output.
MACRO:
91. What system option would you use to help debug a macro?
Ans: Symbolgen Mlogic Mprint
92. Describe how you would create a macro variable?
Ans: %let var=value;
93. How do you identify a macro variable?
94. How do you define the end of a macro?
Ans: %mend
95. How do you assign a macro variable to a SAS variable?
Ans: Using CallSymput
96. What is the difference between %LOCAL and %GLOBAL?
Ans: The %LOCAL that variable will be used only at the
particular block only but in case of the %GLOBAL that variable will be used
till the end of the SAS session
97. How long can a macro variable be? A token?
Ans: Till it passes to
the word scanner.
98. If you use a SYMPUT in a DATA step, when and where can you use
the macro variable?
Ans: It can be used outside the scope of dataset and will be globally
available.
100. How would you code a macro statement to produce information on
the SAS log?
Ans: %put ‘Statement’
No comments:
Post a Comment