Frequently Asked Questions | Page 8 | UBC Department of Statistics

Q: Where can I use SAS?

Server host SAS: unixlab.stat.ubc.ca
OS: Unix Solaris
Hardwares: SUN 280R, 2x 1.4Mhz CPUs, 4Gb of Memory
Location: Undergraduate Network, Room LSK 121
Access mode: Via LSK 121 lab or SSH for remote login.

Q: SAS links

SAS basics collected and maintained by Harry Joe
SAS Tutorial for Unix
Introduction to SAS Programming

SAS/STAT

http://www.id.unizh.ch/software/unix/statmath/sas/sasdoc/stat/

SAS/MACRO

http://www.id.unizh.ch/software/unix/statmath/sas/sasdoc/macro/

SAS/IML

http://www.id.unizh.ch/software/unix/statmath/sas/sasdoc/iml/

SAS AND EVERYTHING

http://www.math.wpi.edu/saspdf/common/mainpdf.htm

MASTER INDEX FOR SAS ONLINEDOC

http://www.id.unizh.ch/software/unix/statmath/sas/sasdoc/mindex/a-index.htm

Q: What is your How to read Microsoft Excel format (.xls) data file by R?

There seem direct way to read .xls format file (see http://maths.newcastle.edu.au/~rking/R/help/00b/2519.html).

However, there some ways to indirectly read .xls file. For example, you can save the .xls file into .csv (comma separated value) format. Then use R's function read.csv to read it.

The reason to save .xls file to .csv file is that usually there are some columns in .xls file which are strings containing white spaces. So if to save .xls file to white space delimited or TAB delimited, then it is still difficult to read the file into R.

Q: How to call Fortran subroutines in R?

In R, we can call Fortran subroutines. For example, we have the following toy Fortran subroutine in the file test.f.

CCCCCCCCCCCCCCCCC C The subroutine is to calculate Hadama product of two matrices. C out[i][j]=x[i][j]*y[i][j]. C Both R and Fortran store matrix by column. CCCCCCCCCCCCCCCCC CCCCCCCCC Fortran program (f77) has to be between 7-th and 72-th column. CCCCCCCCC The 6-th column is for continuation marker. subroutine myHadamaProduct(x, y, nrow, ncol, mo) integer i, j, nrow, ncol CCCCCCC In Fortran, you don't need to specify the second dimension for matrix double precision x(nrow, *), y(nrow, *), mo(nrow, *) do i = 1, nrow do j = 1, ncol mo(i,j)=x(i,j)*y(i,j) enddo enddo return end

First, we need to compile the file test.f to create a shared library, test.so say, by using the GNU Fortran compiler:
g77 -fpic -shared -fno-gnu-linker -o test.so test.f
Next, we need to use the R function dyn.load to load the shared library test.so. if(!is.loaded("myhadamaproduct")){ dyn.load("./test.so") } The R function is.loaded is to check if the Fortran subroutine myHadamaProduct is already be loaded to R. If yes, then we do not need to loaded it again.
Next, we use the R function .Fortran to call the Fortran subroutine myHadamaProduct. For example, x<-matrix(1:10,nrow=5, ncol=2) # get a 5x2 matrix y<-matrix(1:10,nrow=5, ncol=2) # get a 5x2 matrix out<-matrix(0, nrow=5, ncol=2) # initialize output matrix # to format matrix or array, use function storage.mode() storage.mode(x)<-"double" storage.mode(y)<-"double" storage.mode(out)<-"double" nr<-as.integer(nrow(x)) nc<-as.integer(ncol(x)) # Fortran is *NOT* case-sensitive. So it will change the all characters # to lower case. Thus, to use .Fortran call Fortran subroutines, you # have to type lower case. Otherwise, R will prompt error message. res<-.Fortran("myhadamaproduct", x, y, nr, nc, out=out) cat("Hadama product >>n") print(res$out)
If you do not need to use the shared library test.so any more, you can use the R function dyn.unload to unload it. if(is.loaded("myhadamaproduct")){ dyn.unload("./test.so") }

Note:

The Fortran program called by R must be subroutines, not functions. For the example above, myHadamaProduct is defined as subroutine. subroutine myHadamaProduct(x, y, nrow, ncol, mo)
The arguments in Fortran subroutines are passed by address instead of by values. And not like C language, there is no "pointer" concept in Fortran.
When you use ".Fortran" to call Fortran subroutines, the name of the Fortran subroutines must be in lower case.
Any values returned by Fortran subroutines which are called in R must be initialized and must have the format: # if the variable is defined as double in the Fortran subroutine variablename=as.double(initialized values) # inside .Fortran # if the variable is defined as integer in the Fortran subroutine variablename=as.integer(initialized values) # inside .Fortran # if the output is double precision matrix or array storage.mode(variablename)<-"double" # before .Fortran variablename=variablename # inside .Fortran
The input values must also be initialized and must have the above format. However, they can be formated before the ".Fortran" function.
If the output is not written as variablename=variablename format (e.g. out=out in the above example), You still can get results. However, you have to use res[[5]] to refer out in the above example. In fact, the .Fortran function return a list containing all arguments of the Fortran subroutine myHadamaProduct. Since out is the 5-th argument, you can use res[[5]] to refer to the 5-th elements of the list.
It is okay that the file test.f contains the main program.
Sometimes, the command "dyn.load("test.so")" gets error message. This is probably caused by the environment variable "$PATH" was not set correctly. You can either add the following line to the file ".bashrc" in your home directory: export PATH=$PATH:.:
or use the command
dyn.load("./test.so")

Q: How to install command-line editor in Splus?

R has a command-line editor which allows us to retrieve and edit commands we entered before. We also can install a command-line editor in Splus.

STEP 1

If you use Bourne, Korn, Bash or Z-Shell, type the following lines into your .bashrc file which is in your home directory:

export EDITOR="/usr/local/bin/vim" export S_CLEDITOR="/usr/local/bin/vim" export VISUAL="/usr/local/bin/vim"

If you use C-Shell or TC-Shell, then type the following lines into your .cshrc file which is in your home directory:

setenv EDITOR "/usr/local/bin/vim" setenv S_CLEDITOR "/usr/local/bin/vim" setenv VISUAL "/usr/local/bin/vim"

If you want to use emacs instead of vi, then simply replace the vim with emacs.

STEP 2

in your home directory, type command

source .bashrc

source .cshrc

STEP 3

type

Splus -e

to invoke editor when you enter into Splus session.

The most useful editing commands are summarized in the following table:

COMMAND emacs vi backward character Ctrl-B Esc, h forward character Ctrl-F Esc, l previous line Ctrl-P Esc, k next line Ctrl-N Esc, j beginning of line Ctrl-A Esc, ^ (Shift-6) end of line Ctrl-E Esc, $ (Shift-4) forward word Esc, f Esc, w backward word Esc, b Esc, b kill char Ctrl-D Esc, x kill line Ctrl-K Esc, Shift-d delete word Esc, d Esc, dw search backward Ctrl-R Esc, ? yank Ctrl-Y Esc, Shift-y transpose chars Ctrl-T Esc, xp

You can type Splus command

?Command.edit

to get the above table.

Q: How to run R/Splus program in background so that the command can continue running in the background after you log out?

The syntax: nohup R --no-save < input.R > output& nohup Splus < input.s > output& where input.R contains your R code and where input.s contains your Splus code.

Q: How to call C functions or Fortran subroutines in Splus?

There are different versions of Splus installed in the department computing system. Splus is installed in Hajek, Newton, Emily, and Statlab. Splus 5 Splus 6 are installed in all servers.

For Splus 3.4, the function to load the shared libraries is dyn.load.shared instead of dyn.load. The function dyn.load is used to load the object functions such as test.o obtained by using the command "g77 -c test.f". There is also no dyn.unload function in Splus 3.4.
For Splus 5 and Splus 6, the function dyn.load and dyn.load.shared are obsolete. Splus 5 and Splus 6 use the function dyn.open to load the shared libraries and the function dyn.close to unload the shared libraries.

Q: Date and time functions in R

In library(survival), we can find the following functions:

as.date
Converts any of the following character forms to a Julian date: 8/31/56, 8-31-1956, 31 8 56, 083156, 31Aug56, or August 31 1956.

Example:
> as.date(c("1jan1960", "2jan1960", "31mar1960", "30jul1960")) [1] 1Jan60 2Jan60 31Mar60 30Jul60
mdy.date
Given a month, day, and year, returns the number of days since January 1, 1960.

Example:

> mdy.date(3, 10, 53) [1] 10Mar53
date.mdy
Convert a vector of Julian dates to a list of vectors with the corresponding values of month, day and year, and optionally weekday.

Example:

> a1<-mdy.date(month = 8, day = 7, year = 1960) > a1 [1] 7Aug60 > date.mdy(a1) $month [1] 8 $day [1] 7 $year [1] 1960 >
date.mmddyy
Given a vector of Julian dates, this returns them in the form ``10/11/89'', ``28/7/54'', etc.

Example:

> date.mmddyy(mdy.date(3, 10, 53)) [1] "3/10/53"
date.ddmmmyy
Given a vector of Julian dates, this returns them in the form ``10Nov89'', ``28Jul54'', etc.

Example:

> date.ddmmmyy(mdy.date(3, 10, 53)) [1] "10Mar53"
date.mmddyyyy
Given a vector of Julian dates, this returns them in the form ``10/11/1989'', ``28/7/1854'', etc.

Example:

> date.mmddyyyy(mdy.date(3, 10, 53)) [1] "3/10/1953"

Q: How to call C functions in R?

In R, we can call C functions. For example, we have the following toy C function in the file test.c.

/***** The function is to calculate Hadama product of two matrices. out[i][j]=x[i][j]*y[i][j]. The inputs x, y and the output out are vectors, not matrices. So in R, you need to transform input matrices into vectors and transform output vector back to matrix. *****/ void myHadamaProduct(double *x, double *y, int *nrow, int *ncol, double *out) { int i, j, r, c; r=*nrow; c=*ncol; for(i = 0; i < r; i ++) { for(j = 0; j < c; j ++) { out[i*c+j]=x[i*c+j]*y[i*c+j]; } } return; }

First, we need to compile the file test.c to create a shared library, test.so say, by using the GNU C compiler:
gcc -fpic -shared -fno-gnu-linker -o test.so test.c
Next, we need to use the R function dyn.load to load the shared library test.so. if(!is.loaded("myHadamaProduct")){ dyn.load("./test.so") } The R function is.loaded is to check if the C function myHadamaProduct is already be loaded to R. If yes, then we do not need to loaded it again.
Next, we use the R function .C to call the C function myHadamaProduct. For example, x<-matrix(1:10,nrow=5, ncol=2) # get a 5x2 matrix y<-matrix(1:10,nrow=5, ncol=2) # get a 5x2 matrix # In R, a matrix is stored by column in the memory. # However, in C, a matrix is stored by row in the memory. # So we need to transpose the matrix x, namely t(x), before # transforming it to a vector. xx<-as.double(as.vector(t(x))) yy<-as.double(as.vector(t(y))) nr<-as.integer(nrow(x)) nc<-as.integer(ncol(x)) res<-.C("myHadamaProduct", xx, yy, nr, nc, out=as.double(rep(0.0, n))) # In C, matrix is stored by row. So when transforming back, we need to # specify byrow=T. mat<-matrix(res$out, ncol=nc, byrow=T) cat("Hadama product >>n") print(mat)
If you do not need to use the shared library test.so any more, you can use the R function dyn.unload to unload it. if(is.loaded("myHadamaProduct")){ dyn.unload("./test.so") }

Note:

The C function called by R must be void type. For the example above, the function myHadamaProduct has to have the form: void myHadamaProduct(double *x, double *y, int *nrow, int *ncol, double *out) rather than double *myHadamaProduct(double *x, double *y, int *nrow, int *ncol)
You have to let the function return values through arguments, e.g "double *out" in the above example. In fact if the arguments are pointers (e.g. *out) and you change their values they refer to within the function, then the values where the pointers refer to will be changed after calling this function.
All arguments in the C function have to be passed by addresses instead of values. That is, all arguments have to be pointers. For the example above, you cannot change "int *nrow" to "int nrow".
The values where the pointers refer to will be changed after calling the function, if the values are changed within the function. So be careful when using pointers as function arguments.
Any values returned by C functions which are called in R must be initialized and must have the format: # if the variable is defined as double in the C function variablename=as.double(initialized values) # if the variable is defined as integer in the C function variablename=as.integer(initialized values) in the ".C" function (e.g. "out=as.double(rep(0.0, n))" in the above example).
The input values must also be initialized and must have the above format. However, they can be formated before the ".C" function (e.g. "nr<-as.integer(nrow(x))" in the above example).
If the output is not written as variablename=variablename format (e.g. out=as.double(rep(0.0, n)) in the above example), You still can get results. However, you have to use res[[5]] to refer out in the above example. In fact, the .C function return a list containing all arguments of the C function myHadamaProduct. Since out is the 5-th argument, you can use res[[5]] to refer to the 5-th elements of the list.
It is okay that the file test.c contains the main function.
Sometimes, the command "dyn.load("test.so")" gets error message. This is probably caused by the environment variable "$PATH" was not set correctly. You can either add the following line to the file ".bashrc" in your home directory: export PATH=$PATH:.:
or use the command
dyn.load("./test.so")

Q: How to print the R graphics directly in R?

To print the R graphics directly in R, use command dev.print.

The default for `dev.print' is to produce and print a postscript copy, if `options("printcmd")' is set suitably.

`dev.print' is most useful for producing a postscript print (its default) when the following applies. Unless `file' is specified, the plot will be printed. Unless `width', `height' and `pointsize' are specified the plot dimensions will be taken from the current device, shrunk if necessary to fit on the paper. (`pointsize' is rescaled if the plot is shrunk.) If `horizontal' is not specified and the plot can be printed at full size by switching its value this is done instead of shrinking the plot region.

If `dev.print' is used with a specified device (even `postscript') it sets the width and height in the same way as `dev.copy2eps'.

For `dev.copy2eps', `width' and `height' are taken from the current device unless otherwise specified. If just one of `width' and `height' is specified, the other is adjusted to preserve the aspect ratio of the device being copied. The default file name is `Rplot.eps'.

Example:

plot(hist(rnorm(100))) # plot histogram options("printcmd"="lpr -Poptra") # set default printer dev.print() # print the histogram to printer optra

Pages