RSS Matters

RSS Logo

How to export and merge tables, graphs, and raw data from R to a single Excel file; which has multiple sheets.

Link to the last RSS article here: How to import and merge many Excel files; each with multiple sheets of data…for statistical analysis. -- Ed.

By Dr. Jon Starkweather, Research and Statistical Support Consultant

Continuing last month’s theme, we again visit Excel – back by popular demand. And again, we are obligated to mention that Excel, as nice as it is, is not a statistical software package. RSS personnel do not recommend using Excel; for data storage, data display, or data analysis. An often quoted phrase1 is the following; the only thing worse than using SPSS, is using Excel. For more information on the known problems with Excel and other spread sheet based software, see Burns (2013). RSS recommends storing data in plain text (.txt) files with comma delimiters; also known as a comma separated values (.csv) file type. The reason RSS recommends text (.txt) or comma separated values (.csv) file types is because those file types can be easily opened or imported into all the statistical software packages. However, if you feel you must use Excel, then this article may help you with the common task of getting tables, graphs, and data from R into a single Excel file with one sheet for the table(s) and graph(s), and another sheet for the raw data.

Example

First, import the data. The data used in this example is available on the RSS server and can be accessed using the URL from the script below. Simply copy and paste the script below into an R console to import the data directly into R, naming it ts.df. Notice, the data has 6000 rows of 9 columns (1 time index & 8 time series).  

ts.df <- read.table(

"http://www.unt.edu/rss/class/Jon/Benchmarks/ExcelFiles/time_series_001.txt",

   header=TRUE, sep=",", na.strings="NA", dec=".", strip.white=TRUE)

nrow(ts.df)

[1] 6000

ncol(ts.df)

[1] 9

Next, set the working directory (setwd) to the location where you want the finished Excel file stored. Here, for this example, we are using the desktop.

setwd("C:/Users/jds0282/Desktop/")

Next, create an empty table in which some basic descriptive statistics will go. Keep in mind, this table is just being created for the purpose of having a table to export into Excel. The table will contain the length of each time series, the mean of each time series, the standard deviation of each time series, and the mean fractal dimension of each time series. Fractal dimension can be thought of as a complexity measure of each time series.

table.1 <- data.frame(matrix(rep(NA,36), nrow = 4))

names(table.1) <- c("stats",names(ts.df[,2:9]))

table.1[,1] <- factor(c("n","mean","sd","mean.fd"))

table.1

    stats ts.1 ts.2 ts.3 ts.4 ts.5 ts.6 ts.7 ts.8

1       n   NA   NA   NA   NA   NA   NA   NA   NA

2    mean   NA   NA   NA   NA   NA   NA   NA   NA

3      sd   NA   NA   NA   NA   NA   NA   NA   NA

4 mean.fd   NA   NA   NA   NA   NA   NA   NA   NA

Next, calculate the appropriate simple descriptive statistics and store them in the appropriate cells of the table.

table.1[1,2:9] <- rep(nrow(ts.df), 8)

table.1[2:3,2:9] <- data.frame(matrix(c(apply(ts.df[,2:9], 2, mean),

                    apply(ts.df[,2:9], 2, sd)), byrow = T, nrow = 2))

table.1

    stats        ts.1        ts.2        ts.3         ts.4        ts.5

1       n 6000.000000 6000.000000 6000.000000 6000.0000000 6000.000000

2    mean   -1.466314    1.647883   -2.080831    1.3697476   -1.226921

3      sd    2.054721    1.487989    1.170512    0.8129809    2.658949

4 mean.fd          NA          NA          NA           NA          NA

         ts.6        ts.7        ts.8

1 6000.000000 6000.000000 6000.000000

2   -2.065006    1.225880   -1.327785

3    1.016617    2.401352    2.265327

4          NA          NA          NA

Next, calculate the mean fractal dimension of each time series using a simple for-loop and place the estimates in the appropriate cells of our table. Notice below, there are two necessary packages and it is necessary to specify the window size (w.s) in order to calculate multiple fractal dimension estimates (which will be used to calculate a mean fractal dimension for each time series). Also, notice below, each of the mean fractal dimension estimates are close to 2.0; which reflects the random nature of this simulated data. For more information on fractal dimension, see Gneiting, Sevcikova, and Percival (2010).

library(abind)

library(fractaldim)

w.s <- .1*nrow(ts.df); w.s

[1] 600

for(i in 1:8){

  q <- fd.estimate(data = ts.df[,i+1], methods = "madogram",

          window.size = w.s, step.size = w.s, trim = TRUE,

          keep.data = FALSE, keep.loglog = FALSE, parallel = FALSE,

          nr.nodes = NULL, plot.loglog = FALSE)

  table.1[4,i+1] <- mean(q$fd)

  }; rm(i,q,w.s)

detach("package:fractaldim")

detach("package:abind")

table.1

    stats        ts.1        ts.2        ts.3         ts.4        ts.5

1       n 6000.000000 6000.000000 6000.000000 6000.0000000 6000.000000

2    mean   -1.466314    1.647883   -2.080831    1.3697476   -1.226921

3      sd    2.054721    1.487989    1.170512    0.8129809    2.658949

4 mean.fd    1.979112    1.976728    1.999593    1.9830176    1.978095

         ts.6        ts.7        ts.8

1 6000.000000 6000.000000 6000.000000

2   -2.065006    1.225880   -1.327785

3    1.016617    2.401352    2.265327

4    1.970747    1.966416    1.988077

Next, we create a graph; again, the graph is just for the purpose of having a graph to export to Excel. When doing this in R, the graph will not be displayed. Instead, the graph will be written as ‘graph1.png’ to the location specified as the working directory (from above). The graph file will only be written to that location when the line ‘dev.off()’ is processed. The graph is displayed after the code segment so that the reader will see what the graph looks like in R prior to seeing it in the finished Excel file.

jpeg('graph1.png')

  par(mfrow = c(4,1))

  plot(ts.df[,1],ts.df[,2], type = "l", col = "darkblue",  xlab = "Time",

    ylim = c(-5,5), ylab = "Y")

  par(new = T)

  plot(ts.df[,1],ts.df[,3], type = "l", col = "blue",  xlab = "Time",

    ylim = c(-5,5), ylab = "Y")

  plot(ts.df[,1],ts.df[,4], type = "l", col = "darkgreen",  xlab = "Time",

    ylim = c(-5,5), ylab = "Y")

  par(new = T)

  plot(ts.df[,1],ts.df[,5], type = "l", col = "green",  xlab = "Time",

    ylim = c(-5,5), ylab = "Y")

  plot(ts.df[,1],ts.df[,6], type = "l", col = "red", xlab = "Time",

    ylim = c(-5,5), ylab = "Y")

  par(new = T)

  plot(ts.df[,1],ts.df[,7], type = "l", col = "brown",  xlab = "Time",

    ylim = c(-5,5), ylab = "Y")

  plot(ts.df[,1],ts.df[,8], type = "l", col = "black",  xlab = "Time",

    ylim = c(-5,5), ylab = "Y")

  par(new = T)

  plot(ts.df[,1],ts.df[,9], type = "l", col = "grey",  xlab = "Time",

    ylim = c(-5,5), ylab = "Y")

dev.off()

R Graph 1 

Next, we work on creating the Excel file by first creating the workbook, then the individual sheets – all within R. Three packages are required to accomplish these tasks; really only one package (xlsx; Dragulescu, 2013) but, it has two dependent packages.

library(rJava)

library(xlsxjars)

library(xlsx)

First, we need to create the workbook, here simply named my.wb, by using the intuitively named createWorkbook function and supplying the Excel format we wish – here xls.

my.wb <- createWorkbook(type = "xls")

Next, we create two sheets; one for the table and graph, and one for the raw time series data. Again, the function is intuitively named: createSheet, and we supply the workbook in which to create each sheet and the sheet name we desire.

sheet.1 <- createSheet(my.wb, sheetName = "time.series.tables.and.graphs")

sheet.2 <- createSheet(my.wb, sheetName = "time.series.data")

Next, we add the objects, such as the table, graph, and data, to the two sheets we have created in our workbook. When using the addDataFrame or addPicture functions, it may require some trial and error to place the data frame or picture in the sheet where it is desired. However, creating the file, checking placement, and if necessary, altering the start column and start row arguments to re-write the file is very easy to do. One other key point to keep in mind is the scale argument of the addPicture function. The scale argument can be used to adjust the size of the picture. Scale is set to 1.00 by default (if no scale is specified, no scaling factor is applied).

addDataFrame(table.1, sheet = sheet.1, startRow = 3, startColumn = 1)

my.file <- "C:/Users/jds0282/Desktop/graph1.png"

addPicture(file = my.file, sheet = sheet.1, scale = 2,

           startRow = 10, startColumn = 2)

addDataFrame(ts.df, sheet = sheet.2, startRow = 1, startColumn = 1)

Lastly, we must save the workbook. This is the step which actually creates the Excel file or workbook and sheets within it.

saveWorkbook(my.wb, "TimeSeries.001.xls")

Then, simply navigate to the working directory set at the beginning (here, the working directory is the desktop). Then open the Excel file and inspect the placement of the table(s) and graph(s). Screen captures of sheet 1 (table and graph) and then sheet 2 (raw data) are below.

 

 Excel Sheet 1

Excel Sheet 2

 Conclusions

 Keep in mind, there are a variety of different ways of accomplishing what was accomplished in this article. All of the functions used in this article have optional arguments for more precise control over the objects and their placement in an Excel file being created. The example here was admittedly simple in order to illustrate the general use of functions which can be used to export objects (e.g., tables, graphs, & data) to Excel from R. As stated last month, that is another benefit of using R, the flexibility it affords the analyst in deciding what to do and how to do it. For more information on what R can do, please visit the Research and Statistical Support Do-It-Yourself Introduction to R course website. Lastly, for those interested in seeing how the example data was created in R; please take a look at the script which was used. An Adobe.pdf version of this article can be found here.

Until next time; Ground control to Major Tom….

Footnote1: The phrase is believed to have originated with respected statistician and prominent R user Frank Harrell of Vanderbilt University at the 5th annual Bayesian Biostatistics Conference.

 

References / Resources

Burns, P. (2013). Spreadsheet Addiction. Available at: http://www.burns-stat.com/documents/tutorials/spreadsheet-addiction/

Dragulescu, A. A. (2013). Package xlsx. Documentation available at: http://cran.r-project.org/web/packages/xlsx/index.html 

Gneiting, T., Sevcikova, H., & Percival D.B. (2010). Estimators of Fractal Dimension: Assessing the Roughness of Time Series and Spatial Data. Technical Report No. 577, Department of Statistics, University of Washington. Available at: http://www.stat.washington.edu/research/reports/2010/tr577.pdf

Sevcikova, H., Gneiting, T., & Percival, D. (2013). Package fractaldim. Documentation available at:

http://cran.r-project.org/web/packages/fractaldim/index.html

 

Originally published September 2013 -- Please note that information published in Benchmarks Online is likely to degrade over time, especially links to various Websites. To make sure you have the most current information on a specific topic, it may be best to search the UNT Website - http://www.unt.edu . You can also consult the UNT Helpdesk - http://www.unt.edu/helpdesk/. Questions and comments should be directed to benchmarks@unt.edu.