Data Analysis Perspectives, Computing in R

What The Flea Circus Can Teach Us About Computing In R

By Richard Herrington, Data Science Consultant, Data Science, and Analytics

Recently, fellow R&SS consultant Jon Starkweather brought a computing problem to my attention – originally this was posted on the Project Euler project site – Problem 213.

The problem description is as follows: Flea Circus (Problem 213):  A 30×30 grid of squares contains 900 fleas, initially one flea per square. When a bell is rung, each flea jumps to an adjacent square at random (usually four possibilities, except for fleas on the edge of the grid or at the corners).   What is the expected number of unoccupied squares after 50 rings of the bell?  Give your answer rounded to six decimal places.

Jon's approach to the following problem can be found at:  http://bayes.acs.unt.edu:8083/BayesContent/class/Jon/R_SC/Module12/Fleas_001.R.  At first glance,  many R users might approach the problem by using looping constructs (e.g. for,  while,  repeat).   However, R programmers also know that the key to efficient programming in R in using vectorized operations,  rather than using for loops to iterate over the indices of vector and matrix objects.  We explore the following:  Can vectorized approaches be used in the Flea Circus problem?  Does the structure of the problem allow other efficient programming constructs to be used? This brief note explores using the following approaches in R:    i) recursive function calls;  and  ii) parallel implementation of the for-loop across multiple CPUs;  and iii) byte-compiling of user-created functions; - in programming the Flea Circus problem in R.  According to Jon, his R code, takes approximately 45 minutes for about 250 replications of the 50-step iteration of 900 fleas jumping on a 30 by 30 grid (using for and while loops,  with no multiple CPU parallelization, or byte compiling of functions calls). I will explore how much speed-up we can achieve in a Monte Carlo implementation of the Flea Circus problem, by using these techniques in conjunction. We will find that we can perform 250 replications in approximately six minutes on a single CPU (with NO parallelization and using recursive functions that are  byte-compiled). Parallelization across 8 cores takes less than a minute for 250 iterations. This is approximately 1.6 seconds for all 50-steps of iteration for all 900 fleas.  Parallelization of 2000 iterations (250 iterations  * 8 core) takes approximately 6.4 minutes on 8 core -  roughly the same amount of time with 8 times  the number of iterations. Published solutions to the  Project Euler problems can be found here. The published solution with 6 digits of accuracy) is 330.721154.

Overview of the R Scripts and the General Approach Taken

The approach taken here is to keep track of the locations of the particles (fleas) as they move around on the 30 by 30 grid of cell locations - move.incr.  The Flea Circus example is an example of a discrete markov chain process - a 30 by 30 matrix of probabilities that are iterated 50 times to produce a final matrix of probabilities,  that is finally used to calculate the expected values of the empty cells at the end of 50 steps of iteration.  This type of iteration process (when the output from the function becomes the input to the same function) suggests the use of a function that calls itself until a criterion is met, to end the iterations – a recursive function.

We will create a recursively called function - flea.jump - such that at the end of a sequence of 50 steps of recursive calls to itself,  the resulting expected value of the empty cells is returned to the function labled flea.circus.repetition.  The function flea.circus.repitition  is recursively called until the number of pre-specified replications has occured.   Some of the run-time speed decreases of the script are a result of using an apply function to the move list using the recursively called function produce.valid.move (note: using a for-loop here would slow down the function down considerably, hence the use of the apply function). The produce.valid.move function is called recursively untill a valid move is produced for a single current flea location.   This updated move is returned to the apply function within the flea.jump function. The final output of that apply function is an updated move list for ALL 900 fleas.  Within the function flea.jump,  a  table of probabilities is generated for the current locations of the fleas - after jumping. The complement of these probabilities on step i,  are P_i = (1 - p) and are normalized  to sum to 1, and then they are multiplied by the corresponding matrix of probabilities from the last iteration (i.e. P_i-1). Lastly, this resulting multiiplication of the two probabilites (P_i-1  *  P_i ) is again normalized to sum to 1, which is used to produce the current matrix of probabilities for empty cells. From this 30 by 30 matrix of probabilities,  the sum of the probabilities that correspond to empty cells on the current step,  are multiplied by the constant 900 cells,  to produce the expected value of the empty.cell count on the current iteration. The empy.cell count and flea.mat.current.prob probability values  are then saved into the flea.mat.list structure to be used within the next iteration of flea.jump. The function flea.circus.repetition is recursively called to replicate the entire 50 steps of the 900 fleas,  a number of pre-specified times - number.of.repetitions.  The expected empty.cell count is accumulated over iterations, and the replicated average is returned once the flea.circus.repetition  is finished.

Unfortunately, R is limited to no more than a couple thousand nested recursions,  therefore,  we need to implement an outer controlling function that uses a loop to replicate the recursively called replications of the function flea.circus.repetition.  This function - monte.carlo.rep.func can be used with R packages which are designed to utilize multiple CPUs on a single machine, or across multiple machines with multiple CPUs (e.g. R packages:  foreach, parallel, doMC). This parallization of the for-loops across multiple CPUs will allow us to gain further time reductions, for a  larger numbers of iterations of the function monte.carlo.rep.func.

A Note About The Precision of Recursive Arithmetic Computations

The original problem statement requires that the results should be reported with six digits of precision. Iterated arithmetic computations are susceptable to accumulating round-off error,  at each step of the calcuation. Inacurate results will be produced, after a certain number of iterations, if the accuracy of the computer's internal numeric representations are not sufficient.  Additionally, Monte-Carlo variation across replications will produce varying estimates of the final calculations.   If the accuracy of the computer's floating-point-unit is not very accurate,  and Monte-Carlo variation is large,  then it will be difficult (maybe impossible) to achieve six digits of accuracy. While we don't explore these issues here, it might be beneficial to use R packages that are designed to produce arbitrarily accurate precision for arithmetic calculations in R (e.g. R package Rmpfr).

High Performance Computing at UNT

If the use of parallelization in your research project(s) sounds appealing to you, and you don't have access to high performance mutlicore machines (more than four CPUs), then you might take a look at University Information Technology's high performance computing cluster. As stated on their website:  "The cluster systems are intended for computationally intensive LINUX-capable software. Systems may be available for use by any  faculty or currently enrolled qualified student at UNT, with additional requirements to use some resources. Student access must be sponsored by a faculty member."  The UIT's HPC cluster includes R (and R packages) as part of UIT's HPC maintained suite of software

Function  produce.valid.move

# Load package to byte-compile functions
require(compiler)

produce.valid.move<-function(location)
{location.new<-location
axis.to.move<-sample(c("x", "y"), 1)
step.incr<-sample(c(1, -1), 1)
if(axis.to.move=="x") location.new<-location.new + step.incr
if(axis.to.move=="y") location.new<-location.new + step.incr
if(location.new>=1 & location.new<=30 & location.new>=1 & location.new<=30)
return(location.new)
produce.valid.move(location)}

# Byte-compile the function
produce.valid.move<-cmpfun(produce.valid.move)

Function flea.jump

flea.jump<-function(flea.mat.list)
{current.step<-flea.mat.list\$current.step+1
empty.cells.old<-empty.cells
flea.mat.current.prob.old<-flea.mat.list\$flea.mat
move.incr<-t(apply(flea.mat.list\$flea.mat.location, 1, produce.valid.move))
flea.mat.current.freq<-matrix(table(move.incr[,1], move.incr[,2]), nrow=30, ncol=30)
flea.mat.current.prob.temp<-flea.mat.current.freq + 1
flea.mat.current.prob.temp<-flea.mat.current.prob.temp/sum(flea.mat.current.prob.temp)
flea.mat.current.prob<- (flea.mat.current.prob.old) * (1-flea.mat.current.prob.temp)
flea.mat.current.prob<-flea.mat.current.prob/sum(flea.mat.current.prob)
empty.cells<-sum(flea.mat.current.prob[flea.mat.current.freq==0])*900
flea.mat<-flea.mat.current.prob
flea.mat.list<-list(empty.cells=empty.cells,flea.mat=flea.mat.current.prob,
flea.mat.location=move.incr, current.step=current.step)
if(current.step==50)
{
return(flea.mat.list)
}
flea.jump(flea.mat.list)}

# Byte-compile the function
flea.jump<-cmpfun(flea.jump)

Function flea.circus.repetition

flea.circus.repetition<-function(flea.mat.list, number.of.repetitions, current.iteration,
empty.cells.total)
{gc()
cat(current.iteration,"\n")
empty.cells.old <-empty.cells.total
flea.mat<-matrix(1, nrow=30, ncol=30)
flea.mat.location<-which(flea.mat==1, arr.ind=T)
flea.mat.list<-list(empty.cells=0, flea.mat=flea.mat,
flea.mat.location=flea.mat.location, current.step=0)
flea.mat.list<-flea.jump(flea.mat.list)
new.empty.cells.sum<- empty.cells.old + flea.mat.list\$empty.cells
current.iteration<-current.iteration + 1
if(current.iteration > number.of.repetitions)
{new.empty.cells<-new.empty.cells.sum/(current.iteration)
return(new.empty.cells)}
flea.circus.repetition(flea.mat.list=flea.mat.list, number.of.repetitions,
current.iteration=current.iteration, empty.cells.total=new.empty.cells.sum)}

# Byte-compile the function
flea.circus.repetition<-cmpfun(flea.circus.repetition)
# Utilizing a single CPU
gc()
flea.mat<-matrix(1, nrow=30, ncol=30)
flea.mat.location<-which(flea.mat==1, arr.ind=T)
flea.mat.list<-list(empty.cells=0,
flea.mat=flea.mat,
flea.mat.location=flea.mat.location,
current.step=0)

number.of.repetitions<-250

system.time(flea.circus.repetition.out<-flea.circus.repetition(flea.mat.list=flea.mat.list,
number.of.repetitions=number.of.repetitions,
current.iteration=0,
empty.cells.total=0))
flea.circus.repetition.out

Function monte.carlo.rep.func

# For use with parallelization within R

monte.carlo.rep.func<-function(monte.carlo.reps=1, number.of.repetitions=100)
{
number.of.repetitions<-number.of.repetitions
monte.carlo.reps<-monte.carlo.reps
replication.mat<-matrix(0, nrow=monte.carlo.reps, ncol=1)
for(i in 1:monte.carlo.reps)
{
gc()
cat(i,"\n")
flea.mat<-matrix(1, nrow=30, ncol=30)
flea.mat.location<-which(flea.mat==1, arr.ind=T)
flea.mat.list<-list(empty.cells=0, flea.mat=flea.mat, flea.mat.location=flea.mat.location, current.step=0)
system.time(replication.mat[i, ]<-flea.circus.repetition(flea.mat.list=flea.mat.list,
number.of.repetitions=number.of.repetitions,
current.iteration=0,empty.cells.total=0))
cat(replication.mat[i, ], "\n")
}
return(replication.mat)
}

# Byte-compile the function
monte.carlo.rep.func<-cmpfun(monte.carlo.rep.func)

Parallelization of  Functionmonte.carlo.rep.func

require(foreach)
require(parallel)
require(doMC)

# Register the availability of 8 core CPU
registerDoMC(8)

gc()
system.time(flea.list.out<-foreach(i=1:8, .verbose=TRUE) %dopar%
{
monte.carlo.rep.func(monte.carlo.reps=1, number.of.repetitions=200)
})

# Extract contents of list structure and take the average of replications across each list
# element
mean(rapply(flea.list.out, as.numeric))

Example Output

#  0 replications using 1 CPU
#
#
#    user  system elapsed
#   1.604   0.012   1.612       ~ 1.6 sec.
# > flea.circus.repetition.out
#  324.5132
#
#
#
#
# 250 replications using 1 CPU
#
#
#
#    user  system elapsed
# 358.460   0.080 358.034      ~ 6 min.
# > flea.circus.repetition.out
#  330.9852
#
#
#
#
# 2000 replications using 8 CPUs
#
#
#
#     user   system  elapsed
# 4194.288    1.588  381.568   ~ 6.4 min.
# >
# > mean(rapply(flea.list.out, as.numeric))
#  330.6855