Parallel Computing

Parallel Computing concerns the practice of dividing a large computational task into smaller parts that can be executed simultaneously – that is, in parallel – rather than one after another. The goal is to increase processing speeds, handle larger problems (tasks and data), and deploy resources more efficiently.

Parallel Computing

In traditional (or serial) computing, a program runs on a single processor core and executes one instruction at a time. As problems get larger or more complex (e.g., simulating electoral outcomes, training machine learning models, or otherwise processing massive datasets) this approach becomes too slow. Parallel computing solves this by spreading the work across multiple cores, processors, or even machines, allowing many operations to occur at once.


Payroll at the University of Florida (Example)

Imagine you are responsible for discharging payroll to all employees (faculty, staff, etc.) at the University of Florida. This is approximately 33,000 people – all of whom we can assume are registered for direct deposit and would very much appreciate being paid correctly and in a timely manner. For the sake of the example, let’s further assume everything with respect to payroll can be automated. However, you are responsible with designing a programming routine to actually discharge payroll – each of which requires 0.25 seconds. Using a single system, we can easily compute the completion time as:

33,000 (Tasks)×0.25 (Seconds/Task)=8,250 Seconds (2.29 Hours).

Clearly not very efficient – but it’s the result of practical limitations. Traditional computers are designed to execute instructions one after another in a sequence, which can make completing large volumes of tasks a burdensome and time-consuming endeavor.

But what if we have two computers? The time to complete an independent task remains fixed (0.25 seconds/each) due to the serial constraints of the computing process, but what if we divided and allocated the tasks equally to each computer? Suddenly, the completion time has been cut in half:

33,000 (Tasks)×0.25 (Seconds/Completion)2 (Computers)=4,125 Seconds (1.15 Hours).

Adding additional computers – say, 3 (About 45 Minutes), 4 (About 34 Minutes), or 5 (About 27 minutes) computers – continues to reduce the computation time. What was once a 2.3 hour task is suddenly about 30 minutes without changing the serial structure of the computing process – we just added more computers to handle the workload!

This concept is at the heart of parallel computing (generally) and high performance computing environments like HiPerGator. Barring quantum computing (which, please… don’t ask me about), computers can only operationalize tasks serially, but dividing the tasks across multiple computing units simultaneously can drastically improve computational efficiency. Even cooler, virtually every modern laptop on the consumer market can facilitate a parallel environment – you just need to tell your computer (or R) to do it!

Below, I provide some key terms to understand re: parallel computing, as well as how to set-up and deploy a parallel environment in R


Key Terms

Setting Up a Parallel Environment in R

An easy way to understand what’s happening under the hood when you’re using parallel processing in R is that for each core you initialize (register), you’re telling it to create background sessions to independently handle a portion of the larger workload.

There are a handful of different means to facilitate a parallel environment in R, and I have been fairly inconsistent in preferring one suite of packages to initiate the start-up over another. However, for the sake of this course, I will show you how to do it primarily using parallel::().

suppressPackageStartupMessages({
  library(parallel); 
  library(doParallel)
}) # Load Parallel (Quietly)

Recall that the CPU included in most modern computers contain several cores, all of which allow for independent computation. Using detectCores(), we can recover the number you have available to you. However, a good rule of thumb – depending on how much computing power you still need when these processes are running – is to avoid maxing-out your core allocation for parallel tasks. For instance, if you still wanted to be able to run R using a different project (or maybe even just scroll on your computer…), prescribing more of your computing resources to the parallel job reduces the resources that can be used for other things.

Leaving at least one core available for other tasks is always a good idea – as you will see below, I generally (at minimum) indicate the number of cores to use as (numcores - 1).

numcores <- parallel::detectCores() 
message('You Have ', numcores, ' Cores Available for Use!')
## You Have 14 Cores Available for Use!

Next, we will use makeCluster() to initialize the larger volume of available cores into a single socket cluster. Note: There are several additional arguments to declare in makeCluster(), virtually all of which can be ignored.

cl <- makeCluster(numcores - 1) # Create a Socket Cluster  
doParallel::registerDoParallel(cl) # Register the Parallel Environment (cl)
stopCluster(cl) # Relieves Cluster

Congrats – You’ve created a parallel environment with several cores! Now, we need to actually run something. Let’s create a function with something that seems tedious but may take a moment – like 10 million draws from a standard normal distribution and we’ll subsequently calculate the mean of those squared values.

We will use lapply() to serially apply the function serially across 10 repetitions, then do the same using parLapply() (the parallel version) and compare the completion times.

tedious_function <- function(x) {
  y <- rnorm(10000000) # Simulate 10 Million Draws
  mean(y^2) # Recover Mean of Squared Values
} 

serial <- system.time({
  serial_run <- lapply(1:10, tedious_function)
}) # Serial Run


cl <- makeCluster(numcores - 1) # Create a Socket Cluster  
doParallel::registerDoParallel(cl) # Register the Parallel Environment (cl)

parallel <- system.time({
  parallel_run <- parLapply(cl, 1:10, tedious_function)
}) # Parallel Run

stopCluster(cl) # Shut Down Parallel
## Serial Completion Time: 5.32 Seconds
## Parallel Completion Time: 0.76 Seconds

The routine executed using a parallel approach is clearly faster, though evidently not by much here. However, it’s important to recognize that the most obvious benefits of parallel computing emerge when you need to engage with tasks of considerable scale – e.g., the payroll example above.

A Few Important Things to Remember

  1. Whenever you are done using a local parallel environment, be sure to use stopCluster() to fully and cleanly shut down the background R worker processes you’ve created. If you leave a cluster running, those processes continue to consume memory and system resources, which can slow down your computer or cause errors in later parallel tasks.

  2. It’s important to view each node you’ve created with registerDoParallel as their own R sessions, so each one is going to need the necessary packages or libraries, as well as any objects (e.g., data, functions, etc.) stored in the global environment.

You can achieve that using a combination of clusterEvalQ for exporting libraries and clusterExport()for objects and functions in the global environment. Below are some examples from a project I have been working on to simulate turnover in the American lower federal courts – every core I create needs to have its own copy of those items.

clusterExport(cl, c("predict_survival", 'new_judge', 'election_simulation', 'senate_moratorium', 'senate_rejection', 'single_simulation_analysis', 'electoral_outcomes_02_24', 'departure_simulation_comparison', 'd1', 'c1', 'fjc_combined_survival', 'allocated_seats', 'nominate', 'senate_rejection_data'))  # Allocate Global Environment Objects & Functions to Nodes

clusterEvalQ(cl, {
  library(dplyr)
  library(survival)
  library(stringr)
}) # Allocate Necessary Packages