Run massive parallel R jobs cheaply with updated doAzureParallel package

At the EARL conference in San Francisco this week, JS Tan from Microsoft gave an update (PDF slides here) on the doAzureParallel package . As we've noted here before, this package allows you to easily distribute parallel R computations to an Azure cluster. The package was recently updated to support using automatically-scaling Azure Batch clusters with low-priority nodes, which can be used at a discount of up to 80% compared to the price of regular high-availability VMs.

JS Tan using doAzureParallel #rstats package to run simulation on a cluster of 20 low-priority Azure VMs. Total cost: $0.02 #EARLConf2017 pic.twitter.com/Mpl3IUa9zY
— David Smith (@revodavid) June 7, 2017

Using the doAzureParallel package is simple. First, you need to define the cluster you're going to use as a JSON file. (You can see an example on the right.) Here, you'll specify your Azure credentials, the size of the cluster, and the type of nodes (CPUs and memory) to use in the cluster. You can also specify here R packages (from CRAN and/or Github) to be pre-loaded onto each node, and the maximum number of simultaneous tasks to run on each node (for within-node parallelism).

New to this update, the poolSize option allows you to specify the number of dedicated (standard) VM nodes to use, in addition to a number of low-priority nodes to use. Low-priority nodes can be pre-empted by the Azure system at any time, but are much cheaper to use. (Even if a node is pre-empted your parallel computation will be continue; it will just take a little longer with the reduced capacity.) You can even specify a minimum and maximum number of nodes of each class to use, in which case the cluster will automatically scale up and down according to either (your choice) the workload or the time of day (e.g. only expand the low-priority part of the cluster on weekends, when pre-emption is less likely).

Once you've defined the parameters of your cluster, all you need to do is declare the cluster as a backend for the foreach package. The body of the foreach loop runs just like a for loop, except that multiple iterations run in parallel on the remote cluster. Here are the key parts of the option price simulation example JS presented at the conference.

This same approach can be used for any "embarrassingly parallel" iteration in R, and you can use any R function or package within the body of the loop. For example, you could use a cluster to reduce the time required for parameter tuning and cross-validation with the caret package, or speed up data preparation tasks when using the dplyr package.

In addition to support for auto-scaling clusters, this update to doAzureParallel also includes a few other new features. You'll also find new utility functions for managing multiple long-running R jobs, functions to read data from and write data to Azure Blob storage, and the ability to pre-load data into the cluster by specifying resource files.

The doAzureParallel package is available for download now from Github, under the open-source MIT license. For details on how to use the package, check out the README and the doAzureParallel guide.

Github (Azure): doAzureParallel

Run massive parallel R jobs cheaply with updated doAzureParallel package

Trending Articles

RAMAYAMPET Mandal Sarpanch | Upa-Sarpanch | Ward member Mobile Numbers Medak...

लड़कियां सेक्स के दौरान क्यों करती है उह! आह!लड़कियां सेक्स के दौरान क्यों करती...

Neem Baba Extra Questions Answer Class 6 English Poorvi

Throw Back: 4×4 — Sikilitele (Ft Castro) Prod by JQ

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Lowe faces four theft charges

Practice Sheet of Right form of verbs for HSC Students

Mafia, Murder & Mayhem In The Motor City: Detroit Mob Hit Timeline (1937-2007)

The 10 Tennessee Cities With The Largest Black Population For 2021

Materials Around Us Class 6 Worksheet Science Chapter 6

デスクトップヒープの枯渇

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

Kanulanu Thaake Lyrics and translation | Manam (2014)

Korean Sex Porn Videos: XXX Videos & Free Porn Movies

Teen Shot In Miami Drive-By Dies From Injuries

Download: IQ Muzatasha feat Shy D & Pmj – Ulesi NiFertilizer Yamavuto

Mahakal Attitude Status

Property developer set up cannabis factory to help pay off debts...

♡

KB: How to troubleshoot issues when adding a Hyper-V host in System Center...