by Le Zhang (Data Scientist, Microsoft) and Graham Williams (Director of Data Science, Microsoft)
As an in-memory application, R is sometimes thought to be constrained in performance or scalability for enterprise-grade applications. But by deploying R in a high-performance cloud environment, and by leveraging the scale of parallel architectures and dedicated big-data technologies, you can build applications using R that provide the necessary computational efficiency, scale, and cost-effectiveness.
We identify four application areas and associated applications and Azure services that you can use to deploy R in enterprise applications. They cover the tasks required to prototype, build, and operationalize an enterprise-level data science and AI solution. In each of the four, there are R packages and tools specifically for accelerating the development of desirable analytics.
Below is a brief introduction of each.
Cloud resource management and operation
Cloud computing instances or services can be harnessed within an R session, and this favors programmatic control and operationalization of R based analytical pipelines. R packages and tools in this category are featured by offering a simplified way to interact with the Azure cloud platform and operate resources (e.g., blob storage, Data Science Virtual Machine, Azure Batch Service, etc.) on Azure for various tasks.
- AzureSMR - R package for managing a selection of Azure resources. Targeted at data scientists who need to control Azure resources directly from R functions. APIs include Storage Blobs, HDInsight (Nodes, Hive, Spark), Resource Manager, and Virtual Machines.
- AzureDSVM - R package that offers a convenient harness for the Azure Data Science Virtual Machine (DSVM), remote execution of scalable and elastic data science work, and monitoring of on-demand resource consumption.
- doAzureParallel - R package that allows users to submit parallel workloads in Azure.
- rAzureBatch - run R code in parallel across a cluster in Azure Batch.
- AzureML- an R interface to AzureML experiments, datasets, and web services.
Remote interaction and access to cloud resources
Data scientists can seamlessly log in and out of R session on cloud for experimentation and explorative study. The R packages and tools in this category help data scientists or developers to remotely access or interact with Azure cloud instances or services for convenient development.
- mrsdeploy - an R package that provides functions for establishing a remote session in a console application and for publishing and managing a web service that is backed by the R code block or script you provided.
- R Tools for Visual Studio - IDE with R support.
- RStudio Server - IDE for remote R session with access via Internet browser.
- JupterHub - Jupyter notebook with multi-user access.
- IRKernel - R kernel for Jupyter notebook.
Scalable and advanced analytics.
Scalable analytics and advanced machine (deep) learning model creation can be performed in R on cloud services, with acceleration of application-specific hardware like GPUs. R packages and tools in this category allow one to perform large-scale R-based analytics on Azure with modern frameworks such as Spark, Hadoop, Microsoft Cognitive Toolkit, Tensorflow, and Keras. It is worth mentioning that many of the tools are pre-installed and configured for direct use on the Azure Data Science Virtual Machine.
Scalable analytics
- dplyrXdf - a dplyr backend for the XDF data format used in Microsoft ML Server.
- sparklyr - R interface for Apache Spark.
- SparkR - an R package that provides a light-weight frontend to use Apache Spark from R.
Deep learning
- CNTK-R - R bindings to the Cognitive Toolkit (CNTK) deep learning library.
- tensorflow - R interface to Tensorflow.
- mxnet - R interface to MXNET, bringing flexible and efficient GPU computing and state-of-art deep learning to R.
- keras - R interface to Keras.
- darch - Create deep architectures in R.
- deepnet - Implement some deep learning architectures and neural network algorithms, including BP, RBM, DBN, Deep autoencoder and so on.
- gpuR - R interface to use GPUs.
Integrations
- RevoScaleR - a collection of portable, scalable, and distributable R functions for importing, transforming, and analyzing data at scale, included with Microsoft ML Server.
- MicrosoftML - a package that provides state-of-the-art fast, scalable machine learning algorithms and transforms for R.
- h2o - R interface to H2O.
Application and service deployment
R based applications can be easily deployed as service for end-users or developers. The R packages and tools in this category are used for deploying an R-based analytics or applicaiton as services or interfaces that can be conveniently consumed by end-users or developers.
- mrsdeploy - an R package included with Microsoft ML Server that provides functions for deploying easily-consumable service within R session.
- AzureML- an R package to allow one to interact with Azure Machine Learning Studio for publishing R functions as API services.
- Azure Container Instances - service to allow running containerized R analytics in Azure.
- Azure Container Service - service that simplifies deployment, management, and operation of orchestrated containers of R analytics in Azure.
- Shiny server - Develop and publish Shiny based web applications online.
For more information
Companies around the world are using R to build enterprise-grade applications on Azure. For in-depth examples (with code and architecture), you can also find a selection of R based solutions for real-world use cases. A more detailed list of packages and tools for deploying R in Azure is provided at the link below, and will be updated as new tools become available.
Github (yueguoguo): R in Azure