By Hong Ooi and Alex Kyllo
This post is to announce the availability of AzureKusto, the R interface to Azure Data Explorer (internally codenamed “Kusto”), a fast, fully managed data analytics service from Microsoft. It is available from CRAN, or you can install the development version from GitHub via devtools::install_github("cloudyr/AzureKusto")
.
AzureKusto provides an interface (including DBI compliant methods for connecting to Kusto clusters and submitting Kusto Query Language (KQL) statements, as well as a dbplyr style backend that translates dplyr queries into KQL statements. On the administrator side, it extends the AzureRMR framework to allow for creating clusters and managing database principals.
Connecting to a cluster
To connect to a Data Explorer cluster, call the kusto_database_endpoint()
function. Once you are connected, call run_query()
to execute queries and command statements.
library(AzureKusto)
## Connect to a Data Explorer cluster with (default) device code authentication
Samples <- kusto_database_endpoint(
server="https://help.kusto.windows.net",
database="Samples")
res <- run_query(Samples,
"StormEvents | summarize EventCount = count() by State | order by State asc")
head(res)
## State EventCount
## 1 ALABAMA 1315
## 2 ALASKA 257
## 3 AMERICAN SAMOA 16
## 4 ARIZONA 340
## 5 ARKANSAS 1028
## 6 ATLANTIC NORTH 188
# run_query can also handle command statements, which begin with a '.' character
res <- run_query(Samples, ".show tables | count")
res[[1]]
## Count
## 1 5
dplyr Interface
The package also implements a dplyr-style interface for building a query upon a tbl_kusto object and then running it on the remote Kusto database and returning the result as a regular tibble object with collect()
. All the standard verbs are supported.
library(dplyr)
StormEvents <- tbl_kusto(Samples, "StormEvents")
q <- StormEvents %>%
group_by(State) %>%
summarize(EventCount=n()) %>%
arrange(State)
show_query(q)
## <KQL> database('Samples').['StormEvents']
## | summarize ['EventCount'] = count() by ['State']
## | order by ['State'] asc
collect(q)
## # A tibble: 67 x 2
## State EventCount
## <chr> <dbl>
## 1 ALABAMA 1315
## 2 ALASKA 257
## 3 AMERICAN SAMOA 16
## ...
DBI interface
AzureKusto implements a subset of the DBI specification for interfacing with databases in R.
The following methods are supported:
- Connections: dbConnect, dbDisconnect, dbCanConnect
- Table management: dbExistsTable, dbCreateTable, dbRemoveTable, dbReadTable, dbWriteTable
- Querying: dbGetQuery, dbSendQuery, dbFetch, dbSendStatement, dbExecute, dbListFields, dbColumnInfo
It should be noted, though, that Data Explorer is quite different to the SQL databases that DBI targets. This affects the behaviour of certain DBI methods and renders other moot.
library(DBI)
Samples <- dbConnect(AzureKusto(),
server="https://help.kusto.windows.net",
database="Samples")
dbListTables(Samples)
## [1] "StormEvents" "demo_make_series1" "demo_series2"
## [4] "demo_series3" "demo_many_series1"
dbExistsTable(Samples, "StormEvents")
##[1] TRUE
dbGetQuery(Samples, "StormEvents | summarize ct = count()")
## ct
## 1 59066
If you have any questions, comments or other feedback, please feel free to open an issue on the GitHub repo.
And one more thing...
As of Build 2019, Data Explorer can also run R (and Python) scripts in-database. For more information on this feature, currently in public preview, see the Azure blog and the documentation article.