Quantcast
Channel: Category Name
Viewing all articles
Browse latest Browse all 5971

Introduction to AzureKusto

$
0
0

By Hong Ooi and Alex Kyllo

This post is to announce the availability of AzureKusto, the R interface to Azure Data Explorer (internally codenamed “Kusto”), a fast, fully managed data analytics service from Microsoft. It is available from CRAN, or you can install the development version from GitHub via devtools::install_github("cloudyr/AzureKusto").

AzureKusto provides an interface (including DBI compliant methods for connecting to Kusto clusters and submitting Kusto Query Language (KQL) statements, as well as a dbplyr style backend that translates dplyr queries into KQL statements. On the administrator side, it extends the AzureRMR framework to allow for creating clusters and managing database principals.

Connecting to a cluster

To connect to a Data Explorer cluster, call the kusto_database_endpoint() function. Once you are connected, call run_query() to execute queries and command statements.

library(AzureKusto)

## Connect to a Data Explorer cluster with (default) device code authentication
Samples <- kusto_database_endpoint(
server="https://help.kusto.windows.net",
database="Samples") res <- run_query(Samples,
"StormEvents | summarize EventCount = count() by State | order by State asc") head(res) ## State EventCount ## 1 ALABAMA 1315 ## 2 ALASKA 257 ## 3 AMERICAN SAMOA 16 ## 4 ARIZONA 340 ## 5 ARKANSAS 1028 ## 6 ATLANTIC NORTH 188 # run_query can also handle command statements, which begin with a '.' character res <- run_query(Samples, ".show tables | count") res[[1]] ## Count ## 1 5

dplyr Interface

The package also implements a dplyr-style interface for building a query upon a tbl_kusto object and then running it on the remote Kusto database and returning the result as a regular tibble object with collect(). All the standard verbs are supported.

library(dplyr)
StormEvents <- tbl_kusto(Samples, "StormEvents")
q <- StormEvents %>%
    group_by(State) %>%
    summarize(EventCount=n()) %>%
    arrange(State)
show_query(q) ## <KQL> database('Samples').['StormEvents'] ## | summarize ['EventCount'] = count() by ['State'] ## | order by ['State'] asc
collect
(q) ## # A tibble: 67 x 2 ## State EventCount ## <chr> <dbl> ## 1 ALABAMA 1315 ## 2 ALASKA 257 ## 3 AMERICAN SAMOA 16 ## ...

DBI interface

AzureKusto implements a subset of the DBI specification for interfacing with databases in R.

The following methods are supported:

  • Connections: dbConnect, dbDisconnect, dbCanConnect
  • Table management: dbExistsTable, dbCreateTable, dbRemoveTable, dbReadTable, dbWriteTable
  • Querying: dbGetQuery, dbSendQuery, dbFetch, dbSendStatement, dbExecute, dbListFields, dbColumnInfo

It should be noted, though, that Data Explorer is quite different to the SQL databases that DBI targets. This affects the behaviour of certain DBI methods and renders other moot.

library(DBI)

Samples <- dbConnect(AzureKusto(),
                     server="https://help.kusto.windows.net",
                     database="Samples")

dbListTables(Samples)
## [1] "StormEvents"       "demo_make_series1" "demo_series2"     
## [4] "demo_series3"      "demo_many_series1"

dbExistsTable(Samples, "StormEvents")
##[1] TRUE

dbGetQuery(Samples, "StormEvents | summarize ct = count()")
##      ct
## 1 59066

If you have any questions, comments or other feedback, please feel free to open an issue on the GitHub repo.

And one more thing...

As of Build 2019, Data Explorer can also run R (and Python) scripts in-database. For more information on this feature, currently in public preview, see the Azure blog and the documentation article.

 

 


Viewing all articles
Browse latest Browse all 5971

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>