In recent years, Microsoft SQL and Tableau Engineering teams have been working closely together to provide a superior user experience with the two platforms. Today we are sharing some advice on how to optimize the connectivity between Azure SQL DB and Tableau.
Our teams previously teamed up for the SQL Server 2016 launch and for the Azure SQL Data Warehouse launch. Today, this partnership relies on the fact that SQL Server is Tableau’s most common data source, in combined cloud and on-premises usage as detailed in the recent Tableau Cloud Data Brief.
Our engineering benchmarks, and several global customer engagements, lead us to have a closer look at optimal connectivity and how to leverage the specificities of both platforms.
Without further ado, here are the main learnings.
Out-of-the-box experience works well
We observed that most customers fared well by simply replicating their on-premises approach. Azure SQL DB uses the same drivers as SQL Server 2016, which inherently reduces complexity. With Tableau Desktop UI there is a single SQL Server connector for Azure SQL DB, Azure SQL Data Warehouse, or SQL Server 2016, running on premises or in a public cloud like Azure.
Tableau Live Querying provides the best performance
Network bandwidth permitting, the Live Query mode of Tableau allows the heavy lifting to occur in Azure SQL DB, while also providing more up to date information to Tableau users, as opposed to extract based connectivity. This implies doing some sizing and performance testing with different Azure SQL DB SKUs. In our experience, Azure SQL DB latency and throughput can meet the most stringent Tableau requirements.
For example, we advised a joint-customer to move from S0 (10 DTUs) to P1 Premium (125 DTUs), which instantly removed latency issues. The cost impact is commonly offset by an improved user experience and increased customer satisfaction.
Other Tableau best practices
- Isolate date calculations: As much as possible, pre-compute information. Tableau will compute it once and the database may be able to use an index.
- Use Boolean fields: Don’t use 0 and 1 as indicators for true and false, just use Boolean fields. They are generally faster.
- Don’t change case: Don’t put UPPER or LOWER in comparisons when you know the case of the values.
- Use aliases: Where possible, label text using Tableau’s alias feature, rather than in a calculation. Aliases aren’t sent to the database so they tend to be faster.
- Use formatting when possible: Don’t use string functions when you can just use formatting. Then use aliases to label the fields.
- Replace IF / ELSEIF with CASE: It’s a good idea to do this as CASE statements are generally faster.
Using a query tuning methodology
We used the following methodology to analyse Tableau queries in order to identify and address bottlenecks:
- Enable query store
- Run the provided workloads
- Monitor DTU consumption using dynamic management views to ensure that tier limits are not being reached
- Check for index recommendations and usage
- Prioritize statements based on highest execution time
- Examine top queries and associated execution plans
- Apply suggestions
- … iterate
Key tools for Azure SQL DB Optimization
For Azure SQL Database customers in general, consider recommending using the following:
Checking service level constraints
To determine if you are hitting DTU limits for a workload, take a look at the following query:
SELECT [end_time], [avg_cpu_percent], [avg_data_io_percent],
[avg_log_write_percent], [avg_memory_usage_percent]
FROM [sys].[dm_db_resource_stats];
This returns one row for every 15 seconds for the last hour. We used this for testing the provided workloads to determine if we needed to bump up to the next tier. For a less granular view of this data, we used sys.resource_stats catalog view in the master database.
Monitoring index recommendations and usage
Periodically check missing index recommendations
For any Tableau customer, given the diverse workload characteristics, it is a good idea to periodically check missing index recommendations. We don’t recommend adding all recommendations arbitrarily, but we do like to periodically assess the cost/benefit of specific recommendations over time.
SELECT [migs].[group_handle], [migs].[unique_compiles], [migs].[user_seeks],
[migs].[user_scans], [migs].[last_user_seek], [migs].[last_user_scan],
[migs].[avg_total_user_cost], [migs].[avg_user_impact],
[migs].[system_seeks], [migs].[system_scans],
[migs].[last_system_seek], [migs].[last_system_scan],
[migs].[avg_total_system_cost], [migs].[avg_system_impact],
[mig].[index_group_handle], [mig].[index_handle], [mid].[index_handle],
[mid].[database_id], [mid].[object_id], [mid].[equality_columns],
[mid].[inequality_columns], [mid].[included_columns],
[mid].[statement]
FROM [sys].[dm_db_missing_index_group_stats] AS [migs]
INNER JOIN [sys].[dm_db_missing_index_groups] AS [mig]
ON ( [migs].[group_handle] = [mig].[index_group_handle] )
INNER JOIN [sys].[dm_db_missing_index_details] AS [mid]
ON ( [mig].[index_handle] = [mid].[index_handle] );
Validate index usage over time
Conversely, we recommend making sure that indexes are pulling their weight over the long term. Some indexes may not be useful over a long period of time, so we recommend checking index usage via the following applicable dynamic management views.
SELECT OBJECT_NAME([s].[object_id]) AS [Table Name],
[i].[name] AS [Index Name], [s].[user_seeks], [s].[user_scans],
[s].[user_lookups], [s].[user_updates], [s].[last_user_seek],
[s].[last_user_scan], [s].[last_user_lookup], [s].[last_user_update],
[s].[system_seeks], [s].[system_scans], [s].[system_lookups],
[s].[system_updates], [s].[last_system_seek], [s].[last_system_scan],
[s].[last_system_lookup], [s].[last_system_update]
FROM [sys].[dm_db_index_usage_stats] AS [s]
INNER JOIN [sys].[indexes] AS [i]
ON [s].[object_id] = [i].[object_id]
AND [i].[index_id] = [s].[index_id]
INNER JOIN [sys].[objects] AS [o] WITH ( NOLOCK )
ON [i].[object_id] = [o].[object_id]
WHERE OBJECTPROPERTY([s].[object_id], 'IsUserTable') = 1
ORDER BY [s].[user_updates] DESC;
Best practices
- Monitor over time and ensure you do not drop indexes without ensuring that all representative workloads have been run over the testing period.
- Be cautious about dropping indexes that are used to define uniqueness. The indexes may not be used for traversal, but still may be necessary for estimation and enforcing purposes.
- The best scenario for monitoring index usage is for indexes where you are uncertain if it will be helpful and used once created. You can add the index, run the workload, and then check sys.dm_db_index_usage_stats. You can check the plans too, but for larger workloads, checking the DMV is faster.
We hope this is useful to you and we’re curious to read your comments and feedback on how you use Tableau and Azure SQL DB. If you are new to this scenario, Tableau is available with a trial license key, and try the Azure free trial to unlock the benifits you can use against Azure SQL DB. Tableau Server is also available as a ready-to-spin image on the Azure Marketplace.
If you’re looking at more complex deployment scenarios and want to upgrade your Tableau and Azure skills, we’d recommend a look at our Tableau and Cloudera Quickstart Azure template.
You can also follow and connect with the Azure SQL DB team on Twitter.
Acknowledgments
This article is a collaboration between several people. Special thanks to Dan Cory (Tableau), Nicolas Caudron (Microsoft) and Gil Isaacs (Microsoft).