Every row of your data is an insight waiting to be found. That is why it is critical you can get every row loaded into your data warehouse. When the data is clean, loading data into Azure SQL Data Warehouse is easy using PolyBase. It is elastic, globally available, and leverages Massively Parallel Processing (MPP). In reality clean data is a luxury that is always available. In those cases you need to know which rows failed to load and why.
In Azure SQL Data Warehouse the Create External Table definition has been extended to include a Rejected_Row_Location parameter. This value represents the location in the External Data Source where the Error File(s) and Rejected Row(s) will be written.
CREATE EXTERNAL TABLE [dbo].[Reject_Example] ( [Col_one] TINYINT NULL, [Col_two] VARCHAR(100) NULL, [Col_three] NUMERIC(2,2) NULL ) WITH ( DATA_SOURCE = EDS_Reject_Row ,LOCATION = 'Read_Directory' ,FILE_FORMAT = CSV ,REJECT_TYPE = VALUE ,REJECT_VALUE = 100 ,REJECTED_ROW_LOCATION=‘Reject_Directory' )
What happens when data is loaded?
When a user runs a Create Table as Select (CTAS) on the table above, PolyBase creates a directory on the External Data Source at the Rejected_Row_Location if one doesn’t exist. A child directory is created with the name “_rejectedrows”. The “_” character ensures that the directory is escaped for other data processing, unless explicitly named in the location parameter. Within this directory there is a folder created based on the time of load submission in the format YearMonthDay-HourMinuteSecond (ex. 20180330-173205). In this folder, two types of files are written, the _reason file and the data file.
The reason files and the data files both have the queryID associated with the CTAS statement. Because the data and the reason are in separate files corresponding files have a matching suffix.
Next Steps
We are excited to offer this new capability to all SQL DW Customers. For syntax take a look at the Create External Table (transact-SQL) Documentation. Download the latest version of SQL Server Management Studio (SSMS).