Customers love to use Hadoop and often rely on Oozie, a workflow and coordination scheduler for Hadoop to accelerate and ease their big data implementation. Oozie is integrated with the Hadoop stack, and it supports several types of Hadoop jobs. However, for users of Azure HDInsight with domain joined clusters, Oozie was not a supported option. To get around this limitation customers had to run Oozie on a regular cluster. This was costly with extra administrative overhead. Today we are happy to announce that customers can now use Oozie in domain-joined Hadoop clusters too.
In domain-joined clusters, authentication happens through Kerberos and fine-grained authorization is through Ranger policies. Oozie supports impersonation of users and a basic authorization model for workflow jobs.
Moreover, Hive server 2 actions submitted as part of an Oozie workflow get logged and are auditable through Ranger too. Fine-grained authorization through ranger will be enforced on the Oozie jobs, only when Ranger policies are present, otherwise coarse-grained authorization based on HDFS (only available on ADLS Gen1) is enforced.
Learn more about how to create an Oozie workflow and submit jobs in a domain joined cluster, and how to use Oozie with Hadoop to define and run a workflow on Linux-based Azure HDInsight.
Try HDInsight now
We hope you take full advantage of today’s announcements and we are excited to see what you will build with Azure HDInsight. Read this developer guide and follow the quick start guide to learn more about implementing these pipelines and architectures on Azure HDInsight. Stay up-to-date on the latest Azure HDInsight news and features by following us on Twitter #HDInsight and @AzureHDInsight. For questions and feedback, please reach out to AskHDInsight@microsoft.com.
About HDInsight
Azure HDInsight is Microsoft’s premium managed offering for running open source workloads on Azure. Today, we are excited to announce several new capabilities across a wide range of OSS frameworks.
Azure HDInsight powers some of the top customer’s mission critical applications ranging in a wide variety of sectors including, manufacturing, retail education, nonprofit, government, healthcare, media, banking, telecommunication, insurance, and many more industries ranging in use cases from ETL to Data Warehousing, from Machine Learning to IoT, and more.