Alteryx Analytics Cloud supports an insight-driven organization with unified solutions from automated data preparation, approachable machine learning, and AI-generated insights.
Databricks unifies data warehousing on a single platform with its Lakehouse architecture, while Alteryx enables self-service analytics with an easy-to-use experience. Alteryx and Databricks empower everyone in your organization to get more value from data by making it faster and easier to access and process data.
This blog examines the integration between Alteryx Analytics Cloud and Databricks on AWS.
Databricks complements Alteryx Analytics Cloud from two perspectives:
Connectivity
Databricks offers managed tables that can be accessed like cloud data warehouse (CDW) and relational sources such as Snowflake or Postgres. It follows the Lakehouse architecture, storing underlying data using open storage formats such as Parquet in customer-controlled cloud storage while also providing CDW-like capabilities such as ACID support, schema enforcement, and centralized governance. Alteryx Analytics Cloud enables users to easily connect to Unity Catalog and leverage data in Databricks for transformation and analytics.
Execution
Databricks offers a managed offering of Apache Spark for processing data at scale. More recently, Databricks has released Databricks SQL with performance superior to Spark. Alteryx Designer Cloud enables Analysts to leverage both Databricks Spark and SQL for workflow execution using a no-code experience.
Connecting to Databricks
The Alteryx Analytics Cloud Administrator adds one or more Databricks workspaces to an Analytics Cloud workspace by specifying the Databricks service URL and a Personal Access Token (PAT).
Admins and users with permission to create connections then easily connect to these Databrick workspaces by using the Create Connection feature and providing the connection properties.
Importing References to Databricks Tables
With a Databricks connection defined, the Alteryx Analytics Cloud Data Import capability is used to display and explore tables within the Databricks Unity Catalog. Selecting the table displays metadata about the table and a small sample.
Data Preparation with Designer Cloud and Databricks
Alteryx Designer Cloud provides data preparation capabilities for business users with a visual, interactive, and AI-powered platform to ensure clean, connected, and trusted data is available to support data services, modern BI / Reporting, and AI / ML initiatives.
Once a connection to a Databricks workspace is added, a Designer Cloud workflow can reference Databricks tables for input and output. Additionally, once the admin enables the Databricks runtime, Databricks Spark or SQL can be used as the execution engine for workflow processing.
Pushdown Processing into Databricks Serverless SQL
In the case when all workflow inputs and outputs reside in Databricks, Designer Cloud will generate the transformation logic into native Databricks SQL, which is pushed down and executed in Databricks. This is the most performant execution method for the scenario in that no data egresses from Databricks, and Databricks SQL Serverless compute is used for workflow execution.
Databricks Spark Processing
Any Analytics Cloud-supported input to any supported output can be processed with Databricks Spark as a non-pushdown engine option. Databricks Spark is suitable for larger data volumes that benefit from distributed processing on a compute cluster. A Spark cluster is temporarily created for each job.
Monitoring Designer Cloud Jobs Run in Databricks
The execution stages section of Alteryx Analytics Cloud jobs detail page will display “Databricks SQL” or “Spark (Databricks)” for the Transform Environment for Designer Cloud jobs that are run in Databricks. Job status, runtime duration, and output data are also provided on this page.
Additional Databricks-specific job information is displayed within the Databricks UI Job Runs page. Workflow jobs executed with Databricks SQL include “SQL” within the job name. Jobs without “SQL” in the name were run with Databricks Spark. Selecting the job name link displays the job run details page that includes query information for SQL jobs and links to logs for Spark jobs.
Summary
Alteryx Analytics Cloud and Databricks work together to accelerate getting from source data to business insights in the cloud. Business Analysts can leverage cloud native processing and at-scale data using a simple no-code platform.
You can learn more about Databricks and its integration with Alteryx Analytics Cloud using the resources below:
Databricks
Alteryx Analytics Cloud Help – Databricks Connections