PrestoDB, commonly referred to as simply Presto, is an open-source distributed SQL query engine. Presto is used by companies like Meta and Netflix to efficiently query massive data volumes from disparate data sources. Alteryx offers support for Presto with both Designer and Alteryx Analytics Cloud. In this blog, we’ll focus on the integration with Alteryx Analytics Cloud specifically.
Integration with Alteryx Analytics Cloud
Presto is commonly used in conjunction with a Hadoop cluster and could be deployed in the cloud alongside a technology like Amazon EMR, or on-premises. On-prem deployments are still the most common, and this will be the scenario we focus on in this blog, although the integration with AACP is the same regardless of where Presto is deployed.
An important concept with Presto is that of the “catalog,” which is essentially a data source that Presto is configured to query. As Presto can query a large number of data sources through a single environment, in many ways, it can be thought of as a data federation or data virtualization layer. As such, many organizations have chosen to build application integrations into Presto rather than building integrations with each data source they want to query.
This becomes especially interesting with Alteryx Analytics Cloud being a SaaS cloud-hosted platform, as organizations may not want to open up connectivity to all their data sources. However, with Presto acting as a data virtualization layer, Alteryx Analytics Cloud could be granted access to Presto, with all the data sources kept private behind additional networking firewall rules. In such a scenario, only the Presto environment would need to be whitelisted to allow the Alteryx Analytics Cloud IP ranges, and the actual data sources would only be accessible through Presto.
Creating a Connection and Loading Data
Alteryx Analytics Cloud enforces a centralized data governance model by providing a single place for defining and sharing data connections. On the Connections page, Admins or those with the Create Connections permission can define a new Connection to Presto.
The Create Connections panel allows you to configure the connection details to the Presto environment, including any specific connection requirements. In most deployments, Presto will be configured with LDAP Authentication to validate the user, which is also used to determine what underlying data sources the user has access to.
After creating the Connection, the user can navigate to the Data page to browse Presto for data to work with in Alteryx Analytics Cloud. This is where the configured “catalogs” come into play and are presented to the user based on their auth assignments.
With Presto, the catalog is presented first, then upon selection, you can click into a Schema/Database and ultimately view the list of tables. From this view, a user can preview data and begin to work with the Alteryx Analytics Cloud Platform to solve a business problem.
Final Thoughts
This blog has provided a brief overview of how Presto could be used as a data virtualization layer with the Alteryx Analytics Cloud Platform, providing data and analytic users access to data sources efficiently to a broad range of data sources without creating connections to each data source. You can learn more about Presto and its integration with Alteryx Analytics Cloud using the resources below: