At Alteryx (formerly Trifacta), we’re always looking for new, innovative ways for data teams to take advantage of usable data easily and seamlessly. We pride ourselves on a collaborative experience for data practitioners to achieve advanced analytics. On that note, one of our most popular capabilities is the generation of rich data profile reports from datasets that help with data validation. These reports enable our users to troubleshoot mismatches or other issues with the datasets, validate and correct them, to ensure high-quality data is always delivered through their data pipelines.
In this pursuit of enabling consumable data for data teams, we’re excited to talk about our newest integration with dbt Core, the popular open-source workflow tool that lets teams collaboratively deploy analytics code along with software engineering practices such as modularity, portability, and CI/CD. Our goal is to provide you with the best of Designer Cloud and dbt Core to uncover, troubleshoot, and address potential data quality issues. With this powerful integration, you can profile, prepare, and pipeline your model output datasets along with identifying and resolving any quality issues in your data pipelines.
You can leverage this integration with an easy command-line utility that crawls your local dbt Core repositories and generates data profiles from the output datasets you’ve created in Google BigQuery. This utility reads your local dbt Core repository files and creates the necessary BigQuery connections and dataset metadata in Dataprep by Trifacta on Google Cloud.
The utility runs profiling jobs against your BigQuery objects that are produced by dbt Core and it then returns URLs to each data profile page. This can subsequently be downloaded to a pdf or JSON file for easy readability.
So, how do we visualize this? As they say, a picture is worth a thousand words, so here you go.
Once this is set up, you can collaborate and invite other members of your data teams to access your tables and interactively use Trifacta’s data connectors to build self-service pipelines for their projects.
In summary, you can avail of multiple benefits from this integration.
- You can quickly and easily go from text-based YAML and model files to data profile visualizations to get summarized statistics.
- You can validate distributions in your data at various points throughout your dbt Core pipelines.
- Collaboration becomes easier as you invite other users to view the profile results, and interactively access the data for their own self-service data pipelines.
Ready to get started? Click here to learn more and start using this tool.