Self-Service Analytics on BigQuery in 30 minutes

Technology   |   Paul Warburg   |   Jul 25, 2019 TIME TO READ: 9 MINS
TIME TO READ: 9 MINS

Learn how to leverage BigQuery, Cloud DataPrep, and Data Studio to turn your raw data into a beautiful report without the help of anyone but you. And all in less than 30 minutes thanks to Google Cloud Platform smart analytics suite.

But, before you start your timer, make sure you are familiar with the basic concepts of self-service analytics on Google Cloud Platform. The brief “Getting Started with Self-Service Analytics on BigQuery with Cloud Dataprep” will give you this knowledge. 

Then, you need to have at your disposal a GCP project. We will refer to “my-self-service-analytics” project in this blog. Please follow these instructions to create your GCP project if you do not have one yet.  

For your first self-service analytics project, your objective is to create a Data Studio dashboard reporting on your eCommerce website performance and the best web channels driving revenue. It will tell you where to invest more and which products need more focus.

You will have to clean and combine multiple data sources within Cloud Dataprep, output it to BigQuery to make it consumable within Google Data Studio and create your dashboard.

3…2…1 you can start your timer now. 30 min stopwatch in hand!

Setting up BigQuery, Cloud Dataprep, and Google Data Studio

BigQuery Setup

You have to create a dataset, the top-level containers to organize your tables and views within BigQuery.

  1. Go to the BigQuery console. If asked, accept the terms and conditions to activate the service. It can take a few minutes.
  2. Select your GCP project “my-self-service-analytics” on the left side.
  3. Select and name it “MyAnalytics”

Cloud Dataprep Setup

  1. Go to Cloud Dataprep. Accept the terms and conditions to activate the service. It can take a few minutes. 
  2. Create a “My Analytics” Flow 
  3. Add the data to your flow by importing the files Sales_Data.csv and Ad_Clicks.csv which contain respectively the e-Commerce transactions and the clickstream ads log file capturing website traffic information. Select “Add Datasets”, then “Import Datasets”, “Chose a file”, and “Import & Add to Flow”

Google Data Studio Setup

You’re now ready for preparing the data and designing your reports. Keep the 3 tabs open, you will reuse them.

Prepare the Data

We want to understand which are the domains driving the most traffic to the eCommerce website and which ones are converting best. To achieve this, we need to combine the Ad_Clicks and Sales_Data together. However, both data sets can’t be combined as is and are missing some metrics. We need to prepare the data so it can be combined and enriched for reporting. Here are the data manipulations we need to go through:

In the Ad_Clicks dataset, we will need to understand which product is advertised, so we will need to extract the product_id from the URL as such: https://cafepress.com/vivamus.json?pid=1319500800000-208 

In the Sales_Data we will need to calculate the profit by product to identify the most performing web channels.

Then we can combine the two datasets together and load it into BigQuery to further report on it.

Steps to prepare the data

  1. Go to Cloud Dataprep
  2. Select “My Analytics” flow. It should look like this.
  3. Add and edit a new recipe to Ad_click.csv
  4. In the grid interface, expand the column URL
  5. Highlight the value next to pid=
  6. Select the first suggestion to extract the product id
  7. Rename the newly created column into product_id by clicking on the column header and selecting the “rename”.
  8. You have now extracted the product_id which is necessary to combine the Ad_Clicks and Sales_Data datasets together. Switch back to the flow view by clicking “MY ANALYTICS”
  9. Add and edit a new recipe for Sales_Data.csv. In the grid interface when the dataset has opened, on the far right select Recipe
  10. Add a “new step” and type “New Formula”, then enter (or copy/past) “NUMFORMAT(totalSale – totalCost, ‘###.##’)” in the Formula field and “profit” in the new column name. You have now created a new metric representing your profit for each transaction.
  11. To combine (join) Sales_data with Ad_clicks, use the join tool and select Ad_Clicks dataset to join with, and accept. Edit/Add the join conditions to reflect these characteristics, then select Next and check these output columns: ordermonth, orderyear, productkey, customerstate, customercountry, totalSale, profit, domain, device_type
  12. Then Review and Save to Recipe. You have now created a combined dataset you can report on with DataStudio, but before this step we need to load the dataset result into BigQuery. Run the job to publish into BigQuery.
  13. In the Run Job on Dataflow screen, edit the publishing action to replace the default file publishing to a BigQuery table. Select BigQuery, MyAnalytics dataset, and select Create a New Table.
  14. Select Append to this table every run and Update.
  15. And select Run Job. The data is now processed and the result will be loaded in a BigQuery table

Visualize the Data

Now that the data has been prepared in Cloud Dataprep and loaded into a BigQuery table, you are ready to create a report with Data Studio on top of it.

  1. Go to Google Data Studio
  2. Select DATA SOURCES and select (+) to add a new source
  3. Select BigQuery and authorize Data Studio for BigQuery if requested
  4. Navigate BigQuery datasets to connect to Sales_Data; you might need to authorize Bigquery access
  5. Change customerstate to a Geo.Region type and customercountry to a Geo.country type. It will permit using specific geographical chart to represent the data.
  6. Change orderyear to a Year (YYYY) format. That will be the default format displayed in your charts.
  7. Select Create a Report when done 
  8. Add a new Geo Chart in the report
  9. Drag and drop the customerstate field to replace the customer country dimension section. Drag and drop the totalSales field to replace Record Count in the metric section. It should look like this. This is a nice representation to quickly know in which states your products sell well.
  10. Add a bar chart to the report
  11. Drag and drop the domain field to replace the productKey dimension section 
    Drag and drop the profit field to the metric section and remove Record Count
    It should look like this. This addition graph gives you an idea of the web sites that are diving the most profit (when you remove all the costs associated to advertise your products).
  12. You can now switch to the view mode to visualize and interact with the report.

Stop your timer! How much did you do? 

Conclusion

While this is a rudimentary example, you have now achieved the foundational steps to establish a scalable self-service analytics solution leveraging:

  • BigQuery for storing and retrieving the data
  • Cloud Dataprep to clean, combine, and create metrics
  • Google Data Studio to visually report on top of your data. 

By exploring each GCP service deeper and iterating on these principles, you will be able to solve any requirements for your analytics in a self-service manner. Maybe in more than 30 minutes though, but you will have the total control of the end to end process, which will empower you with the agility you need for faster, more accurate and personalized data insight to drive your business.

Tags