Cómo un stack de datos moderno transforma el análisis de datos

What's New   |   Alteryx   |   Jul 23, 2024 TIME TO READ: 8 MINS
TIME TO READ: 8 MINS

We often hear about organizations undergoing “data modernization” to become more data-driven. Essentially, this means that these organizations have recognized that legacy data tools aren’t very good at solving modern data problems. They’re moving data out of legacy mainframe databases and, at the same time, replacing legacy systems with an updated solution — commonly referred to as a “modern data stack.”

So, what does a modern data stack ecosystem look like, and how does it deliver on its promise of increasing and improving analytics? Read on to learn more.

What is a modern data stack?

Data is often referred to as “the new oil of the digital economy” because it is one of an organization’s most valuable yet underutilized assets. However, raw data alone holds little value. To unlock its potential, data must be compiled, organized, cleaned, and analyzed. The combination of technologies and processes that facilitate these steps makes up a data stack.

To take a data stack from traditional to modern involves incorporating cloud-based tools like data lakes and cloud warehouses to collect, store, process, analyze, and visualize data in an efficient, accessible, and scalable manner.

The key components of a modern data stack

Let’s take a closer look at the four critical components of a modern data stack.

  1. Loading: Technologies in this category are responsible for moving data from one place to another. Fivetran is a good example of a vendor that covers this part of the stack.
  2. Warehousing: These are the technologies that allow organizations to store all their data in one place. Cloud-based data warehouses, lakehouses, or data lakes are the basis of modern data stacks; provider examples include Google BigQuery, Amazon Redshift, Snowflake, and Databricks.
  3. Transforming: This stage turns “raw” data into “refined” data — in other words, it makes data usable for analytics. Most organizations will use a “data preparation platform” for this stage.
  4. Analytics: At this point, organizations begin to derive meaningful insights from their data by funneling it into machine learning models and business intelligence tools, serving up to stakeholders as reports or visualizations, or using it as the basis of data applications. Examples of analytics vendors abound; a few common vendors include Looker, Google Data Studio, Tableau, and Amazon SageMaker (ML models).

Keep it simple when modernizing your data stack

When organizations attempt to modernize their existing data stack, they often adopt overly complex architectures and tools that require specific skill sets.

Data engineering a complex data stack isn’t often a sign of success. Instead, organizations should prioritize building a data stack that leverages modern cloud, automation, and AI technologies to simplify access – making data available to more Analytics Champions in an organization.

“Doing the first 80% in analytics and the last 20% in departments that are closest to the data allows us to support the needs of the entire organization.”

Armen Rostamian, VP of Marketing Intelligence and Analytics , BODi Wellness 

Start with the basics instead of complicating your analytics journey from the beginning.

Here are the fundamental steps of a data lifecycle:

  • Data sources: Where is your data coming from? What does the data look like?
  • Data ingestion: How are you collecting the data?
  • Data storage: Where are you keeping the data? How much compute will you need?
  • Transformation: How are you processing the data into a usable format? What data platform will you use?
  • Analytics: How are you getting insights from the data? What types of data analysis will you do? How will you use analytics tools to support decision-making?
  • Reporting and data visualization: How are you sharing and using the findings? Will you be building dashboards? What BI tools will you be using?
  • Data governance: How will data teams ensure data quality and security at each stage? What will your processes be for data management?

Prioritize the essentials first and build around them. You can continually expand and add components as needed.

The rise of ELT

Data modernization has called for a new stack of technologies and a new way of building data pipelines.

In pre-cloud data warehouses, most organizations relied upon an ETL (extract, transform, load) process for data preparation. That is, data can be extracted from data systems and external sources, transformed into a format for storage, and loaded into databases. This process made sense when a small team of developers controlled the organization’s data. Now, far too many teams and users need data for a small group to handle the entire process of preparing data and serving it up to them. On top of that, shoehorning modern, complex data types into one format for storage isn’t efficient or conducive to data exploration.

Moving from an ETL to an ELT (also known as reverse ETL) process like the one outlined above — where organizations can load data into warehouses before it is transformed and then allow business users to transform it themselves — is a much more efficient approach.

The main advantages of ELT include:

  • Reduced time: An ETL process requires a staging area and system, which means extra time to load data; ELT does not.
  • Increased usability: Business users can own business logic instead of a small IT team using Java, Python, SQL, Scala, etc., to transform data.
  • More cost-effective: Using SaaS solutions, an ELT data structure stack has scalability and can scale up or down to the needs of an organization; ETL was designed for large organizations only.
  • Improved data analytics: Under ELT, business users can apply their unique business context to the data, often leading to better results.

Why transformation is so important in a data stack

We’ve been discussing the “T” in the data stack—using data transformation tools to transform data for analytic use. Let’s take a closer look at why this stage is so important.

A good analogy for transforming data is food preparation. The work it takes to move from raw ingredients to a complete meal is critical and largely dictates the quality of your meal. While some food preparation tasks can be applied to all ingredients (washing, removing stems, etc.), by and large, each ingredient will be prepared differently when cooking different meals. Data works similarly.

There is no “one-size-fits-all” data preparation. Each analytic project will require different data preparation steps and will have various data quality standards. However, the commonality in all data preparation jobs is that no matter how the data is transformed, that outcome will be the foundation of the final analysis—for better or worse. Performed correctly, data preparation can lead to deeper insights, even beyond the intended scope of analysis. Each step in the data preparation process exposes new potential ways that the data might be “re-wrangled,” all driving toward generating the most robust final analysis.

While IT often maintains responsibility for large-scale data transformation tasks to ensure a single version of the truth, business users need to own the finishing steps in cleansing and data preparation. Having the right business context allows these users to decide what’s acceptable, what needs refining, and when to move on to analysis.

“While Alteryx was critical for analysis, the integration between Snowflake, Alteryx, and Tableau made this project successful.”

Derik Sadowski, Senior Consultant, HCPA  

A modern ELT data stack with a data preparation platform like Alteryx allows business users to assume this responsibility. And it radically changes how analytics is performed across an organization. There is less friction in obtaining data, more of the right eyes on how it should be transformed, and increased room for exploration in how the analysis could be changed.

Alteryx Designer Cloud for the modern data stack

Designer Cloud is here for data preparation. It is an easy-to-use interface with drag-and-drop functionality, whether you’re a data scientist, data analyst, or just a business user interested in more profound insights. Organizations are building automated workflows and data pipelines that dramatically improve analytics efficiency using Alteryx Designer Cloud in conjunction with other technologies that make up the modern data stack.

“With Alteryx, we can significantly improve the productivity of our ETL activities, interfacing with Power BI or Tableau for visualization.”

Jifeng Qiu, Project Team Manager, Toyo Engineering  

Designer Cloud’s machine-learning-powered platform acts as an invisible hand during the data preparation, guiding users and data flows toward the best possible transformation. Its visual interface automatically surfaces errors, outliers, and missing data, allowing users to edit or redo any transformation quickly. It also has an extensive collection of connectors and integrations, making it easy to retrieve raw data from whichever source you have. Finally, it integrates with modern data stack technologies for seamless, automated data pipelines.

Learn why organizations incorporate Designer Cloud as a critical part of their data stack strategy today. Get started with Designer Cloud on the platform of your choice.

Tags