Exploring the data reality gap - Learn data with sqlbelle 2024.05.04 edition


Hello Reader,

I hope you had a great week this week. Here are the data tidbits for this week.

Tableau Tip - Crosstab

Here is a simple but overlooked Tableau Tip. Look at your data in crosstab.

Benefits of Crosstab view:

  • Provides a clear view of raw data numbers, which can help in your validation processes.
  • Particularly useful for verifying complex calculations such as table calculations or LOD calculations.

In Tableau, you can right-click on your view name (the tab at the bottom of the screen) and select Duplicate as crosstab.

Alternatively, you can go to the Worksheet menu at the top and find this option there too.

In addition to crosstabs, don’t forget you can see your underlying data a few different ways:

  1. View data from the sidebar shows you underlying data from the data source

2. View data from a specific mark shows you the underlying data that comprises that mark

Data Reality Gap Pitfall

Have you ever heard of the “Data Reality Gap”? I first encountered this term when I read Ben Jones’s book “Avoiding Data Pitfalls,” and it has stuck with me ever since.

The “data reality gap” refers to the discrepancy between the data as recorded, analyzed, or interpreted vs the actual, real-world conditions or behaviors the data is supposed to represent.

It means whatever data you’re working with - it’s just a slice of reality. There will be things that are not captured in your data, which means the data is incomplete.

Some reasons for this gap can include:

  • unavailability of data
  • misinterpretation of data
  • bias in data collection - i.e., someone would have decided which information was worth collecting, which data points were not
  • outdated data

While no data set can ever be complete, we need to acknowledge the limitations of our data in our analyses. Acknowledging this gap ensures more grounded analyses.

Here are some examples where you could see data reality gaps:

  1. Marketing Campaigns. There could be problematic recommendations if the analyses rely on outdated consumer interest data.
  2. Customer Service. There can be misaligned service improvements if only digital feedback is analyzed, ignoring verbal customer feedback.
  3. Retail. Retailers risk overstocking products and not selling them if analyses are based solely on past sales data, ignoring emerging trends.
  4. Energy Sector: The data reality gap appears when planning based on historical consumption patterns without considering renewable energy adoption rates.
  5. Education Sector: The data reality gap can lead to outdated curricula that don’t match current job market demands.

Another popular piece often cited on the topic of “Data Reality Gap” is the WWII Aircraft Analysis.

Here is a short rundown:

  • Initial Analysis: The U.S. Air Force analyzed bullet holes in planes returning from missions to decide where to reinforce armor.
  • Initial Approach: They focused on adding armor to areas with the most bullet holes, like wings and fuselages, assuming these were critical hit points.
  • Critical Oversight: This method, however, ignored planes that didn’t return from missions, leading to a flawed strategy based on survivorship bias.
  • Survivorship Bias Explained: Survivorship bias means the analysis only included data from surviving planes, missing critical insights from planes that were shot down.
  • Abraham Wald’s Insight: Abraham Wald proposed reinforcing areas without bullet holes (engines and cockpit) on returning planes, as hits there likely meant a plane wouldn’t return.
  • Outcome: Reinforcing less-damaged areas significantly improved mission survival rates, demonstrating the importance of accounting for unseen data.

It's a wrap.

That's it for now.

Remember, the journey through data is paved with questions, not just answers. The right question can change the way we see the world. Keep asking, keep learning.

Until next time,

Donabel

Hi! I'm sqlbelle!

Join 4.4K subscribers who receive weekly, bite-sized data lessons, and practical SQL and Tableau tutorials | Subscribe for additional resources, and start with free tutorials at youtube.com/sqlbelle

Read more from Hi! I'm sqlbelle!

Hey there, Remember when you landed that data analyst job you'd been dreaming of? The one with the fancy title, cool projects, and promise of working with cutting-edge tech? Fast forward to today. You're knee-deep in a complex analysis at 2 AM, fueled by your third cup of coffee, suddenly realizing this isn't normal. Or healthy. Welcome to burnout in the data world. It's real, it's ugly, and it's more common than inconsistent data across multiple platforms. But here's the thing: It doesn't...

Hello there, Ever feel like you're following all the "best practices" but still not getting the results you want? You're not alone. Here's the thing: when it comes to data work, "best practices" can be misleading, and sometimes even a trap. Let's dive into why, and more importantly, what you can do about it. The Problem with Best Practices 🚫 Best practices are like hand-me-down clothes. Sometimes they fit perfectly, but often they need some serious tailoring.In the data world, here's where...

Hey there, You know that feeling when you look at a chart and something just... clicks? Like suddenly the data makes sense in a way it never did before? That's the power of a well-designed data visualization. And the secret to creating that "aha!" moment lies in understanding how the human brain works. Because here's the thing: Our brains are fascinating, but they're also kind of quirky. And if you don't know how to work with those quirks, your charts might not be having the impact you want....