Working with untidy text + Visual Vocabulary - Learn data with sqlbelle 2024.05.11 edition


Hello Reader,

Greeting here

Tableau Tip - working with untidy data

Imagine you have to work with a lot of text, for example, needing to extract the hashtags or reformat phone numbers easily. Does Tableau have the functionality to help you?

The answer is yes - with regex.

What is Regex?

Regex, or Regular Expressions, is a powerful way to work with text.

Think of it as a search tool that goes way beyond finding simple text matches. It finds patterns within text, which can be incredibly useful for cleaning up and organizing data. Regex is used heavily in text analytics, cybersecurity (log analysis, intrusion detection), and many other fields.

Many languages and tools support regex, including Python, Perl, JavaScript, PHP, C# - and Tableau.

Why Regex Matters

Data isn't always neat and tidy. Sometimes, text data comes in a jumbled mess that needs sorting or cleaning.

Understanding regex can dramatically improve your handling of text data. It means spending less time cleaning your data and more time analyzing it.

Regex in Tableau

Tableau makes using regex simple with a few key functions:

  • REGEXP_MATCH(): Checks if part of your text matches a pattern.
  • REGEXP_EXTRACT(): Pulls a specific piece of the text based on a pattern.
  • REGEXP_EXTRACT_NTH(): Pulls a particular piece of the text based on a pattern, starting at the nth position.
  • REGEXP_REPLACE(): Lets you swap parts of your text with something else.

Basic Regex Building Blocks

At the core of regex are the patterns. The basic building blocks for the patterns are:

  • Dot (.): This wildcard character represents any single character.
  • Digits (\d): Finds numbers. It spots and captures any digit from 0 to 9.
  • Word Characters (\w): Catches all the parts of words, numbers, and underscores
  • Whitespace (\s): Detects spaces, tabs, and breaks (new lines) in your text.
  • Caret (^): This symbol is your anchor, making sure the pattern appears at the start of the string. However, if placed inside square brackets, it means "match anything that is not in these characters."

Many more building blocks and rules exist, but this can get you started.

Regex Examples

Here are a couple of use cases where regex in Tableau can be used:

1. Extracting hashtags for trend analysis

  • Scenario: Extract the first hashtag in
  • Syntax: REGEXP_EXTRACT([Post], '#(\w+)')
  • Why regex: Efficiently pulls out hashtags that can be dynamically located anywhere within a post.

2. Cleaning and standardizing phone numbers

  • Scenario: Phone numbers might come in different formats, with different spacing, characters used for area codes
  • Syntax: REGEXP_REPLACE([Phone Number], '[^\d]', '')
  • Note: Placing a caret inside square brackets with \d says, "match anything that is not a digit".
  • Why regex: Cleans text efficiently by removing any non-digit characters.

FT Visual Vocabulary

The Financial Times (FT) Visual Vocabulary is a tool created to assist primarily journalists in choosing the most appropriate graphical representation for their data. It was inspired by Dr. Jon Schwabish and Severino Ribecca's Graphic Continuum.

The FT Visual Vocabulary helps present complex information to non-expert audiences. It organizes various charts based on what you need them for, helping you choose the right one for your story.

Initially introduced in 2016, the tool is part of a broader initiative to enhance the clarity and effectiveness of data visualization in journalism. Since then, it has gone beyond just journalism.

This is a great tool to leverage, especially if you're just starting your visualization journey.

Download a copy of the tool here.

Keep in mind, however, that this guide doesn't cover all situations. There might be times when the typical recommendation won't work well because of the specific data you have or the people who will be using it. Always keep an open mind, and be on the lookout for different possibilities.

Want to learn more?

Simple Techniques for bridging the graphics language gap

Visual Vocabulary Readme

Read Alan Smith's book: How Charts Work - Understand and Explain Data with Confidence

Thank you

I hope you found this edition informative.

I want to thank you for your support. It means a lot.

Until next time,

Donabel

Hi! I'm sqlbelle!

Join 4.4K subscribers who receive weekly, bite-sized data lessons, and practical SQL and Tableau tutorials | Subscribe for additional resources, and start with free tutorials at youtube.com/sqlbelle

Read more from Hi! I'm sqlbelle!

Hey there, Remember when you landed that data analyst job you'd been dreaming of? The one with the fancy title, cool projects, and promise of working with cutting-edge tech? Fast forward to today. You're knee-deep in a complex analysis at 2 AM, fueled by your third cup of coffee, suddenly realizing this isn't normal. Or healthy. Welcome to burnout in the data world. It's real, it's ugly, and it's more common than inconsistent data across multiple platforms. But here's the thing: It doesn't...

Hello there, Ever feel like you're following all the "best practices" but still not getting the results you want? You're not alone. Here's the thing: when it comes to data work, "best practices" can be misleading, and sometimes even a trap. Let's dive into why, and more importantly, what you can do about it. The Problem with Best Practices 🚫 Best practices are like hand-me-down clothes. Sometimes they fit perfectly, but often they need some serious tailoring.In the data world, here's where...

Hey there, You know that feeling when you look at a chart and something just... clicks? Like suddenly the data makes sense in a way it never did before? That's the power of a well-designed data visualization. And the secret to creating that "aha!" moment lies in understanding how the human brain works. Because here's the thing: Our brains are fascinating, but they're also kind of quirky. And if you don't know how to work with those quirks, your charts might not be having the impact you want....