Working with untidy text + Visual Vocabulary - Learn data with sqlbelle 2024.05.11 edition


Hello Reader,

Greeting here

Tableau Tip - working with untidy data

Imagine you have to work with a lot of text, for example, needing to extract the hashtags or reformat phone numbers easily. Does Tableau have the functionality to help you?

The answer is yes - with regex.

What is Regex?

Regex, or Regular Expressions, is a powerful way to work with text.

Think of it as a search tool that goes way beyond finding simple text matches. It finds patterns within text, which can be incredibly useful for cleaning up and organizing data. Regex is used heavily in text analytics, cybersecurity (log analysis, intrusion detection), and many other fields.

Many languages and tools support regex, including Python, Perl, JavaScript, PHP, C# - and Tableau.

Why Regex Matters

Data isn't always neat and tidy. Sometimes, text data comes in a jumbled mess that needs sorting or cleaning.

Understanding regex can dramatically improve your handling of text data. It means spending less time cleaning your data and more time analyzing it.

Regex in Tableau

Tableau makes using regex simple with a few key functions:

  • REGEXP_MATCH(): Checks if part of your text matches a pattern.
  • REGEXP_EXTRACT(): Pulls a specific piece of the text based on a pattern.
  • REGEXP_EXTRACT_NTH(): Pulls a particular piece of the text based on a pattern, starting at the nth position.
  • REGEXP_REPLACE(): Lets you swap parts of your text with something else.

Basic Regex Building Blocks

At the core of regex are the patterns. The basic building blocks for the patterns are:

  • Dot (.): This wildcard character represents any single character.
  • Digits (\d): Finds numbers. It spots and captures any digit from 0 to 9.
  • Word Characters (\w): Catches all the parts of words, numbers, and underscores
  • Whitespace (\s): Detects spaces, tabs, and breaks (new lines) in your text.
  • Caret (^): This symbol is your anchor, making sure the pattern appears at the start of the string. However, if placed inside square brackets, it means "match anything that is not in these characters."

Many more building blocks and rules exist, but this can get you started.

Regex Examples

Here are a couple of use cases where regex in Tableau can be used:

1. Extracting hashtags for trend analysis

  • Scenario: Extract the first hashtag in
  • Syntax: REGEXP_EXTRACT([Post], '#(\w+)')
  • Why regex: Efficiently pulls out hashtags that can be dynamically located anywhere within a post.

2. Cleaning and standardizing phone numbers

  • Scenario: Phone numbers might come in different formats, with different spacing, characters used for area codes
  • Syntax: REGEXP_REPLACE([Phone Number], '[^\d]', '')
  • Note: Placing a caret inside square brackets with \d says, "match anything that is not a digit".
  • Why regex: Cleans text efficiently by removing any non-digit characters.

FT Visual Vocabulary

The Financial Times (FT) Visual Vocabulary is a tool created to assist primarily journalists in choosing the most appropriate graphical representation for their data. It was inspired by Dr. Jon Schwabish and Severino Ribecca's Graphic Continuum.

The FT Visual Vocabulary helps present complex information to non-expert audiences. It organizes various charts based on what you need them for, helping you choose the right one for your story.

Initially introduced in 2016, the tool is part of a broader initiative to enhance the clarity and effectiveness of data visualization in journalism. Since then, it has gone beyond just journalism.

This is a great tool to leverage, especially if you're just starting your visualization journey.

Download a copy of the tool here.

Keep in mind, however, that this guide doesn't cover all situations. There might be times when the typical recommendation won't work well because of the specific data you have or the people who will be using it. Always keep an open mind, and be on the lookout for different possibilities.

Want to learn more?

Simple Techniques for bridging the graphics language gap

Visual Vocabulary Readme

Read Alan Smith's book: How Charts Work - Understand and Explain Data with Confidence

Thank you

I hope you found this edition informative.

I want to thank you for your support. It means a lot.

Until next time,

Donabel

Learn Practical Data Skills

Join 5K+ subscribers who receive weekly, bite-sized, practical and actionable lessons for the data professional. | Free video tutorials at youtube.com/sqlbelle | Teaching data? Incorporate AI - tips and prompts at https://teachdatawithai.substack.com/

Read more from Learn Practical Data Skills

You know the signs: glazed eyes during your presentation, people checking phones while you explain a process, or the dreaded interruption - ”Sorry, but why does this matter to me?” It happens because we lead with how things work instead of what breaks when they don’t. We assume people want to understand the process when they really want to understand the consequences. The Gap Between Data Professionals and Everyone Else Here’s what usually happens: you spend time crafting a clear technical...

You know that feeling when you show someone your analysis and… nothing happens? The numbers are solid. Your work is spot-on. Everything makes perfect sense. But somehow your ideas just sit there. Nobody's acting on it. Most of the time, this is the reason: data doesn’t convince people - understanding how people think does. The Situation The analysts whose recommendations actually get implemented aren’t always the ones with the fanciest techniques or the cleanest data. They’re the ones who...

Hello there, Quick question: Have you ever designed a metric that created exactly the opposite behavior you wanted? If you’re nodding, you’ve discovered Goodhart's Law in action: "When a measure becomes a target, it ceases to be a good measure." This suggests the fundamental truth about human nature: As soon as people know a number is being watched or used to make decisions, they start optimizing for that number - often at the expense of what it was meant to represent. Here’s why this matters...