Working with untidy text + Visual Vocabulary - Learn data with sqlbelle 2024.05.11 edition


Hello Reader,

Greeting here

Tableau Tip - working with untidy data

Imagine you have to work with a lot of text, for example, needing to extract the hashtags or reformat phone numbers easily. Does Tableau have the functionality to help you?

The answer is yes - with regex.

What is Regex?

Regex, or Regular Expressions, is a powerful way to work with text.

Think of it as a search tool that goes way beyond finding simple text matches. It finds patterns within text, which can be incredibly useful for cleaning up and organizing data. Regex is used heavily in text analytics, cybersecurity (log analysis, intrusion detection), and many other fields.

Many languages and tools support regex, including Python, Perl, JavaScript, PHP, C# - and Tableau.

Why Regex Matters

Data isn't always neat and tidy. Sometimes, text data comes in a jumbled mess that needs sorting or cleaning.

Understanding regex can dramatically improve your handling of text data. It means spending less time cleaning your data and more time analyzing it.

Regex in Tableau

Tableau makes using regex simple with a few key functions:

  • REGEXP_MATCH(): Checks if part of your text matches a pattern.
  • REGEXP_EXTRACT(): Pulls a specific piece of the text based on a pattern.
  • REGEXP_EXTRACT_NTH(): Pulls a particular piece of the text based on a pattern, starting at the nth position.
  • REGEXP_REPLACE(): Lets you swap parts of your text with something else.

Basic Regex Building Blocks

At the core of regex are the patterns. The basic building blocks for the patterns are:

  • Dot (.): This wildcard character represents any single character.
  • Digits (\d): Finds numbers. It spots and captures any digit from 0 to 9.
  • Word Characters (\w): Catches all the parts of words, numbers, and underscores
  • Whitespace (\s): Detects spaces, tabs, and breaks (new lines) in your text.
  • Caret (^): This symbol is your anchor, making sure the pattern appears at the start of the string. However, if placed inside square brackets, it means "match anything that is not in these characters."

Many more building blocks and rules exist, but this can get you started.

Regex Examples

Here are a couple of use cases where regex in Tableau can be used:

1. Extracting hashtags for trend analysis

  • Scenario: Extract the first hashtag in
  • Syntax: REGEXP_EXTRACT([Post], '#(\w+)')
  • Why regex: Efficiently pulls out hashtags that can be dynamically located anywhere within a post.

2. Cleaning and standardizing phone numbers

  • Scenario: Phone numbers might come in different formats, with different spacing, characters used for area codes
  • Syntax: REGEXP_REPLACE([Phone Number], '[^\d]', '')
  • Note: Placing a caret inside square brackets with \d says, "match anything that is not a digit".
  • Why regex: Cleans text efficiently by removing any non-digit characters.

FT Visual Vocabulary

The Financial Times (FT) Visual Vocabulary is a tool created to assist primarily journalists in choosing the most appropriate graphical representation for their data. It was inspired by Dr. Jon Schwabish and Severino Ribecca's Graphic Continuum.

The FT Visual Vocabulary helps present complex information to non-expert audiences. It organizes various charts based on what you need them for, helping you choose the right one for your story.

Initially introduced in 2016, the tool is part of a broader initiative to enhance the clarity and effectiveness of data visualization in journalism. Since then, it has gone beyond just journalism.

This is a great tool to leverage, especially if you're just starting your visualization journey.

Download a copy of the tool here.

Keep in mind, however, that this guide doesn't cover all situations. There might be times when the typical recommendation won't work well because of the specific data you have or the people who will be using it. Always keep an open mind, and be on the lookout for different possibilities.

Want to learn more?

Simple Techniques for bridging the graphics language gap

Visual Vocabulary Readme

Read Alan Smith's book: How Charts Work - Understand and Explain Data with Confidence

Thank you

I hope you found this edition informative.

I want to thank you for your support. It means a lot.

Until next time,

Donabel

Learn Practical Data Skills

Join 5K+ subscribers who receive weekly, bite-sized, practical and actionable lessons for the data professional. | Subscribe for additional resources, and start with free tutorials at youtube.com/sqlbelle

Read more from Learn Practical Data Skills

Hey there, I bought one of those "create your digital product" courses last year. It was helpful, but there was a disconnect. Day 1: "Define your avatar - soccer mom who loves morning routines" Day 2: "Create content pillars around lifestyle transformation" Day 3: "Build your email list with a productivity freebie" I kept thinking: "I'm a data professional. I help executives understand what their numbers actually mean. I have frameworks for managing stakeholder communications and...

Hey there, Last week, someone shared this with me:I need to find a way to communicate my ideas so people will see them as valuable. I want to share what I'm capable of without sounding arrogant. But I keep getting silenced or interrupted when I try to speak. How do I get heard when people are loud and won't stop talking? It resonated because she captured the exact catch-22 so many people face: you want to communicate what you're capable of, but you're terrified of coming across as bragging....

Hey there, You've probably felt it by now. That moment when ChatGPT generates a SQL query faster than you could type it. Or when AI creates a visualization that would have taken you an hour. The little voice in your head whispers: "Am I becoming obsolete?" Here's the truth: You're not less valuable because AI can write SQL faster than you. You're MORE valuable because you can think about whether that SQL actually answers the right question. But - and this is important - that doesn't mean you...