Filter Out Rows Using Regex


Using Regex

Using Regular Expressions allows you to analyze a value and find a match according to some logic. It's a pretty vast topic, so I won't get into it here but you should check out regex101.com to play around with your regex formula.

As a very high-level explanation, you can use regex to find partial matches within a value to help you determine if one value is similar to another.

For example, if I wanted to see which one of these values contained the term "apple" in them, I could use regex to quickly determine that:

let values = ['apple pie', 'apple tart', 'banana bread'] let regex = new RegExp(/apple/) console.log(values.map(value => regex.test(value))) // Output: [ true, true, false ]

Using the .filter() method with Regular Expressions

When you combine regex with the .filter() method, you get a powerful tool for filtering out your data using fuzzy matching. Let's take the same example from above and instead of outputting whether or not it matched, we'll return the values that matched our regex.

let values = ['apple pie', 'apple tart', 'banana bread'] let regex = new RegExp(/apple/) let filtered = values.filter((x) => regex.test(x)) console.log(filtered) // Output: [ 'apple pie', 'apple tart' ]

How to use this code snippet

Our filterOutWithRegex() code snippet takes in 4 parameters - the data set we're looking at, the target column number as well as the regex, and the regex options we're using.

We're using a data set instead of a value because you're liking doing this for a larger data set that you probably got from a CSV or a Google Sheet.

The target column number represents the Column in your data you want to run the regex against. Column A = 1, Column B = 2, and so on.

For the regex pattern and the regex options - you can use regex101.com to help you determine what those should be. I like to always use i in my regex options because it makes my regex case-insensitive.

function filterOutWithRegex(data, targetColumn, regexPattern, regexOptions) { // For targetColumn, start from position 1 - Column A = 1, Column B = 2, etc // For help with regex, try regex101.com const regex = new RegExp(regexPattern, regexOptions) const filteredData = data.filter((row) => {return regex.test(row[targetColumn-1]) == false}) return filteredData }

In this code snippet, we are specifically filtering OUT anything that matches our regex. Meaning, we'll be removing the rows where our target column matches the regex pattern we've put in.

Let's take this data set as the example:

let data = [ ['Pastry Name', 'Type'], ['Apple Tart', 'Sweet'], ['Blackberry Tart', 'Sweet'], ['Aloo Pie', 'Savory'], ['Quiche', 'Savory'], ['Tiropita', 'Savory'], ['Pirog', 'Sweet/Savory'], ['Pirozhki', 'Sweet or Savory'] ]

In this data set, we have a bunch of pastries and their types. Some are sweet, some are savory, some can be either. Let's say we wanted to filter out anything that could be either because we don't like pastries that are confused and can't commit to a type.

If we simply filtered the list by type == 'Sweet/Savory', we would only get back Pirog - but not the Pirozhki. We want both of those results back, but the way the types are entered are slightly different. This is where regex comes in handy!

Here's how we can write out regex pattern:

new RegExp(/(sweet.*savory)|(savory.*sweet)/, 'i')

This will match it up regardless of how it's entered as long as the word 'Sweet' and the word 'Savory' are both found.

You can run this using our code snippet by calling the function like this:

filterOutWithRegex(data, 2, /(sweet.*savory)|(savory.*sweet)/, 'i') // Output: [ [ 'Pastry Name', 'Type' ], // [ 'Apple Tart', 'Sweet' ], // [ 'Blackberry Tart', 'Sweet' ], // [ 'Aloo Pie', 'Savory' ], // [ 'Quiche', 'Savory' ], // [ 'Tiropita', 'Savory' ] ]

If you wanted to do this the other way around where you filter FOR the values that match your regex, you just need to update the code snippet to match on true instead of false. Like this:

function filterForWithRegex(data, targetColumn, regexPattern, regexOptions) { // For targetColumn, start from position 1 - Column A = 1, Column B = 2, etc // For help with regex, try regex101.com const regex = new RegExp(regexPattern, regexOptions) const filteredData = data.filter((row) => {return regex.test(row[targetColumn-1]) == true}) return filteredData } let data = [ ['Pastry Name', 'Type'], ['Apple Tart', 'Sweet'], ['Blackberry Tart', 'Sweet'], ['Aloo Pie', 'Savory'], ['Quiche', 'Savory'], ['Tiropita', 'Savory'], ['Pirog', 'Sweet/Savory'], ['Pirozhki', 'Sweet or Savory'] ] filterForWithRegex(data, 2, /(sweet.*savory)|(savory.*sweet)/, 'i') // Output: [ [ 'Pirog', 'Sweet/Savory' ], // [ 'Pirozhki', 'Sweet or Savory' ] ]
thumbnail

NitroGAS Chrome Extension

Want to reference this code snippet right from your Google Apps Script Project? Check out our NitroGAS Chrome Extension. This tool will help you build your scripts faster than you could imagine. The tool itself and access to the code snippets are 100% FREE - Happy Coding!

Get the Extension