Keeping code simple with regular expressions

Keeping code simple with regular expressions

Scenarios where RegEx fit better than if/else statements and loops

A regular expression can save multiple conditionals, loops and string functions, making the code simpler. A one-liner regex code looks elegant and much more readable.

I am sharing some examples here. The first three are PHP and Javascript problems and their solution, followed by a RegEx solution.

The other three examples are about employing regex in SQL database, Apache, Nginx web servers and Linux shell.

Table of Content

  1. Time to read an article
  2. Gmail username validation
  3. IP address validation
  4. RegExp in SQL
  5. RegEx in Apache, Nginx webserver
  6. Linux Shell

Example 1:

Time to read an article

According to a study in the Journal of memory and Language( M Brysbaert), we read 238 words per minute. This function will return minutes to read the text input.

function minutesToRead($text){
$total_words = str_word_count(implode(" ", $text));
$minutes_to_read = round($total_words / 238);

return max($minutes_to_read, 1);
}

echo minutesToRead($content) . ' min read'

Instead of breaking down the text into an array of words, we count the spaces \s in the text. We can also use \w+ to count the words.

PHP (regex)

function minutesToRead($text){
   $total_words = preg_match_all('/\s/', $text, $match);
   return max(round($total_words / 238), 1);
}

Javascript (regex)

function minutesToRead(text){
   const word_count = text.match(/\s/g).length;
   return Math.max(Math.round(word_count / 238), 1);
}

PHP preg_match_all matches all occurrences. In Javascript, the group flag \g is used to get all matches.

If the text has HTML tags, use PHP strip_tags to remove these tags in Javascript use one of these regular expressions to strip tags.

/<[\w\s"-.=%#;'“”!?…{}()\d:\/]+>/g
OR
/<[^<]+>/g

Example 2:

Gmail username validation

A username input needs checks for these rules:

  • begins with an English letter
  • only contains English letters, digits and dot (.)
  • minimum 6, maximum 30 characters long

A non-regex solution would need separate code blocks for each rule converting string to an array, using the filter function and several conditionals to implement all validation rules in the code.

For brevity, I will go straight to the solution using regular expression.

PHP

function isValidUsername($username)
{
    return preg_match("/^[a-z][a-z0-9.]{5,29}$/i", $username) === 1;
}

Javascript

function usernameIsValid(username){
return /^[a-z][a-z0-9.]{5,29}$/i.test(username);
}
  • ^[a-z] ensures username begins with a letter in the range of a-z.
  • [a-z0-9.] checks rest of the username only contains alphanumeric values and a dot.
  • {5,29} validates the length of the string is in the allowed range.

  • i flag is used for a case-insensitive match.

Example 3:

IP address validation

IPv4 address is a collection of four 8-bit integers (from 0 to the largest 8-bit integer 255) separated by a dot (.).

Examples:

  • 192.168.0.1 is a valid IPv4 address

  • 255.255.255.255 is a valid IPv4 address

  • 257.100.92.101 is not a valid IPv4 address because 257 is too large to be an 8-bit integer

  • 255.100.81.160.172 is not a valid IPv4 address because it contains more than four integers

  • 1..0.1 is not a valid IPv4 address because it's not properly formatted

  • 17.233.00.131 and 17.233.01.131 are not valid IPv4 addresses as both contain leading zeros

Javascript (without regular expressions)

function isIPv4Address(inputString) {

let ip = inputString.split('.');
return ip.filter((e)=>{return e.match(/\D/g) || e > 255 || parseInt(e) != e;}).length == 0 && ip.length === 4;
}

PHP filter_var has an IP validator so, we do not need to write regex here.

PHP

filter_var("192.168.00.1", FILTER_VALIDATE_IP, FILTER_FLAG_IPV4);

Javascript (regex)

function isIPv4Address(inputString) {

const ip = inputString.split('.');
if(ip.length !== 4) {return false};
return ip.every(e => /^([1-9]?[0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])$/.test(e));
}

The IP address split on dots into four strings. Regular expression validates each of the string is an 8-bit integer. Unlike non-regex solution, there is no string to int conversion.

  • [1-9]?[0-9] matches numbers between 0 to 99

  • 1[0-9][0-9] matches numbers between 100 to 199

  • 2[0-4][0-9] matches numbers between 200 - 249

  • 25[0-5] matches number between 250 to 255

  • | is OR ^,$ marks the beginning and end of the regex

Example 4:

RegExp in SQL

For example, to extract initials from the name column of a table.

MySQL query

SELECT 
    id,
    name,
    REGEXP_REPLACE(name, '(.{1})([a-z]*)(.*)$','$1\.$3') AS REGEXP_name 
FROM students;

result

id    name                REGEXP_name

33    Lesa Barnhouse      L. Barnhouse
38    Kurtis Saulters     K. Saulters
40    Charisse Lake       C. Lake
  • (.{1}) group 1 matches the first character of the name
  • ([a-z]*) group 2 matches alphabets up till space
  • (.*) group 3 matches the rest of the name up till the end
  • $1\.$3 prints value of group1, . and value of group3

Note: MySQL regular expressions support is not extensive, and character class tokens are different: like: [:alpha:] instead of standard \w. More details on MySQL RegExp manual and O'Reilly's cookbook.

Example 5:

RegEx in Apache, Nginx webserver

For example, a blog with URI articles.php?id=123 uses article_id to display the requested articles. Change it to SEO friendly URI like articles/category/title-of-article_123.html in the blog. Virtually all articles now have a separate page with the id and relevant keywords in the name.

The web server can regex match the new SEO URLs for id parameter, pass it to the script and display script output for the URL.

Apache2

RewriteRule "_([0-9]+).html$" "/articles.php?article_id=$1"

Nginx

rewrite "_([0-9]+).html$" "/articles.php?article_id=$1";

Example 6:

Linux Shell

Regex can save the hassle of opening a file and searching or scrolling for a directive or setting in it. Instead, use a regular expression to match text pattern in a file and get matching lines straight in the terminal.

To find out the value of the AllowOverride directive in the apache configuration file.

grep -C 2 'AllowOverride' /etc/apache2/apache2.conf

-C 2 flag adds extra lines for context, AllowOverride matches the exact word. Command outputs this

<Directory /var/www/>
    Options Indexes FollowSymLinks
    AllowOverride None
    Require all granted
</Directory>

To find PHP maximum upload file size without opening long configuration file php.ini.

grep 'upload.*size' /usr/local/etc/php/php.ini

outputs upload_max_filesize = 2M

More grep information on gnu grep and manual page.

Conclusion

Learning some basic regex and exploring different use cases can help you build a knowledge of the possibilities regex brings. Knowing where to use regular expressions in coding and problem-solving can help to write efficient code. Elegant, readable code is a bonus.

I will write a second article about regex basics. If you have any comment or a better regex, please share.

Header photo by M. Dziedzic