paul bennett

MySQL query speed: REGEXP vs. LIKE

Posted on: February 28, 2008

MySQL’s REGEXP function is handy when you need to search for specific text strings within a textual column, but over large record sets, it’s performance can be less than fantastic.

Recently I had a situation at work where I needed to do a first name / last name search across a single name field. Due to poor database design, the name field was in the form “lastname, first names”. Initially I used REGEXP to come up with something like this:

Select table1.some-data, table2.some-other-data
from
table1, table2
where
table1.name REGEXP "^[ a-z]{0,},[ a-z]{0,}'.$firstName.'[ a-z]{0,}$"
and
table1.name REGEXP "^[ a-z]{0,}'.$lastName.'[ a-z]{0,},[ a-z]{0,}$"
and
table1.key = table2.foreignKey

Our testing showed that this query produced a good quality result for name searches, but I had a gut feeling it would slow the database server considerably when under a bit of load.

Testing using JMeter confirmed this hunch and I was faced with the task of optimising the query without affecting the quality of the results.

What to do? What would give the pattern-matching punch of REGEXP without killing the server?

Enter the LIKE function. I’ve used LIKE before but didn’t realise how powerful (and fast) it was. After a bit of tweaking I cam up with the following:

Select table1.some-data, table2.some-other-data
from
table1, table2
where
table1.name LIKE "%,%'.$firstName.'%"
and
table1.name LIKE "%'.$lastName.'%,%"
and
table1.key = table2.foreignKey

I had (falsely) thought initially that LIKE‘s wild card characters (the “%” in the query) could only occur at the start or end of a query, but it turns out they can be used for all sorts of pattern matching goodness. The query using LIKE turned out the same results as the query using REGEXP, only it was at least 10 times faster.

The downside with LIKE‘s pattern matching (which didn’t affect my query) is that it’s wild card will match any character, whereas REGEXP can be tuned using normal regular expressions to providfe much more granular query control.

If you just need fast, ‘any character’ pattern matching though, LIKE is your friend.

13 Responses to "MySQL query speed: REGEXP vs. LIKE"

Hey Paul,

thanks for the post. We have a similar problem with our database. I ended up using somthing like

SELECT * FROM users WHERE concat(firstname,’ ‘,lastname) LIKE ‘%$name%’

REGEXP is very nice if you require more complex search queries, but the 10x slow down sucks.

Hi babak,

Thanks for the comment. Nice to know I’m not the only one finding out that REGEXP is reaaally slow in MySQL πŸ™‚

Well don’t forget to put Multiple-index to those 2 fields for further optimizations πŸ™‚

Select table1.some-data, table2.some-other-data
from
table1, table2
where
table1.name LIKE “%’.$lastName.’%,%’.$firstName.’%”
and
table1.key = table2.foreignKey

Should work too, though I am not sure wether it will be slower or faster than your query.

Hi Paul,

Second time today i’ve googled for a specific answer to a question and landed at your website.

Both times (can’t remember the first item) i have ended up here on your website to find the answer in plain and simple english.

Simple, straight to the point, and no needless ramblings..

Thank you.

@t31os,

Thanks, I try to “cut to the chase” as much as I can

πŸ™‚
Paul

I am here because I needed help with a very similar query, Thanks for the concise explanation of both query examples, you used.

Good discussion. But what about when matching a text fields and selecting only starting with digits, what would be best, regex or like?

For example, 1%, 2%, 3% vs. ^[0-9]

Any comments?

Hi Erik,

Not sure offhand. The test case I put up was matching text fields but not fields starting with numeric characters.

For this case, you could also try (untested) something like.
select * from tableName where substring(textfieldname, 1) regexp "^[0-9]"
or
select * from tableName where substring(textfieldname, 1) in (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
which would probably still run faster than a regexp full table scan.

πŸ™‚
Paul

Was looking for a speedy replacement for the LIKE operator and was hoping REGEX was the ticket. Thanks for doing the leg work!

Hello Paul,

I don’t know maybe regex could be faster if it’s like this , haven’t tested the speed thou.

“table1.name REGEXP ‘^.*(.*,”.$firstName.”).*$'”

This will be just matching one or more occurrence.
Anyways, great explanation!

chris

In a table with 11 million rows, my regexp is faster than a like query. But then again it’s a search over an INT column, I don’t really know if that should make a difference.

For reference: WHERE number REGEXP(‘^1234[5-6]{5}$’) vs. WHERE number LIKE ‘1234%’;

Leave a comment

Archives