Home > Databases, Programming > SHA1 Speed Differences [PHP vs MySQL] – Part 2

SHA1 Speed Differences [PHP vs MySQL] – Part 2

Back again for the second part of SHA1 Speed Differences. Who will win this battle between PHP vs MySQL? This post is a continuation of SHA1 Speed Differences [PHP vs MySQL] – Part 1 directly replying to the post MD5/SHA1 Encryption in PHP vs. MySQL.

As discussed in part 1, using an index on the digest field brings us to a reasonable results. This post will be based around differences in speed performance between PHP vs MySQL. It is based on a lot of testing but I won’t annoy you with a lot of numbers.

First thing I noticed was, that if I increased the number of records from 500 to 1000 and even to 2000, the increase in query time was becoming drastically large. One problem was due to the index, which I’ve explained in part 1 but the difference in PHP vs MySQL was becoming evident. The table had no index for both queries which lead  me to think that MySQL version had something wrong. To cut it short, when MySQL manages to get the result spot on with the index or few records (like 10-100), the speed difference between PHP vs MySQL is negligible, but when the amount of records increase, MySQL SHA1 was really slowing down. The results I kept getting were as follows (only select queries done):

Rows PHP MySQL
500 3.947ms 5.273ms
1000 4.884ms 9.603ms
2000 5.856ms 15.507ms

After I’ve identified that SHA1 had a problem dealing with thousands of records, I had to identify why it was slow. My first thought was that it was computing SHA1 for each record, seeing that the results above show an increase in execution time by nearly 3 times as much as PHP’s version. Considering PHP query was only doing a simple string compare for every record traversed while MySQL was doing an SHA1 function plus a string compare for each record thus slowing much more than PHP.

Deterministic or Not?

I didn’t want to accept this as SHA1 function is deterministic were if you give the same string to hash, it will always give you the same digest in return. Usually deterministic function like this, are only executed once as the result will be the same for all further checks. The results show differently and it seems it is really executing the SHA1 function for each record. The question was, how do we identify if it is really deterministic or not?

At first I thought I’d do a simple SHA1 query (ex. ‘SELECT SHA1(‘digist_string’);’) before the actual queries which would make it much more similar to the PHP version. This would be identical to PHP version but involved an extra query. A second idea had been put through one of my colleagues which was actually what I was searching for. A sub-select was the answer. In DB terms and optimizations, a sub-select is ran before and the return is used to continue the traversal bringing us to only a string comparison. So ready for another test batch with our ideal way of querying MySQL to identify if SHA1 is being used as deterministic or not was on its way.

The change in code was simple. MySQL query was changed from ”SELECT identity FROM test_table WHERE digest = sha1(?)” to “SELECT identity FROM test_table WHERE digest = (SELECT sha1(?))”. I know you may think this is babish and maybe not necessary but the results proof otherwise as the MySQL optimizer works different than our elementary reasoning. The test that follow are based on the basic table structure and a non indexed digest field. Only select queries were recorded.

Rows PHP MySQL
500 3.947ms 4.066ms
1000 4.710ms 4.917ms
2000 5.919ms 5.893ms

As you can see from the results, MySQL is actually considering SHA1 as non-deterministic when traversing records. I’ve tried to research more about this and checked MySQL source code for this to no avail. Maybe I haven’t searched enough but till now this is my conclusion on this subject.

If you are using cryptographic functions in MySQL and the field queried is not indexed, please make sure to either change the code to a sub-query or change the structure.

Conclusion – PHP or MySQL?

Coming to the original post idea, PHP vs MySQL SHA1 functions really don’t differ in speed. My suggestion is to use what you’re mostly comfortably using and least troublesome for you. Happy programming.

Any comments or advice is appreciated.

Advertisement
  1. September 26, 2009 at 1:00 am | #1

    Good job again :) The behaviour with the subquery makes sense. Interesting gotcha though, I would have expected the optimizer to handle that. Thanks for the update, it’s really interesting!

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.