SHA1 Speed Differences [PHP vs MySQL] – Part 1
Reading my favorite blog, I’ve encountered a rather interesting article discussing the speed difference between PHP and MySQL in SHA1 algorithms. The results obtained were suspicious to me. Usually in database terms, reads are faster then inserts, but in this case it was the opposite.
Being the hard headed I am, I had to find out what was wrong with the tests conducted in that article. The poster kindly provided the source which was really dandy as I could execute the same tests as the poster did. After several head bangs trying to setup PHP to work flawless on my pc, I managed to analyse the tests.
At first glance, everything looked really well done and flawless. Tests were godly written and everything was up to standard. After considerable time contemplating and a beer, I came to the conclusion that the problem lies elsewhere. The tests were simple enough that it was very easy to understand and even test. One query had a fixed string and the other one had an SHA1 function call from MySQL, exactly as the test intended.
The thought that was haunting me was that even the normal PHP method, (ie. the string comparison) reading was slower than the insert. Looking at this problem alone and excluding PHP vs MySQL performance led me to the problems. Yes that’s right, there is more than one problem. It’s not how the tests are written but it’s more how databases work internally which should be taken into consideration once working with databases.
Database Search Explained
Before explaining the problem, let’s discuss a bit on how databases normally work and how they fetch data. Let’s take the tests provided by the poster and use his table as our example reference.
CREATE TABLE test_table (
identity INTEGER AUTO_INCREMENT,
digest VARCHAR(41),
PRIMARY KEY (identity)
);
This is the table structure and as you can see, it is simple and perfectly normal. It has two fields, a numeric primary key and a string field of length 41 to store our hashed data.
When the database is asked to retrieve information which have ‘digest’ equals to ‘hello’, the database has to check each row if the text in ‘digest’ field matches ‘hello’. If there are 500 records, the database will check 500 records one by one and at the end returns the records which have digest = ‘hello’. As explained before, this is how normally work and not how they should work. This method of search can become very slow on large tables as checking each and every row isn’t a very intelligent way to search. Telephone directories have indices and each page is sorted by the surname to ease your searching. Imagine going through your telephone directory searching for the surname of ‘Zohan John” which should be located in the last few pages. That would take ages and this also applies for databases. This also applies for databases which make use of indices and sorting to retrieve that information quicker.
With an index on the ‘digest’ field one can locate information much faster instead of checking each field one by one. What happens when a database searches for information on an indexed field (the whole field is indexed and not the first letter or two) is as follows. It will ask the index if there’s anything that matches the whole word ‘hello’. The index will reply with either a ‘match’ or ‘no match’. If a match is returned, the location will be identified and the database can go and retrieve the information immediately without waisting any more valuable time asking each and every record if it matches. The result is usually instant which is what raised my interest in the article.
Index Search vs Normal Search
After all the technicalities, it is time to see if what is said above is the culprit. The test is inserting 500 records into the table and querying the database 500 times one for each record which returns with a match. Without an index, the database has to go through all records and thus the ‘select’ results are slower then the ‘insert’ commands. Second beer in hand, I placed the index and the results produced were much better. The new table structure looks like this.
CREATE TABLE test_table (
identity INTEGER AUTO_INCREMENT,
digest VARCHAR(41),
PRIMARY KEY (identity),
KEY (digest)
);
This solved the first problem which was haunting me like my worst nightmare. Is ‘select’ slower than ‘insert’?
Results – Select vs Insert
My average results vary from the original article but the difference can still be identified.
| Records | Normal Select | Index Select | Insert |
| 10 | 1.9342ms | 1.7923ms | 2.1394ms |
| 500 | 2.5648ms | 1.7885ms | 2.2872ms |
| 1000 | 3.1148ms | 1.7948ms | 2.0497ms |
| 5000 | 3.6964ms | 1.7961ms | 2.2519ms |
This shows that the indexed search really does a great job when dealing with a lot of information. They’re a necessity for a database to work soundly. I am not implying one should go overboard with indices but one should know when and how to use them.
With this said, there’s still the performance issue between PHP vs MySQL. Could it be MySQL performs a function which basically does the same that much slower, 4 times slower? Be patient and I’ll post my finding in another post as this is already too long to read.
Great article, and a good reminder of why one should use indices
Thanks for covering this – it hadn’t occurred to me that the select *should* be faster. Looking forward to reading your further findings.
Thanks for the comments about the blog too – much appreciated! :$
Thanks Karl, tried to contact you today but gchat was down. :] Love your blog, always fresh with new ideas. :] I’ll try to finish the article by tomorrow.
Interesting post. I can’t wait for part 2!