You are herePerformancing SQL query
Performancing SQL query
Last weeks we had serious performance issues on a server we're renting with a few friends. That server happens to be running mechelenblogt.be, an evolving website both on number of visitors as on codebase/features. (it runs drupal).
Originally the server wasn't scaled for such load (Intel(R) Celeron(R) CPU 2.66GHz, 512MB RAM) and we thought the performance issues were because of the many images and extra visitors. Fortunately Floris found out the cause was a not-so-performant SQL query.
(yes, I know, we could have seen it by looking at the mysql-slow-queries log, but you usually think about that when it's too late)
The bad query was:
SELECT SQL_NO_CACHE drupal_comments.comment, drupal_comments.cid as cid,
drupal_node.title, drupal_node.nid, drupal_comments.uid,
drupal_users.name as name_known, drupal_comments.name as name_anonymous
FROM drupal_comments
INNER JOIN drupal_node ON drupal_comments.nid = drupal_node.nid
INNER JOIN drupal_users ON drupal_comments.uid = drupal_users.uid
WHERE drupal_node.comment > 0
AND drupal_node.status = 1
ORDER BY drupal_comments.timestamp DESC
LIMIT 0 , 6;
Showing rows 0 - 5 (6 total, Query took 1.9932 sec)
You can see the query took almost 2 full seconds, totally unacceptable.
When analyzing the query there is only one thing that can be the cause of the slowdown, the ORDER BY drupal_comments.timestamp DESC. Executing the query without the ORDER BY confirmed it: Query took 0.0011 sec
If you want to test the performance of a query you need to use the SQL_NO_CACHE. Cache will not be used and you can trust the timings MySQL gives you. If you don't disable the cache, the second time you'll run the query you'll have a blazing fast result.
Ok, how can we improve this? Functionally I want to get the last rows from the drupal_comments table joined to a few other things. It looks like in the internal datastructure MySQL considers tables as kind of maps with no specific order, so there seems no query to say 'SELECT * FROM table TAIL 6'.
The drupal_comments table has a field called cid, so why not playing around with that?
First try, ORDER BY cid instead of the timestamp. Unfortunately this doens't change a lot. Still 1.5 to 2.0 sec.
Second try, try the ugly WHERE drupal_comments.cid > ((SELECT MAX(cid) from drupal_comments) - 6). This subquery will get the highest 'cid' number and subtract 6 from it, giving me the last 6 comments.
SELECT SQL_NO_CACHE drupal_comments.comment, drupal_comments.cid as cid,
drupal_node.title, drupal_node.nid, drupal_comments.uid,
drupal_users.name as name_known, drupal_comments.name as name_anonymous
FROM drupal_comments
INNER JOIN drupal_node ON drupal_comments.nid = drupal_node.nid
INNER JOIN drupal_users ON drupal_comments.uid = drupal_users.uid
WHERE drupal_node.comment > 0
AND drupal_node.status = 1
AND drupal_comments.cid > ( ( SELECT MAX( cid ) FROM drupal_comments ) - 7 )
ORDER BY drupal_comments.timestamp DESC
Showing rows 0 - 5 (6 total, Query took 0.0019 sec)
Woohoo, this seems to work !
This query has a few drawbacks and bugs. One I can immediately think of is missing comments when one has been deleted. To catch up this problem you could get more results with the cid search and LIMIT the number of results. Here's a new query:
SELECT SQL_NO_CACHE drupal_comments.comment, drupal_comments.cid as cid,
drupal_node.title, drupal_node.nid, drupal_comments.uid,
drupal_users.name as name_known, drupal_comments.name as name_anonymous
FROM drupal_comments
INNER JOIN drupal_node ON drupal_comments.nid = drupal_node.nid
INNER JOIN drupal_users ON drupal_comments.uid = drupal_users.uid
WHERE drupal_node.comment > 0
AND drupal_node.status = 1
AND drupal_comments.cid > ( ( SELECT MAX( cid ) FROM drupal_comments ) - 12 )
ORDER BY drupal_comments.timestamp DESC
LIMIT 0 , 6;
Showing rows 0 - 5 (6 total, Query took 0.0017 sec)
As you can see the query is still blazing fast and one possible bug has been eliminated. I still wonder if there is a better way to get those last rows... Anyone has a better idea?





I just heard in a podcast that it is not advisable to alter Drupal tables, but a quite common and harmless hack is to add indexes to columns.
So another ugly workaround could be to add an index on the drupal_comments.timestamp column.
Anyway, the root cause of the problem was that the result of the query was not being cached -- ever. So we took the heavy query hit literally for every page visit.
The result of the query was cached, but as the server doesn't have enough ram, the mysql cache was being swapped rendering it completely useless.
I did try to do a
ORDER BY cidand that didn't help.cidis indexed as it's the primary key...I could be mistaken, but I'm pretty certain that some mysql tables (not sure if it's InnoDB or MyISAM) can be made to store data in some predefined order. Can't for the life of me remember how to do it, but it should be possible...