You are hereMySQL
MySQL
Performancing SQL query
Last weeks we had serious performance issues on a server we're renting with a few friends. That server happens to be running mechelenblogt.be, an evolving website both on number of visitors as on codebase/features. (it runs drupal).
Originally the server wasn't scaled for such load (Intel(R) Celeron(R) CPU 2.66GHz, 512MB RAM) and we thought the performance issues were because of the many images and extra visitors. Fortunately Floris found out the cause was a not-so-performant SQL query.
(yes, I know, we could have seen it by looking at the mysql-slow-queries log, but you usually think about that when it's too late)
The bad query was:
SELECT SQL_NO_CACHE drupal_comments.comment, drupal_comments.cid as cid,
drupal_node.title, drupal_node.nid, drupal_comments.uid,
drupal_users.name as name_known, drupal_comments.name as name_anonymous
FROM drupal_comments
INNER JOIN drupal_node ON drupal_comments.nid = drupal_node.nid
INNER JOIN drupal_users ON drupal_comments.uid = drupal_users.uid
WHERE drupal_node.comment > 0
AND drupal_node.status = 1
ORDER BY drupal_comments.timestamp DESC
LIMIT 0 , 6;
Showing rows 0 - 5 (6 total, Query took 1.9932 sec)
You can see the query took almost 2 full seconds, totally unacceptable.
When analyzing the query there is only one thing that can be the cause of the slowdown, the ORDER BY drupal_comments.timestamp DESC. Executing the query without the ORDER BY confirmed it: Query took 0.0011 sec
If you want to test the performance of a query you need to use the SQL_NO_CACHE. Cache will not be used and you can trust the timings MySQL gives you. If you don't disable the cache, the second time you'll run the query you'll have a blazing fast result.
Ok, how can we improve this? Functionally I want to get the last rows from the drupal_comments table joined to a few other things. It looks like in the internal datastructure MySQL considers tables as kind of maps with no specific order, so there seems no query to say 'SELECT * FROM table TAIL 6'.
The drupal_comments table has a field called cid, so why not playing around with that?
First try, ORDER BY cid instead of the timestamp. Unfortunately this doens't change a lot. Still 1.5 to 2.0 sec.
Second try, try the ugly WHERE drupal_comments.cid > ((SELECT MAX(cid) from drupal_comments) - 6). This subquery will get the highest 'cid' number and subtract 6 from it, giving me the last 6 comments.
SELECT SQL_NO_CACHE drupal_comments.comment, drupal_comments.cid as cid,
drupal_node.title, drupal_node.nid, drupal_comments.uid,
drupal_users.name as name_known, drupal_comments.name as name_anonymous
FROM drupal_comments
INNER JOIN drupal_node ON drupal_comments.nid = drupal_node.nid
INNER JOIN drupal_users ON drupal_comments.uid = drupal_users.uid
WHERE drupal_node.comment > 0
AND drupal_node.status = 1
AND drupal_comments.cid > ( ( SELECT MAX( cid ) FROM drupal_comments ) - 7 )
ORDER BY drupal_comments.timestamp DESC
Showing rows 0 - 5 (6 total, Query took 0.0019 sec)
Woohoo, this seems to work !
This query has a few drawbacks and bugs. One I can immediately think of is missing comments when one has been deleted. To catch up this problem you could get more results with the cid search and LIMIT the number of results. Here's a new query:
SELECT SQL_NO_CACHE drupal_comments.comment, drupal_comments.cid as cid,
drupal_node.title, drupal_node.nid, drupal_comments.uid,
drupal_users.name as name_known, drupal_comments.name as name_anonymous
FROM drupal_comments
INNER JOIN drupal_node ON drupal_comments.nid = drupal_node.nid
INNER JOIN drupal_users ON drupal_comments.uid = drupal_users.uid
WHERE drupal_node.comment > 0
AND drupal_node.status = 1
AND drupal_comments.cid > ( ( SELECT MAX( cid ) FROM drupal_comments ) - 12 )
ORDER BY drupal_comments.timestamp DESC
LIMIT 0 , 6;
Showing rows 0 - 5 (6 total, Query took 0.0017 sec)
As you can see the query is still blazing fast and one possible bug has been eliminated. I still wonder if there is a better way to get those last rows... Anyone has a better idea?




