Covering indexes

I was recently lucky enough to attend the Percona Live conference in London on the 24th and 25th of October. I learnt a lot of great stuff over these two days through a number of tutorial sessions and also a couple of presentations from som companies who use MySQL to support software sytems much larger than the ones I currently work on. Over the next few weeks I hope to share some of the uselful stuff that came up. For this post I thought I'd start with covering indexes.

The simple idea behind a covering index is that all fields required for a query are contained within the index. In short this means there's no need to perform an additional seek to access the table row and therefore the query can execute faster. In terms of MyISAM this means you don't have to perform a row read, in InnoDB the data is stored with the primary key in a B-Tree with secondary indexes containing primary key values so it means one less B-Tree traversal to look up the row. For example consider the following database structure :

CREATE TABLE `customers` (
  `CustomerId` BIGINT(20) NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`CustomerId`)
) ENGINE=INNODB DEFAULT CHARSET=utf8

CREATE TABLE `invoices` (
  `InvoiceId` BIGINT(20) NOT NULL AUTO_INCREMENT,
  `CustomerId` BIGINT(20) DEFAULT NULL,
  `ProductId` BIGINT(20) DEFAULT NULL,
  PRIMARY KEY (`InvoiceId`)
) ENGINE=INNODB DEFAULT CHARSET=utf8

CREATE TABLE `products` (
  `ProductId` BIGINT(20) NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`ProductId`)
) ENGINE=INNODB DEFAULT CHARSET=utf8

Obviously this has been oversimplified for our purposes but now consider the following select query.

SELECT * FROM customers LEFT JOIN invoices ON customers.CustomerId = invoices.CustomerId LEFT JOIN products ON invoices.ProductId = products.ProductId WHERE customers.CustomerId = '1';

Running explain on this query tells us that the type is ALL for the invoices table meaning that the query is going to be doing a full table scan of this table. By adding a key on CustomerId we can improve this so that the CustomerId index is used.

ALTER TABLE invoices ADD KEY (CustomerId)

Running explain on the query again now gives us a type of ref meaning that = is used for comparison bu that it is not a primary key lookup. The query will now use the index to find all rows that match CustomerId and then read all these rows to perform the next part of the query, however we can improve this by adding a covering index.

ALTER TABLE invoices ADD KEY CustomerId_ProductId (CustomerId, ProductId)

This now allows the query to run by simply reading from the index. We can see this by running explain again. Notice in the Extra column there is now 'Using where; Using index'. This shows that the query is only reading from the index without having to perform an additional seek for the row. If this seek for the row was going to be a read from disk then we have drastically reduced the execution time of the query, even in memory this additional seek can now be avoided.