Tuesday, March 21, 2017

Database Indexes

Premature optimization is the root of all evil.

Database indexes are all about optimization. Using indexes prematurely is unnecessary in most cases. However, knowing what it is and how to use it will save your ass. 

So lets talk indexing.

When you do things to some set of data in a relational database, the db has to retrieve that data. And if that data is retrieved based on some value (get every row where column C equals 5), then the db has to go through every single row in a table containing that data and decide whether or not that row should be used. 

Unless you use indexes. 

If you use indexes, the db does NOT have to go through every single row. Instead, it will find the rows it needs by using a special data structure. That special data structure is the index. 

Most database indexes are B-Trees which allow for logarithmic time operations. In other words, it can change O(N) lookup to O(log n). If you have a lot of data, that can be a huge difference. Instead of looking at every row and checking if a specific column equals a certain value. If there's an index for that column, then the database can do a O(logn) time look up for the value in the B-Tree, then follow the pointer to the row! 

Using an index is not always an optimization.

There are times when using an index can actually hurt you. An index is just used to do a look up for a row, but if a lot of those rows are being returned, then all an index adds is extra overhead for the lookups because when the matching rows are found using the index lookup, they still need to be scanned!


No comments:

Post a Comment