Posts Tagged full table scan

Index of my blog

Well, my DB classes are getting to be more interesting every week. I really learnt some cool stuff about the concept of an index. Before that, a quick trivia as to how tables in Oracle DB are stored. Every row in a table is stored in one of the oracle blocks (size 8K). Each row when inserted occupies the 1st block that has enough space to accommodate it.

Gotcha: So, even when you insert 2 rows consecutively, it doesn’t really mean that both of them would reside in the same block!

Now, lets get started :)

An index is a data structure that is applied to column(s) of a table allowing faster retrieval of data. It is generally implemented as a B+ tree or a hash map.

blog

Let’s say we apply an index to a column C that has values ranging from 0 to 100. The above figure is an example of how an index structure would look like. Each block in the B+ tree represents a range of values of C. Finally, every leaf node contains a (key,value) pair where

  • key represents an element in C
  • value contains the corresponding row-id of the key.

An important property of the structure is that the leaf nodes form a doubly-linked-list (DLL) structure. More on how this structure helps faster access, below.

Ok, now to real-world problems :) . Let’s analyze a few queries and how an index behaves in each of them assuming an index has been created on column C in table T

select * from T where C = 25;

The index ptr traverses the B+tree structure ( similar to any BST traversal to find a node ) and reaches the leaf node L2 and fetches the data from the row-id corresponding to 25. The number of I/O lookups is the height of tree + 1 ( to fetch data from the row-id ). In this case, it would just be 4 I/O operations instead of 25! (full-table scan)

select * from T where C between 15 and 30;

Agreed, it can easily find the row-ids for the value 15 – 24 similar to the procedure above. But, values 25 – 30 reside in a different leaf node. Here is the beauty of the DLL structure :) . Instead of another traversal from the root-block/parent , the ptr continues to move to the next leaf node and fetches the remaining 5 rows!

select * from T where C in ( 1,25 );

In the above case, the index performs 2 traversals from the root-block. One, to fetch the row-id corresponding to the value 1; Second, to fetch the row-id corresponding to the value 25. Again it just involves 2 * height of tree + 2 lookups – much lesser than a full-table scan.

Ok, enough of ranting about the index structure. But, is an index always helpful ? Is there a possibility that a full-table scan might be more efficient than an index ?

Let’s assume that the table T we were talking about has values between 0-38 in column ‘C’ for 90 percent of the rows. The remaining values in C (67-100) occupy 10 percent of the rows.

select * from T where C between 0 and 38′;

Index behavior

Firstly, there is an extra I/O call made to access the index pointer. Added to this, the height of the tree. This is the extra I/O look-ups compared to a normal table scan. However, this is not the end of the story.

Every time, oracle tries to fetch a row, it fetches an oracle block that the row belongs to and stores it onto the buffer cache (oracle’s cache not the OS!). It might well be the case that the row with C’s value = 1 and the row with C’s value = 2 might be in two different oracle blocks. Hence, when the index pointer moves through the leaf node, it will incur an extra I/O operation to fetch the second oracle block!. Again, if the row with C’s value = 3 belongs to the same block as val(C) = 1, it needs to fetch the block again! Remember, fetching a block again means “hit the original block -> miss -> fetch the required block”. Phew, this means that there is a whole lot of extra I/O operations!

Full-table scan behaviour

It’s simple. Scan all the blocks and keep retrieving the rows. Each block is accessed only once.

For these kind of scenarios, a full-table scan is preferable. The decision is taken by Oracle optimizer – how it does will be in a different post. Cya then :) I am loving Computer Science :)

, , , , , ,

No Comments