Saturday, February 14, 2009

Giving a hint

Every major DB I know supports some sort of "hinting" mechanism. What is it and why is it needed?

When a DB compiles a SQL query in order to execute it, it also decides about the best way to execute that query: which data to fetch first, which indexes are to be used, how to perform joins, how many rows to retrieve, etc. The quality of these decisions has major impact over the performance of the query.
As part of this process, a component usually known as the optimizer is running. This component is required for the construction of the execution plan, and sometimes it takes "drastic" measures, such as rewriting the query so that WHERE conditions would be executed in a different order.
Over the years RDBMS providers had put a lot of effort in order to achieve the best possible optimizer. In the course of time it started using statistics (based on ranking mechanisms and previous queries). But before that happened, each RDBMS provided an extension to the query syntax (SQL) so the developer could provide some hint to the optimizer about how he thinks the query should be executed. As the years went by, providers started recommending to avoid those hints, as the optimizer usually did a better job.

I hadn't seen a good optimizer hint for some years now, until this week. I was trying to improve some query in MS SQL Server which lasted forever, and involved an external data source. After I was about to give up and rewrite the code, I decided to give it one last chance. Digging in the execution plans I realized MS's optimizer did awful job in executing the query, so I check their hint syntax. My addition to the code was "OPTION (hash join)", and viola, the query completed within few seconds. Impressive improvement.

Improving the query (by better building it) will always be better than adding a hint, but when all else fails, this might be the only solution.

No comments:

Post a Comment