PostgreSQL tips and tricks: performance

By definition, it is impossible to finely tune all complex ad hoc queries generated at runtime: you will have to rely more on the query optimizer to make your query fast than you do with carefully handcrafted queries.

PostgreSQL has no query optimizer hints. This is obviouly a long debated issue in the PostgreSQL community. Let’s not argue and try to move forward without query hints.

Setup

What are the options at hand when queries are too slow? The usual basic checklist is as follows:

default_statistics_target = 10000
join_collapse_limit = 1

Troubleshooting CTEs and joins

We experienced slow joins with CTEs (Common Table Expressions) based on nested loops whereas the best choice was hash joins. In this case, one might be tempted to disable nested loops with enable_nestloop = false. Indeed, the difference was pretty impressive: the query took more than 5 minutes to run with default settings, and about 2 seconds with nested loops disabled. But this was not an option as this could have adverse effects on other queries.

CTE are optimization fences and are often presented as a way to trick the query planner. In our particular case, CTEs were triggering bad execution plans. A closer look revealed that row estimates were completely wrong: the optimizer thought there was only one row in both CTE results, whereas they had both around 2,500 rows.

By rewriting the CTEs to use index scans instead of sequential scans, we were able to have their number of rows properly estimated and in the end have a correct execution plan.

comments powered by Disqus