Monday, January 6, 2025

[Tip of day 1] PostgreSQL query filter conditions changed, performance improved 100 times!

 Sometimes we need to evaluate and improve the clauses in our queries WHERE. Even a small change to the filter criteria can have a huge impact on query performance.

Avoid using functions in filter conditions

reason

Applying functions to columns during the filtering phase can degrade performance. The database needs to apply the function to the dataset before filtering. Let's look at a simple example of filtering on a timestamp field:

SELECT count(*
FROM orders
WHERE CAST(order_timestamp AS DATE> '2024-02-01';

The above query for the 100000000 row dataset runs 01 min 53 secbecause it needs to change the data type of the column from timestamp to date before applying the filter order_timestamp. However, that is not necessary! Note that the query above could be rewritten as:

SELECT count(*
FROM orders
WHERE order_timestamp > '2024-02-01 00:00:00';

The rewritten query uses the original timestamp field without casting. After making this minor change, the query now 20 secruns nearly 6 times faster than the original query.



warn

Not all functions can be avoided, as some functions may be needed to retrieve a part of a column value (consider the following substringexample) or to reconstruct it. However, every time you are going to add a function to a filter condition, consider whether you can use the original data type operators instead.

Best Practices

When you apply a filter to a column, try to format the filter instead of the column.

The above is a perfect example of changing the format of the filter from a date 2024-02-01to a timestamp 2024-02-01 00:00:00, which allows us to use the original timestamp data format and operators.

Professional advice

If you must apply a function, you can try two approaches:

  • • Create an index on the expression. This   is possible in both PostgreSQL  and  MySQL.
  • • Use database triggers to populate additional columns to be transformed
  • Improved subquery

  • reason

    Subqueries are often used in filter conditions to retrieve a set of values ​​to apply to the filter condition. A common example is when you need to retrieve a list of recently active users.

  • SELECT *
    FROM users
    WHERE id IN (
    SELECT DISTINCT user_id 
    FROM sessions 
    WHERE session_date = '2024-02-01');

  • The query above SESSIONSretrieves a list of distinct users from the table and then USERSapplies a filter to the table. However, there are several more performant ways to achieve the same result. One example is to use EXISTS:

  • SELECT * 
    FROM users
    WHERE EXISTS (
    SELECT user_id 
    FROM sessions 
    WHERE user_id = id and session_date = '2024-02-01'
    );

  • EXISTSis faster because it does not need SESSIONto retrieve a list of distinct users from the table, but only verifies if at least one row exists in the table for a particular user. The above use case only changes the subquery part, and the performance changes 02 min 08 secfrom 18 sec.

  • warn

    In extreme cases, slight changes in the subquery may provide different results.

  • Best Practices

    When you need to use subqueries, take the time to learn and understand what methods are available and what they allow you to achieve. Many times there are multiple methods and certain functions will provide better response times.


No comments:

Post a Comment