Sometimes we need to evaluate and improve the clauses in our queries WHERE
. Even a small change to the filter criteria can have a huge impact on query performance.
Avoid using functions in filter conditions
reason
Applying functions to columns during the filtering phase can degrade performance. The database needs to apply the function to the dataset before filtering. Let's look at a simple example of filtering on a timestamp field:
SELECT count(*)
FROM orders
WHERE CAST(order_timestamp AS DATE) > '2024-02-01';
The above query for the 100000000 row dataset runs 01 min 53 sec
because it needs to change the data type of the column from timestamp to date before applying the filter order_timestamp
. However, that is not necessary! Note that the query above could be rewritten as:
SELECT count(*)
FROM orders
WHERE order_timestamp > '2024-02-01 00:00:00';
The rewritten query uses the original timestamp field without casting. After making this minor change, the query now 20 sec
runs nearly 6 times faster than the original query.
warn
Not all functions can be avoided, as some functions may be needed to retrieve a part of a column value (consider the following substring
example) or to reconstruct it. However, every time you are going to add a function to a filter condition, consider whether you can use the original data type operators instead.
Best Practices
When you apply a filter to a column, try to format the filter instead of the column.
The above is a perfect example of changing the format of the filter from a date 2024-02-01
to a timestamp 2024-02-01 00:00:00
, which allows us to use the original timestamp data format and operators.
Professional advice
If you must apply a function, you can try two approaches:
• Create an index on the expression. This is possible in both PostgreSQL and MySQL. • Use database triggers to populate additional columns to be transformed Improved subquery
reason
Subqueries are often used in filter conditions to retrieve a set of values to apply to the filter condition. A common example is when you need to retrieve a list of recently active users.
SELECT *
FROM users
WHERE id IN (
SELECT DISTINCT user_id
FROM sessions
WHERE session_date = '2024-02-01');The query above
SESSIONS
retrieves a list of distinct users from the table and thenUSERS
applies a filter to the table. However, there are several more performant ways to achieve the same result. One example is to useEXISTS
:SELECT *
FROM users
WHERE EXISTS (
SELECT user_id
FROM sessions
WHERE user_id = id and session_date = '2024-02-01'
);EXISTS
is faster because it does not needSESSION
to retrieve a list of distinct users from the table, but only verifies if at least one row exists in the table for a particular user. The above use case only changes the subquery part, and the performance changes02 min 08 sec
from18 sec
.warn
In extreme cases, slight changes in the subquery may provide different results.
Best Practices
When you need to use subqueries, take the time to learn and understand what methods are available and what they allow you to achieve. Many times there are multiple methods and certain functions will provide better response times.
No comments:
Post a Comment