Friday, January 31, 2025

Database Performance Benchmarks: PostgreSQL 17 vs. PostgreSQL MySQL 9 vs. MongoDB 8

 

Database Performance Benchmarks: PostgreSQL 17 vs. PostgreSQL MySQL 9 vs. MongoDB 8

Hello readers, I'm a passionate fan of WEB full-stack development and Rust programming. If you also have a soft spot for Rust, welcome to follow my official account "Dream Beast Programming" and join our technical exchange group to discuss cutting-edge technologies.





In the database space in 2024, PostgreSQL 17, MySQL 9, and MongoDB 8 are the main contenders. Each database has its own unique features and performance optimizations, and understanding their pros and cons is essential to choosing the right tool.


Benchmark settings

For performance comparison, I used the Docker environment to build the latest versions of these three databases. The benchmark covers the following common operations:

  • • Single Insertion: Inserts data one by one.
  • • Batch Insertion: Insert multiple pieces of data at once.
  • • Query operation: Retrieve all data from the table.
  • • Delete Operation: Deletes all data in the table.

The test results are all 1,000 pieces of data to ensure fair comparisons across databases.

Test environment

Testing is conducted in the following environments to ensure consistent performance testing across databases:

  • • Operating System: Ubuntu 24.04.1 LTS (WSL)
  • • Hardware: Ryzen 5 5500U, 16GB RAM

Dockerfile and test code: Start PostgreSQL 17, MySQL 9.0, and MongoDB 8.0 with a Docker container and test the performance.


Single strip insertion



  1. 1. PostgreSQL 17:2317.83 ms
  2. 2. MySQL 9.0:5060.12 ms
  3. 3. MongoDB 8.0:1759.43 ms

Analysis: MongoDB performs best in single insert operations, probably due to its schemaless nature and optimized document model. PostgreSQL, while slightly slower, is still faster than MySQL.


Batch insertion



  1. 1. PostgreSQL 17:60.06 ms
  2. 2. MySQL 9.0:29.35 ms
  3. 3. MongoDB 8.0:28.99 ms

Analysis: MySQL and MongoDB are almost identical in terms of batch insert performance, with MySQL slightly ahead. PostgreSQL is relatively weak in this regard, but still provides decent performance.


Query operations



  1. 1. PostgreSQL 17:9.11 ms
  2. 2. MySQL 9.0:11.25 ms
  3. 3. MongoDB 8.0:52.63 ms

Analysis: PostgreSQL performs best in query operations thanks to its excellent query optimization and indexing capabilities. MySQL is also relatively fast to query, while MongoDB's document model is slower in this regard.

Delete the action



  1. 1. PostgreSQL 17:4.70 ms
  2. 2. MySQL 9.0:23.96 ms
  3. 3. MongoDB 8.0:23.52 ms

Analysis: PostgreSQL performed well in the deletion operation with a time of only 4.70 ms. MySQL and MongoDB behave similarly, indicating that PostgreSQL is more optimized for deletion tasks.

Key Summary:

  • • PostgreSQL 17: Excellent performance in query and delete operations, suitable for applications that require high data retrieval and cleanup speeds.
  • • MySQL 9.0: Excellent performance in batch insertion, suitable for scenarios that need to process large amounts of data quickly.
  • • MongoDB 8.0: Leads in single-insert performance, but lags behind MySQL in queries and bulk inserts.

conclusion

Each database has its own unique advantages, and choosing the right one depends on the specific use case:

  • • If fast reads and data consistency are important to you, PostgreSQL 17 is the best choice.
  • • If efficient data ingestion and batch processing are required, MySQL 9.0 will be a good choice.
  • • For flexible schema design and quick insertion, MongoDB 8.0 may be more suitable.

Understanding the unique strengths and weaknesses of these databases will help you make informed decisions based on your application's performance needs.


Monday, January 6, 2025

[Tip of day 1] PostgreSQL query filter conditions changed, performance improved 100 times!

 Sometimes we need to evaluate and improve the clauses in our queries WHERE. Even a small change to the filter criteria can have a huge impact on query performance.

Avoid using functions in filter conditions

reason

Applying functions to columns during the filtering phase can degrade performance. The database needs to apply the function to the dataset before filtering. Let's look at a simple example of filtering on a timestamp field:

SELECT count(*
FROM orders
WHERE CAST(order_timestamp AS DATE> '2024-02-01';

The above query for the 100000000 row dataset runs 01 min 53 secbecause it needs to change the data type of the column from timestamp to date before applying the filter order_timestamp. However, that is not necessary! Note that the query above could be rewritten as:

SELECT count(*
FROM orders
WHERE order_timestamp > '2024-02-01 00:00:00';

The rewritten query uses the original timestamp field without casting. After making this minor change, the query now 20 secruns nearly 6 times faster than the original query.



warn

Not all functions can be avoided, as some functions may be needed to retrieve a part of a column value (consider the following substringexample) or to reconstruct it. However, every time you are going to add a function to a filter condition, consider whether you can use the original data type operators instead.

Best Practices

When you apply a filter to a column, try to format the filter instead of the column.

The above is a perfect example of changing the format of the filter from a date 2024-02-01to a timestamp 2024-02-01 00:00:00, which allows us to use the original timestamp data format and operators.

Professional advice

If you must apply a function, you can try two approaches:

  • • Create an index on the expression. This   is possible in both PostgreSQL  and  MySQL.
  • • Use database triggers to populate additional columns to be transformed
  • Improved subquery

  • reason

    Subqueries are often used in filter conditions to retrieve a set of values ​​to apply to the filter condition. A common example is when you need to retrieve a list of recently active users.

  • SELECT *
    FROM users
    WHERE id IN (
    SELECT DISTINCT user_id 
    FROM sessions 
    WHERE session_date = '2024-02-01');

  • The query above SESSIONSretrieves a list of distinct users from the table and then USERSapplies a filter to the table. However, there are several more performant ways to achieve the same result. One example is to use EXISTS:

  • SELECT * 
    FROM users
    WHERE EXISTS (
    SELECT user_id 
    FROM sessions 
    WHERE user_id = id and session_date = '2024-02-01'
    );

  • EXISTSis faster because it does not need SESSIONto retrieve a list of distinct users from the table, but only verifies if at least one row exists in the table for a particular user. The above use case only changes the subquery part, and the performance changes 02 min 08 secfrom 18 sec.

  • warn

    In extreme cases, slight changes in the subquery may provide different results.

  • Best Practices

    When you need to use subqueries, take the time to learn and understand what methods are available and what they allow you to achieve. Many times there are multiple methods and certain functions will provide better response times.