Microsoft CosmosDB: RUM instead of GIN but same limitations on JSON paths

Storing documents in PostgreSQL does not transform it to a document database. Embedded documents in JSONB require GIN indexes, which are not effective for range or pagination queries. Microsoft recognized some limitations of JSONB and GIN indexes, and developped the DocumentDB extension for BSON storage. However, this does not resolve the pagination issues. In DocumentDB, GIN indexes are replaced by RUM indexes, but show the same limitation.

The Microsoft Azure service emulating MongoDB on PostgreSQL is called "vCore-based Azure Cosmos DB for MongoDB." It utilizes the PostgreSQL extension, DocumentDB, which suprisingly shares its name with AWS's Amazon DocumentDB, another MongoDB emulation service.

Install DocumentDB

To verify I used FerretDB v2 which includes PostgreSQL with the DocumentDB extension and adds a MongoDB API emulation.

I start FerretDB with using the Docker Compose file provided by the documentation. I've added a mongosh service to run the MongoDB client. I've added auto_explain configuration to the start of PostgreSQL.

services:
  postgres:
    image: ghcr.io/ferretdb/postgres-documentdb:17-0.102.0-ferretdb-2.1.0
    platform: linux/amd64
    restart: on-failure
    environment:
      - POSTGRES_USER=username
      - POSTGRES_PASSWORD=password
      - POSTGRES_DB=postgres
    volumes:
      - ./data:/var/lib/postgresql/data
    command:
      postgres -c shared_preload_libraries=auto_explain,pg_stat_statements,pg_cron,pg_documentdb_core,pg_documentdb -c auto_explain.log_min_duration=0 -c auto_explain.log_analyze=on -c auto_explain.log_buffers=on -c auto_explain.log_nested_statements=on

  ferretdb:
    image: ghcr.io/ferretdb/ferretdb:2.1.0
    restart: on-failure
    ports:
      - 27017:27017
    environment:
      - FERRETDB_POSTGRESQL_URL=postgres://username:password@postgres:5432/postgres

  mongosh:
    image: mongo
    deploy:
     replicas: 0
    command: mongosh mongodb://username:password@ferretdb/

I start the services and log stderr so display the output of Auto Explain:

docker compose up -d
docker compose logs -f postgres

I connect to the MongoDB API emulation:

docker compose run --rm -it mongosh

It emulates MongoDB 7.0 on top of PostgreSQL 17.0 with DocumentDB extension:

Current Mongosh Log ID: 6813828db4f75c9887d861df
Connecting to:          mongodb://@ferretdb/?directConnection=true&appName=mongosh+2.5.0
Using MongoDB:          7.0.77
Using Mongosh:          2.5.0

------
   The server generated these startup warnings when booting
   2025-05-01T14:17:52.106Z: Powered by FerretDB v2.1.0 and DocumentDB 0.102.0 (PostgreSQL 17.4).
   2025-05-01T14:17:52.124Z: Please star 🌟 us on GitHub: https://github.com/FerretDB/FerretDB.                                                                                                 
------

Test with FerretDB

I create a simple collection with 1000 documents:

for (let i = 0; i < 10000; i++) {   
 db.demo.insertOne( {
  a: 1 ,
  b: Math.random(),
 ts: new Date()
} );
}

Here is what Auto Explain logged in PostgreSQL:

postgres-1  | 2025-05-01 13:49:54.012 UTC [37] LOG:  duration: 0.219 ms  plan:
postgres-1  |   Query Text:
postgres-1  |   Query Parameters: $1 = '\x13000000070068137c029be252893cd8696900', $2 = '\x34000000075f69640068137c029be252893cd86969106100f30100000162006466d90936e7d73f09747300a6671c8c9601000000'
postgres-1  |   Insert on documents_13 collection  (cost=0.00..0.01 rows=1 width=80) (actual time=0.216..0.217 rows=1 loops=1)                                       
postgres-1  |     Buffers: shared hit=20
postgres-1  |     ->  Values Scan on "values"  (cost=0.00..0.01 rows=1 width=80) (actual time=0.003..0.003 rows=1 loops=1)                                           
postgres-1  | 2025-05-01 13:49:54.012 UTC [37] LOG:  duration: 0.485 ms  plan:
postgres-1  |   Query Text: SELECT p_result::bytea, p_success FROM documentdb_api.insert($1, $2::bytea, $3::bytea)                                                   
postgres-1  |   Query Parameters: $1 = 'test', $2 = '\x9900000002696e73657274000500000064656d6f0004646f63756d656e7473003c00000003300034000000106100f30100000162006466d90936e7d73f09747300a6671c8c96010000075f69640068137c029be252893cd869690000086f7264657265640001036c736964001e00000005696400100000000431ae93d9c7074468a630c26b67a8709b00022464620005000000746573740000', $3 = NULL
postgres-1  |   Function Scan on insert  (cost=0.01..0.02 rows=1 width=33) (actual time=0.482..0.482 rows=1 loops=1)                                                 
postgres-1  |     Buffers: shared hit=20

I'm happy that FerretDB provides a MongoDB-like API to avoid calling the raw DocumentDB functions like:

documentdb_api.insert('test', '\x9900000002696e73657274000500000064656d6f0004646f63756d656e7473003c00000003300034000000106100f30100000162006466d90936e7d73f09747300a6671c8c96010000075f69640068137c029be252893cd869690000086f7264657265640001036c736964001e00000005696400100000000431ae93d9c7074468a630c26b67a8709b00022464620005000000746573740000'::bytea, NULL::bytea)

In PostgreSQL, modified by the DocumentDB extension, inserting a small document (rows=1) into a collection without indexes affects 20 pages (Buffers: shared hit=20). While the syntax resembles that of MongoDB, the performance differs due to PostgreSQL's heap tables and 8k blocks, which introduce additional overhead.

I created a simple index that adheres to the MongoDB Equality, Sort, Range rule. This index is designed for queries utilizing an equality filter on "a" and sorting based on "ts":

db.demo.createIndex({ "a": 1 , ts: -1 }) ;

My goal is to test the most frequent pattern in OLTP applications: pagination queries. This exists in many domains, like retrieving the last ten orders for a customer, the last ten measures from a device, or the last ten payments on an account.

Heap table and RUM index

I connect to PostgreSQL and check the SQL table that stores the documents:

# docker compose run -it -e PGUSER=username -e PGPASSWORD=password postgres psql -h postgres postgres

psql (17.4 (Debian 17.4-1.pgdg120+2))
Type "help" for help.

postgres=# \d+ documentdb_data.documents_15*

                                               Table "documentdb_data.documents_15"
     Column      |           Type           | Collation | Nullable | Default | Storage  | Compression | Stats target | Description 
-----------------+--------------------------+-----------+----------+---------+----------+-------------+--------------+-------------
 shard_key_value | bigint                   |           | not null |         | plain    |             |              | 
 object_id       | documentdb_core.bson     |           | not null |         | extended |             |              | 
 document        | documentdb_core.bson     |           | not null |         | extended |             |              | 
 creation_time   | timestamp with time zone |           |          |         | plain    |             |              | 
Indexes:
    "collection_pk_15" PRIMARY KEY, btree (shard_key_value, object_id)
    "documents_rum_index_35" documentdb_rum (document documentdb_api_catalog.bson_rum_single_path_ops (path=a, tl='2691'), document documentdb_api_catalog.bson_rum_single_path_ops (path=ts, tl='2691'))
Check constraints:
    "shard_key_value_check" CHECK (shard_key_value = '15'::bigint)
Access method: heap

The table includes two extended storage columns, a heap table, a primary index, and a secondary index supporting the collection index created. It is important to note that this is a RUM index, not a GIN index. The differences between the two has a good explanation in an Alibaba blog post.

The table stores 154 pages and the index has to read 3 pages to find the first row:

postgres=# explain (analyze , buffers, serialize binary) 
           select * from documentdb_data.documents_15
;
                                                   QUERY PLAN                                                    
-----------------------------------------------------------------------------------------------------------------
 Seq Scan on documents_15  (cost=0.00..254.00 rows=10000 width=89) (actual time=0.010..0.994 rows=10000 loops=1)
   Buffers: shared hit=154
 Planning Time: 0.070 ms
 Serialization: time=4.160 ms  output=1026kB  format=binary
 Execution Time: 6.176 ms
(5 rows)

postgres=# explain (analyze , buffers, serialize binary) select * from documentdb_data.documents_15 order by shard_key_value,object_id limit 1;
                                                           QUERY PLAN                                                           
--------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.29..0.36 rows=1 width=89) (actual time=0.021..0.021 rows=1 loops=1)
   Buffers: shared hit=3
   ->  Index Scan using _id_ on documents_15  (cost=0.29..760.10 rows=10000 width=89) (actual time=0.020..0.020 rows=1 loops=1)
         Buffers: shared hit=3
 Planning Time: 0.091 ms
 Serialization: time=0.005 ms  output=1kB  format=binary
 Execution Time: 0.085 ms
(7 rows)

Simple query (Equality)

The first query I tested has no pagination. On my small collection it retreives 14 document for one value of "a":

test> db.demo.countDocuments( { a: 1 } );
14

test> db.demo.find( { a: 1 } ).explain("executionStats")
;

{
  queryPlanner: {
    Plan: {
      'Node Type': 'Bitmap Heap Scan',
      'Parallel Aware': false,
      'Async Capable': false,
      'Relation Name': 'documents_15',
      'Plan Rows': 5000,
      'Recheck Cond': "(document OPERATOR(documentdb_api_catalog.@=) 'BSONHEX0c0000001061000100000000'::documentdb_core.bson)",                                      
      Plans: [
        {
          'Parent Relationship': 'Outer',
          'Parallel Aware': false,
          'Index Name': 'a_1_ts_-1',
          'Startup Cost': 0,
          'Total Cost': 0,
          'Plan Rows': 100,
          'Node Type': 'Bitmap Index Scan',
          'Async Capable': false,
          'Plan Width': 0,
          'Index Cond': "(document OPERATOR(documentdb_api_catalog.@=) 'BSONHEX0c0000001061000100000000'::documentdb_core.bson)"                                     
        }
      ],
      Alias: 'collection',
      'Startup Cost': 1.25,
      'Total Cost': 146.58,
      'Plan Width': 53
    }
  },
  explainVersion: '1',
  command: { find: 'demo', filter: { a: 1 }, '$db': 'test' },
  serverInfo: {
    host: 'a898d7c3cd9a',
    port: 27017,
    version: '7.0.77',
    gitVersion: '05ed2b952c612533cb12c1ff1a0319a4e7f2e4b5',
    ferretdb: { version: 'v2.1.0' }
  },
  ok: 1
}

The access methods for RUM and GIN indexes are quite similar, utilizing bitmaps for operations like 'Bitmap Index Scan' and 'Bitmap Heap Scan.' However, FerretDB only displays the explain estimations rather than execution statistics, even when using explain("executionStats").

Additional insights can be gathered from PostgreSQL Auto Explain:

postgres-1  | 2025-05-01 14:01:47.749 UTC [74] LOG:  duration: 0.041 ms  plan:
postgres-1  |   Query Text: SELECT (index_spec).index_name FROM documentdb_api_catalog.collection_indexes WHERE index_id = 35                                        
postgres-1  |   Seq Scan on collection_indexes  (cost=0.00..1.09 rows=1 width=32) (actual time=0.030..0.031 rows=1 loops=1)                                          
postgres-1  |     Filter: (index_id = 35)
postgres-1  |     Rows Removed by Filter: 7
postgres-1  |     Buffers: shared hit=1
postgres-1  | 2025-05-01 14:01:47.749 UTC [45] LOG:  duration: 1.023 ms  plan:
postgres-1  |   Query Text:
postgres-1  |   Query Parameters: $1 = 'BSONHEX6600000004636f6e74696e756174696f6e00050000000010676574706167655f6261746368436f756e7400ca00000010676574706167655f626174636853697a6548696e74000000000110676574706167655f626174636853697a6541747472000100000000'                                                                               
postgres-1  |   Custom Scan (DocumentDBApiScan)  (cost=0.42..150.16 rows=1667 width=85) (actual time=0.974..1.012 rows=14 loops=1)                                   
postgres-1  |     Page Row Count: 202 rows
postgres-1  |     Page Size Hint: 16777216 bytes
postgres-1  |     Buffers: shared hit=16
postgres-1  |     ->  Bitmap Heap Scan on documents_15 collection  (cost=0.42..146.00 rows=1667 width=89) (actual time=0.968..0.994 rows=14 loops=1)                 
postgres-1  |           Recheck Cond: (document OPERATOR(documentdb_api_catalog.@=) 'BSONHEX0c0000001061000100000000'::documentdb_core.bson)                         
postgres-1  |           Filter: documentdb_api_internal.cursor_state(document, 'BSONHEX6600000004636f6e74696e756174696f6e00050000000010676574706167655f6261746368436f756e7400ca00000010676574706167655f626174636853697a6548696e74000000000110676574706167655f626174636853697a6541747472000100000000'::documentdb_core.bson)                
postgres-1  |           Heap Blocks: exact=14
postgres-1  |           Buffers: shared hit=16
postgres-1  |           ->  Bitmap Index Scan on "a_1_ts_-1"  (cost=0.00..0.00 rows=100 width=0) (actual time=0.949..0.950 rows=14 loops=1)                          
postgres-1  |                 Index Cond: (document OPERATOR(documentdb_api_catalog.@=) 'BSONHEX0c0000001061000100000000'::documentdb_core.bson)                     
postgres-1  |                 Buffers: shared hit=2
postgres-1  | 2025-05-01 14:01:47.749 UTC [45] LOG:  duration: 39.943 ms  plan:
postgres-1  |   Query Text: SELECT cursorpage::bytea, continuation::bytea, persistconnection, cursorid FROM documentdb_api.find_cursor_first_page($1, $2::bytea, $3) 
postgres-1  |   Query Parameters: $1 = 'test', $2 = '\x5a0000000266696e64000500000064656d6f000366696c746572000c0000001061000100000000036c736964001e00000005696400100000000431ae93d9c7074468a630c26b67a8709b00022464620005000000746573740000', $3 = '0'
postgres-1  |   Function Scan on find_cursor_first_page  (cost=0.00..0.02 rows=1 width=73) (actual time=39.937..39.938 rows=1 loops=1)                               
postgres-1  |     Buffers: shared hit=202 read=1 dirtied=1

There's a lot happening, but the crucial detail is the number of pages and rows that have been processed. The Bitmap Index Scan efficiently located the 14 entries needed for the result (rows=14) by only accessing two index pages (Buffers: shared hit=2). The Bitmap Heap Scan expanded this by including one heap page per document (Heap Blocks: exact=14).

On top of this, a custom scan (DocumentDBApiScan) keeps track of the MongoDB cursor and paging. It reports the following:

Page Row Count: 202 rows
     Page Size Hint: 16777216 bytes
     Buffers: shared hit=16

The PostgreSQL scans have read 16 pages, but the DocumentDBApiScan emulates 16MB pages of MongoDB with 202 rows. I don't know exactly how to interpret the numbers here. DocumentDB is not PostgreSQL, and even though it is open-source, its code lacks the internal documentation quality of PostgreSQL.
On one side, I don't think there are really 202 rows in that page, as only 14 have been read from storage, but it seems that it has iterated over those 202 rows by re-reading the PostgreSQL pages, as indicated by Buffers: shared hit=202.

Pagination query (Equality, Sort)

OLTP applications commonly implement pagination to limit their results to what is displayed to the user. I executed a query ordered by timestamp to retrieve the last ten entries:

test> db.demo.find(
  { a: 1 } ).sort({ts:-1}).limit(10).explain("executionStats")
;

{
  queryPlanner: {
    Plan: {
      'Parallel Aware': false,
      'Async Capable': false,
      'Startup Cost': 267.13,
      'Total Cost': 267.15,
      'Plan Rows': 10,
      'Plan Width': 85,
      Plans: [
        {
          'Node Type': 'Sort',
          'Parallel Aware': false,
          'Async Capable': false,
          'Startup Cost': 267.13,
          'Total Cost': 279.63,
          'Plan Width': 85,
          'Parent Relationship': 'Outer',
          'Plan Rows': 5000,
          'Sort Key': [
            "(documentdb_api_catalog.bson_orderby(document, 'BSONHEX0d00000010747300ffffffff00'::documentdb_core.bson)) DESC NULLS LAST"
          ],
          Plans: [
            {
              'Node Type': 'Bitmap Heap Scan',
              'Parent Relationship': 'Outer',
              'Parallel Aware': false,
              'Async Capable': false,
              'Relation Name': 'documents_15',
              'Startup Cost': 1.25,
              'Recheck Cond': "(document OPERATOR(documentdb_api_catalog.@=) 'BSONHEX0c0000001061000100000000'::documentdb_core.bson)",
              Plans: [
                {
                  'Parent Relationship': 'Outer',
                  'Parallel Aware': false,
                  'Async Capable': false,
                  'Startup Cost': 0,
                  'Plan Rows': 100,
                  'Index Cond': "(document OPERATOR(documentdb_api_catalog.@=) 'BSONHEX0c0000001061000100000000'::documentdb_core.bson)",
                  'Node Type': 'Bitmap Index Scan',
                  'Index Name': 'a_1_ts_-1',
                  'Total Cost': 0,
                  'Plan Width': 0
                }
              ],
              Alias: 'collection',
              'Total Cost': 159.08,
              'Plan Rows': 5000,
              'Plan Width': 85
            }
          ]
        }
      ],
      'Node Type': 'Limit'
    }
  },
  explainVersion: '1',
  command: {
    find: 'demo',
    filter: { a: 1 },
    sort: { ts: -1 },
    limit: 10,
    '$db': 'test'
  },
  serverInfo: {
    host: 'a898d7c3cd9a',
    port: 27017,
    version: '7.0.77',
    gitVersion: '05ed2b952c612533cb12c1ff1a0319a4e7f2e4b5',
    ferretdb: { version: 'v2.1.0' }
  },
  ok: 1
}

The sort operation performed after the scan is inefficient, as it requires reading all rows, sorting them, and then limiting the results to ten. I expected this outcome because the bitmap scan does not preserve the order, even if the index has sorted entries.

The PostgreSQL execution plan clarifies it:

postgres-1  | 2025-05-01 14:04:18.760 UTC [87] LOG:  duration: 0.917 ms  plan:
postgres-1  |   Query Text: SELECT version(), documentdb_api.binary_extended_version()
postgres-1  |   Result  (cost=0.00..0.01 rows=1 width=64) (actual time=0.913..0.914 rows=1 loops=1)
postgres-1  | 2025-05-01 14:04:18.807 UTC [87] LOG:  duration: 4.137 ms  plan:
postgres-1  |   Query Text: SELECT * FROM documentdb_api_catalog.collections WHERE database_name = $1 AND collection_name = $2                                                                                                                           
postgres-1  |   Query Parameters: $1 = 'test', $2 = 'demo'
postgres-1  |   Index Scan using collections_pkey on collections  (cost=0.15..8.17 rows=1 width=248) (actual time=4.133..4.133 rows=1 loops=1)                                                                                                           
postgres-1  |     Index Cond: ((database_name = 'test'::text) AND (collection_name = 'demo'::text))
postgres-1  |     Buffers: shared hit=2
postgres-1  | 2025-05-01 14:04:18.807 UTC [87] CONTEXT:  SQL statement "SELECT * FROM documentdb_api_catalog.collections WHERE database_name = $1 AND collection_name = $2"                                                                              
postgres-1  | 2025-05-01 14:04:18.862 UTC [88] LOG:  duration: 0.020 ms  plan:
postgres-1  |   Query Text: SELECT (index_spec).index_name FROM documentdb_api_catalog.collection_indexes WHERE index_id = 35                                                                                                                            
postgres-1  |   Seq Scan on collection_indexes  (cost=0.00..1.09 rows=1 width=32) (actual time=0.010..0.011 rows=1 loops=1)
postgres-1  |     Filter: (index_id = 35)
postgres-1  |     Rows Removed by Filter: 7
postgres-1  |     Buffers: shared hit=1
postgres-1  | 2025-05-01 14:04:18.862 UTC [87] LOG:  duration: 0.188 ms  plan:
postgres-1  |   Query Text:
postgres-1  |   Limit  (cost=267.13..267.15 rows=10 width=85) (actual time=0.183..0.185 rows=10 loops=1)
postgres-1  |     Buffers: shared hit=24
postgres-1  |     ->  Sort  (cost=267.13..279.63 rows=5000 width=85) (actual time=0.183..0.183 rows=10 loops=1)
postgres-1  |           Sort Key: (documentdb_api_catalog.bson_orderby(document, 'BSONHEX0d00000010747300ffffffff00'::documentdb_core.bson)) DESC NULLS LAST                                                                                             
postgres-1  |           Sort Method: quicksort  Memory: 26kB
postgres-1  |           Buffers: shared hit=24
postgres-1  |           ->  Bitmap Heap Scan on documents_15 collection  (cost=1.25..159.08 rows=5000 width=85) (actual time=0.055..0.119 rows=14 loops=1)                                                                                               
postgres-1  |                 Recheck Cond: (document OPERATOR(documentdb_api_catalog.@=) 'BSONHEX0c0000001061000100000000'::documentdb_core.bson)                                                                                                       
postgres-1  |                 Heap Blocks: exact=14
postgres-1  |                 Buffers: shared hit=16
postgres-1  |                 ->  Bitmap Index Scan on "a_1_ts_-1"  (cost=0.00..0.00 rows=100 width=0) (actual time=0.031..0.031 rows=14 loops=1)                                                                                                        
postgres-1  |                       Index Cond: (document OPERATOR(documentdb_api_catalog.@=) 'BSONHEX0c0000001061000100000000'::documentdb_core.bson)                                                                                                   
postgres-1  |                       Buffers: shared hit=2
postgres-1  | 2025-05-01 14:04:18.862 UTC [87] LOG:  duration: 59.993 ms  plan:
postgres-1  |   Query Text: SELECT cursorpage::bytea, continuation::bytea, persistconnection, cursorid FROM documentdb_api.find_cursor_first_page($1, $2::bytea, $3)                                                                                     
postgres-1  |   Query Parameters: $1 = 'test', $2 = '\x780000000266696e64000500000064656d6f000366696c746572000c000000106100010000000003736f7274000d00000010747300ffffffff00106c696d6974000a000000036c736964001e00000005696400100000000431ae93d9c7074468a630c26b67a8709b00022464620005000000746573740000', $3 = '0'
postgres-1  |   Function Scan on find_cursor_first_page  (cost=0.00..0.02 rows=1 width=73) (actual time=59.987..59.988 rows=1 loops=1)                                                                                                                   
postgres-1  |     Buffers: shared hit=524

The same 14 rows have been scanned and sorted to identify the final 10 results. However, this method won't scale effectively as more rows are added.

For instance, if I insert ten thousand rows for this value and run the pagination again, the inefficiency becomes apparent:

for (let i = 0; i < 10000; i++) {   
 db.demo.insertOne( {
  a: 1 ,
  b: Math.random(),
 ts: new Date()
} );
}

db.demo.find( { a: 1 } ).sort({ts:-1}).limit(10)
;

Here is the PostgreSQL EXPLAIN (ANALYZE, BUFFERS) from Auto Explain:

postgres-1  | 2025-05-01 15:14:08.942 UTC [250] LOG:  duration: 0.027 ms  plan:
postgres-1  |   Query Text: SELECT * FROM documentdb_api_catalog.collections WHERE database_name = $1 AND collection_name = $2                                                                                                                           
postgres-1  |   Query Parameters: $1 = 'test', $2 = 'demo'
postgres-1  |   Index Scan using collections_pkey on collections  (cost=0.15..8.17 rows=1 width=248) (actual time=0.024..0.024 rows=1 loops=1)                                                                                                           
postgres-1  |     Index Cond: ((database_name = 'test'::text) AND (collection_name = 'demo'::text))
postgres-1  |     Buffers: shared hit=2
postgres-1  | 2025-05-01 15:14:08.942 UTC [250] CONTEXT:  SQL statement "SELECT * FROM documentdb_api_catalog.collections WHERE database_name = $1 AND collection_name = $2"                                                                             
postgres-1  | 2025-05-01 15:14:09.042 UTC [252] LOG:  duration: 0.018 ms  plan:
postgres-1  |   Query Text: SELECT (index_spec).index_name FROM documentdb_api_catalog.collection_indexes WHERE index_id = 35                                                                                                                            
postgres-1  |   Seq Scan on collection_indexes  (cost=0.00..1.09 rows=1 width=32) (actual time=0.010..0.011 rows=1 loops=1)
postgres-1  |     Filter: (index_id = 35)
postgres-1  |     Rows Removed by Filter: 7
postgres-1  |     Buffers: shared hit=1
postgres-1  | 2025-05-01 15:14:09.043 UTC [250] LOG:  duration: 88.349 ms  plan:
postgres-1  |   Query Text:
postgres-1  |   Limit  (cost=533.41..533.44 rows=10 width=85) (actual time=88.334..88.336 rows=10 loops=1)
postgres-1  |     Buffers: shared hit=185
postgres-1  |     ->  Sort  (cost=533.41..558.41 rows=10000 width=85) (actual time=88.333..88.334 rows=10 loops=1)
postgres-1  |           Sort Key: (documentdb_api_catalog.bson_orderby(document, 'BSONHEX0d00000010747300ffffffff00'::documentdb_core.bson)) DESC NULLS LAST                                                                                             
postgres-1  |           Sort Method: top-N heapsort  Memory: 27kB
postgres-1  |           Buffers: shared hit=185
postgres-1  |           ->  Bitmap Heap Scan on documents_15 collection  (cost=2.50..317.32 rows=10000 width=85) (actual time=1.840..15.702 rows=10014 loops=1)                                                                                          
postgres-1  |                 Recheck Cond: (document OPERATOR(documentdb_api_catalog.@=) 'BSONHEX0c0000001061000100000000'::documentdb_core.bson)                                                                                                       
postgres-1  |                 Heap Blocks: exact=169
postgres-1  |                 Buffers: shared hit=177
postgres-1  |                 ->  Bitmap Index Scan on "a_1_ts_-1"  (cost=0.00..0.00 rows=200 width=0) (actual time=1.795..1.795 rows=10014 loops=1)                                                                                                     
postgres-1  |                       Index Cond: (document OPERATOR(documentdb_api_catalog.@=) 'BSONHEX0c0000001061000100000000'::documentdb_core.bson)                                                                                                   
postgres-1  |                       Buffers: shared hit=8
postgres-1  | 2025-05-01 15:14:09.044 UTC [250] LOG:  duration: 105.136 ms  plan:
postgres-1  |   Query Text: SELECT cursorpage::bytea, continuation::bytea, persistconnection, cursorid FROM documentdb_api.find_cursor_first_page($1, $2::bytea, $3)                                                                                     
postgres-1  |   Query Parameters: $1 = 'test', $2 = '\x780000000266696e64000500000064656d6f000366696c746572000c000000106100010000000003736f7274000d00000010747300ffffffff00106c696d6974000a000000036c736964001e0000000569640010000000048d95dce6e29f4fb3b34276d28e0a528100022464620005000000746573740000', $3 = '0'
postgres-1  |   Function Scan on find_cursor_first_page  (cost=0.00..0.02 rows=1 width=73) (actual time=105.130..105.131 rows=1 loops=1)                                                                                                                 
postgres-1  |     Buffers: shared hit=685

For the same size of the result (rows=10), it had to read all index entries for the equality predicate (rows=10014), retrieve all documents (rows=10014) from many scattered pages (Heap Blocks: exact=169), and sort them. In my small lab where all pages are in memory, the impact on the response time is minimal, but it is sufficient to understand the scalability problem.
The time complexity of the query is proportional to the size of the table rather than the size of the result. You start production with acceptable response times but over the years it will slow down. During this, you reach the thresholds when the working set doesn't fit in cache and you suffer from disk latency. The user doesn't understand because the result is always the same: ten documents.

This is a very common scenario in OLTP but benchmarks that try to sell a PostgreSQL managed service as a document database do not run this in their benchmarks.

No flexible schema in SQL, even with JSON

In this case, optimizing the query could be feasible because all fields are at the document's root level, eliminating the need for an inverted index. However, while a B-Tree index on expressions like using btree on ( ((document->>a)::int) , ((document->>b)::float) ) could work, DocumentDB does not support ->> operators. Additionally, such an index would not address the issue, as document databases like MongoDB allow fields to be either scalar or flexible types such as objects or arrays.
PostgreSQL has many index types, and allow some schema flexibility with the JSON datatype, but, unlike a document database, the type of index depends on the schema.

Comparison with MongoDB

When executed on MongoDB, you will consistently find that 10 index keys are examined, and 10 documents are fetched, regardless of the collection's size:

executionStats: {
    executionSuccess: true,
    nReturned: 10,
    totalKeysExamined: 10,
    totalDocsExamined: 10,
    executionStages: {
      stage: 'LIMIT',
      nReturned: 10,
      works: 11,
      advanced: 10,
      limitAmount: 10,
      inputStage: {
        stage: 'FETCH',
        nReturned: 10,
        works: 10,
        advanced: 10,
        docsExamined: 10,
        alreadyHasObj: 0,
        inputStage: {
          stage: 'IXSCAN',
          nReturned: 10,
          works: 10,
          advanced: 10,
          keyPattern: { a: 1, ts: -1 },
          indexName: 'a_1_ts_-1',
          isMultiKey: false,
          multiKeyPaths: { a: [], ts: [] },
          isUnique: false,
          isSparse: false,
          isPartial: false,
          indexVersion: 2,
          direction: 'forward',
          indexBounds: { a: [ '[1, 1]' ], ts: [ '[MaxKey, MinKey]' ] },
          keysExamined: 10,
          seeks: 1,
          dupsTested: 0,
          dupsDropped: 0
        }
      }
    }
  },

Conclusion

I've written this to convey several messages at a time when some cloud vendors are riding the wave of MongoDB and PostgreSQL popularity to pretend offering both in one.

Ignore the so-called benchmarks that overlook critical use cases on purpose, such as pagination queries. Indexes help not only in locating rows based on equality or range predicates but also in retrieving them in the expected order for pagination.
Reduce your cloud bill and don't run unnecessary benchmarks on large instances. You can assess scalability in a small lab, laptop or free tier. The execution plan provides the best indicator: Are you reading only the necessary index entries and documents to build the result set, or are you scanning a growing database due to limited indexing options?
Do not trust the comparison based on feature similarity without understanding how it works. I've seen this recently on Reddit where many contributors overlooked that document databases must also index the documents for efficient queries:

you can just create a table in postgres that is a key and a JSON field and boom, you have a document store

This is wrong except if "document store" stands for full document access by primary key, like an object store. But in this case, why use PostgreSQL, especially given that it has only heap tables, which is not the best to access by primary key.

JSON in PostgreSQL enables document storage, yet it functions like other columns. Modifying even a single byte requires rewriting the entire document. You can index expressions with regular indexes, but only for single entries per document. Although multiple keys can be indexed within the same document, indexing options are more limited. Therefore, while JSON adds value to a relational schema, it's a misconception to think "boom, you have a document store" by storing all in JSONB or plugging a BSON extension to PostgreSQL.

It's encouraging to see RDBMS managed service vendors recognize the effectiveness of the document model and the MongoDB API, prompting them to emulate it in SQL functions. While PostgreSQL users may encounter fewer indexing options, this can serve as a valuable starting point for evaluating the developer experience before transitioning to MongoDB for improved performance, with multi-key indexes.

Microsoft CosmosDB: RUM instead of GIN but same limitations on JSON paths

Install DocumentDB

Test with FerretDB

Heap table and RUM index

Simple query (Equality)

Pagination query (Equality, Sort)

No flexible schema in SQL, even with JSON

Comparison with MongoDB

Conclusion

Comments (0)

Read More

#reading

#popular

Microsoft CosmosDB: RUM instead of GIN but same limitations on JSON paths

Install DocumentDB

Test with FerretDB

Heap table and RUM index

Simple query (Equality)

Pagination query (Equality, Sort)

No flexible schema in SQL, even with JSON

Comparison with MongoDB

Conclusion

Comments (0)

Read More

Backend Development Demystified: From Structure to Real-Life APIs

Streaming Postgres Changes with Debezium and Kafka Connect: A Hands-On Tutorial

How fsync and synchronous_commit Affect PostgreSQL Performance

We Listened: Pgai Vectorizer Now Works With Any Postgres Database

#reading

#popular