In this article we will explain how to load test a MongoDB cluster.
For this demo, we are going to create a sharded cluster (3 shards) that uses MongoDB 6.x in Atlas using the Atlas UI.
Once created, we will whitelist our IP address, create some credentials and load some data.
To do so we can use a very useful tool: mgenerate4j GitHub repo This tool will request a json template that will use to generate documents on the fly and insert them into MongoDB.
If you don't want to compile it, you can download the latest release from this link: https://github.com/dioxic/mgenerate4j/releases/tag/v0.0.7
wget https://github.com/dioxic/mgenerate4j/releases/download/v0.0.7/mgenerate.jar
Below I share with you an example of this template and you can find the documentation here: https://dioxic.github.io/mgenerate4j/
{
  "name": "$name",
  "age": "$age",
"field1": "$paragraph",
"field2": {"$string": {
            "length": 20,
            "pool": "MONGO40*"
        }},
  "emails": {
    "$array": {
      "of": "$email",
      "number": 20
    }
  }
}
Now, you can execute the following command to start inserting documents:
java -jar mgenerate.jar load --uri "connstring" template.json --collection test -n 10000
After executing this command you should see the following output:
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
time                    inserts/s     operations/s     bulk ops/s     latency p50     latency p95     latency p99     load factor     progress
---------------------------------------------------------------------------------------------------------------------------------------------------
2025-01-09T11:19:54     197           197              2              996ms           996ms           996ms           196.0%          3%
2025-01-09T11:19:55     5302          5302             53             61.4ms          496ms           996ms           591.0%          54%
2025-01-09T11:19:56     8331          8331             83             47.6ms          108ms           141ms           424.0%          100%
Completed in 2.57s (3886 inserts/s)
Now, if this is a sharded cluster, and you haven't shard the collection, remember the data will be stored only in one of the shards (the primary shard) therefore you will not use all the processing power from the cluster.
You can connect to the cluster using mongosh
mongosh "connstring"
After connecting, you can shard the collection:
sh.shardCollection("test.test", {_id : 1})
This will shard the collection by _id but as the template used by mgenerate4j doesn't have an _id field, the driver will create one for you that always will be and ObjectId.
The 12-byte ObjectId consists of:
- A 4-byte timestamp, representing the ObjectId's creation, measured in seconds since the Unix epoch.
- A 5-byte random value generated once per process. This random value is unique to the machine and process.
- A 3-byte incrementing counter, initialized to a random value.
As you can see, the fact of containing a timestamp converts this _id in a monotonically increasing id which means that all the writes will go to the same shard. This is called a hot shard and will stop us from using all the processing power of the cluster.
If you write lots of data to the shard cluster, you will see that the data is equally distributed anyway, and this is because there is a process called balancer that will split the chunks in a shard and move them from one shard to another. This will add read load to the shard that is receiving all the writes and write load to the destination shard.
For this case, we will leave it like this, but in other articles we will see how to overcome this problem.
Now, let's insert the first 1 million of documents.
time                    inserts/s     operations/s     bulk ops/s     latency p50     latency p95     latency p99     load factor     progress
---------------------------------------------------------------------------------------------------------------------------------------------------
2025-01-09T11:32:46     5143          5143             51             70.2ms          134ms           153ms           391.0%          99%
2025-01-09T11:32:46     4796          4796             48             69.6ms          377ms           377ms           502.0%          100%
Completed in 191s (5229 inserts/s)
It took 3 min which is a lot, it's common to see 50,000 inserts/s instead of x10 less.
We are going to try with multiple instances of mgenerate4j, to launch the process in the background we can add the following to the execution:
> /dev/null 2>&1 &
I executed the same test with 4 instances and then with 8 instances of mgenerate4j in the same laptop.
| Number of instances | inserts/s | 
|---|---|
| 1 | 6,000 | 
| 4 | 40,000 | 
| 8 | 65,000 | 
In the screenshot below you can observe the number of inserts/s on the mongos.
