Spec Serve

Villeroy & Boch New Cott.Spec.Serve Salad Schale tief 21x18cm 10

I love to cook. Sometimes, my guests like something so much that they ask for the recipe. Occasionally, I have to confess there isn't one — I just made it up as I went along!

Improvisation is fine in the kitchen, but it's not a great approach for consistency in software development. The MongoDB Drivers team is responsible for writing and maintaining eleven drivers across ten languages. We want our drivers to have similar behaviors, even while staying idiomatic for each language.

One way we do that is by writing and sharing driver specification documents for those behaviors that we'd like to have in common across all drivers. Just as a recipe helps a chef serve a consistently great dish night after night, these specifications guide software development for consistency across all drivers, at MongoDB and our community.

One of the most recent specifications we've developed covers server selection. Production MongoDB deployments typically consist of multiple servers, either as a replica set or as a sharded cluster. Server selection describes the process by which a driver chooses the right server for any given read or write operation, taking into account the last known status of all servers. The specification also covers when to recheck server status and when to give up if an appropriate server isn't available.

The rest of this article describes our design goals and how server selection will work in the next generation of MongoDB drivers.

Design Goals

The most important goal is that server selection be predictable. If an application is developed against a standalone server, later deployed in production against a replica set, then finally used with a sharded cluster, the application code should be constant and only need appropriate changes to configuration.

For example, if some part of an application queries a secondary, that should succeed with a standalone server (when the notion of primary and secondary is irrelevant), work as expected against a replica set, and keep working in a sharded cluster where secondary reads are proxied by a mongos.

The second design goal is that server selection be resilient whenever possible. That means that in the face of detectable server failures, drivers should try to continue with alternative servers rather than immediately fail with an error.

For a write, that means waiting for a primary to become available or switching to another mongos (for a sharded cluster). For a read, that means selecting an alternative server, if the read preference allows.

The third design goal is that server selection be low-latency. That means that if more than one server is appropriate for an operation, servers with a lower average round-trip time (RTT) should be preferred over others.

Overview of the Server Selection Specification

The Server Selection specification1 has four major parts:

  1. Configuration
  2. Average Round-Trip Time (RTT)
  3. Read Preferences
  4. Server Selection Algorithm

Configuration

Server selection is governed primarily by two configuration variables:

  • serverSelectionTimeoutMS. The serverSelectionTimeoutMS variable gives the amount of time in milliseconds that drivers should allow for server selection before giving up and raising an error. Users can set this higher or lower depending on whether they prefer to be patient or to return an error to users quickly (e.g. a "fail whale" web page). The default is 30 seconds, which is enough time for a typical new-primary election to occur during failover.
  • localThresholdMS. If more than one server is appropriate for an operation, the localThresholdMS variable defines the size of the acceptable "latency window" in milliseconds relative to the server with the best average RTT. One server in the latency window will be selected at random. When this is zero, only the server with the best average RTT will be selected. When this is very large, any appropriate server could be selected. The default is 15 milliseconds, which allows only a little bit of RTT variance.

For example, in the illustration below, Servers A through E are all appropriate for an operation – perhaps all mongos servers able to handle a write operation – and the localThresholdMS has been set to 100. Server A has the lowest average RTT at 15ms, so it defines the lower bound of the latency window. The upper bound is at 115ms, thus only Servers A, B and C are in the latency window.

Hewlett Packard HP ProLiant ML10 v2 Tower Server System Intel Dual-core i3-4150 3.5 GHz 8 GB RAM 500GB SATA 7.2K
Personal Computer (Hewlett Packard)
  • Micro Tower 4U 1 x Intel Dual-core i3-4150 3.50 GHz
  • 8GB Memory 500 GB HDD
  • Matrox G200
  • 350W Power

Related posts: