In this article, we will set up a MongoDB ReplicaSet with an Arbiter on EC2 using Cloudformation templates. First, MongoDB, Amazon EC2 and Amazon CloudFormation are briefly introduced, after which we provide the CloudFormation templates to set up the MongoDb replica set with the Arbiter.
Finally, we provide a CloudFormation template to set up the Mongo Monitoring Service on a separate EC2 instance.

It is recommended that readers already have some experience with MongoDb and Amazon EC2.

Introduction

What is MongoDB?

MongoDB is an open source document-oriented NoSQL database.

Instead of storing data in tables as is done in a “classical” relational database, MongoDB stores structured data as JSON-like documents with dynamic schemas(“BSON”).

Although earlier versions of MongoDB were criticized, “MongoDB is Web scale“, these issues were resolved(for example, journaling is enabled by default now). MongoDb is now mature and production ready, and it is used extensively by MTV, Craigslist and Foursquare among others.

Whether you should use it or not depends. This Dzone Reference Card on Getting Started with NoSql might help. It advises to base the choice around Brewer’s CAP Theorem: “It’s impossible for a distributed computer system to simultaneously provide Consistency, Availability and Partition Tolerance”. Decide which 2 of these you need most, and you know what type of database you need.

MongoDB offers a lot of the features RDBMS databases offer, such as adding indices to and querying on all fields. Some NoSql databases/services, such as Amazon DynamoDB lack this, which makes their use rather limited.

However, MongoDB does not enforce consistency. You can put a reference(DBRef) in a record in collection1(a MongoDB “collection” is the equivalent of a RDBMS table) to a record in collection2 and then delete the record in collection2. No error would be thrown. The reference would still be there, but when someone would try to resolve it, no record in collection2 would be found.

What MongoDB offers in return however, is easy replication and sharding, which makes it excel at Availability and Partition Tolerance. Unlike RDBMS databases, which are usually scaled vertically(“making the machine bigger”), MongoDB can be easily scaled horizontally( “adding more machines”).

Note: It is not true that RDBMS databases cannot be scaled horizontally. However, this is usually done by adding an extra layer between the application layer and the database layer, or by handling this in the application layer itself(for example, by adding code that is able to determine in what database a record resides). No need to say the required referential integrity usually makes this quite difficult, and in some cases, referential integrity is loosened a bit to make horizontal scalability for the RDBMS database possible.

Similarly, MongoDB databases usually never end up in an inconsistent state, since data is checked in the application layer before it is put in or removed from the database. Consistency is just not enforced by the database itself.

As a rule of thumb, if you need consistency more than horizontal scalability, go RDBMS, if you need horizontal scalability more than consistency, go NoSql.

What is Amazon EC2?

The Amazon Elastic Compute Cloud(Amazon EC2) is a central part of Amazon’s cloud computing platform Amazon Web Services(AWS).
Users can rent virtual computers on EC2 to which they get full root access.
They pay by the hour for active servers, hence the term “elastic”. EC2 makes scalable deployment of applications easy by providing a web service through which users can boot instances by providing a machine image(for example, a Linux Image on which Apache/Php/Wordpress/Mysql or Java/Tomcat is already installed).

Although using Amazon EC2 is not necessarily less expensive than buying your own hardware, it is a good choice for most startups, since instances can be easily added or removed, or replaced by larger or smaller ones. You can scale your costs with your load, and thus, you are protected against overinvestment and you are prepared for scaling up in a matter of minutes in case you get a traffic peak.

Many startups, such as Foursquare and BitBucket, run entirely on Amazon EC2/AWS.

What is Amazon CloudFormation?

There are 3 interfaces through which EC2 instances/AWS Resources can be set up. The first one is the web interface, which provides an intuitive graphical interface. The second one consists of the EC2 Command Line Tools, which make it possible to launch and manage instances through scripting and which are used by third party AWS tools such as the AWS/EC2 Plugin for Eclipse.

The third one is CloudFormation. With Amazon CloudFormation, you describe AWS resources in a template. You then instruct Amazon CloudFormation to read this template and initialize the resources associated with it.

If you’ve never used CloudFormation before, you should go through its Getting Started Guide. We will only discuss the basic layout of a Cloudformation template file here briefly. There are 4 sections that appear in almost every template. We will use snippets of a Tomcat 7 template as examples.

Parameters

When a template is loaded, CloudFormation prompts the user to enter a value for every parameter that is defined in the Parameters section.

An example Parameters section:

"Parameters" : {
        "KeyName" : {
            "Description" : "Name of an existing EC2 KeyPair to enable SSH access",
            "Type" : "String"
        },

        "InstanceType" : {
            "Type" : "String",
            "Default" : "m1.large",
            "AllowedValues" : [ "t1.micro", "m1.small", "m1.medium", "m1.large", "m1.xlarge", "m2.xlarge", "m2.2xlarge", "m2.4xlarge", "c1.xlarge", "cc1.4xlarge" ],
            "Description" : "EC2 instance type (e.g. t1.micro, m1.small, m1.medium, m1.large, m1.xlarge, m2.xlarge)"
        },

        "SecurityGroupName" : {
            "Description" : "Security group name for the instance",
            "Type" : "String"
        }
}

With the above example, the user will be prompted for the name of a ssh keypair, the type of the EC2 instance that will be created by the template and the name of the security group to which this instance will belong.

Mappings

In the Mappings section, mappings of certain values to other values are described.

An example Mappings section:

"Mappings" : {
        "RegionImageZone" : {
            "us-east-1"      : { "64" : "ami-e565ba8c"},
            "us-west-2"      : { "64" : "ami-3ac64a0a"},
            "us-west-1"      : { "64" : "ami-e78cd4a2"},
            "eu-west-1"      : { "64" : "ami-f9231b8d"},
            "ap-southeast-1" : { "64" : "ami-be3374ec"},
            "ap-northeast-1" : { "64" : "ami-e47acbe5"},
            "sa-east-1"      : { "64" : "ami-a6855bbb"}
        }
}

The above mapping could be used in a template to determine the code for the AMI(Amazon Machine Image) to be used for a certain region. The above mappings point to the basic Amazon Linux AMI for different regions. Since the template knows on which region it is executing, providing and using a mapping like this is usually a good idea.

Resources

This is the most important section of the template and describes the resources to initialize. EC Instances are typically seen here, but other kinds of AWS Resources can be defined as well, such as security rules and S3(Amazon Simple Storage Service, often used extensively with EC2 Instances) resources.

In our example, the only resource associated with the template is an EC2 instance on which a Tomcat 7 server will be installed:

"Resources" : {
        "Tomcat7" : {
            "Type" : "AWS::EC2::Instance",
            "Metadata" : {
                "AWS::CloudFormation::Init" : {
                    "config" : {
                        "packages" : {
                            "yum" : {
                                "apr-devel" : [],
                                "openssl-devel" : [],
                                "gcc" : [],
                                "java-1.6.0-openjdk-devel" : [],
                                "tomcat7": []
                            }
                        }
                    }
                }
            },

            "Properties" : {
                "InstanceType" : { "Ref" : "InstanceType" },
                "ImageId" : { "Fn::FindInMap" : [ "RegionImageZone", { "Ref" : "AWS::Region" }, "64" ] },
                "SecurityGroups" : [ { "Ref" : "SecurityGroupName" } ],
                "KeyName" : { "Ref" : "KeyName" },
                "Tags" : [
          			{"Key" : "Name", "Value" : "Tomcat7" }
        		]}
	}
}

Notice how both the Parameters (with “Ref”) and the Mappings(with “Fn::FindInMap”) section are used to init the instance.

Outputs

The Outputs of the template. This could be the public ip of the created instance.

An example outputs section:

"Outputs" : {
        "InstanceName" : {
            "Value" : { "Fn::GetAtt" : [ "Tomcat7", "PublicDnsName" ] },
            "Description" : "public DNS name of the new Tomcat7"
        }
    }

Outputs are visible in the “Outputs” tab of the CloudFormation web interface:

Amazon CloudFormation Template Outputs

The CloudFormation templates for setting up a MongoDb replicaset with an Arbiter

What is a MongoDb replicaset?

A MongoDb replicaset is a form of asynchronous master/slave replication, adding automatic failover and automatic recovery of member nodes. It is heavily recommended to never deploy MongoDb to one standalone machine, but always to replicasets, in sets of 3 or a higher uneven number of machines. A normal full replicaset features 3 nodes that store data. In a replicaset with 2 nodes and an arbiter, only the first 2 nodes store data. The master only replicates the data to the one slave. The arbiter is only there in case a new master needs to be elected. It is necessary to prevent MongoDb to go in read-only mode if one of the two other nodes goes down.

The 10gen templates vs our templates

10gen, the MongoDB company, already has extensive documentation on how to setup MongoDB on EC2, including documentation on how to setup a replicaset on EC2, and even provides Cloudformation templates to setup a replicaset.

Our templates, which describe a (more budget) MongoDb replicaset consisting of 2 normal nodes and an arbiter as opposed to a replicaset consisting of 3 normal nodes – are heavily based on these 10gen Cloudformation templates.

We did make some changes to these templates which we consider improvements though:

  • Instead of 4 S3 volumes in RAID 10 per node, our template inits 8 S3 volumes in RAID 10 per node. This is what the 10gen documentation recommends.
  • The nodiratime directive was added to all mounts in /etc/fstab. This is also what the 10gen documentation recommends.
  • File descriptor limits were raised in /etc/security/limits.conf(“* hard nofile 100000″ and “* soft nofile 100000″) Raising file descriptor limits is also recommended by 10gen for production deployments.

We also made the following changes. These are not necessarily improvements:

  • Security group definitions are not included in our templates, so you need to create these manually yourself before you can use our templates. Although this requires a bit more EC2 Knowledge, you probably don’t want to tie these security groups to only this Cloudformation stack. If you run the template again later, you probably want to put the new instances in the same security group again, not create a new one with the same security rules.
  • Although 10gen recommends to not use anything smaller than a large instance(m1.large) for MongoDB production deployments, large instances are expensive to test on. Therefore, we made it possible to select the Micro(t1.micro) and Small(m1.small) instance types for the normal MongoDb nodes as well.

The CloudFormation templates for a MongoDb replicaset with an Arbiter

IMPORTANT! Remember that running these templates on CloudFormation will create AWS resources and Amazon will bill you for AWS resource usage! These templates will create EC2 instances, S3 volumes and some IAM security resources. When t1.micro instances are chosen, it should be possible to test these templates entirely on the Amazon AWS Free Tier, but we do not guarantee this.

The templates:

  • ReplicaSetWithArbiter.template: This template will first initialize a slave and an arbiter instance by using the templates below, and then init the master instance.
  • MongoDbServer.template: This template describes the instance that will be configured as the slave of the replicaset. When run seperately, this creates a standalone MongoDb server.
  • Arbiter.template: This template describes the Arbiter instance.

To setup the replicaset, it is enough to provide the first url to CloudFormation. It will call the other two templates itself.

If you are heading towards a production MongoDB deployment, it is probably best to download and host these templates yourself. In that case, you will need to change the TemplateUrl of the Arbiter and the Secondary instance in the ReplicaSetWithArbiter.template file to your own url.

A CloudFormation template for the Mongo Monitoring Service

The Mongo Monitoring Service(MMS) enables you to proactively monitor your MongoDb cluster. First, you set up an mms agent that has access to your MongoDb nodes. This agent will then fetch metrics from your mongos and send these to 10gen servers, which will show you different graphs about your MongoDb deployment.

First, you will need to sign up for this monitoring service through http://www.10gen.com/mongodb-monitoring-service. After signing up, you can retrieve your API key and Secret key, both necessary to setup the agent.

You can then use the following CloudFormation template to set up the mms agent on an EC2 Micro instance:

MongoMonitoringServer.template

After setting up this Micro instance, you only need to add the hosts(in case of a replicaset, at least one ip to a mongo node; mms will find the other nodes) through the mss web interface, as explained in its documentation.