Tags

General Issues

Why to choose MongoDB?

  • Because MongoDB scales good horizontally. Just remember that horizontal scaling means communication overhead between elements to make them run in coordination. Besides, increased number of parts increases chance of failure of individual elements, so these should be redundant. Generally there is a trade off between functionality and performance and MongoDB tries to add features up to point without degregating scaling ability. MongoDB currently does not support joins (but does use embedding, keeping  generally used data together in the first place as JSON document) and complex transactions (since distributed transactions need concurrency control that is hard to scale) in order to increase scalabiliy.
  • Because MongoDB enables rapid development of production quality applications.
  • Because MongoDB supports complex data types.

 

Why JSON is used?

JSON is a good way of dealing structured documents that is very readable and it is also closer to developers in representing data of objects.

JSON data types are: numbers, boolean, string, array, object, null

BSON is binary representation of JSON, that enables fast scanning and offers extra data types such as ObjectId, Date and BinData.

MongoDB has dynamic schema much like dynamic typed languages, it is not pre declared and resolves in compile time. This gives agility in application development and flexibility in data representation as requirements evolve over time.

In order to start an instance, create database directory and fire one mogod instance

mkdir /data
mongod --dbpath /data --fork --logfile /data/log.a

Help
For database level help

mongod> help
mongod> db.help

For collection level help

 mongod> db.mycoll.help

For sharding level help

 mongos> sh.help

mongoimport

mongoimport enables importing collections from raw files such as; json, tsv and csv and has a pipe like architecture. In the files, each document shouldbe represented in its own line.

mongoimport --stopOnError --db mydb --collection mycoll < products.json

will read data to mydb database mycoll collection. Operation will halt with the first error encountered.If there is not an explicit “_id” at the document, one is created for you.

Queries

 mongod> db.bycles.find().limit(10).toArray()

gets whole query to javascript array, without iterating 20 by 20 so it is better to put a limit.

 mongod>db.bycles.find().limit(10).skip(2)

remember that query does not get run until all of these are applied in server side.

Queries $gt, $gte, $lt, $lte, $or, $in, $type, $not, $nin (not in), $exists
Updates $inc, $set, $addToSet

 mongod> db.bycles.find({for:{$exists:true})

sort sorting will not filter the entities that are not present. If you want to filter out those, filter by exists as;

 mongod> db.bycles.find({price:{$exists:true}}).sort({price:-1})
 var cursor = db.bycles.find().limit(100); while (cursor.hasNext()) print(cursor.next().x);

Remember that three member replica set is the simplest production ready configuration that is recommended.

update
Update may be full document update or partial update with fields {upsert:true/false}, {multi:true/false}, upsert makes sure if update field does not exist, place one.

db.bycle.update({"_id": "kron"}, {$inc: {sales:1}}, true)

, makes if not sold at al, set it to 1

save is a mongo shell operation (not server) for update. Assume that my_obj is a json object. Then;

 db.bycle.update({_id: my_obj.id}, my_obj)

may be replaced with

 db.bycle.save(my_obj)
 db.bycle.update({_id:100}, {$set: {price: 100}})

to add key value

 db.bycle.update({_id:100}, {$push: {review_scores: 77}})

to push a value into an array review_scores, create the array if not already present.

 db.bycle.update({_id:100}, {$addToSet: {review_scores: 77}})

to push a value into an array review_scores, if not already present.

 db.bycle.remove({_id:100})

to remove from collection.

 db.bycle.remove({})

to remove all documents from collection.
BSON wire protocols are CRUD, Querry, Insert, Update, Remove, GetNext
BULK operations may be ordered or unordered

var operation = db.bycles.initializeOrderedBulkOp() / initialize.unorderedBulkOp();
operation.find({item: "abc"}).remove()
operation.find({item: "efg"}).update({$inc:{points:1}})
operation.execute()
db.runCommand({getLastError:1, timeout:10})
db.ensureIndex
db.dropIndex
db.currentOp
db.killOp
db.bycles.stats()

will give information about collection statistics

db.bycles.drop()

will remove collection including catalog data, its own beeing which is different than remove({})

db.serverStatus

will give detailed information about server status

databases

local database will be used in replication and will keep startup log

storage engine

storage engines are interface between mongodb server and hardware it is running on. It effects how data is written,stored and read from disk. It determines format of indexes and data file format on disc.

mmap

mongod --storageEngine mmapv1

memory map, that maps files into virtual memory and when the data of interest is not in the memory, a page fault occurs and fsync is performed to write changes back. Mmap performs collection level locking. Journal stores what you are about to do, then do what you want to keep data consistent in the event of failure. Data in memory is directly mapped, therefore is in BSON format. Mmpap uses power of 2 allocation which results in less move of constant rate growing documents, less fragmentation and prevents movement of documents for a small increment. Document move is not good because it requires index updating.

db.dropDatabase()
db.createCollection("foo", {noPadding: true})

In order to disable power of 2 allocation (maybe we know that our documents are fixed, and we want to save space).

WiredTiger

mongod --storageEngine wiredTiger

WiredTiger provides compression and document level locking. It is per mongod. It stores data in Btrees. WiredTiger compressions are snappy(default) for fast, zlib for more compression, and choice for no compression is also present.

Indexes

creating discovering and deleting indexes may be performed by,

db.bycles.createIndex({a:1, b:1}, {unique:true})

to create uniqe indexes

db.bycles.createIndex({a:1, b:1}, {sparse:true})

for creating sparse indexes to save space on index, by not pointing eachdocument

db.bycles.createIndex({a:1, b:1}, {expireAfterSeconds: 3600})

for creating TTL indexes, that documents will become obselete after some amount of time

db.bycles.getIndexes()
db.bycles.dropIndex()
db.bycles.createIndex({loc: "2d"})

2 dimensional Cartesian index

db.bycles.createIndex({loc: "2dsphere"})

2 dimensional sphere geospatial index
and index scan will be much faster than table scan / collection scan. They are implemented by Btrees. By default duplicate keys are allowed, which may be disabled by unique option. Index keys may be any type, mixed keys are also possible. _id index is automatically created. Arrays also may be indexed (multi key indexes), each element in array forming an entry. Subdocuments or subfields may be indexed

regular expression

 db.bycles.find({name: /in/})

in in searched text

 db.bycles.createIndex({name: "text"})

special text index in a string field. Each individual word will be indexed seperately in a Btree much like multikey index

 db.bycles.find({$text: {$search: "canondale"}})

to search in text index

 db.bycles.createIndex({price:1}}, {background:true})

to create index at background for read write availability in primary. In secondaries, indexes are allways created in foreground blocking operations meanwhile.
Generally more indexex makes read faster but write slower. And İt will be faster to make import and build indexes rather than create index and import bulk data.

Usually read operations and write operations in primary are safe to kill. Killing writes in secondaries will make synch problems. Compact command job is also should not be killed. Do not kill internal operations such as migration.

 db.currentOp() 
 db.killOp() 
 db.setProfilingLevel(1,10)

0 = off, 1 = slower ones (and a ms limit for slow limit), 2 = on

 db.system.profile.find() 

will give the result stored in profile collection

 db.getProfilingStatus()
 db.system.namespaces.find()

to see that profile logs are small (to fit in memory)and fast write (without index)circular queue

 db.system.profile.stats() 

to see profile statistics

mongostat --port 27003

command line binary that will resemble iostat. number of insert, delete, query, commands, flushes(data files to background fsync every 60sec), storage engine mapped memory size, page faults and virtual size and database locking, network traffic, and number of connections.

mongotop --port 27003

collection level read write durations.

Replication

Replication is keeping redundant copies of data, used for high availability, durability (data safety), disaster recovery, and sometimes scalability (read from secondaries for geographic purposes). Asynchronous replication is used because of possible latency issues on commodity network and therefore there is eventual consistency. MongoDB replication is statement based, which means, replicate statement and execute these statement on secondaries (however, these may be converted into more basic statements, one remove should be converted into more basic _id based removes) . Replication is also possible for servers running different storage engines with compression or not. Also different version of mongod may be running in members of replica set to enable rolling update.
MongoDB drivers are replica set aware. Replication provides automatic fail over and automatic node recovery. In Mongo writes go to primary, but reads may go to secondaries with ReadPreference option.
We specify a replica set name for our system to provide a namespace. To start a replica set that will wait for initialization (assuming same host with working directory of /Users/db/

mkdir db1 db2 db3;
mongod --port 27001 --replSet cycle1 --dbpath /Users/db/db1/ --logpath /Users/db/log.1 --logappend --smallfiles --oplogsize 50 --fork
mongod --port 27002 --replSet cycle1 --dbpath /Users/db/db2/ --logpath /Users/db/log.2 --logappend --smallfiles --oplogsize 50 --fork
mongod --port 27003 --replSet cycle1 --dbpath /Users/db/db3/ --logpath /Users/db/log.3 --logappend --smallfiles --oplogsize 50 --fork

then in order to initiate data, connect to one of the members;

mongo --port 27001
var cfg = {_id: "cycle1", members: [{_id:0, host:"localhost:27001"},{_id:1, host:"localhost:27002"},{_id:2, host:"localhost:27003"}]};
rs.initiate(cfg);
rs.status();
rs.reconfig();
db.isMaster();
rs.conf();

better to use host:”10gen.local:27001″ instead of localhost. Best practice is do not using an ip address and names from /etc/hosts. Use dns and pick an appropriate ttl record (ttl of a few minutes, 1 to 5 minutes). opTimeDate will indicate the last write operation of member in replica set. lsatHearthBeat will give the status of other members from the rs.status() member point of view.
To disable a member becoming primary for five minutes, we may use;

rs.stepDown(300);
rs.freeze(300);

Replica set information will be stored in local database, which oplog, system catalog will be stored and will not be replicated. The same as rs.conf()

use local
db.system.replSet.find().pretty()
rs.slaveOk()

to read from secondaries (to accept eventually consistent reads). Reasons for reading from secondary may be geographic reasons (to avoid latency), availability in time of fail over and workload distribution (make analytic server use data from secondary). Read preference options are:primary, primary preferred, secondary, secondary preferred and nearest in terms as network latency. When opening connection from driver, we may specify one of these. We may use nearest when we are in a remote. We may use secondary for analytic jobs. For even read nodes, we may consider nearest also.

rs.reconfig()

command should be applied to primary, therefore we must have majority to be up, to select a primary to reconfigure.
Arbiter nodes will have no data at all. They are just used to vote in order to break ties.

var cfg = {_id: "cycle1", members: [{_id:0, host:"localhost:27001", arbiterOnly: true},{_id:1, host:"localhost:27002", priority:0},{_id:2, host:"localhost:27003", hidden: true}]};

Zero priority means never be eligible to be primary. Hidden members can not be primary, clients can not see the hidden servers, and can not query those. slaveDelay : 8.3600 to lag 8 hours, delay, rolling backup to guard against hidden finger problems.
If write is propagated to majority, then it is durable.

db.cycless.update( { _id : "kron" }, { $set : { comment : "A" }, { w : 3 } )

if we want an write acknowledgement for cluster wide commit, we may use,

db.cycles.insert({"model": "kron"});
db.getLastError({w: "majority", timeout: 8000})

There may be several use cases.
If there is trivial web page view count increment with no user impact, or I may have log server, we may not choose to get ack about update. However, if i make anything important, i should check for majority acknowledgement, then calling getLastError() is a way to be sure about cluster wide writes and should be default method. Waiting for “all” would probably for something flow control, maybe in batch writing a million document. Write concern of 1 will be basic error checking, maybe for duplicate key checking. We may also choose to call once in N. Remember that we do not need to call getLatError() in default write concerns.
Since MongoDB replication is based on operations instead of bytes, different storage engines may be used in replication sets.

Scalability

We may choose to connect a mongo with a helper script, and my functions in sript will be available in the shell

mongo --shell setup_script.sh --port 27107

Mongo uses range based sharding on shard key. Metadata of key range vs shard location will keep location of data. Keeping range based will make range queries somewhat more efficient.

db.cycles.find({"brand": "/^k/"})

will make starting with k, probably if sharded through brand name, may yield use of same shard. Fewer chunk sizes requires more migrations but each shard will be more balanced (default is around 64MB each). Chunks getting larger will be split, and when there is unbalance in number of chunks, they will be migrated. In migration, the chunk still is readable and writable, so live. Balancer tries to balance number of chunks. Config servers are small mongod processing storing metadata about shards. They synchronize the same data with 2 phase commit in config server set. If one config server is down, metadata change (split and migrate) can not be possible, but other operations is possible. mongos are just load balancers, they do not store data, get which shard to connect from config servers and join coming results if required.

# start_shard_cluster.sh
# do dot forget to make script executable by chmod +x start_shard_cluster.sh
# create directories for shard mongod instances
mkdir a0 a1 a2 b0 b1 b2 c0 c1 c2 d0 d1 d2;
# create directory for config server metadata
mkdir cf0 cf1 cf2;
# start config servers
mongod --configsvr --dbpath cf0 --port 26050 --fork --logpath log.cf0 --logappend
mongod --configsvr --dbpath cf1 --port 26051 --fork --logpath log.cf1 --logappend
mongod --configsvr --dbpath cf2 --port 26052 --fork --logpath log.cf2 --logappend
# start mongos
mongod --shardserv --replSet a --dbpath a0 --logpath log.a0 --port 27000 --logappend --smallfiles --oplogSize 50 
mongod --shardserv --replSet a --dbpath a1 --logpath log.a1 --port 27001 --logappend --smallfiles --oplogSize 50 
mongod --shardserv --replSet a --dbpath a2 --logpath log.a2 --port 27002 --logappend --smallfiles --oplogSize 50 
mongod --shardserv --replSet b --dbpath b0 --logpath log.b0 --port 27100 --logappend --smallfiles --oplogSize 50 
mongod --shardserv --replSet b --dbpath b1 --logpath log.b1 --port 27101 --logappend --smallfiles --oplogSize 50 
mongod --shardserv --replSet b --dbpath b2 --logpath log.b2 --port 27102 --logappend --smallfiles --oplogSize 50 
mongod --shardserv --replSet c --dbpath c0 --logpath log.c0 --port 27200 --logappend --smallfiles --oplogSize 50 
mongod --shardserv --replSet c --dbpath c1 --logpath log.c1 --port 27201 --logappend --smallfiles --oplogSize 50 
mongod --shardserv --replSet c --dbpath c2 --logpath log.c2 --port 27202 --logappend --smallfiles --oplogSize 50 
mongod --shardserv --replSet d --dbpath d0 --logpath log.d0 --port 27300 --logappend --smallfiles --oplogSize 50 
mongod --shardserv --replSet d --dbpath d1 --logpath log.d1 --port 27301 --logappend --smallfiles --oplogSize 50 
mongod --shardserv --replSet d --dbpath d2 --logpath log.d2 --port 27302 --logappend --smallfiles --oplogSize 50
# start mongos processes
mongos --configdb 10.gen.local:26050,10.gen.local:26051,10.gen.local:26052 --fork --logappend --logpath log.ms0 --port 27017
mongos --configdb 10.gen.local:26050,10.gen.local:26051,10.gen.local:26052 --fork --logappend --logpath log.ms1 --port 26061
mongos --configdb 10.gen.local:26050,10.gen.local:26051,10.gen.local:26052 --fork --logappend --logpath log.ms2 --port 26062
mongos --configdb 10.gen.local:26050,10.gen.local:26051,10.gen.local:26052 --fork --logappend --logpath log.ms3 --port 26063
echo
ps -A | grep mongo
# check very last line of each log to see if something is wrong
echo
tail -n 1 log.cf0
tail -n 1 log.cf1
tail -n 1 log.cf2
tail -n 1 log.a0
tail -n 1 log.a1
tail -n 1 log.a2
tail -n 1 log.b0
tail -n 1 log.b1
tail -n 1 log.b2
tail -n 1 log.c0
tail -n 1 log.c1
tail -n 1 log.c2
tail -n 1 log.d0
tail -n 1 log.d1
tail -n 1 log.d2
tail -n 1 log.ms0
tail -n 1 log.ms1
tail -n 1 log.ms2
tail -n 1 log.ms3

As best practice, run mongos at 27017, default mongo access port, and do not use 27017 at config servers or shard mongod, since typically, these need not and should not be accessed by clients. Then for each shard, we need to initiate the replica set, and then add the shard to the cluster

#just connect toone in a set and add the others
mongo --port 27000
rs.status()
rs.initiate()
rs.add("10.gen.local:27001")
rs.add("10.gen.local:27001")
rs.status();
rs.conf();
#just connect toone in a set and add the others
mongo --port 27100
rs.status()
rs.initiate()
rs.add("10.gen.local:27101")
rs.add("10.gen.local:27101")
rs.status();
#just connect toone in a set and add the others
mongo --port 27200
rs.status()
rs.initiate()
rs.add("10.gen.local:27201")
rs.add("10.gen.local:27201")
rs.status();
#just connect toone in a set and add the others
mongo --port 27300
rs.status()
rs.initiate()
rs.add("10.gen.local:27301")
rs.add("10.gen.local:27301")
rs.status();
# connect to mongos, if port is ommited it will be 27017
mongo --port 27017
# add shard with set name host namesyntax
sh.addShard("a/10.gen.local:27000");
sh.addShard("b/10.gen.local:27100");
sh.addShard("c/10.gen.local:27200");
sh.addShard("d/10.gen.local:27300");
sh.status()

In mongos we may look shards as

use config
db.shards.find()

Sharding a collection
By default, collections are not sharded. All unsharded collections are on first shard of cluster.

# enable sharding of database
mongos> sh.enableSharding(cycles)
# enable sharding of collection giving full name and specify a shard key, and say if this is unique or not
mongos> sh.shardCollection("cycles.brands", {_id:1}, true)

Look for cardinality and granularity in choosing shard keys, if required choose compound shard keys to increase granularity.
When we have bulk initial loads, we may want to pre split data. This is because, we may be loading to primary shard faster than automatic migration. This may be done;

mongos> sh.splitAt("cycle.brands", {"price": 2000})

Some best practices on sharding will be

  • Shard if collection is big, else extra complexity added will not be justified.
  • Be carefulof monotonicly increasing shard keys such as timestamp of bson id’s, since these will redirect inserts to one shard
  • We may consider prespliting manually in case we need to use bulk inserts
  • Shard keys are fixed and can not be changed later
  • Adding new shards tocluster is easy but takes some time for chunks to migrate
  • Use logical names, especially in config servers. Let DNS do the job for resolving ip’s
  • Put mongos on default ports, and avoid shard mongod instances from direct access to clients
  • Security

    Security options in mongo include

  • –auth for securing client access, mongos and mongod will be run with –auth
  • –keyFile for inter cluster security that uses a shared key
  • Besides, to run mongodb with encryption over the wire we should compile mongo with —ssl option. By default authentication is performed with encryption but data will be transferred plain.

     mongod --dbpath newdb --auth 

    will allow connection from loclhost to create the first user. admin database will store users and roles of cluster and system wide.

    use admin
    db.createUser({"user": "sifa", "pwd": "kismet", "roles": ["userAdminAnyDatabase"]})
    

    Then connection may be performed by specifying username.

    mongo localhost/admin -u sifa -p kismet
    

    This user will be able to create users but not read and write. After login, create a user eligible to read and write as (without administrative permissions such as create users);

    db.createUser({"user": "joe", "pwd": "dalton", "roles":["readWriteAnyDatabase"]})
    db.createUser({"user": "avarel", "pwd": "dalton", "roles":["readWrite"]})
    

    Notice that avarel only has privileges to read and write the database specified.
    Some possible roles are;

  • read
  • readAnyDatabase
  • readWrite
  • readWriteAnyDatabase
  • dbAdmin
  • dbAdminAnyDatabase
  • userAdmin
  • userAdminAnyDatabase
  • clusterAdmin
  • Backups for individual server, replica set may be performed by

  • mongodump –oplog mongorestore –oplogReplay will restore specific database, oplog options is good forhot backup.
  • file system snapshot here we must be sure that journalling is enabled, otherwise, snapshot may be lagging.
  •  db.fsyncLock() 

    will flush all the data to disc and prevent any further write, to make taking file system snapshot easily

  • backup from secondary make secondary offline, copy files and makeup again
  • Sharded cluster backup

  • Turn of balancer sh.stopBalancer()
  • in order to be sure that there is no metadata movement

  • Backup config database
  • by mongodump –db config , or stop one of them and copy files

  • Backup one member of each shard
  • Start balancer, sh.startBalancer()
  • # stop balancer
    mongo --host my_mongos --eval "sh.stopBalancer()"
    # take config dump
    mongodump --host my_mongos_or_configs --db config backups/configdb
    # take shard backups
    mongodump --host my_probably_shard_secondary_1 --oplog backups/shard1
    # start balancer
    mongo --host my_mongos--eval "sh.startBalancer()"
    

    Capped collections
    Capped collections are basically circular queues that have pre allocated maximum size, and elements can not be manually deleted or updated.
    TTL collections
    These are auto deleted documents with special index
    GridFS
    For storing large blobs than BSON limit of 16MB/doc.
     .

    Advertisements