M201 MongoDB Performance study notes day I



This will serve as a small memento of  M201 MongoDB Performance.

Lesson highlights for day I

As memory operations are much stronger than I/O operations, MongoDB heavily depend on memory especially for;

  • aggregation
  • index traversing
  • writes (first performed in memory)
  • query engine
  • connections  (1MB for connection)

CPU power will be needed for;

  • storage engine (wire tiger)
  • concurrency model of use (by default all cpu cores are used)
  • page compression
  • data calculation
  • aggregation framework
  • map reduce

Recommended RAID architecture for MongoDB is Raid10.

Applications connect to MongoS which connects config servers and shards.

Applications should choose wisely;

  • read concern
  • write concern
  • read preference

Lab setup

Here I will leash my mongo instance with vagrant and puppet. My sample configuration will be;

vagrantfile (Vagrantfile)

Vagrant.configure("2") do |config|
 # The most common configuration options are documented and commented below.
 # For a complete reference, please see the online documentation at
 # https://docs.vagrantup.com.

# Every Vagrant development environment requires a box. 
 config.vm.box = "debian81"

# Create a private network, which allows host-only access to the machine
 # using a specific IP.
 config.vm.hostname = "mongodb"
 config.vm.network :private_network, ip: ""

# Provider-specific configuration so you can fine-tune various
 # backing providers for Vagrant. 
 config.vm.provider "virtualbox" do |vb|
 vb.memory = 2048
 vb.cpus = 1

# Enable provisioning with a shell script. Additional provisioners such as
 # Puppet, Chef, Ansible, Salt, and Docker are also available. 
 config.vm.provision :puppet do |puppet|
 puppet.module_path = "puppet/modules"
 puppet.manifests_path = "puppet/manifests"
 puppet.options = ['--verbose']
 config.ssh.private_key_path = ['~/.vagrant.d/insecure_private_key', '~/.ssh/id_rsa', '.vagrant\machines\default\virtualbox\private_key']
 config.ssh.forward_agent = true

puppet manifest (puppet\manifests\default.pp)

# set path for executables
Exec { path => [ "/bin/", "/sbin/" , "/usr/bin/", "/usr/sbin/" ] }

# list packages that should be installed
$system_packages = ['vim', 'git', 'gpp', 'make',]

# perform an apt-get update
exec { 'update':
 command => 'apt-get update',
 require => Exec['mongodb_source_add']

# install system packages after an update
package { $system_packages:
 ensure => "installed",
 require => Exec['update']

# Import the public key used by the package management system
# sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 0C49F3730359A14518585931BC711F9BA15703C6
exec { 'mongodb_key_get':
 command => 'apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 0C49F3730359A14518585931BC711F9BA15703C6'

# Create a /etc/apt/sources.list.d/mongodb-enterprise.list file for MongoDB.
# echo "deb http://repo.mongodb.com/apt/debian jessie/mongodb-enterprise/3.4 main" | sudo tee /etc/apt/sources.list.d/mongodb-enterprise.list
exec { 'mongodb_source_add':
 command => 'echo "deb http://repo.mongodb.com/apt/debian jessie/mongodb-enterprise/3.4 main" | sudo tee /etc/apt/sources.list.d/mongodb-enterprise.list',
 require => Exec['mongodb_key_get']

# Install the MongoDB Enterprise packages
package { mongodb-enterprise:
 ensure => "installed",
 install_options => ['-y'],
 require => Exec['update']

Lab for day I

Lab requires performing a simple query on an imported json dbase. I followed the steps;

get people.json with wget

wget https://university.mongodb.com/static/MongoDB_2017_M201_February/handouts/people.a74d7de502b1.json


start mongod

Start mongod instance, and check contents of log file through tail

sudo service mongod start

Then we may perform queries on db’s as we wish


Packaging files for a basic bug collector

As users of a program enlarge and diversify, it becomes natural to think about automatic bug collecting. First step through this automation will be preparing and packaging things for an automated bug collecting system. In this post, I will document the initial steps we went through to trigger generation of a bug report package after a recovery from hard kill (kill -9) or crash.

The most important part of postmortem analysis will be investigating crash dumps. Unfortunately, Windows does not produce crash dumps as default. However, it may be easily made so by registry modification. For global configuration add  HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\Windows Error Reporting\ LocalDumps key to registry with corresponding parameters about dump type, dump folder, and dump count. These global values may be overridden by application level configuration, placed in an application.exe key as may seen at following RegEdit window;


In the upper figure, we asked windows to generate mini crash dumps at d:\test_folder\multithread\datafiles\cores directory.

Now assume we have a fatal bug as;


After a crash there should be dump file appearing at corresponding folder.


Our bug report will include crash dump, log files and program configuration.


It is important for log file to include information about actual software being running. As oldskool version information about major / minor defines are error prone, we may make version control commit information propagate through build software. An example of embedding git commit information may be seen at a previous post . This makes code of concern available by a simple checkout, other then searching version change which have possibility of being non unique.

Assuming program being build for multiple OS, zlib packager will be used. One drawback of zlib is it’s lack of native support for multiple files. This will be handled by a wrapper class preparing a tar file from bug package contents.

What tar does is collecting many files into an tarball archive file. Each  file is represented by a 512 byte header and multiple of 512 byte data chunks containing original file with (optional zero) padding to make multiple of 512 byte.



In our case a wrapper class around zlib library is prepared which handles searching available files and creating tarball before operation.

For Linux side, the required steps for allowing crash dumps is handled before and requires modification of /etc/security/limits.conf for core size setting and /etc/sysctl.conf for name and path configuration.



OpenStack | DevStack setup


, , , ,

DevStack deployment

As we know basics about openstack, let’s experiment it’s simplest form devstack.

Install a virtualization program if not already present

Here I will install kvm to my debian host with informations given in debian wiki. I will install qemu-kvm for kernel module, and libvirt-bin for virtualization daemon and virtinst for command line guest creation.


Then add users of concern to kvm and libvirt groups


and install virt-manager for gui control. Adding guest OS will be trivial using virt-manager.


Kvm GUI virt-manager usage resembles similar to somewhat more popular VirtualBox / WmWare.  If virt-manager fails to start virtual machines due to default network error like;


Try starting network manually using;

virsh net-start


Here I created a template from scratch in order to enable custom modifications to support common bare minimums of the context. Another option may be downloading a template from one of already present alternatives.  As our template is ready, let’s work on vagrant. Vagrant will be used as configuration controller/provisioner that will help us do things in a reproducible and programmatic way. For now it may be seem like matrix in matrix to use openstack in vagrant but lets continue and see what it will look line at the end.  Preparing a vagrant base box for virtual box is handled before, and same principles may be applied here. After template is ready, try creating a base box with;


Here vagrant states it’s preference to VirtualBox. As we use KVM, this love story seems impossible unless we take necessary measures. In order to use KVM, we need to install a plug in vagrant-libvirt that adds a libvirt provider to vagrant. First install dependencies listed below (some of them should already be installed up to now);


Then install vagrant-libvirt plugin. Here I installed plug in as root but better install it at login that will be using vagrant.


In order to create a base box, we may follow box format specification. First create a temporary directory, copy guest image (that will normally be in /var/lib/libvirt/images/), create a json document (metadata.json) and a configuration file (Vagrantfile), and finally unite these into a single file using tar as presented below.


While copying it is better also rename image to box.img. Then create metadata and configuration files as;



And tar these into a .box file,


Now, we may create a base box (called debian86 here)  as;


Now, create a directory for the project, and in that directory, initialize a vagrant project with

vagrant init

Here, we will see a blank Vagrantfile for vagrant configuration. Change Vagrantfile to enable puppet for provisioning, and create necessary files accordingly. Two divergences from the link are,

  • set preferred hypervisor at the start of Vagrantfile, which will be libvirt,

This is optional, and you may also use –provider=’libvirt’ with corresponding vagrant commands, such as vagrant up / vagrant status.

  • And if not already done generate ssh keys with;

or remove corresponding entry in ssh key path in Vagrantfile configuration. It is better to create one if you intend to use your host keys in guest through key forwarding. Then configure vagrant files in order to;

  • clone devstack respository
  • create a basic configuration
  • create an appropriate user
  • and install openstack

After all, sample configuration for devstack will be;

Vagrantfile (./Vagrantfile)


# Vagrantfile API/syntax version. Don't touch unless you know what you're doing!

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|

# Template box
 config.vm.box = "debian86"

# Guest hostname and ip
 config.vm.hostname = "devstack"
 config.vm.network :private_network, ip: ""

# Guest hardware, 2048MB ram and 1 cpu
 config.vm.provider "libvirt" do |lv|
 lv.memory = 2048
 lv.cpus = 1

# Configure shared folders
 config.nfs.functional = false
 config.vm.synced_folder ".", "/vagrant", disabled: true
 config.vm.synced_folder "synced_folder", "/synced_folder", type: "rsync", create: true
 # Set puppet as provisioner and configure puppet modules path
 config.vm.provision :puppet do |puppet|
 puppet.module_path = "puppet/modules"
 puppet.manifests_path = "puppet/manifests"
 puppet.options = ['--verbose']

# Use ssh keys of host
 config.ssh.forward_agent = true
 config.ssh.private_key_path = ['~/.vagrant.d/insecure_private_key', '~/.ssh/id_rsa']


Puppet manifest (./puppet/manifests/default.pp)

# set path for executables
Exec { path => [ "/bin/", "/sbin/", "/usr/bin/", "/usr/sbin/" ] }

# list packages that should be installed
$system_packages = [ 'vim', 'g++', 'make', 'git', 'python', 'python-pip']

# perform an update
exec { 'update':
 command => 'apt-get update'

# install system packages after an update
package { $system_packages:
 ensure => "installed",
 require => Exec['update']

# create a user "stack"
user { 'stack':
 ensure => "present",
 home => "/home/stack",
 managehome => true,
 notify => Exec['update_sudoers']

# add user 'stack' in sudoers list
exec { 'update_sudoers':
 command => "/bin/echo \"stack ALL=(ALL) NOPASSWD: ALL\" >> /etc/sudoers",
 refreshonly => true,
 require => User['stack']

# clone devstack repository
exec{ 'clone_repository' :
 creates => '/home/stack/devstack',
 cwd => "/home/stack",
 command => "git clone -v https://git.openstack.org/openstack-dev/devstack",
 user => 'stack',
 provider => shell,
 require => [Exec['update_sudoers'], Package['git']]

# copy templete configuration file, 
# remember that "puppet:///modules/localconf/local.conf" will
# match to %vagrant_root%/modules/localconf/files/local.conf
file{ 'local_conf' :
 path => '/home/stack/devstack/local.conf',
 ensure => file,
 source => "puppet:///modules/localconf/local.conf",
 owner => 'stack',
 group => 'stack',
 mode => 0744,
 require => Exec['clone_repository'],
 notify => Exec['install_devstack']

# install devstack
exec{ 'install_devstack' :
 cwd => "/home/stack/devstack",
 command => "./stack.sh",
 user => 'stack',
 provider => shell,
 refreshonly => true,
 require => File['local_conf']

Puppet module (./puppet/modules/localconf/files/local.conf)


Then start the engine;

vagrant up

Here for my debian 8.6 host, and vagrant-lbvirt 0.0.36, I encountered a dhcp lease problem that made my guests unable to obtain ip;


There is an issue about this git repository of fog-libvirt. while waiting for permanent fix,

as a solution


may be modified as suggested;


And also beware that there is a bug about halted guests ending in suspended state in guests with gui. Temporary solution may be using guests own controls using virt manager console while waiting a fix.

After ./stack.sh is executed at puppet manifest, our guest should have keystone, glance, nova, cinder, neutron and horizon installed.


We may access openstack cli by;


Next step will be getting familiar with devstack

OpenStack | basics


, ,

OpenStack is a cloud computing infrastructure that is used for managing cloud computing resources. Actually I am new to concept and trying to learn by following excellent resource from edx. As I proceed I want to perform practical experiments as much as possible and   document resulting takeaways in order to help my future self and anybody else if interested. Being a cloud infrastructure, Openstack  depends on virtualization so it will be better to start with these concepts;

Virtualization, Containers and Cloud Computing

Virtualization manages and abstracts hardware resources between operating systems much like operating systems performs similar task for processes. Cloud computing uses shared resources on an on demand basis. It operates on top of virtualization and container computing to eliminating on premise hardware and therefore provide scalability and elasticity. Ultimate aim of cloud computing will be to offer on demand, pay as you go  computing service, much like todays electrical infrastructure. As compute power becomes similar to electrical power, you just need to connect which will be similar to plugging cable to use electrical power. Implementing, sustaining and distributing details will be handled by professionals which is usually not the concern of end user. Having said this, today cloud computing, being far away from this idealization, is offered with three broad alternatives;

  • Software as a service (SaaS) where provider offers access to a specific application much like Ofice365. Usually end users interact with SaaS cloud.
  • Platform as a service (PaaS) where provider offers some suite of applications that will be bundle of hardware, storage, operating system and middle ware. Build platforms may be thought as an example.
  • Infrastructure as a service (IaaS) where provider offers infrastructure to host virtual machines. OpenStack, Microsoft Azure, VMware vCloud Air,  Amazon Web Services are examples of IaaS.


Besides, cloud computing enables easy access to IT basics by enabling self deployment with eliminating need of an IT administrator to deploy a machine for you.

Virtualization may be of;

  • Hardware virtualization, software abstraction of hardware.
  • Storage virtualization, Software defined storage (SDS), abstraction of actual discs and computers accessing these discs.
  • Network virtualization, Software defined networks (SDN), abstraction of psychical network infrastructure to provide logical network infrastructures.

Virtualization provides efficient use of psychical resources and power. In hypervisor based virtualization, virtual machines are running on a small optimized kernel. KVM, XEN, VMware ESXI are known alternatives of this kind. In host based virtualization, virtualization software is performing on an host operating system. VMware player, VirtualBox are examples.

Containers are lightweight compared to virtual machines by eliminating each virtual machine having its own kernel. They depend on the idea of using same kernel and sharing it between users to form containers.  Multiple instance of operating system will be using the same kernel therefore it may be taken as virtualization at operating system level. Container image contains applications, user libraries and dependencies, whereas kernel space components are provided by host operating system. Every container has namespace (global system resources), cgroups (used to reserve and allocate isolated resources), and a union file system. Containers are small in size compared to virtual machines, and are very lightweight, and many of them can be used on top of a single kernel. Besides, a user on one kernel is not able to access resources on another kernel, so it is fairly secure. Running multiple copies of a single application is a perfect use case of containers.However isolation of containers is weaker than virtual machines, and if kernel goes down, all containers will be down.

Now we have resources, how can we manage them?

Now we have hypervizors, and we have somewhat quantized compute resources. We will definitely want to control and interconnect these to provide on demand scalable compute power, and we will also want to clever ways of storing input / output data. One solution for these is OpenStack platform.


OpenStack is a bunch of infrastructure services, with core ones being; Nova for compute which is an interface for hypervisors, Swift for object storage which performs distributed and replicated binary storage, Neutron for networking which enables software defined networking to the cloud, Cinder for block storage that enables persistent storage for virtual machines, Keystone for identity, that administers users, roles, tenants, services and Glance for image that eliminates installing but enables deploying images.


Nova is an interface to hypervizor, which spawns, schedules and decommissions machines on demand. It is responsible from managing the compute instance life cycle. Nova has a distributed architecture as there are Nova agents running on hypervizors, and Nova service process running on Cloud Controller.


Neutron allows software defined networking that enables own inter instance networking between deployed images. It should provide logical networks on top of physical architecture.



Swift proposes distributed and replicated, scalable solution for binary object storage.


Swift provides REST api for applications and distributes request to multiple physical devices for replication, reliability, scalability and performance.


As image storages are ephemeral (like live cd boot image), changes are not persistent. Cinder provides persistent storage to instances. It may use Swift or Ceph as backend object storage.


Keystone provides central repository for authentication and authorization. Services and endpoints are introduced to Keystone. Besides, users and roles are created and assigned to projects, known as tenants and by default kept in MariaDB.


Glance is used to store virtual machine disc images, which are then instantiated on demand. They may either be downloaded from repositories, or custom created to represent requirements of organizations. Glance may use Swift or Ceph as object storage for scalability or just use local storage for simple / small environments.


Horizon is a user friendly web interface dashboard for easy management of instances.


Ceilometer is used for metering and billing.


Heat is used for deploying stacks of instances.


Magnum is used as a container manager for OpenStack.


Congress is used as a policy enforcer in OpenStack.


OpenStack shared file system service is performed by Manila

Other important service are about time synchronization, message queue and database for storing cloud related information. Manual deployment of OpenStack will require these services to be setup manually.

OpenStack components are accessed through RESTful to enable uniform access.

To sum up basic Openstack nodes will be;

Controller node will typically be performing centralized controller functionality. It may be a single node or a cluster with redundancy and high availability. Network controller node will perform network services to the cloud. There will be compute nodes that have Nova agents, and there will be storage nodes containing Swift or Ceph. As a bundle DevStack contains all for development and testing environment (not intended to be used in production)

Openstack can be deployed by

  • Manual deployment
  • Scripted deployment with PackStack and DevStack
  • Large scale automatic deployment with TripleO and Director

As a staring point I will take the easy path and deploy a DevStack instance. I will have a DevStack  guest controlled by Vagrant. It seems like matrix in matrix, but as Vagrant provides a controlled reproducible development environment, it will make my life easier in the long term and worth this a priori effort. This will be in OpenStack | DevStack setup.

A personal time regulation institute


, ,

Last week I found myself dealing with time again and again. I got embarrassed with attending a meeting one hour late as it turned out to be I was not clever enough to decipher my chrome calendar timezone. Then, I dealt with bugs related to recent timezone changes.

As I spare time from new features, I try to refactor oldskool code.  Modularity by abstraction is the sustainable way of implementing complex structures, and converting platform dependent code to common abstractions seems a good way of improvement.

However, sometime this turns to be harder done than said. As we will see, localtime() has tendency to not reflect timezone changes in Windows. We test use of time(), localtime(), GetLocalTime() using following sample code

#include <iostream>
#include <ctime>
#include <chrono>
#include <thread> // for sleep_for
#include <windows.h>
#include <stdio.h>

unsigned int k = 0;

int main()
    while (true)
        std::cout << "-----------------------" << k++ << "--------------------" << std::endl;
        /***** Get loval time from time() & localtime() *****/
        std::time_t call_start_sec_tt;
        tm* o_ct_ptr = localtime(&call_start_sec_tt);
        tm o_ct = *o_ct_ptr; 
        const int time_part_size = 24; //4+1+2+1+2+1+2+1+2+1+2+1+3+1
        char old_time_part[time_part_size];
        // another way to display may be use of ctime(&call_start_sec_tt); 
        snprintf(old_time_part, time_part_size, "%d %.2d %.2d %.2d %.2d %.2d", ((o_ct.tm_year)+1900), 
 ((o_ct.tm_mon) + 1), (o_ct.tm_mday), (o_ct.tm_hour), (o_ct.tm_min), (o_ct.tm_sec));
        std::cout << "Local time from \"time()\": " << old_time_part << std::endl;
        /***** Get loval time from std::chrono::system_clock::now() *****/
        std::chrono::system_clock::time_point today = std::chrono::system_clock::now();
        std::time_t tt;

        tt = std::chrono::system_clock::to_time_t ( today );
        tm* n_ct_ptr = localtime(&tt);
        tm n_ct = *n_ct_ptr; 
        char new_time_part[time_part_size];
        snprintf(new_time_part, time_part_size, "%d %.2d %.2d %.2d %.2d %.2d", ((n_ct.tm_year)+1900), 
 ((n_ct.tm_mon) + 1), (n_ct.tm_mday), (n_ct.tm_hour), (n_ct.tm_min), (n_ct.tm_sec));
         std::cout << "Local time from \"system_clock::now()\": " << new_time_part << std::endl;

         /***** Get loval time from Windows GetLocalTime() *****/
         SYSTEMTIME lt;
         char win_time_part[time_part_size];
         snprintf(win_time_part, time_part_size, "%d %.2d %.2d %.2d %.2d %.2d", 
  lt.wYear, lt.wMonth, lt.wDay, lt.wHour, lt.wMinute,lt.wSecond);
         std::cout << "Local time from \"GetLocalTime()\": " << win_time_part << std::endl;
         // modern but dangerous way to make thread sleep for 1 seconds in windows
         // std::this_thread::sleep_for (std::chrono::seconds(1));
         // oldskool but safe way to make thread sleep for 1 seconds


When we modify time, changes are immediately available to process;


However, timezone changes seems to be ineffective to ongoing processes for localtime(). So when we start a new process we see the change but ongoing processes reflect the old version. The left console shows output of a pre started process and right console shows output of a post started process. Here windows specific GetLocalTime() seems to be solution.


And as a final surprise, beware of

std::this_thread::sleep_for (std::chrono::seconds(1))

As your intended sleep duration may change tremendously with changing system time, which seems undocumented and may take time to find the cause.


To sum up, now I see more clearly the importance of time and agree with Mr Tanpinar in necessity of establishing a Time Regulation Institute.

Displaying command line arguments of windows processes


, ,

In Windows, in order to see command line arguments of processes, we have to make corresponding column displayed. From task manager Details tab,


Right click any of column name, and choose Select columns in popup window,which will list column options.Here be sure to check Command line option.


Result will be included command line information in task manager.


An alternate will be using wmic as;


Adding Python2 and Python3 kernels at Jupyter


, ,

Being live documents, notebooks enable reproducible research and Jupyter is one of the frontiers. Assume, we want to use both recent and legacy version of Python as kernels and we have Python, Python3 and Jupyter already installed. Initial instance of Jupyter notebook will provide only (probably the latest) kernel, as can be seen below.

jupyter kernelspace - initial kernelspace

jupyter kernelspace - initial notebook

In order to see both Python2 and Python3 kernels, we should install and then introduce the desired kernel using

python2 -m pip install --upgrade ipykernel
python2 -m ipykernel install

In the figure below, it is seen that we have Python2 kernel is already installed but yet to be introduced. If somehow we have seen Python listed and Python3 missing, we should then need to replace 2 with 3 at the corresponding commands.

jupyter kernelspace - installing python2 ipykernel

Result will be listing in kernelspace and choice option at web interface as follows;

jupyter kernelspace - final kernelspace

jupyter kernelspace - final notebook

memory dump in windows



In Linux, there are many alternatives for in vivo profiling weather debug symbols included or not. For a small survey, here is my previous post. For windows side, the best free alternative I am able to find that is useful is memory dumps used to get process snapshots.

Assume that we have a process that we want to investigate what she is dealing with. The reason may be malfunction, profiling, etc. From task manager select the process and request for a memory dump.

memory dump - create a memory dump

In order to get useful information from this dump, we would need public and hopefully private symbols that should be in program database, usually kept in a pdb file. In order to have debug information stored in program database we may use /Zi option while taking the build.

memory dump - compiling to generate symbols

Now we need a platform to investigate our memory dump using symbol information we have. Get WinDbg from Microsoft (Windows Driver Kit will do the job).

Run WinDbg, choose dump file, set path for windows and application symbols (if not already done), and request for discard wow64 stuff, if you try to investigate 32 bit dump as;

memory dump - windbg

Now we may investigate where the threads are lingering using commands like kb (call stack)

memory dump - kb

Thanks to Mr.Turgu for initial idea of memory dump. As he has chosen to join dark side, God shall let his soul rest in peace.

My small study memento for MongoDB certification exam


General Issues

Why to choose MongoDB?

  • Because MongoDB scales good horizontally. Just remember that horizontal scaling means communication overhead between elements to make them run in coordination. Besides, increased number of parts increases chance of failure of individual elements, so these should be redundant. Generally there is a trade off between functionality and performance and MongoDB tries to add features up to point without degregating scaling ability. MongoDB currently does not support joins (but does use embedding, keeping  generally used data together in the first place as JSON document) and complex transactions (since distributed transactions need concurrency control that is hard to scale) in order to increase scalabiliy.
  • Because MongoDB enables rapid development of production quality applications.
  • Because MongoDB supports complex data types.


Why JSON is used?

JSON is a good way of dealing structured documents that is very readable and it is also closer to developers in representing data of objects.

JSON data types are: numbers, boolean, string, array, object, null

BSON is binary representation of JSON, that enables fast scanning and offers extra data types such as ObjectId, Date and BinData.

MongoDB has dynamic schema much like dynamic typed languages, it is not pre declared and resolves in compile time. This gives agility in application development and flexibility in data representation as requirements evolve over time.

In order to start an instance, create database directory and fire one mogod instance

mkdir /data
mongod --dbpath /data --fork --logfile /data/log.a

For database level help

mongod> help
mongod> db.help

For collection level help

 mongod> db.mycoll.help

For sharding level help

 mongos> sh.help


mongoimport enables importing collections from raw files such as; json, tsv and csv and has a pipe like architecture. In the files, each document shouldbe represented in its own line.

mongoimport --stopOnError --db mydb --collection mycoll < products.json

will read data to mydb database mycoll collection. Operation will halt with the first error encountered.If there is not an explicit “_id” at the document, one is created for you.


 mongod> db.bycles.find().limit(10).toArray()

gets whole query to javascript array, without iterating 20 by 20 so it is better to put a limit.


remember that query does not get run until all of these are applied in server side.

Queries $gt, $gte, $lt, $lte, $or, $in, $type, $not, $nin (not in), $exists
Updates $inc, $set, $addToSet

 mongod> db.bycles.find({for:{$exists:true})

sort sorting will not filter the entities that are not present. If you want to filter out those, filter by exists as;

 mongod> db.bycles.find({price:{$exists:true}}).sort({price:-1})
 var cursor = db.bycles.find().limit(100); while (cursor.hasNext()) print(cursor.next().x);

Remember that three member replica set is the simplest production ready configuration that is recommended.

Update may be full document update or partial update with fields {upsert:true/false}, {multi:true/false}, upsert makes sure if update field does not exist, place one.

db.bycle.update({"_id": "kron"}, {$inc: {sales:1}}, true)

, makes if not sold at al, set it to 1

save is a mongo shell operation (not server) for update. Assume that my_obj is a json object. Then;

 db.bycle.update({_id: my_obj.id}, my_obj)

may be replaced with

 db.bycle.update({_id:100}, {$set: {price: 100}})

to add key value

 db.bycle.update({_id:100}, {$push: {review_scores: 77}})

to push a value into an array review_scores, create the array if not already present.

 db.bycle.update({_id:100}, {$addToSet: {review_scores: 77}})

to push a value into an array review_scores, if not already present.


to remove from collection.


to remove all documents from collection.
BSON wire protocols are CRUD, Querry, Insert, Update, Remove, GetNext
BULK operations may be ordered or unordered

var operation = db.bycles.initializeOrderedBulkOp() / initialize.unorderedBulkOp();
operation.find({item: "abc"}).remove()
operation.find({item: "efg"}).update({$inc:{points:1}})
db.runCommand({getLastError:1, timeout:10})

will give information about collection statistics


will remove collection including catalog data, its own beeing which is different than remove({})


will give detailed information about server status


local database will be used in replication and will keep startup log

storage engine

storage engines are interface between mongodb server and hardware it is running on. It effects how data is written,stored and read from disk. It determines format of indexes and data file format on disc.


mongod --storageEngine mmapv1

memory map, that maps files into virtual memory and when the data of interest is not in the memory, a page fault occurs and fsync is performed to write changes back. Mmap performs collection level locking. Journal stores what you are about to do, then do what you want to keep data consistent in the event of failure. Data in memory is directly mapped, therefore is in BSON format. Mmpap uses power of 2 allocation which results in less move of constant rate growing documents, less fragmentation and prevents movement of documents for a small increment. Document move is not good because it requires index updating.

db.createCollection("foo", {noPadding: true})

In order to disable power of 2 allocation (maybe we know that our documents are fixed, and we want to save space).


mongod --storageEngine wiredTiger

WiredTiger provides compression and document level locking. It is per mongod. It stores data in Btrees. WiredTiger compressions are snappy(default) for fast, zlib for more compression, and choice for no compression is also present.


creating discovering and deleting indexes may be performed by,

db.bycles.createIndex({a:1, b:1}, {unique:true})

to create uniqe indexes

db.bycles.createIndex({a:1, b:1}, {sparse:true})

for creating sparse indexes to save space on index, by not pointing eachdocument

db.bycles.createIndex({a:1, b:1}, {expireAfterSeconds: 3600})

for creating TTL indexes, that documents will become obselete after some amount of time

db.bycles.createIndex({loc: "2d"})

2 dimensional Cartesian index

db.bycles.createIndex({loc: "2dsphere"})

2 dimensional sphere geospatial index
and index scan will be much faster than table scan / collection scan. They are implemented by Btrees. By default duplicate keys are allowed, which may be disabled by unique option. Index keys may be any type, mixed keys are also possible. _id index is automatically created. Arrays also may be indexed (multi key indexes), each element in array forming an entry. Subdocuments or subfields may be indexed

regular expression

 db.bycles.find({name: /in/})

in in searched text

 db.bycles.createIndex({name: "text"})

special text index in a string field. Each individual word will be indexed seperately in a Btree much like multikey index

 db.bycles.find({$text: {$search: "canondale"}})

to search in text index

 db.bycles.createIndex({price:1}}, {background:true})

to create index at background for read write availability in primary. In secondaries, indexes are allways created in foreground blocking operations meanwhile.
Generally more indexex makes read faster but write slower. And İt will be faster to make import and build indexes rather than create index and import bulk data.

Usually read operations and write operations in primary are safe to kill. Killing writes in secondaries will make synch problems. Compact command job is also should not be killed. Do not kill internal operations such as migration.


0 = off, 1 = slower ones (and a ms limit for slow limit), 2 = on


will give the result stored in profile collection


to see that profile logs are small (to fit in memory)and fast write (without index)circular queue


to see profile statistics

mongostat --port 27003

command line binary that will resemble iostat. number of insert, delete, query, commands, flushes(data files to background fsync every 60sec), storage engine mapped memory size, page faults and virtual size and database locking, network traffic, and number of connections.

mongotop --port 27003

collection level read write durations.


Replication is keeping redundant copies of data, used for high availability, durability (data safety), disaster recovery, and sometimes scalability (read from secondaries for geographic purposes). Asynchronous replication is used because of possible latency issues on commodity network and therefore there is eventual consistency. MongoDB replication is statement based, which means, replicate statement and execute these statement on secondaries (however, these may be converted into more basic statements, one remove should be converted into more basic _id based removes) . Replication is also possible for servers running different storage engines with compression or not. Also different version of mongod may be running in members of replica set to enable rolling update.
MongoDB drivers are replica set aware. Replication provides automatic fail over and automatic node recovery. In Mongo writes go to primary, but reads may go to secondaries with ReadPreference option.
We specify a replica set name for our system to provide a namespace. To start a replica set that will wait for initialization (assuming same host with working directory of /Users/db/

mkdir db1 db2 db3;
mongod --port 27001 --replSet cycle1 --dbpath /Users/db/db1/ --logpath /Users/db/log.1 --logappend --smallfiles --oplogsize 50 --fork
mongod --port 27002 --replSet cycle1 --dbpath /Users/db/db2/ --logpath /Users/db/log.2 --logappend --smallfiles --oplogsize 50 --fork
mongod --port 27003 --replSet cycle1 --dbpath /Users/db/db3/ --logpath /Users/db/log.3 --logappend --smallfiles --oplogsize 50 --fork

then in order to initiate data, connect to one of the members;

mongo --port 27001
var cfg = {_id: "cycle1", members: [{_id:0, host:"localhost:27001"},{_id:1, host:"localhost:27002"},{_id:2, host:"localhost:27003"}]};

better to use host:”10gen.local:27001″ instead of localhost. Best practice is do not using an ip address and names from /etc/hosts. Use dns and pick an appropriate ttl record (ttl of a few minutes, 1 to 5 minutes). opTimeDate will indicate the last write operation of member in replica set. lsatHearthBeat will give the status of other members from the rs.status() member point of view.
To disable a member becoming primary for five minutes, we may use;


Replica set information will be stored in local database, which oplog, system catalog will be stored and will not be replicated. The same as rs.conf()

use local

to read from secondaries (to accept eventually consistent reads). Reasons for reading from secondary may be geographic reasons (to avoid latency), availability in time of fail over and workload distribution (make analytic server use data from secondary). Read preference options are:primary, primary preferred, secondary, secondary preferred and nearest in terms as network latency. When opening connection from driver, we may specify one of these. We may use nearest when we are in a remote. We may use secondary for analytic jobs. For even read nodes, we may consider nearest also.


command should be applied to primary, therefore we must have majority to be up, to select a primary to reconfigure.
Arbiter nodes will have no data at all. They are just used to vote in order to break ties.

var cfg = {_id: "cycle1", members: [{_id:0, host:"localhost:27001", arbiterOnly: true},{_id:1, host:"localhost:27002", priority:0},{_id:2, host:"localhost:27003", hidden: true}]};

Zero priority means never be eligible to be primary. Hidden members can not be primary, clients can not see the hidden servers, and can not query those. slaveDelay : 8.3600 to lag 8 hours, delay, rolling backup to guard against hidden finger problems.
If write is propagated to majority, then it is durable.

db.cycless.update( { _id : "kron" }, { $set : { comment : "A" }, { w : 3 } )

if we want an write acknowledgement for cluster wide commit, we may use,

db.cycles.insert({"model": "kron"});
db.getLastError({w: "majority", timeout: 8000})

There may be several use cases.
If there is trivial web page view count increment with no user impact, or I may have log server, we may not choose to get ack about update. However, if i make anything important, i should check for majority acknowledgement, then calling getLastError() is a way to be sure about cluster wide writes and should be default method. Waiting for “all” would probably for something flow control, maybe in batch writing a million document. Write concern of 1 will be basic error checking, maybe for duplicate key checking. We may also choose to call once in N. Remember that we do not need to call getLatError() in default write concerns.
Since MongoDB replication is based on operations instead of bytes, different storage engines may be used in replication sets.


We may choose to connect a mongo with a helper script, and my functions in sript will be available in the shell

mongo --shell setup_script.sh --port 27107

Mongo uses range based sharding on shard key. Metadata of key range vs shard location will keep location of data. Keeping range based will make range queries somewhat more efficient.

db.cycles.find({"brand": "/^k/"})

will make starting with k, probably if sharded through brand name, may yield use of same shard. Fewer chunk sizes requires more migrations but each shard will be more balanced (default is around 64MB each). Chunks getting larger will be split, and when there is unbalance in number of chunks, they will be migrated. In migration, the chunk still is readable and writable, so live. Balancer tries to balance number of chunks. Config servers are small mongod processing storing metadata about shards. They synchronize the same data with 2 phase commit in config server set. If one config server is down, metadata change (split and migrate) can not be possible, but other operations is possible. mongos are just load balancers, they do not store data, get which shard to connect from config servers and join coming results if required.

# start_shard_cluster.sh
# do dot forget to make script executable by chmod +x start_shard_cluster.sh
# create directories for shard mongod instances
mkdir a0 a1 a2 b0 b1 b2 c0 c1 c2 d0 d1 d2;
# create directory for config server metadata
mkdir cf0 cf1 cf2;
# start config servers
mongod --configsvr --dbpath cf0 --port 26050 --fork --logpath log.cf0 --logappend
mongod --configsvr --dbpath cf1 --port 26051 --fork --logpath log.cf1 --logappend
mongod --configsvr --dbpath cf2 --port 26052 --fork --logpath log.cf2 --logappend
# start mongos
mongod --shardserv --replSet a --dbpath a0 --logpath log.a0 --port 27000 --logappend --smallfiles --oplogSize 50 
mongod --shardserv --replSet a --dbpath a1 --logpath log.a1 --port 27001 --logappend --smallfiles --oplogSize 50 
mongod --shardserv --replSet a --dbpath a2 --logpath log.a2 --port 27002 --logappend --smallfiles --oplogSize 50 
mongod --shardserv --replSet b --dbpath b0 --logpath log.b0 --port 27100 --logappend --smallfiles --oplogSize 50 
mongod --shardserv --replSet b --dbpath b1 --logpath log.b1 --port 27101 --logappend --smallfiles --oplogSize 50 
mongod --shardserv --replSet b --dbpath b2 --logpath log.b2 --port 27102 --logappend --smallfiles --oplogSize 50 
mongod --shardserv --replSet c --dbpath c0 --logpath log.c0 --port 27200 --logappend --smallfiles --oplogSize 50 
mongod --shardserv --replSet c --dbpath c1 --logpath log.c1 --port 27201 --logappend --smallfiles --oplogSize 50 
mongod --shardserv --replSet c --dbpath c2 --logpath log.c2 --port 27202 --logappend --smallfiles --oplogSize 50 
mongod --shardserv --replSet d --dbpath d0 --logpath log.d0 --port 27300 --logappend --smallfiles --oplogSize 50 
mongod --shardserv --replSet d --dbpath d1 --logpath log.d1 --port 27301 --logappend --smallfiles --oplogSize 50 
mongod --shardserv --replSet d --dbpath d2 --logpath log.d2 --port 27302 --logappend --smallfiles --oplogSize 50
# start mongos processes
mongos --configdb 10.gen.local:26050,10.gen.local:26051,10.gen.local:26052 --fork --logappend --logpath log.ms0 --port 27017
mongos --configdb 10.gen.local:26050,10.gen.local:26051,10.gen.local:26052 --fork --logappend --logpath log.ms1 --port 26061
mongos --configdb 10.gen.local:26050,10.gen.local:26051,10.gen.local:26052 --fork --logappend --logpath log.ms2 --port 26062
mongos --configdb 10.gen.local:26050,10.gen.local:26051,10.gen.local:26052 --fork --logappend --logpath log.ms3 --port 26063
ps -A | grep mongo
# check very last line of each log to see if something is wrong
tail -n 1 log.cf0
tail -n 1 log.cf1
tail -n 1 log.cf2
tail -n 1 log.a0
tail -n 1 log.a1
tail -n 1 log.a2
tail -n 1 log.b0
tail -n 1 log.b1
tail -n 1 log.b2
tail -n 1 log.c0
tail -n 1 log.c1
tail -n 1 log.c2
tail -n 1 log.d0
tail -n 1 log.d1
tail -n 1 log.d2
tail -n 1 log.ms0
tail -n 1 log.ms1
tail -n 1 log.ms2
tail -n 1 log.ms3

As best practice, run mongos at 27017, default mongo access port, and do not use 27017 at config servers or shard mongod, since typically, these need not and should not be accessed by clients. Then for each shard, we need to initiate the replica set, and then add the shard to the cluster

#just connect toone in a set and add the others
mongo --port 27000
#just connect toone in a set and add the others
mongo --port 27100
#just connect toone in a set and add the others
mongo --port 27200
#just connect toone in a set and add the others
mongo --port 27300
# connect to mongos, if port is ommited it will be 27017
mongo --port 27017
# add shard with set name host namesyntax

In mongos we may look shards as

use config

Sharding a collection
By default, collections are not sharded. All unsharded collections are on first shard of cluster.

# enable sharding of database
mongos> sh.enableSharding(cycles)
# enable sharding of collection giving full name and specify a shard key, and say if this is unique or not
mongos> sh.shardCollection("cycles.brands", {_id:1}, true)

Look for cardinality and granularity in choosing shard keys, if required choose compound shard keys to increase granularity.
When we have bulk initial loads, we may want to pre split data. This is because, we may be loading to primary shard faster than automatic migration. This may be done;

mongos> sh.splitAt("cycle.brands", {"price": 2000})

Some best practices on sharding will be

  • Shard if collection is big, else extra complexity added will not be justified.
  • Be carefulof monotonicly increasing shard keys such as timestamp of bson id’s, since these will redirect inserts to one shard
  • We may consider prespliting manually in case we need to use bulk inserts
  • Shard keys are fixed and can not be changed later
  • Adding new shards tocluster is easy but takes some time for chunks to migrate
  • Use logical names, especially in config servers. Let DNS do the job for resolving ip’s
  • Put mongos on default ports, and avoid shard mongod instances from direct access to clients
  • Security

    Security options in mongo include

  • –auth for securing client access, mongos and mongod will be run with –auth
  • –keyFile for inter cluster security that uses a shared key
  • Besides, to run mongodb with encryption over the wire we should compile mongo with —ssl option. By default authentication is performed with encryption but data will be transferred plain.

     mongod --dbpath newdb --auth 

    will allow connection from loclhost to create the first user. admin database will store users and roles of cluster and system wide.

    use admin
    db.createUser({"user": "sifa", "pwd": "kismet", "roles": ["userAdminAnyDatabase"]})

    Then connection may be performed by specifying username.

    mongo localhost/admin -u sifa -p kismet

    This user will be able to create users but not read and write. After login, create a user eligible to read and write as (without administrative permissions such as create users);

    db.createUser({"user": "joe", "pwd": "dalton", "roles":["readWriteAnyDatabase"]})
    db.createUser({"user": "avarel", "pwd": "dalton", "roles":["readWrite"]})

    Notice that avarel only has privileges to read and write the database specified.
    Some possible roles are;

  • read
  • readAnyDatabase
  • readWrite
  • readWriteAnyDatabase
  • dbAdmin
  • dbAdminAnyDatabase
  • userAdmin
  • userAdminAnyDatabase
  • clusterAdmin
  • Backups for individual server, replica set may be performed by

  • mongodump –oplog mongorestore –oplogReplay will restore specific database, oplog options is good forhot backup.
  • file system snapshot here we must be sure that journalling is enabled, otherwise, snapshot may be lagging.
  •  db.fsyncLock() 

    will flush all the data to disc and prevent any further write, to make taking file system snapshot easily

  • backup from secondary make secondary offline, copy files and makeup again
  • Sharded cluster backup

  • Turn of balancer sh.stopBalancer()
  • in order to be sure that there is no metadata movement

  • Backup config database
  • by mongodump –db config , or stop one of them and copy files

  • Backup one member of each shard
  • Start balancer, sh.startBalancer()
  • # stop balancer
    mongo --host my_mongos --eval "sh.stopBalancer()"
    # take config dump
    mongodump --host my_mongos_or_configs --db config backups/configdb
    # take shard backups
    mongodump --host my_probably_shard_secondary_1 --oplog backups/shard1
    # start balancer
    mongo --host my_mongos--eval "sh.startBalancer()"

    Capped collections
    Capped collections are basically circular queues that have pre allocated maximum size, and elements can not be manually deleted or updated.
    TTL collections
    These are auto deleted documents with special index
    For storing large blobs than BSON limit of 16MB/doc.

    stack underflow


    , , ,

    If you do cross compiling, you should remember different stack leashing styles of operating systems in concern (NYU has a pretty good comparison). For example, in Linux flavors, stack size is an environment issue. We may check the values by ulimit -s or ulimit -a.

    stack - linux ulimit

    Default value of stack size, which is 8192 kB, may be changed through modifying /etc/security/limits.conf just as core size and number of file handle modifications;

    stack - linux size changing through limitsconf

    Size changes will reflect to new terminals,

    stack - linux ulimit after modification

    In windows, stack size preference is sticked to executables and may be checked with dumpbin /headers which may be obtained through Visual Studio (and express if you are a poor man).

    stack - windows dumpbin

    Default value of stack size in visual studio builds is 100000 hex bytes, which corresponds to 1024 kB. If we want to make stack size 16384 kB as Linux case, we should link with /F option to declare desired stack size in bytes, as /F 16777216.

    stack - windows link

    Remember that 16777216 decimal bytes will correspond to 1000000 hex bytes and that will make 16384 kB.

    stack - windows dumpbin after modification