Getting started with Ruby and MongoDB using Software Collections

MongoDB became recently a very popular document database and RHSCL 1.1 includes both mongodb24 and ror40 Software Collections including the supported Ruby drivers for MongoDB.  So let’s have a short look on what MongoDB actually is and how to get started using MongoDB from Ruby using Software Collections. Note that we will use ruby200, ror40, mongodb24 and v8 collections together on RHEL 7 Beta, although you can follow this article with RHEL 6 as well. If you need to use Ruby 1.9.3, you can use ruby193 Software Collection that contains the MongoDB adapters as well (in RHSCL 1.1) and the only apparent difference is that ruby193 is not split into Ruby and Ruby on Rails collections so all the gems are available in ruby193.

What is MongoDB

MongoDB is an open-source document database getting a lot of traction and used by some big enterprises like Craigslist and SourceForge. If you still haven’t heard of it, you should know that the main difference from traditional relational database management systems (RDBMS) is that it provides a dynamic table-less schema. Data are saved like documents that can contain other documents (you can think of the document as an object in Ruby). Documents then contain elements which are basically JSON-like fields and value pairs. Since you don’t need a fixed database schema for your attributes, no migrations are necessary. Moreover MongoDB is built for speed and performance in a distributed environment offering features like sharding (which is a process how to save data on more physical devices) and map-reduce (a paradigm for efficient mapping and reducing a data selection into aggregated results).

BSON

BSON is the actual JSON-like format used internally by MongoDB to store data. It’s a binary format in which zero or more fields and value pairs are stored like one entity which is called a document. If you look on the following hash {“hello”: “world”} then

"x16x00x00x00x02hellox00
 x06x00x00x00worldx00x00"

is its representation in BSON. The basic types are as follows: byte (1 byte as 8 bits), int32 (4 bytes as 32-bit signed integer), int64 (8 bytes as 64-bit signed integer) and double (8 bytes as 64-bit IEEE 754 floating point) and represent the terminals of the BSON grammar. In the above example “hello” is represented as hellox00. So x00 and others are the terminals. Elements (key-value pairs), sequence of elements and document itself are then non-terminals of the grammar. You can go and read the full spec at www.bsonspec.org, but simply said MongoDB implements this binary BSON format for storing the data and it’s just sort of a nested hash which is very alike to the one in Ruby itself.

Installation of MongoDB

Installation of MongoDB client and server using Software Collections is fairly easy:

$ su -c "yum install -y mongodb24 mongodb24-mongodb"

mongodb24 is the meta package for the collection that will pull in the mongodb24-mongodb-server package that we need for running MongoDB as a server. mongodb24-mongodb is a client (shell) that is not automatically installed so we need to list it explicitly (mongodb24 Software Collection contains more clients and therefore it installs only the server part of MongoDB alongside the meta package). But most important is mongodb24-mongodb-server that contains the server part of MongoDB including the config and init files. Notice that v8314-v8 is also automatically installed with MongoDB as its dependency. v8314 is a Software Collection of v8, the famous JavaScript engine from Google Chrome.

After the installation is done, let’s see if the mongodb24 is available and enable it:

$ scl -l
mongodb24
v8314
$ scl enable mongodb24 bash

That should enable both mongodb24 and v8314 since mongodb24 needs v8314 to run.
Then we can start the MongoDB server from the collection and check whether it runs:

$ su -c "systemctl start mongodb24-mongod.service"
$ su -c "systemctl status mongodb24-mongod.service"
mongodb24-mongod.service - High-performance, schema-free document-oriented database
   Loaded: loaded (/usr/lib/systemd/system/mongodb24-mongod.service; disabled)
   Active: active (running) since Thu 2014-02-27 17:24:33 CET; 6s ago
  Process: 3233 ExecStart=/usr/bin/scl enable $MONGODB24_SCLS_ENABLED -- /opt/rh/mongodb24/root/usr/bin/mongod $OPTIONS run (code=exited, status=0/SUCCESS)
 Main PID: 3243 (mongod)
   CGroup: /system.slice/mongodb24-mongod.service
           └─3243 /opt/rh/mongodb24/root/usr/bin/mongod --quiet -f /opt/rh/mo...

Feb 27 17:24:24 localhost.localdomain systemd[1]: Starting High-performance, ...
Feb 27 17:24:24 localhost.localdomain scl[3233]: about to fork child process....
Feb 27 17:24:24 localhost.localdomain scl[3233]: forked process: 3243
Feb 27 17:24:24 localhost.localdomain scl[3233]: all output going to: /var/l...g
Feb 27 17:24:33 localhost.localdomain scl[3233]: child process started succe...g
Feb 27 17:24:33 localhost.localdomain systemd[1]: Started High-performance, s...
Hint: Some lines were ellipsized, use -l to show in full.

If you are not yet running RHEL 7 beta, you will need to change those commands to su -c "service mongodb24-mongod {start,status}" for RHEL 6.

Then let’s try out the MongoDB client known as mongo shell:

$ mongo
MongoDB shell version: 2.4.9
connecting to: test
Welcome to the MongoDB shell.
For interactive help, type "help".
For more comprehensive documentation, see
	http://docs.mongodb.org/show databases
Questions? Try the support group
	http://groups.google.com/group/mongodb-user
>

And let’s save some data to a new database:

> show dbs
local	0.078125GB
> use my_db
switched to db my_db
> db.products.insert( { item: "First item" } )
> db.products.find()
{ "_id" : ObjectId("530f6896b3ae9716674004e9"), "item" : "First item" }
> show databases
local	0.078125GB
my_db	0.203125GB

As you can see we can list databases with show dbs or show databases, we switch databases with use [database_name] and access currently selected database and its collections with db. MongoDB has collections and objects instead of tables and rows so we can immediately start inserting objects as hashes. Notice that our my_db was also created automatically.

The most interesting thing to explain here is the _id attribute with the ObjectId("530f6896b3ae9716674004e9") value. You can think of it as a primary key from relational databases. It’s most likely unique and created automatically by MongoDB. Internally it is a BSON type represented by a 4-byte value representing the seconds since the Unix epoch, a 3-byte machine identifier, a 2-byte process id, and a 3-byte counter, starting with a random value. Therefore we can actually get the time and date of creation of the object from this ObjectID:

> ObjectId("530f6896b3ae9716674004e9").getTimestamp()

For more information about mongo shell check the reference manual. When you are done playing in the mongo shell, type exit to exit.

Official Ruby client

In RHSCL 1.1 beta we now ship rubygem-mongo, the official Ruby adapter for MongoDB. Let’s try to install it and connect to our existing database from Ruby.

You can install it by running:

su -c "yum install ror40-rubygem-mongo ror40-rubygem-bson_ext"

Installing MongoDB Ruby client should pull in the ror40-rubygem-bson, but you can also install ror40-rubygem-bson_ext for the bson gem to use the faster C extension.

$ scl enable ror40 bash

This will enable also ruby200 Software Collection as a dependency. Before we take a look on the mongo gem, let’s see how you can work with BSON format itself:

irb(main):004:0> require 'bson'
irb(main):005:0> hash = { "attribute" => "value", :'2' => 3 }
=> {"attribute"=>"value", :"2"=>3}
irb(main):006:0> bson = BSON.serialize(hash)
=> #<BSON::ByteBuffer:0x00000001d188b8 @str="!x00x00x00x02attributex00x06x00x00x00valuex00x102x00x03x00x00x00x00", @cursor=33, @max_size=4194304>
irb(main):007:0> BSON.deserialize(bson)
=> {"attribute"=>"value", "2"=>3}

We can easily create BSON from any Ruby hash with BSON.serialize(hash) or convert it back using BSON.deserialize(bson). This is what mongo will internally use.

Now we can move on to use mongo driver. If we want to create a connection to MongoDB we need to know the host and port. The default port for MongoDB is 27017 and we are connecting to localhost:

irb(main):010:0> require 'mongo'
=> true
irb(main):011:0> mongo_client = Mongo::MongoClient.new('localhost', 27017)
=> #<Mongo::MongoClient:0x00000000d44d88 @host="localhost", @port=27017, @id_lock=#, @primary=["localhost", 27017], @primary_pool=#, @mongos=false, @tag_sets=[], @acceptable_latency=15, @max_message_size=48000000, @max_bson_size=16777216, @slave_ok=nil, @ssl=nil, @unix=false, @socket_opts={}, @socket_class=Mongo::TCPSocket, @auths=[], @pool_size=1, @pool_timeout=5.0, @op_timeout=nil, @connect_timeout=30, @logger=nil, @read=:primary, @default_db="test", @write_concern={:w=>1, :j=>false, :fsync=>false, :wtimeout=>nil}, @read_primary=true>

If that worked we can go on and access our previously created database:

irb(main):012:0> mongo_client.database_names
=> ["local", "my_db"]
irb(main):013:0> mongo_client.database_info.each { |info| puts info.inspect }
["local", 83886080]
["my_db", 218103808]
=> {"local"=>83886080, "my_db"=>218103808}
irb(main):014:0> db = mongo_client.db('my_db')
=> #<Mongo::DB:0x00000000d665c8 @name="my_db", @connection=#<Mongo::MongoClient:0x00000000d44d88 @host="localhost", @port=27017, @id_lock=#, @primary=["localhost", 27017], @primary_pool=#, @mongos=false, @tag_sets=[], @acceptable_latency=15, @max_message_size=48000000, @max_bson_size=16777216, @slave_ok=nil, @ssl=nil, @unix=false, @socket_opts={}, @socket_class=Mongo::TCPSocket, @auths=[], @pool_size=1, @pool_timeout=5.0, @op_timeout=nil, @connect_timeout=30, @logger=nil, @read=:primary, @default_db="test", @write_concern={:w=>1, :j=>false, :fsync=>false, :wtimeout=>nil}, @read_primary=true>, @strict=nil, @pk_factory=nil, @write_concern={:w=>1, :j=>false, :fsync=>false, :wtimeout=>nil}, @read=:primary, @tag_sets=[], @acceptable_latency=15, @cache_time=300>

We can see that our my_db database is there. So we select it using Mongo::Connection#db method. It could be wise to check also our collections:

irb(main):015:0> Mongo::DB.instance_methods
=> [:strict=, :strict?, :name, :write_concern, :connection, :cache_time, :cache_time=, :read, :read=, :tag_sets, :tag_sets=, :acceptable_latency, :acceptable_latency=, :authenticate, :issue_authentication, :add_stored_function, :remove_stored_function, :add_user, :remove_user, :logout, :issue_logout, :collection_names, :collections, :collections_info, :create_collection, :collection, :[], :drop_collection, :get_last_error, :error?, :previous_error, :reset_error_history, :dereference, :eval, :rename_collection, :drop_index, :index_information, :stats, :ok?, :command, :full_collection_name, :pk_factory, :pk_factory=, :profiling_level, :profiling_level=, :profiling_info, :validate_collection, :legacy_write_concern, :write_concern_from_legacy, :get_write_concern, :nil?, :===, :=~, :!~, :eql?, :hash, :, :class, :singleton_class, :clone, :dup, :taint, :tainted?, :untaint, :untrust, :untrusted?, :trust, :freeze, :frozen?, :to_s, :inspect, :methods, :singleton_methods, :protected_methods, :private_methods, :public_methods, :instance_variables, :instance_variable_get, :instance_variable_set, :instance_variable_defined?, :remove_instance_variable, :instance_of?, :kind_of?, :is_a?, :tap, :send, :public_send, :respond_to?, :extend, :display, :method, :public_method, :define_singleton_method, :object_id, :to_enum, :enum_for, :==, :equal?, :!, :!=, :instance_eval, :instance_exec, :__send__, :__id__]
irb(main):016:0> db.collection_names
=> ["system.indexes", "products"]

I listed the instance methods for Mongo::DB and found out that I can use #collection_names method to list my collections. The “products” collection is really there, but it’s not the only collection in the database. So what about this “system.indexes”? As the name suggests it’s a collection for this database indexes. It’s a system collection that MongoDB creates automatically. There are also other system collections that MongoDB can create so it’s better to avoid naming your collections “system.*” altogether.

And that’s it for today. For the next steps check the mongo gem manual.

Don’t forget you need to always enable both ror40 and mongodb24 collections. You can do so with:

$ scl enable ror40 mongodb24 bash

And check that all four collections are enabled by running:

$ env | grep X_SCLS
X_SCLS=ruby200 v8314 mongodb24 ror40

Sources:

Share