MongoDB data management in Python

Check out the new site at https://rkblog.dev.

pymongo is the Python module for accessing MongoDB - schema free key - value database with lot of goodies. As this isn't relational database as MySQL the data management looks a bit different. We have databases, which contain collections (like tables), which contain documents (like rows). Collection doesn't have defined schema. It can contain documents in various shapes and sizes. Document is a set of key-value data in JSON format (for Python a dictionary).

Server and the database

You can install MongoDB on Linux using packages from the repository or download ready to use packages from the MongoDB site for your operating system. To start the server you have to execute:
mongod --dbpath=/path/to/folder
Where you specify a path to empty (at start) folder for the database files. pymongo may be installed using setuptools:
easy_install pymongo
Now we are ready to use MongoDB from Python.

Managing data in MongoDB

API is easy to use. We connect to the server, select a database, select a collection and we are ready to add/edit/delete documents:
# -*- coding: utf-8 -*-

from pymongo import Connection

connection = Connection()
# test_db is the database name
db = connection.test_db
# first_collection is the collection name
collection = db.first_collection

# how many documents in collection?
print collection.count()
The example code above will create "test_db" if it doesn't exist, and then it will select "first_collection" collection (creating if doesn't exist). At the end it will count how many documents are in that collection (none for now). To add a document or list of documents use the insert method:
# -*- coding: utf-8 -*-

from pymongo import Connection
from pymongo import ASCENDING, DESCENDING

connection = Connection()
db = connection.test_db
collection = db.first_collection

collection.insert({"title": "Tom", "age": 23})
collection.insert({"title": "Kate", "age": 21, "location": "UK", "phone": 1111111})
collection.insert([{"title": "Anna 1", "age": 33}, {"title": "Anna 2", "age": 34}])

print collection.count()
This example adds 4 documents. The schema of documents doesn't have to be solid, it's a schema-free database. When you add a document it will get unique _id value. To get/select some documents you can use find and find_one methods:
# -*- coding: utf-8 -*-

from pymongo import Connection
from pymongo import ASCENDING, DESCENDING

connection = Connection()
db = connection.test_db
collection = db.first_collection

# get all documents
docs = collection.find()
for i in docs:
    print i

# get one document
print
print collection.find_one({"title": "Anna 1"})
print

# get documents with some conditions
docs = collection.find({"title": {'$exists': True}}).sort('_id', DESCENDING).skip(1).limit(2)
for i in docs:
    print i
Both methods support advanced set of arguments for limiting selected documents. You can use equality filters, greater than, lower than, key exists and other filters. skip method skips X first documents. limit limits the selected set of documents to given number. sort method allows sorting documents by given key. Our previously inserted documents look like this:
{u'age': 23, u'_id': ObjectId('4b9bb0c5e1382315db000000'), u'title': u'Tom'}
{u'phone': 1111111, u'age': 21, u'_id': ObjectId('4b9bb0c5e1382315db000001'), u'location': u'UK', u'title': u'Kate'}
{u'age': 33, u'_id': ObjectId('4b9bb0c5e1382315db000002'), u'title': u'Anna 1'}
{u'age': 34, u'_id': ObjectId('4b9bb0c5e1382315db000003'), u'title': u'Anna 2'}
To update a document you can use the update method, which requires at least two args - dictionary with conditions describing the document or documents to be updated, and second dictionary with data to be updated:
# -*- coding: utf-8 -*-

from pymongo import Connection
from pymongo import ASCENDING, DESCENDING
from datetime import datetime

connection = Connection()
db = connection.test_db
collection = db.first_collection

# update the document
collection.update({"title": "Anna 2"}, {"$set": {"date": datetime.utcnow()}})

print collection.find_one({"title": "Anna 2"})

In the example we update one document changing "date" key. Update method has also tree important options (args) to set - safe which if True will force the pymongo to check if the updated finished, multi - if true will update more than one matching document, and upsert if True will create a document if it doesn't exist.

MongoDB can hold files up to 4MB by default. Large or even very very large files can be managed by the GridFS extension efficiently.

The collection which we use to manage data of a given collection has many other methods for managing indexes, deleting documents and more. You can look at the documentation. There is also very good video presentation about MongoDB and it advanced features.

Some numbers

Speed of few operations on a laptop:
  • Connect, select collection: 0,3 s
  • Fetch 10K documents where somekey = somevalue (from 100K docs collection): 0,5 s
  • Fetch with index on somekey: 0,48 s
  • Fetch with index on somekey and limit 100: 0,32 s
  • Insert 10K documents: 0,5 s

MongoDB Browser

I'm working on a simple database browser written in PyQt4 - mongobrowser. Currently it can connect to local database, select collection, browser documents and add/edit in simple gui. You have to have Python and PyQt4 installed to run the app (python run.py) - screenshot.
RkBlog

Python programming, 14 March 2010


Check out the new site at https://rkblog.dev.
Comment article
Comment article RkBlog main page Search RSS Contact