MongoDB data management in Python
Check out the new site at https://rkblog.dev.
14 March 2010
Comments
pymongo is the Python module for accessing MongoDB - schema free key - value database with lot of goodies. As this isn't relational database as MySQL the data management looks a bit different. We have databases, which contain collections (like tables), which contain documents (like rows). Collection doesn't have defined schema. It can contain documents in various shapes and sizes. Document is a set of key-value data in JSON format (for Python a dictionary).
Server and the database
You can install MongoDB on Linux using packages from the repository or download ready to use packages from the MongoDB site for your operating system. To start the server you have to execute:mongod --dbpath=/path/to/folder
Where you specify a path to empty (at start) folder for the database files. pymongo may be installed using setuptools:
easy_install pymongo
Now we are ready to use MongoDB from Python.
Managing data in MongoDB
API is easy to use. We connect to the server, select a database, select a collection and we are ready to add/edit/delete documents:# -*- coding: utf-8 -*-
from pymongo import Connection
connection = Connection()
# test_db is the database name
db = connection.test_db
# first_collection is the collection name
collection = db.first_collection
# how many documents in collection?
print collection.count()
# -*- coding: utf-8 -*-
from pymongo import Connection
from pymongo import ASCENDING, DESCENDING
connection = Connection()
db = connection.test_db
collection = db.first_collection
collection.insert({"title": "Tom", "age": 23})
collection.insert({"title": "Kate", "age": 21, "location": "UK", "phone": 1111111})
collection.insert([{"title": "Anna 1", "age": 33}, {"title": "Anna 2", "age": 34}])
print collection.count()
# -*- coding: utf-8 -*-
from pymongo import Connection
from pymongo import ASCENDING, DESCENDING
connection = Connection()
db = connection.test_db
collection = db.first_collection
# get all documents
docs = collection.find()
for i in docs:
print i
# get one document
print
print collection.find_one({"title": "Anna 1"})
print
# get documents with some conditions
docs = collection.find({"title": {'$exists': True}}).sort('_id', DESCENDING).skip(1).limit(2)
for i in docs:
print i
{u'age': 23, u'_id': ObjectId('4b9bb0c5e1382315db000000'), u'title': u'Tom'} {u'phone': 1111111, u'age': 21, u'_id': ObjectId('4b9bb0c5e1382315db000001'), u'location': u'UK', u'title': u'Kate'} {u'age': 33, u'_id': ObjectId('4b9bb0c5e1382315db000002'), u'title': u'Anna 1'} {u'age': 34, u'_id': ObjectId('4b9bb0c5e1382315db000003'), u'title': u'Anna 2'}
# -*- coding: utf-8 -*-
from pymongo import Connection
from pymongo import ASCENDING, DESCENDING
from datetime import datetime
connection = Connection()
db = connection.test_db
collection = db.first_collection
# update the document
collection.update({"title": "Anna 2"}, {"$set": {"date": datetime.utcnow()}})
print collection.find_one({"title": "Anna 2"})
In the example we update one document changing "date" key. Update method has also tree important options (args) to set - safe which if True will force the pymongo to check if the updated finished, multi - if true will update more than one matching document, and upsert if True will create a document if it doesn't exist.
MongoDB can hold files up to 4MB by default. Large or even very very large files can be managed by the GridFS extension efficiently.
The collection which we use to manage data of a given collection has many other methods for managing indexes, deleting documents and more. You can look at the documentation. There is also very good video presentation about MongoDB and it advanced features.
Some numbers
Speed of few operations on a laptop:- Connect, select collection: 0,3 s
- Fetch 10K documents where somekey = somevalue (from 100K docs collection): 0,5 s
- Fetch with index on somekey: 0,48 s
- Fetch with index on somekey and limit 100: 0,32 s
- Insert 10K documents: 0,5 s
MongoDB Browser
I'm working on a simple database browser written in PyQt4 - mongobrowser. Currently it can connect to local database, select collection, browser documents and add/edit in simple gui. You have to have Python and PyQt4 installed to run the app (python run.py) - screenshot.
RkBlog
Check out the new site at https://rkblog.dev.
Comment article