MongoDB data management in Python
Description of the pymongo driver and data management in schema-free MongoDB database
pymongo is the Python module for accessing MongoDB - schema free key - value database with lot of goodies. As this isn't relational database as MySQL the data management looks a bit different. We have databases, which contain collections (like tables), which contain documents (like rows). Collection doesn't have defined schema. It can contain documents in various shapes and sizes. Document is a set of key-value data in JSON format (for Python a dictionary).Server and the database
You can install MongoDB on Linux using packages from the repository or download ready to use packages from the MongoDB site for your operating system. To start the server you have to execute:Managing data in MongoDB
API is easy to use. We connect to the server, select a database, select a collection and we are ready to add/edit/delete documents: The example code above will create "test_db" if it doesn't exist, and then it will select "first_collection" collection (creating if doesn't exist). At the end it will count how many documents are in that collection (none for now). To add a document or list of documents use the insert method: This example adds 4 documents. The schema of documents doesn't have to be solid, it's a schema-free database. When you add a document it will get unique _id value. To get/select some documents you can use find and find_one methods: Both methods support advanced set of arguments for limiting selected documents. You can use equality filters, greater than, lower than, key exists and other filters. skip method skips X first documents. limit limits the selected set of documents to given number. sort method allows sorting documents by given key. Our previously inserted documents look like this:{u'age': 23, u'_id': ObjectId('4b9bb0c5e1382315db000000'), u'title': u'Tom'} {u'phone': 1111111, u'age': 21, u'_id': ObjectId('4b9bb0c5e1382315db000001'), u'location': u'UK', u'title': u'Kate'} {u'age': 33, u'_id': ObjectId('4b9bb0c5e1382315db000002'), u'title': u'Anna 1'} {u'age': 34, u'_id': ObjectId('4b9bb0c5e1382315db000003'), u'title': u'Anna 2'}
In the example we update one document changing "date" key. Update method has also tree important options (args) to set - safe which if True will force the pymongo to check if the updated finished, multi - if true will update more than one matching document, and upsert if True will create a document if it doesn't exist.
MongoDB can hold files up to 4MB by default. Large or even very very large files can be managed by the GridFS extension efficiently.
The collection which we use to manage data of a given collection has many other methods for managing indexes, deleting documents and more. You can look at the documentation. There is also very good video presentation about MongoDB and it advanced features.
Some numbers
Speed of few operations on a laptop:- Connect, select collection: 0,3 s
- Fetch 10K documents where somekey = somevalue (from 100K docs collection): 0,5 s
- Fetch with index on somekey: 0,48 s
- Fetch with index on somekey and limit 100: 0,32 s
- Insert 10K documents: 0,5 s