RkBlog

Hardware, programming and astronomy tutorials and reviews.

Django/Python app memory usage - examples

Examples of memory usage problems (currently: whoosh searching, sitemaps generation)

Whoosh

After launching my new website - JobMaster.rk.edu.pl, that contained some technologies used by me for the first time I've encountered few memory usage issues on the production server (Nginx, FastCGI). Whoosh used for full text searching was leaking memory when searching. The code was based on arnebrodowski.de example. The problem was with the searching view, line with: searcher = ix.searcher(), which was leaking at every search ("memstat -v" was used to check memory usage). Solution for this was to move to settings.py so this object won't be created for every request (as you can see it can be "reused", as it doesn't depend on the search term). So in settings.py I've placed all "reusable" code:
WHOOSH_SCHEMA = fields.Schema(title=fields.TEXT(stored=True),
				content=fields.TEXT,
				slug=fields.ID(stored=True, unique=True))

storage = store.FileStorage(WHOOSH_INDEX)
ix = index.Index(storage, schema=WHOOSH_SCHEMA)
PARSER = QueryParser("content", schema=ix.schema)
SEARCHER = ix.searcher()
The "reuse" solution came up when I was testing whoosh with a simple server-client setup on localhost. Here is whoosh-client.py:
# -*- coding: utf-8 -*-
from socket import *
for i in range(0, 100):
	s = socket(AF_INET, SOCK_STREAM) # make socket
	s.connect(('localhost', 8889)) # make the connection
	s.send('popular search term')
	res = s.recv(102400)
	s.close()
	print i
	print res
	print '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'
The leaking server, whoosh-server-bad.py:
# -*- coding: utf-8 -*-
from socket import *

import simplejson as json

from whoosh import index, store, fields
from whoosh.qparser import QueryParser
from whoosh import store, fields, index

# dane indeksu whoosh
WHOOSH_INDEX = '/path/to/whoosh/index/'
WHOOSH_SCHEMA = fields.Schema(title=fields.TEXT(stored=True),
				content=fields.TEXT,
				slug=fields.ID(stored=True, unique=True))

# whoosh config
storage = store.FileStorage(WHOOSH_INDEX)
ix = index.Index(storage, schema=WHOOSH_SCHEMA)
parser = QueryParser("content", schema=ix.schema)


# server config
s = socket(AF_INET, SOCK_STREAM) # make socket
s.bind(('', 8889)) 
s.listen(5)

while 1:
	client,addr = s.accept() # odebranie polaczenia
	print 'Polaczenie z ', addr
	while 1:
		# get the query term
		query = client.recv(1024)
		if not query: break
		
		# prepare query for searching
		query = query.replace('+', ' AND ').replace(' -', ' NOT ').replace(', ', ' AND ')
		hits = []
		try:
			
			qry = parser.parse(query)
		except:
			qry = None
		if qry is not None:
			# THIS WILL LEAK
			searcher = ix.searcher()
			hits = searcher.search(qry)
		
		# prepare results for sending
		if len(hits) > 0:
			res = []
			for i in hits:
				res.append({'slug': i['slug'], 'title': i['title']})
			hits =  json.dumps(res)
		else:
			hits = '||'
		client.send(hits) # send data to client
	client.close()
And the not leaking whoosh-server.py with "ix.searcher()" moved from the while loop:
# -*- coding: utf-8 -*-
from socket import *

import simplejson as json

from whoosh import index, store, fields
from whoosh.qparser import QueryParser
from whoosh import store, fields, index

# dane indeksu whoosh
WHOOSH_INDEX = '/path/to/whoosh/index/'
WHOOSH_SCHEMA = fields.Schema(title=fields.TEXT(stored=True),
				content=fields.TEXT,
				slug=fields.ID(stored=True, unique=True))

# whoosh config
storage = store.FileStorage(WHOOSH_INDEX)
ix = index.Index(storage, schema=WHOOSH_SCHEMA)
searcher = ix.searcher()
parser = QueryParser("content", schema=ix.schema)

# server config
s = socket(AF_INET, SOCK_STREAM) # make socket
s.bind(('', 8889))
s.listen(5)

while 1:
	client,addr = s.accept() # odebranie polaczenia
	print 'Polaczenie z ', addr
	while 1:
		# get the query term
		query = client.recv(1024)
		if not query: break
		
		# prepare query for searching
		query = query.replace('+', ' AND ').replace(' -', ' NOT ').replace(', ', ' AND ')
		hits = []
		try:
			qry = parser.parse(query)
		except:
			qry = None
		if qry is not None:
			hits = searcher.search(qry)
		
		# prepare results for sending
		if len(hits) > 0:
			res = []
			for i in hits:
				res.append({'slug': i['slug'], 'title': i['title']})
			hits =  json.dumps(res)
		else:
			hits = '||'
		client.send(hits) # send data to client
	client.close()
Start the bad server in one terminal, then in second use ps -aux to check memory usage, and in next terminal launch whoosh-client. The memory used by server should increase every client run. If you use the good server memory usage won't go up.

Sitemaps

It looks like if you use Django built-in Sitemaps framework to make big sitemaps it will use a lot of RAM for something. This time I had really big sitemap - about 9000 elements, and 1,7MB sitemap.xml file. However when I requested the sitemap after server restart - memory usage boosted from 6 to 105MB. Every next request did not increase the memory usage. Solution was to server static sitemap.xml file.
RkBlog

28 July 2009;

Comment article