Django/Python app memory usage - examples
28 July 2009
Comments
Whoosh
After launching my new website - JobMaster.rk.edu.pl, that contained some technologies used by me for the first time I've encountered few memory usage issues on the production server (Nginx, FastCGI). Whoosh used for full text searching was leaking memory when searching. The code was based on arnebrodowski.de example. The problem was with the searching view, line with: searcher = ix.searcher(), which was leaking at every search ("memstat -v" was used to check memory usage). Solution for this was to move to settings.py so this object won't be created for every request (as you can see it can be "reused", as it doesn't depend on the search term). So in settings.py I've placed all "reusable" code:WHOOSH_SCHEMA = fields.Schema(title=fields.TEXT(stored=True),
content=fields.TEXT,
slug=fields.ID(stored=True, unique=True))
storage = store.FileStorage(WHOOSH_INDEX)
ix = index.Index(storage, schema=WHOOSH_SCHEMA)
PARSER = QueryParser("content", schema=ix.schema)
SEARCHER = ix.searcher()
# -*- coding: utf-8 -*-
from socket import *
for i in range(0, 100):
s = socket(AF_INET, SOCK_STREAM) # make socket
s.connect(('localhost', 8889)) # make the connection
s.send('popular search term')
res = s.recv(102400)
s.close()
print i
print res
print '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'
# -*- coding: utf-8 -*-
from socket import *
import simplejson as json
from whoosh import index, store, fields
from whoosh.qparser import QueryParser
from whoosh import store, fields, index
# dane indeksu whoosh
WHOOSH_INDEX = '/path/to/whoosh/index/'
WHOOSH_SCHEMA = fields.Schema(title=fields.TEXT(stored=True),
content=fields.TEXT,
slug=fields.ID(stored=True, unique=True))
# whoosh config
storage = store.FileStorage(WHOOSH_INDEX)
ix = index.Index(storage, schema=WHOOSH_SCHEMA)
parser = QueryParser("content", schema=ix.schema)
# server config
s = socket(AF_INET, SOCK_STREAM) # make socket
s.bind(('', 8889))
s.listen(5)
while 1:
client,addr = s.accept() # odebranie polaczenia
print 'Polaczenie z ', addr
while 1:
# get the query term
query = client.recv(1024)
if not query: break
# prepare query for searching
query = query.replace('+', ' AND ').replace(' -', ' NOT ').replace(', ', ' AND ')
hits = []
try:
qry = parser.parse(query)
except:
qry = None
if qry is not None:
# THIS WILL LEAK
searcher = ix.searcher()
hits = searcher.search(qry)
# prepare results for sending
if len(hits) > 0:
res = []
for i in hits:
res.append({'slug': i['slug'], 'title': i['title']})
hits = json.dumps(res)
else:
hits = '||'
client.send(hits) # send data to client
client.close()
# -*- coding: utf-8 -*-
from socket import *
import simplejson as json
from whoosh import index, store, fields
from whoosh.qparser import QueryParser
from whoosh import store, fields, index
# dane indeksu whoosh
WHOOSH_INDEX = '/path/to/whoosh/index/'
WHOOSH_SCHEMA = fields.Schema(title=fields.TEXT(stored=True),
content=fields.TEXT,
slug=fields.ID(stored=True, unique=True))
# whoosh config
storage = store.FileStorage(WHOOSH_INDEX)
ix = index.Index(storage, schema=WHOOSH_SCHEMA)
searcher = ix.searcher()
parser = QueryParser("content", schema=ix.schema)
# server config
s = socket(AF_INET, SOCK_STREAM) # make socket
s.bind(('', 8889))
s.listen(5)
while 1:
client,addr = s.accept() # odebranie polaczenia
print 'Polaczenie z ', addr
while 1:
# get the query term
query = client.recv(1024)
if not query: break
# prepare query for searching
query = query.replace('+', ' AND ').replace(' -', ' NOT ').replace(', ', ' AND ')
hits = []
try:
qry = parser.parse(query)
except:
qry = None
if qry is not None:
hits = searcher.search(qry)
# prepare results for sending
if len(hits) > 0:
res = []
for i in hits:
res.append({'slug': i['slug'], 'title': i['title']})
hits = json.dumps(res)
else:
hits = '||'
client.send(hits) # send data to client
client.close()
Sitemaps
It looks like if you use Django built-in Sitemaps framework to make big sitemaps it will use a lot of RAM for something. This time I had really big sitemap - about 9000 elements, and 1,7MB sitemap.xml file. However when I requested the sitemap after server restart - memory usage boosted from 6 to 105MB. Every next request did not increase the memory usage. Solution was to server static sitemap.xml file.
RkBlog
Comment article