RkBlog

Hardware, programming and astronomy tutorials and reviews.

Django and Lupy

Lupy is a full text search engine written in Python as a port of Jakarta Lucene search engine. The project is retired. A description can be found on old project page, and you can download Lupy from a Gentoo mirror. In /examples you can find some nice example code. Here I'll show the basic for indexing and searching data originally stored in the database.

Creating Search Index

Place this code in a view:
# import
from lupy.indexer import Index
# we create index named "foobar", create True = overwrite existing
index = Index('foobar', create=True)

# get all data
pages = Page.objects.all()
for p in pages:
	#index every page
	index.index(text=p.text, __title=p.title, _slug=p.slug)
index.optimize()
For a model:
class Page(models.Model):
	title = models.CharField(maxlength=255) # page real title (for title tag and h1 in templates)
	slug = models.SlugField(maxlength=255, unique=True) # the wiki URL "title"
	description = models.CharField(maxlength=255) # short description (meta description, some link generation)
	text = models.TextField() # the page text
	changes = models.CharField(maxlength=255) # description of changes, no blanks!
	creation_date = models.DateTimeField(auto_now_add = True)
	modification_date = models.DateTimeField(auto_now = True)
	modification_user = models.CharField(maxlength=30)
	modification_ip = models.CharField(maxlength=20, blank=True)

Simple Search

from lupy.indexer import Index
index = Index('foobar', create=False)
# search term - python
hits = index.find('python')
for h in hits:
	# slug is a lupu search index field which we added when indexing.
	print 'Found in ', h.get('slug')


Real Usage

I've added this code to the wiki "add page" view, after the data is validated and saved:
if settings.WIKI_SEARCH_WITH_LUPY:
	from lupy.indexer import Index
	from os.path import isdir
	if isdir('diamandaSearchCache'):
		index = Index('diamandaSearchCache', create=False)
	else:
		index = Index('diamandaSearchCache', create=True)
	index.index(text=page_data['text'].decode("utf-8"), __title=page_data['title'].decode("utf-8"), __description=page_data['description'].decode("utf-8"), _slug=page_data['slug'])
	index.optimize()
Where diamandaSearchCache is the Lupy cache name, "WIKI_SEARCH_WITH_LUPY" is a setting in settings.py which can be True/False indicating if we use Lupy or not. This code indexes added page.

Searching is more advanced - boolean OR search for each word in the phrase. The code is part of wiki search view:
if data.has_key('lupy'):
	from lupy.index.term import Term
	from lupy.search.indexsearcher import IndexSearcher
	from lupy.search.term import TermQuery
	from lupy.search.boolean import BooleanQuery
	
	index =  IndexSearcher('diamandaSearchCache')
	query = data['string'].split(' ')
	q = BooleanQuery()
	# how many words i phrase ?
	if len(query) > 1:
		for a in query:
			t = Term('text', a.decode("utf-8"))
			tq = TermQuery(t)
			q.add(tq, False, False)
	# one word
	else:
		t = Term('text', query[0].decode("utf-8"))
		tq = TermQuery(t)
		q.add(tq, True, False)
	hits = index.search(q)
	pages = []
	for h in hits:
		pages.append({'title': h.get('title'),'description': h.get('description'),'slug': h.get('slug')})
	# to get from best to worst order
	pages.reverse()
	return render_to_response('wiki/' + settings.ENGINE + '/search.html', {'pages': pages, 'lupy': lupy, 'string': data['string'], 'google': google, 'lupyuse': True, 'theme': settings.THEME, 'engine': settings.ENGINE})
Search form has a submit button called lupy:
<input type="submit" value="{% trans "boolean OR search" %}" name="lupy" />
The name indicates which search (LIKE, Google or Lupy) is used by the user. Results are showed in the template:
{% if lupyuse %}
	{% for page in pages %}
	<img src="/site_media/wiki/img/2.png" alt="" /> <a href="/wiki/page/{{ page.slug }}/">{{ page.title }}</a> - {{ page.description }}<br />
	{% endfor %}
{% endif %}
RkBlog

Python programming, 14 July 2008, Piotr Maliński

Comment article