Django and Lupy
Check out the new site at https://rkblog.dev.
14 July 2008
Comments
Lupy is a full text search engine written in Python as a port of Jakarta Lucene search engine. The project is retired. A description can be found on old project page, and you can download Lupy from a Gentoo mirror. In /examples you can find some nice example code. Here I'll show the basic for indexing and searching data originally stored in the database.Creating Search Index
Place this code in a view:# import
from lupy.indexer import Index
# we create index named "foobar", create True = overwrite existing
index = Index('foobar', create=True)
# get all data
pages = Page.objects.all()
for p in pages:
#index every page
index.index(text=p.text, __title=p.title, _slug=p.slug)
index.optimize()
class Page(models.Model):
title = models.CharField(maxlength=255) # page real title (for title tag and h1 in templates)
slug = models.SlugField(maxlength=255, unique=True) # the wiki URL "title"
description = models.CharField(maxlength=255) # short description (meta description, some link generation)
text = models.TextField() # the page text
changes = models.CharField(maxlength=255) # description of changes, no blanks!
creation_date = models.DateTimeField(auto_now_add = True)
modification_date = models.DateTimeField(auto_now = True)
modification_user = models.CharField(maxlength=30)
modification_ip = models.CharField(maxlength=20, blank=True)
Simple Search
from lupy.indexer import Index
index = Index('foobar', create=False)
# search term - python
hits = index.find('python')
for h in hits:
# slug is a lupu search index field which we added when indexing.
print 'Found in ', h.get('slug')
Real Usage
I've added this code to the wiki "add page" view, after the data is validated and saved:if settings.WIKI_SEARCH_WITH_LUPY:
from lupy.indexer import Index
from os.path import isdir
if isdir('diamandaSearchCache'):
index = Index('diamandaSearchCache', create=False)
else:
index = Index('diamandaSearchCache', create=True)
index.index(text=page_data['text'].decode("utf-8"), __title=page_data['title'].decode("utf-8"), __description=page_data['description'].decode("utf-8"), _slug=page_data['slug'])
index.optimize()
Searching is more advanced - boolean OR search for each word in the phrase. The code is part of wiki search view:
if data.has_key('lupy'):
from lupy.index.term import Term
from lupy.search.indexsearcher import IndexSearcher
from lupy.search.term import TermQuery
from lupy.search.boolean import BooleanQuery
index = IndexSearcher('diamandaSearchCache')
query = data['string'].split(' ')
q = BooleanQuery()
# how many words i phrase ?
if len(query) > 1:
for a in query:
t = Term('text', a.decode("utf-8"))
tq = TermQuery(t)
q.add(tq, False, False)
# one word
else:
t = Term('text', query[0].decode("utf-8"))
tq = TermQuery(t)
q.add(tq, True, False)
hits = index.search(q)
pages = []
for h in hits:
pages.append({'title': h.get('title'),'description': h.get('description'),'slug': h.get('slug')})
# to get from best to worst order
pages.reverse()
return render_to_response('wiki/' + settings.ENGINE + '/search.html', {'pages': pages, 'lupy': lupy, 'string': data['string'], 'google': google, 'lupyuse': True, 'theme': settings.THEME, 'engine': settings.ENGINE})
<input type="submit" value="{% trans "boolean OR search" %}" name="lupy" />
{% if lupyuse %}
{% for page in pages %}
<img src="/site_media/wiki/img/2.png" alt="" /> <a href="/wiki/page/{{ page.slug }}/">{{ page.title }}</a> - {{ page.description }}<br />
{% endfor %}
{% endif %}
RkBlog
Check out the new site at https://rkblog.dev.
Comment article