Parsing tags with django-content-bbcode in examples
Some time ago I've released django-content-bbcode - a BBCode alike tag parser. In this article I'll show some example of usage for such parser - from simple search and replace to more complex using database to get the response.
Examples
I've wrote a bit about creating tags for django-content-bbcode on na github. But let us start with basic example. Let say we want to turn this:
Into a clickable URL. The "anchor" is the tag name (rk:NAME), next we have one attribute - "href". We can also have tags with a closing tag:
Such tags also have the inner content parsed. django-content-bbcode parses those two types of tags and passes all parsed data as a dictionary to a function that handles given tag. The function gets a list of dictionaries - a list of all occurrences of given (by name) tag:
def anchor(occurrences, text):
for occurrence in occurrences:
href = occurrence['attributes']['href']
text = text.replace(occurrence['tag'], '<a href="%s">link</a>' % href)
return text
For it to work place the code in tags.py file in one of your Django applications (the one using/providing given tag). You will also have create registered_tags dictionary, where key is the tag name and value is the function callable that will handle it. There is an example on github.
Our "anchor" function gets two arguments - a list of dictionaries (occurrences) and text in which those tags were found. Every occurrence dictionary will have "href" attribute. Under special key "tag" there will the raw tag itself. Thanks to that we can replace it in the text with any result we want. In this case we iterate over the list and replace every tag with some HTML code.
Tags with closing tag would also have the inner content under "code" key.
Linkable headlines
Let say that we want to have linkable headlines in article so that it's possible to give a link to given headline in the article:
<a name="1" title="Linkable headlines"></a>
<h3><a href="#1">Linkable headlines</a></h3>
def h(occurrences, text):
for number, tag in enumerate(occurrences):
tag_number = number + 1
result = ('<a name="' + str(tag_number) + '" title="' + tag['code'] + '"></a>'
'<h' + tag['attributes']['id'] + '><a href="#' + str(tag_number) + '">' + tag['code'] + '</a></h' +
tag['attributes']['id'] + '>')
text = text.replace(tag['tag'], result)
return text
We replace every tag occurrence with a HTML code we generated. Every occurrence is numbered to provide unique "A" label name. The "id" attribute determines headline size - h1 or smaller. This example has a bad coding style - it's not good to generate code and to mix different things with each other. A better implementation would look like so:
from django.template.loader import render_to_string
def h(occurrences, text):
for number, tag in enumerate(occurrences):
tag['tag_number'] = number + 1
result = render_to_string('tags/headline.html', tag)
text = text.replace(tag['tag'], result)
return text
<a name="{{ tag_number }}" title="{{ code }}"></a>
<h{{ attributes.id }}><a href="#{{ tag_number }}">{{ code }}</a></h{{ attributes.id }}>
Simple database operations
We can implement even complex tags that will use database or other source to render the result. As a basic example we can implement latest registered users list:
def noobs(occurrences, text):
noob_users = User.objects.all().order_by('-date_joined')[:5]
response = render_to_string('tags/noobs.html', {'users': noob_users})
for occurrence in occurrences:
text = text.replace(occurrence['tag'], response)
return text
With a template:
<ul>
{% for user in users %}
<li>{{ user.username }}</li>
{% endfor %}
</ul>
When creating database using tags it's good to think about caching - either at tag function level or in a view or template in which slow rendering tag will be present.
Data such a list of latest registered users, latest news etc. could be implemented in views or in templates with the help of a function from TEMPLATE_CONTEXT_PROCESSORS. The tag based solution gives you the ability to create and modify content and structure of a page as you desire without the need to change something in the code. It's not always needed or desired, but on for example wiki-alike pages it's quite essential to create content structure without coding it. It can also be handy for static page generators that output formatted static pages for use on Github Pages and alike.
We can also run into a problem in which many tags on one page will execute multiple queries. For example a tag that inserts a link and description of an article given by slug. Many tags - many slug queries. We can solve it by fetching all articles before iterating every occurrence:
def art(occurrences, text):
from articles.models import Article
slugs = []
for i in occurrences:
slugs.append(i['attributes']['slug'])
pages = Article.objects.filter(slug__in=slugs).select_related('site')
for i in pages:
text = text.replace('[ rk:art slug="' + i.slug + '" ]',
'<li><a href="%s">%s</a> - %s</li>' % (i.get_absolute_url(), i.title, i.short_description))
We gather all slugs and then make one query using "IN" operator. It's also recommended to see if the code needs also some fields in select_related. The "IN" queries can be helpfull if the list isn't very big. Also don't give it a queryset but flat IDs list as it may end up in a monster query with subselects executed multiple times. Also note that this implementation has some code generation that could be done better ;)
Syntax highlighting
We can also use some external packages like pygments for colour highlighting or pillow for thumbnail creation. For example code highlighting would look like so:
from pygments import highlight
from pygments.lexers import get_lexer_by_name
from pygments.formatters import HtmlFormatter
def syntax(occurrences, text):
pygments_formatter = HtmlFormatter()
langs = {}
for i in occurrences:
language = i['attributes'].get('lang', 'text')
lexer = get_lexer_by_name(language)
parsed = highlight(i['code'], lexer, pygments_formatter)
text = text.replace(i['tag'], parsed)
# css styles for given lang
langs['<style>%s</style>' % pygments_formatter.get_style_defs()] = True
#add CSS in to the text
styles = ''
for style in langs.keys():
styles = styles + style
text = '%s%s' % (text, styles)
return text
Aside of hackerish CSS injection (they could also be added to site CSS globally) the function looks similar to others. The "lang" attribute determines language for highlighting and the "code" tag has the code to be highlighted.
You can also change the function implementation, like change the library used for highlighting without the need to apply changes to every text in which the tag is used.
Comment article