Parsing tags with django-content-bbcode in examples

Check out the new site at https://rkblog.dev.

13 April 2014 Comments

Some time ago I've released django-content-bbcode - a BBCode alike tag parser. In this article I'll show some example of usage for such parser - from simple search and replace to more complex using database to get the response.

Examples

I've wrote a bit about creating tags for django-content-bbcode on na github. But let us start with basic example. Let say we want to turn this:

[rk:anchor href="http://www.google.pl"]

Into a clickable URL. The "anchor" is the tag name (rk:NAME), next we have one attribute - "href". We can also have tags with a closing tag:

[rk:anchor href="http://www.google.pl"]click me![/rk:anchor]

Such tags also have the inner content parsed. django-content-bbcode parses those two types of tags and passes all parsed data as a dictionary to a function that handles given tag. The function gets a list of dictionaries - a list of all occurrences of given (by name) tag:

def anchor(occurrences, text):
    for occurrence in occurrences:
        href = occurrence['attributes']['href']
        text = text.replace(occurrence['tag'], '<a href="%s">link</a>' % href)
    return text

For it to work place the code in tags.py file in one of your Django applications (the one using/providing given tag). You will also have create registered_tags dictionary, where key is the tag name and value is the function callable that will handle it. There is an example on github.

Our "anchor" function gets two arguments - a list of dictionaries (occurrences) and text in which those tags were found. Every occurrence dictionary will have "href" attribute. Under special key "tag" there will the raw tag itself. Thanks to that we can replace it in the text with any result we want. In this case we iterate over the list and replace every tag with some HTML code.

Tags with closing tag would also have the inner content under "code" key.

Linkable headlines

Let say that we want to have linkable headlines in article so that it's possible to give a link to given headline in the article:

<a name="1" title="Linkable headlines"></a>
<h3><a href="#1">Linkable headlines</a></h3>

It can be done with tag like this one:

[ rk:h id="4" ]Linkable headlines[ /rk:h ]

Basic parser would look like this:

def h(occurrences, text):
    for number, tag in enumerate(occurrences):
        tag_number = number + 1
        result = ('<a name="' + str(tag_number) + '" title="' + tag['code'] + '"></a>'
                  '<h' + tag['attributes']['id'] + '><a href="#' + str(tag_number) + '">' + tag['code'] + '</a></h' +
                  tag['attributes']['id'] + '>')
        text = text.replace(tag['tag'], result)
    return text

We replace every tag occurrence with a HTML code we generated. Every occurrence is numbered to provide unique "A" label name. The "id" attribute determines headline size - h1 or smaller. This example has a bad coding style - it's not good to generate code and to mix different things with each other. A better implementation would look like so:

from django.template.loader import render_to_string

def h(occurrences, text):
    for number, tag in enumerate(occurrences):
        tag['tag_number'] = number + 1
        result = render_to_string('tags/headline.html', tag)
        text = text.replace(tag['tag'], result)
    return text

It would use a Django template to render the response:

<a name="{{ tag_number }}" title="{{ code }}"></a>
<h{{ attributes.id }}><a href="#{{ tag_number }}">{{ code }}</a></h{{ attributes.id }}>

No code generation and we splitted handling from response generation.

Simple database operations

We can implement even complex tags that will use database or other source to render the result. As a basic example we can implement latest registered users list:

def noobs(occurrences, text):
    noob_users = User.objects.all().order_by('-date_joined')[:5]
    response = render_to_string('tags/noobs.html', {'users': noob_users})
    for occurrence in occurrences:
        text = text.replace(occurrence['tag'], response)
    return text

With a template:
<ul>
{% for user in users %}
    <li>{{ user.username }}</li>
{% endfor %}
</ul>

When creating database using tags it's good to think about caching - either at tag function level or in a view or template in which slow rendering tag will be present.

Data such a list of latest registered users, latest news etc. could be implemented in views or in templates with the help of a function from TEMPLATE_CONTEXT_PROCESSORS. The tag based solution gives you the ability to create and modify content and structure of a page as you desire without the need to change something in the code. It's not always needed or desired, but on for example wiki-alike pages it's quite essential to create content structure without coding it. It can also be handy for static page generators that output formatted static pages for use on Github Pages and alike.

We can also run into a problem in which many tags on one page will execute multiple queries. For example a tag that inserts a link and description of an article given by slug. Many tags - many slug queries. We can solve it by fetching all articles before iterating every occurrence:

def art(occurrences, text):
    from articles.models import Article
    slugs = []
    for i in occurrences:
        slugs.append(i['attributes']['slug'])
    pages = Article.objects.filter(slug__in=slugs).select_related('site')
    for i in pages:
        text = text.replace('[ rk:art slug="' + i.slug + '" ]',
                            '<li><a href="%s">%s</a> - %s</li>' % (i.get_absolute_url(), i.title, i.short_description))

We gather all slugs and then make one query using "IN" operator. It's also recommended to see if the code needs also some fields in select_related. The "IN" queries can be helpfull if the list isn't very big. Also don't give it a queryset but flat IDs list as it may end up in a monster query with subselects executed multiple times. Also note that this implementation has some code generation that could be done better ;)

Syntax highlighting

We can also use some external packages like pygments for colour highlighting or pillow for thumbnail creation. For example code highlighting would look like so:

from pygments import highlight
from pygments.lexers import get_lexer_by_name
from pygments.formatters import HtmlFormatter


def syntax(occurrences, text):
    pygments_formatter = HtmlFormatter()
    langs = {}
    for i in occurrences:
        language = i['attributes'].get('lang', 'text')
        lexer = get_lexer_by_name(language)
        parsed = highlight(i['code'], lexer, pygments_formatter)
        text = text.replace(i['tag'],  parsed)
        # css styles for given lang
        langs['<style>%s</style>' % pygments_formatter.get_style_defs()] = True

    #add CSS in to the text
    styles = ''
    for style in langs.keys():
        styles = styles + style
    text = '%s%s' % (text, styles)
    return text

Aside of hackerish CSS injection (they could also be added to site CSS globally) the function looks similar to others. The "lang" attribute determines language for highlighting and the "code" tag has the code to be highlighted.

You can also change the function implementation, like change the library used for highlighting without the need to apply changes to every text in which the tag is used.

RkBlog

Django web framework tutorials, 13 April 2014

Check out the new site at https://rkblog.dev.