Searching multimedia sites in Python using public APIs

Check out the new site at https://rkblog.dev.

9 October 2009 Comments

YouTube

Youtube has a big API described on code.google.com, but the most interesting is the gdata API that allows us to manage video clips. Part of the API methods doesn't require any authentication or keys. For example to search for videos you can use this feed: http://gdata.youtube.com/feeds/api/videos?q=TERM&v=2. In Python this feed can be easily parsed with feedparser:

import feedparser

YURL = 'http://gdata.youtube.com/feeds/api/videos?q=%s&v=2'

rssurl = YURL % 'python'
data = feedparser.parse(rssurl)
for i in data['entries']:
	title = i['title']
	print title

Slideshare

Slideshare has its own API, which requires API key generated after free registration. In Python you can use PySlideshare, which covers nearly all API methods. One of unsupported methods is slides searching. To search for slides we have to use the API directly by requesting this method URL with additional GET arguments. Required arguments are api_key, ts (timestamp of the request), hash (a sha1 hash of timestamp and secret key). Here is an example of searching slides using the API:

import urllib2
import hashlib, time

SLIDESHARE = 'API KEY'
SLIDESHARE_SECRET = 'SECRET KEY'

# request timestamp
ts = int(time.time())
# the hash
m = hashlib.sha1()
m.update(SLIDESHARE_SECRET+str(ts))
shash = m.hexdigest()

SLIDE = 'http://www.slideshare.net/api/2/search_slideshows?api_key=%s&ts=%s&hash=%s' % (SLIDESHARE, ts, shash)

# search for the given term
surl = SLIDE + '&q=%s' % 'python'
opener = urllib2.build_opener()
opener.addheaders = [('user-agent', 'Opera/9.64 (X11; Linux x86_64; U; en) Presto/2.1.1')]
o = opener.open(surl)
xml = o.read()
print xml

Data is returned in XML format.

Vimeo

Vimeo also has nice API. To use it we have to register and generate the API key. Some methods need only the key, and other are more protected by extra authentication. There is a Python module for this API: python-vimeo, but you can always use the API directly:

import urllib2

VIMEO = 'API KEY'

VIMEO_LINK = 'http://vimeo.com/api/rest/v2?api_key=%s&method=vimeo.videos.search' % VIMEO

vurl = VIMEO_LINK + '&query=%s' % 'python'
opener = urllib2.build_opener()
opener.addheaders = [('user-agent', 'Opera/9.64 (X11; Linux x86_64; U; en) Presto/2.1.1')]
o = opener.open(vurl)
xml = o.read()
print xml

Scribd

Scribd is a document storage website, similar do Slideshare but focused on typical documents. You can use the public API with the help of python-scribd. To use the API you need to register and generate API key. To search for documents we can use:

import urllib2

SCRIBD = 'API Key'

SCRIBD_URL = 'http://api.scribd.com/api?method=docs.search&api_key=%s&scope=all' % SCRIBD

surl = SCRIBD_URL + '&query=python'
opener = urllib2.build_opener()
opener.addheaders = [('user-agent', 'Opera/9.64 (X11; Linux x86_64; U; en) Presto/2.1.1')]
o = opener.open(surl)
xml = o.read()
print xml

Examples shown in this article were used to make my media catalogue, that gathers slides, docs and video clips from those sites (except Scribd, which has bit unfriendly API that doesn't pass all data needed to generate the embed codes from search results).

RkBlog

Python programming, 9 October 2009

Check out the new site at https://rkblog.dev.