Searching multimedia sites in Python using public APIs
Check out the new site at https://rkblog.dev.
9 October 2009
Comments
YouTube
Youtube has a big API described on code.google.com, but the most interesting is the gdata API that allows us to manage video clips. Part of the API methods doesn't require any authentication or keys. For example to search for videos you can use this feed: http://gdata.youtube.com/feeds/api/videos?q=TERM&v=2. In Python this feed can be easily parsed with feedparser:
import feedparser
YURL = 'http://gdata.youtube.com/feeds/api/videos?q=%s&v=2'
rssurl = YURL % 'python'
data = feedparser.parse(rssurl)
for i in data['entries']:
title = i['title']
print title
Slideshare
Slideshare has its own API, which requires API key generated after free registration. In Python you can use PySlideshare, which covers nearly all API methods. One of unsupported methods is slides searching. To search for slides we have to use the API directly by requesting this method URL with additional GET arguments. Required arguments are api_key, ts (timestamp of the request), hash (a sha1 hash of timestamp and secret key). Here is an example of searching slides using the API:import urllib2
import hashlib, time
SLIDESHARE = 'API KEY'
SLIDESHARE_SECRET = 'SECRET KEY'
# request timestamp
ts = int(time.time())
# the hash
m = hashlib.sha1()
m.update(SLIDESHARE_SECRET+str(ts))
shash = m.hexdigest()
SLIDE = 'http://www.slideshare.net/api/2/search_slideshows?api_key=%s&ts=%s&hash=%s' % (SLIDESHARE, ts, shash)
# search for the given term
surl = SLIDE + '&q=%s' % 'python'
opener = urllib2.build_opener()
opener.addheaders = [('user-agent', 'Opera/9.64 (X11; Linux x86_64; U; en) Presto/2.1.1')]
o = opener.open(surl)
xml = o.read()
print xml
Vimeo
Vimeo also has nice API. To use it we have to register and generate the API key. Some methods need only the key, and other are more protected by extra authentication. There is a Python module for this API: python-vimeo, but you can always use the API directly:import urllib2
VIMEO = 'API KEY'
VIMEO_LINK = 'http://vimeo.com/api/rest/v2?api_key=%s&method=vimeo.videos.search' % VIMEO
vurl = VIMEO_LINK + '&query=%s' % 'python'
opener = urllib2.build_opener()
opener.addheaders = [('user-agent', 'Opera/9.64 (X11; Linux x86_64; U; en) Presto/2.1.1')]
o = opener.open(vurl)
xml = o.read()
print xml
Scribd
Scribd is a document storage website, similar do Slideshare but focused on typical documents. You can use the public API with the help of python-scribd. To use the API you need to register and generate API key. To search for documents we can use:import urllib2
SCRIBD = 'API Key'
SCRIBD_URL = 'http://api.scribd.com/api?method=docs.search&api_key=%s&scope=all' % SCRIBD
surl = SCRIBD_URL + '&query=python'
opener = urllib2.build_opener()
opener.addheaders = [('user-agent', 'Opera/9.64 (X11; Linux x86_64; U; en) Presto/2.1.1')]
o = opener.open(surl)
xml = o.read()
print xml
Examples shown in this article were used to make my media catalogue, that gathers slides, docs and video clips from those sites (except Scribd, which has bit unfriendly API that doesn't pass all data needed to generate the embed codes from search results).
RkBlog
Check out the new site at https://rkblog.dev.
Comment article