Python tv scraper development
![]() |
![]() ![]() |
![]() |
Introduction
Historically, Kodi has been supporting XML scraping addons that allow parsing online information sources about movies, TV shows, music and so on. However, this approach has its limitations. First, XML parsing definitions with regular expressions are difficult to write and maintain. Second, many information sources implemented REST APIs for getting information and regular expressions are not suitable for parsing JSON data. That is why Python scrapers have been introduced.
Python scrapers are written in Python language and have similar structure to media addons (media plugins).
They are also called by special plugin://
URLs with query parameters. The main query parameter is action
that defines a scraping stage.
An example of a URL query sting passed to a scraper plugin: ?action=getdetails&url=foo&pathSettings=%7B%22foo%22%3A+%22bar%22%7D
.
Like other plugins, scrapers are also interact with Kodi via the functions of xbmcplugin
Python API module and pass information to Kodi through xbmcgui.ListItem
instances.
Basically, a Python scraper is a media plugin that passes information to the Kodi database instead of presenting lists of playable media items.
A TV shows scraper must support "action" calls that are described below.
Actions
Find
The find
action is used for searching for a specific TV show by title and optionally a year that are passed as additional query parameters of the plugin call.
This action should use xbmcplugin.addDirectoryItem
or xbmcplugin.addDirectoryItems
to pass xbmcgui.ListItem
instances to Kodi.
If only one instance is passed then it is considered as a perfect match. Otherwise the media file won't be matched and will need to be resolved manually by selecting
a necessary item from a list.
The xbmcgui.ListItem
instances must be assigned the following properties:
- label: passed as
label
parameter to the class constructor. This is the label that is presented to a user during scraping.
- url: passed as
url
parameter to the class constructor. This should be some unique string that can be used to request all the necessary TV show info from data provider's
website or API. It can be, for example, a link to a TV show page on a TV information website or a some unique ID to request TV show information from a REST API.
- thumb (optional): passed via
setArtwork()
ofxbmcgui.ListItem
class instance method. This should be a URL of a TV show poster, for example.
NfoUrl (TV show)
The NfoUrl
action is called as an alternative to find
if tvshow.nfo
file is present in a TV show directory.
The entire .NFO file contents are passed as nfo
parameter. This action should use xbmcplugin.addDirectoryItem
to pass a single xbmcgui.ListItem
instance to Kodi.
The xbmcgui.ListItem
instances must be assigned the following properties:
- url: passed as
url
parameter to the class constructor. This should be some unique string that can be used to request all the necessary TV show info from data provider's
website or API. It can be, for example, a link to a TV show page on a TV show information website or a some unique ID to request TV show information from a REST API.
- Unique IDs: set via
ListItem.setUniqueIDs()
instance method. A scraper should set at least the default unique ID from the TV information site it works with.
Optionally it can set unique IDs from other online TV show databases. This is needed to correctly get TV show's artwork.
- episodeguide: set via
ListItem.setInfo()
instance method. This should be some unique string that can be used to retrieve the list of TV show episoded with all the necessary info.
If tvshow.nfo
contains all the necessary info about the respective TV show in XML format then this actions may not pass any xbmcgui.ListItem
instance to Kodi.
In this case Kodi will use all the information parsed from the XML file and will not call getdetails action.
NfoUrl (episode)
The NfoUrl
action is also called as for each episode .NFO file that is present in a TV show directory.
The entire .NFO file contents are passed as nfo
parameter. This action should use xbmcplugin.addDirectoryItem
to pass a single xbmcgui.ListItem
instance to Kodi.
The xbmcgui.ListItem
instances must be assigned the following properties:
- url: passed as
url
parameter to the class constructor. This should be some unique string that can be used to request all the necessary episode info from data provider's
website or API. It can be, for example, a link to the episode page on a TV show information website or some unique ID to request episode information from a REST API.
If an episode .NFO file contains all the necessary info about the respective episode in XML format then this action may not pass any xbmcgui.ListItem
instance to Kodi.
In this case Kodi will use all the information parsed from the XML file and will not call getepisodedetails action.
getdetails
The getdetails
action must pass a single xbmcgui.ListItem
via xbmcplugin.setResolvedUrl()
function. This action receives url query parameter
from the previous stages and should set as much information to the xbmcgui.ListItem
instance as possible using appropriate methods.
One of the necessary properties that are set via ListItem.setInfo
method is episodeguide
. This should be some unique string that can be used to retrieve
the list of TV show episoded with all the necessary info.
getepisodelist
The getepisodelist
action should use xbmcplugin.addDirectoryItem
or xbmcplugin.addDirectoryItems
to pass xbmcgui.ListItem
instances to Kodi
with information about available TV show episodes. This action receives url query parameter that is episodeguide
property set by getdetails
or NfoUrl
action.
The xbmcgui.ListItem
instances must be assigned the following properties:
- url: passed as
url
parameter to the class constructor. This is some unique string that can be used to retrieve information about a specific episode.
- season: season number that is passed via
ListItem.setInfo
method.
- episode: episode number that is passed
ListItem.setInfo
method.
- aired: episode air date (if available) that is passed
ListItem.setInfo
method.
getepisodedetails
The getepisodedetails
action must pass a single xbmcgui.ListItem
via xbmcplugin.setResolvedUrl()
function. This action receives url query parameter
from the previous stage and should set as much information to the xbmcgui.ListItem
instance as possible using appropriate methods.
getartwork
The getartwork
action must pass a single xbmcgui.ListItem
via xbmcplugin.setResolvedUrl()
function. This action receives id query parameter
that is the default unique ID set by NfoUrl or getdetails actions or parsed form an XML tvshow.nfo
file if the scraper haven't set any. This action should set available artwork using ListItem.addAvailableArtwork()
and ListItem.setAvailableFanart()
instance methods.
Warning: if your .NFO file contains the default unique ID that is different from the ID of the TV show database your scraper works with,
you must set the correct default unique ID in NfoUrl
action. Otherwise getdetails
will receive the default unique ID from your .NFO file
and your scraper may not be able to find any artwork using this ID.
episodeguide
The episodeguide
string is passed to getepisodedetails
action to retrieve the list of episodes. In legacy XML scrapers it was an actual URL of a page that was supposed to contain
the list of TV show episodes. Now in Python scrapers it can be any string that allows to correctly identify a TV show and retrieve the list of its episodes. However, the commonly accepted convention for episodeguide
is that it should be a JSON-encoded string containing IDs of the specific TV show in various online databases. As a minimum it should include the ID of a show at the database
the scraper works with. An example of a JSON-encoded episodeguide
string:
{"tvmaze": "60", "tvrage": "4628", "tvdb": "72108", "imdb": "tt0364845"}
Note that all IDs should be strings.
Path Settings
A scraper receives the current path settings via pathSettings
query parameters as a JSON-encoded string with each action call.
Example TV Shows Scraper
A very high-level example of a Python TV show scraper can be found in the Kodi source code as metadata.demo.tv addon.