Difference between revisions of "Python tv scraper development"

From Official Kodi Wiki
Jump to navigation Jump to search
m
(One intermediate revision by the same user not shown)
Line 1: Line 1:
{{mininav|[[Scrapers]] {{l2|[[Development]]}} }}
{{mininav|[[Scrapers]] {{l2|[[Development]]}} }}


{{Wiki_revamp}}
=Introduction=
 
Historically, Kodi has been supporting [https://kodi.wiki/view/HOW-TO:Write_media_scrapers XML scraping addons] that allow parsing online information sources
about movies, TV shows, music and so on. However, this approach has its limitations. First, XML parsing definitions with regular expressions are difficult to write and maintain.
Second, many information sources implemented REST APIs for getting information and regular expressions are not suitable for parsing JSON data. That is why Python
scrapers have been introduced.
 
Python scrapers are written in Python language and have similar structure to [https://kodi.wiki/view/HOW-TO:Video_addon media addons] (media plugins).
They are also called by special <code>plugin://</code> URLs with query parameters. The main query parameter is <code>action</code> that defines a scraping stage.
An example of a URL query sting passed to a scraper plugin: <code>?action=getdetails&url=foo&pathSettings=%7B%22foo%22%3A+%22bar%22%7D</code>.
Like other plugins, scrapers are also interact with Kodi via the functions of <code>xbmcplugin</code> Python API module and pass information to Kodi through <code>xbmcgui.ListItem</code> instances.
Basically, a Python scraper is a media plugin that passes information to the Kodi database instead of presenting lists of playable media items.
 
A TV shows scraper must support "action" calls that are described below.
 
=Actions=
 
==Find==
 
The <code>find</code> action is used for searching for a specific TV show by title and optionally a year that are passed as additional query parameters of the plugin call.
This action should use <code>xbmcplugin.addDirectoryItem</code> or <code>xbmcplugin.addDirectoryItems</code> to pass <code>xbmcgui.ListItem</code> instances to Kodi.
If only one instance is passed then it is considered as a perfect match. Otherwise the media file won't be matched and will need to be resolved manually by selecting
a necessary item from a list.
The <code>xbmcgui.ListItem</code> instances must be assigned the following properties:
 
* '''label''': passed as <code>label</code> parameter to the class constructor. This is the label that is presented to a user during scraping.
 
* '''url''': passed as <code>url</code> parameter to the class constructor. This should be some unique string that can be used to request all the necessary TV show info from data provider's
website or API. It can be, for example, a link to a TV show page on a TV information website or a some unique ID to request TV show information from a REST API.
 
* '''thumb''' (optional): passed via <code>setArtwork()</code> of <code>xbmcgui.ListItem</code> class instance method. This should be a URL of a TV show poster, for example.
 
==NfoUrl==
 
The <code>NfoUrl</code> action is called as an alternative to <code>find</code> if <code>tvshow.nfo</code> file is present in a TV show directory.
The entire .NFO file contents are passed as <code>nfo</code>parameter. This action should use <code>xbmcplugin.addDirectoryItem</code> to pass a single <code>xbmcgui.ListItem</code> instance to Kodi.
 
The <code>xbmcgui.ListItem</code> instances must be assigned the following properties:
 
* '''url''': passed as <code>url</code> parameter to the class constructor. This should be some unique string that can be used to request all the necessary TV show info from data provider's
website or API. It can be, for example, a link to a TV show page on a TV show information website or a some unique ID to request TV show information from a REST API.
 
* '''Unique IDs''': set via <code>ListItem.setUniqueIDs()</code> instance method. A scraper should set at least the default unique ID from the TV information site it works with.
Optionally it can set unique IDs from other online TV show databases.
 
==getdetails==
 
The <code>getdetails</code> action must pass a single <code>xbmcgui.ListItem</code> via <code>xbmcplugin.setResolvedUrl()</code> function. This action receives '''url''' query parameter
from the previous stages and should set as much information to the <code>xbmcgui.ListItem</code> instance as possible using appropriate methods.
One of the necessary properties that are set via <code>ListItem.setInfo</code> method is <code>episodeguide</code>. This should be some unique string that can be used to retrieve
the list of TV show episoded with all the necessary info.
 
==getepisodelist==
 
The <code>getepisodelist</code> action should use <code>xbmcplugin.addDirectoryItem</code> or <code>xbmcplugin.addDirectoryItems</code> to pass <code>xbmcgui.ListItem</code> instances to Kodi
with information about available TV show episodes. This action receives '''url''' query parameter that is <code>episodeguide</code> property set by <code>getdetails</code> action.
A scraper needs to set '''url''' for each <code>xbmcgui.ListItem</code> instance that is some unique string that can be used to retrieve information about a specific episode.
 
==getepisodedetails==
 
The <code>getepisodedetails</code> action must pass a single <code>xbmcgui.ListItem</code> via <code>xbmcplugin.setResolvedUrl()</code> function. This action receives '''url''' query parameter
from the previous stage and should set as much information to the <code>xbmcgui.ListItem</code> instance as possible using appropriate methods.
 
==getartwork==
 
The <code>getartwork</code> action must pass a single <code>xbmcgui.ListItem</code> via <code>xbmcplugin.setResolvedUrl()</code> function. This action receives '''id''' query parameter
that is the default unique ID set by '''NfoUrl''' or '''getdetails''' actions. This action should set available artwork using <code>ListItem.addAvailableArtwork()</code>
and <code>ListItem.setAvailableFanart()</code> instance methods.
 
'''Warning''': if your .NFO file contains the default unique ID that is different from the ID of the TV show database your scraper works with,
you must set the correct default unique ID in <code>NfoUrl</code> action. Otherwise <code>getdetails</code> will receive the default unique ID from your .NFO file
and your scraper may not be able to find any artwork using this ID.
 
=Path Settings=
 
A scraper receives the current path settings via <code>pathSettings</code> query parameters as a JSON-encoded string with each action call.
 
=Example TV Shows Scraper=
 
A very high-level example of a Python TV show scraper can be found in the Kodi source code as [https://github.com/xbmc/xbmc/tree/master/addons/metadata.demo.tv metadata.demo.tv addon].

Revision as of 18:50, 19 June 2022

Home icon grey.png   ▶ Scrapers
▶ Development
▶ Python tv scraper development

1 Introduction

Historically, Kodi has been supporting XML scraping addons that allow parsing online information sources about movies, TV shows, music and so on. However, this approach has its limitations. First, XML parsing definitions with regular expressions are difficult to write and maintain. Second, many information sources implemented REST APIs for getting information and regular expressions are not suitable for parsing JSON data. That is why Python scrapers have been introduced.

Python scrapers are written in Python language and have similar structure to media addons (media plugins). They are also called by special plugin:// URLs with query parameters. The main query parameter is action that defines a scraping stage. An example of a URL query sting passed to a scraper plugin: ?action=getdetails&url=foo&pathSettings=%7B%22foo%22%3A+%22bar%22%7D. Like other plugins, scrapers are also interact with Kodi via the functions of xbmcplugin Python API module and pass information to Kodi through xbmcgui.ListItem instances. Basically, a Python scraper is a media plugin that passes information to the Kodi database instead of presenting lists of playable media items.

A TV shows scraper must support "action" calls that are described below.

2 Actions

2.1 Find

The find action is used for searching for a specific TV show by title and optionally a year that are passed as additional query parameters of the plugin call. This action should use xbmcplugin.addDirectoryItem or xbmcplugin.addDirectoryItems to pass xbmcgui.ListItem instances to Kodi. If only one instance is passed then it is considered as a perfect match. Otherwise the media file won't be matched and will need to be resolved manually by selecting a necessary item from a list. The xbmcgui.ListItem instances must be assigned the following properties:

  • label: passed as label parameter to the class constructor. This is the label that is presented to a user during scraping.
  • url: passed as url parameter to the class constructor. This should be some unique string that can be used to request all the necessary TV show info from data provider's

website or API. It can be, for example, a link to a TV show page on a TV information website or a some unique ID to request TV show information from a REST API.

  • thumb (optional): passed via setArtwork() of xbmcgui.ListItem class instance method. This should be a URL of a TV show poster, for example.

2.2 NfoUrl

The NfoUrl action is called as an alternative to find if tvshow.nfo file is present in a TV show directory. The entire .NFO file contents are passed as nfoparameter. This action should use xbmcplugin.addDirectoryItem to pass a single xbmcgui.ListItem instance to Kodi.

The xbmcgui.ListItem instances must be assigned the following properties:

  • url: passed as url parameter to the class constructor. This should be some unique string that can be used to request all the necessary TV show info from data provider's

website or API. It can be, for example, a link to a TV show page on a TV show information website or a some unique ID to request TV show information from a REST API.

  • Unique IDs: set via ListItem.setUniqueIDs() instance method. A scraper should set at least the default unique ID from the TV information site it works with.

Optionally it can set unique IDs from other online TV show databases.

2.3 getdetails

The getdetails action must pass a single xbmcgui.ListItem via xbmcplugin.setResolvedUrl() function. This action receives url query parameter from the previous stages and should set as much information to the xbmcgui.ListItem instance as possible using appropriate methods. One of the necessary properties that are set via ListItem.setInfo method is episodeguide. This should be some unique string that can be used to retrieve the list of TV show episoded with all the necessary info.

2.4 getepisodelist

The getepisodelist action should use xbmcplugin.addDirectoryItem or xbmcplugin.addDirectoryItems to pass xbmcgui.ListItem instances to Kodi with information about available TV show episodes. This action receives url query parameter that is episodeguide property set by getdetails action. A scraper needs to set url for each xbmcgui.ListItem instance that is some unique string that can be used to retrieve information about a specific episode.

2.5 getepisodedetails

The getepisodedetails action must pass a single xbmcgui.ListItem via xbmcplugin.setResolvedUrl() function. This action receives url query parameter from the previous stage and should set as much information to the xbmcgui.ListItem instance as possible using appropriate methods.

2.6 getartwork

The getartwork action must pass a single xbmcgui.ListItem via xbmcplugin.setResolvedUrl() function. This action receives id query parameter that is the default unique ID set by NfoUrl or getdetails actions. This action should set available artwork using ListItem.addAvailableArtwork() and ListItem.setAvailableFanart() instance methods.

Warning: if your .NFO file contains the default unique ID that is different from the ID of the TV show database your scraper works with, you must set the correct default unique ID in NfoUrl action. Otherwise getdetails will receive the default unique ID from your .NFO file and your scraper may not be able to find any artwork using this ID.

3 Path Settings

A scraper receives the current path settings via pathSettings query parameters as a JSON-encoded string with each action call.

4 Example TV Shows Scraper

A very high-level example of a Python TV show scraper can be found in the Kodi source code as metadata.demo.tv addon.