Add-on:Parsedom for xbmc plugins: Difference between revisions

From Official Kodi Wiki
Jump to navigation Jump to search
No edit summary
m (Addon-Bot repo category update)
Tag: Manual revert
 
(108 intermediate revisions by 6 users not shown)
Line 1: Line 1:
{{Addon  
{{Addon  
|Name=Parsedom for xbmc plugins.
|Name=Parsedom for xbmc plugins
|provider-name=TheCollective
|provider-name=TheCollective
|ID=script.module.parsedom
|ID=script.module.parsedom
|latest-version=0.9.0
|latest-version=2.5.2
|extension point=xbmc.python.module
|extension point=xbmc.addon.metadata
|provides=
|provides=
|Summary=Parsedom for xbmc plugins.
|Summary=Parsedom for xbmc plugins.
|Description=
|Description=
|Platform=all
|Platform=all
|Repo=
|Language=
|icon url=}}
|License=GPLv3
|Forum=http://forum.xbmc.org/showthread.php?tid=116498
|Website=
|Source=https://github.com/HenrikDK/xbmc-common-plugin-functions
|Email=
|broken=
|icon url=http://mirrors.kodi.tv/addons/leia/script.module.parsedom/icon.png}}
= Testing/Status =
= Testing/Status =
Integration and unittests are run continously by Jenkins
Integration and unittests are run continously by Jenkins
Line 20: Line 26:


And also a few other useful functions.
And also a few other useful functions.
Development and support thread: http://forum.kodi.tv/showthread.php?t=116498


== Setup ==
== Setup ==
Line 26: Line 34:
   <requires>
   <requires>
     <import addon="xbmc.python" version="2.0"/>
     <import addon="xbmc.python" version="2.0"/>
     <import addon="script.module.parsedom" version="0.9.0"/> # Add this
     <import addon="script.module.parsedom" version="0.9.1"/> # Add this
   </requires>
   </requires>


And add the following to your py file.
And add the following to your py file.


  import xbmc, xbmcvfs, xbmcaddon
   import CommonFunctions
   import CommonFunctions
   common = CommonFunctions.CommonFunctions()
   common = CommonFunctions
   common.plugin = plugin
  common.plugin = "Your Plugin name-1.0"
 
If you want fetchPage to support cookies you must include the following in your default.py
 
  import cookielib
  import urllib2
  cookiejar = cookielib.LWPCookieJar()
   cookie_handler = urllib2.HTTPCookieProcessor(cookiejar)
  opener = urllib2.build_opener(cookie_handler)


== Debugging ==
== Debugging ==
Line 42: Line 57:
  common.dbglevel = 3 # Default
  common.dbglevel = 3 # Default


Whenever you debug your own code you should also debug in the cache. Otherwise you should remember to DISABLE it.
Whenever you debug your own code you should also debug in parseDOM.


== parseDOM(self, html, name = "", attrs = {}, ret = False) ==
== parseDOM(self, html, name = "", attrs = {}, ret = False) ==
Line 139: Line 154:
                                             </rsp>'
                                             </rsp>'
   
   
   ids = self.common.parseDOM(result, "contact", ret = "id")
   ids = self.common.parseDOM(xml, "contact", ret = "id")
   titles = self.common.parseDOM(result, "contact", ret = "display_name")
   titles = self.common.parseDOM(xml, "contact", ret = "display_name")
   portraits = self.common.parseDOM(result, "portraits")
   portraits = self.common.parseDOM(xml, "portraits")
   next = "false"
   next = "false"
   
   
Line 163: Line 178:
Again, this function has not been tested with XML in any way.
Again, this function has not been tested with XML in any way.


== _fetchPage(self, dict) ==
== fetchPage(self, dict) ==
WARNING: This call will be changed from "_fetchPage" to "fetchPage" in the future.
 
Fetches a page from the internet.
Fetches a page from the internet.


Line 172: Line 185:
Input dictinary variables:
Input dictinary variables:
* link(string): The URL.
* link(string): The URL.
Input dictionary variables to come:
* cookie(string): The cookies that need to be set.
* cookie(string): The cookies that need to be set.
* refering(string): The refering URL.
* refering(string): The refering URL.
Line 184: Line 195:
* status: http return status
* status: http return status


  result = common._fetchPage({"link": "http://www.example.com/index.html"})
  result = common.fetchPage({"link": "http://www.example.com/index.html"})
  if result["status"] == 200:
  if result["status"] == 200:
     print "content: %s" %result["content"]
     print "content: %s" %result["content"]
   
   
  result = common._fetchPage({"link": "http://www.example.com/doesnotexist.html"})
  result = common.fetchPage({"link": "http://www.example.com/doesnotexist.html"})
  if result["status"] == 500:
  if result["status"] == 500:
     print "redirect url: %s" %result["new_url"]
     print "redirect url: %s" %result["new_url"]
Line 200: Line 211:


   import CommonFunctions
   import CommonFunctions
   common = CommonFunctions.CommonFunctions()
   common = CommonFunctions
   common.plugin = "PluginName"
   common.plugin = "PluginName"
   common.dbg = True
   common.dbg = True
   common.dbglevel = 3
   common.dbglevel = 3
 
   def helloWorld():
   def helloWorld():
     common.log("Ran this")
     common.log("Ran this")
Line 236: Line 247:


== getUserInputNumbers(self, title = "Input", default="", hidden=False) ==
== getUserInputNumbers(self, title = "Input", default="", hidden=False) ==
Warning: This will return string in the future.


This function raises a keyboard numpad for user input
This function raises a keyboard numpad for user input
Line 242: Line 252:
Returns int
Returns int


  pin = common.getUserInput("Userpin", "") # Will ask the user to write a pin.  
  pin = common.getUserInputNumbers("Userpin", "") # Will ask the user to write a pin.  
   
   
  def_pin = common.getUserInput("Userpin", "1234") # Will default to 1234 if the user doesn't enter another pin.
  def_pin = common.getUserInputNumbers("Userpin", "1234") # Will default to 1234 if the user doesn't enter another pin.


== getParameters(self, dict) ==
== getParameters(self, dict) ==
Line 254: Line 264:
  print repr(params) # Prints '{ "path": "/root/favorites", "login": "true" }'
  print repr(params) # Prints '{ "path": "/root/favorites", "login": "true" }'


== replaceHtmlCodes(self, str) ==
== replaceHTMLCodes(self, str) ==
WARNING: This call will be changed from "replaceHtmlCodes" to "replaceHTMLCodes" in the future.
 
Replaces html codes with ascii.
Replaces html codes with ascii.


Line 269: Line 277:
Returns string
Returns string


  clean_string = common.stripTags("I want this text <img src='' alt='without this'>")
  clean_string = common.stripTags("I want this text &lt;img src='' alt='without this'&gt;")
  print clean_string # Prints "I want this text"
  print clean_string # Prints "I want this text"


Line 280: Line 288:
  print clean_string # Prints "test  test"
  print clean_string # Prints "test  test"


[[Category:Repo: Eden-pre]]
[[Category:Add-on libraries/modules]]
[[Category:Pre-Eden add-on repository]]
[[Category:Add-ons missing type]]
[[Category:Add-ons with license tag]]
[[Category:Add-ons with source tag]]
[[Category:All add-ons]]
[[Category:Gotham add-on repository]]
[[Category:Helix add-on repository]]
[[Category:Isengard add-on repository]]
[[Category:Jarvis add-on repository]]
[[Category:Krypton add-on repository]]
[[Category:Leia add-on repository]]

Latest revision as of 04:19, 17 October 2021

Parsedom for xbmc plugins
icon.png

See this add-on on the kodi.tv showcase

Author: TheCollective

Type:
Repo:

License: GPLv3
Source: Source code
Summary: Parsedom for xbmc plugins.
Home icon grey.png   ▶ Add-ons ▶ Parsedom for xbmc plugins
Attention talk.png Need help with this add-on? See here.

Installing

This add-on is installed from the Add-on browser located in Kodi as follows:

  1. Settings
  2. Add-ons
  3. Install from repository
  4. Parsedom for xbmc plugins
  5. Install

Testing/Status

Integration and unittests are run continously by Jenkins

http://tc.tobiasussing.dk/jenkins/view/Common%20Functions/

Developers

This DOM parser is a fast replacement for Beautiful Soup.

And also a few other useful functions.

Development and support thread: http://forum.kodi.tv/showthread.php?t=116498

Setup

To use the parsedom functions edit your addon.xml like this.

 <requires>
   <import addon="xbmc.python" version="2.0"/>
   <import addon="script.module.parsedom" version="0.9.1"/> # Add this
 </requires>

And add the following to your py file.

 import CommonFunctions
 common = CommonFunctions
 common.plugin = "Your Plugin name-1.0"

If you want fetchPage to support cookies you must include the following in your default.py

 import cookielib
 import urllib2
 cookiejar = cookielib.LWPCookieJar()
 cookie_handler = urllib2.HTTPCookieProcessor(cookiejar)
 opener = urllib2.build_opener(cookie_handler)

Debugging

To enable debugging set the following values in default.py

common.dbg = True # Default
common.dbglevel = 3 # Default

Whenever you debug your own code you should also debug in parseDOM.

parseDOM(self, html, name = "", attrs = {}, ret = False)

  • html(string or list) - String to parse, or list of strings to parse.
  • name(string) - Element to match ( for instance "span" )
  • attrs(dictionary) - Dictionary with attributes you want matched in the elment ( for instance { "id": "span3", "class": "oneclass.*anotherclass", "attribute": "a random tag" } )
  • ret(string or False) - Attribute in element to return value of. If not set(or False), returns content of DOM element.

returns list

Getting element content.

 link_html = "<a href='bla.html'>Link Test</a>"
 ret = common.parseDOM(link_html, "a")
 print repr(ret) # Prints ['Link Test']

Getting an element attribute.

 link_html = "<a href='bla.html'>Link Test</a>"
 ret = common.parseDOM(link_html, "a", ret = "href")
 print repr(ret) # Prints ['bla.html']

Get element with matching attribute.

 link_html = "<a href='bla1.html' id='link1'>Link Test1</a><a href='bla2.html' id='link2'>Link Test2</a><a href='bla3.html' id='link3'>Link Test3</a>"
 ret1 = common.parseDOM(link_html, "a", attrs = { "id": "link1" }, ret = "href")
 ret2 = common.parseDOM(link_html, "a", attrs = { "id": "link2" })
 ret3 = common.parseDOM(link_html, "a", attrs = { "id": "link3" }, ret = "id")
 print repr(ret1) # Prints ['bla1.html']
 print repr(ret2) # Prints ['Link Test2']
 print repr(ret3) # Prints ['link3']

When scraping sites it is prudent to scrape in steps, since real websites are often complicated.

Take this example where you want to get all the user uploads.

<div id="content">
 <div id="sidebar">
  <div id="latest">
   <a href="/video?8wxOVn99FTE">Miley Cyrus - When I Look At You</a>>br /<
   <a href="/video?46">Puppet theater</a><br />
   <a href="/video?98">VBLOG #42</a><br />
   <a href="/video?11">Fourth upload</a><br />
  </div>
 </div>
 <div id="user">
  <div id="uploads">
   <a href="/video?12">First upload</a><br />
   <a href="/video?23">Second upload</a><br />
   <a href="/video?34">Third upload</a><br />
   <a href="/video?41">Fourth upload</a><br />
  </div>
 </div>
</div>

The first step is to limit your search to the correct area.

One should always find the inner most DOM element that contains the needed data.

 ret = common.parseDOM(html, "div", attrs = { "id": "uploads" })

The variable ret now contains

  ['<a href="/video?12">First upload</a><br />
  <a href="/video?23">Second upload</a><br />
  <a href="/video?34">Third upload</a><br />
  <a href="/video?41">Fourth upload</a><br />']

And now we get the video url.

 videos = common.parseDOM(ret, "a", ret = "href")
 print repr(videos) # Prints [ "video?12", "video?23", "video?34", "video?41" ]

XML with parseDOM

The parseDOM function supports the XML syntax

We have used parseDOM for some simple XML structures.

There is no guarantee that parseDOM for XML will be better than minidom.

 xml = '<?xml version="1.0" encoding="utf-8"?><rsp generated_in="0.1760" stat="ok" />'
 stat = self.common.parseDOM(xml, "rsp", ret = "stat")
 print stat # Prints ok

Here is a slightly more complex example.

 xml = '<?xml version="1.0" encoding="utf-8"?>
                                           <rsp generated_in="0.0132" stat="ok">
                                             <contacts on_this_page="1" page="1" perpage="25" total="1">
                                               <contact display_name="MileyCyrus" id="458471" is_plus="0" is_pro="0" is_staff="0" mutual="1" profileurl="http://somesite/user458471" realname="MileyCyrus" username="user458471" videosurl="http://somesite/user458471/videos">
                                                 <portraits>
                                                   <portrait height="30" width="30">http://somesite/portraits/defaults/d.30.jpg</portrait>
                                                   <portrait height="75" width="75">http://somesite/portraits/defaults/d.75.jpg</portrait>
                                                   <portrait height="100" width="100">http://somesite/portraits/defaults/d.100.jpg</portrait>
                                                   <portrait height="300" width="300">http://somesite/portraits/defaults/d.300.jpg</portrait>
                                                 </portraits>
                                               </contact>
                                             </contacts>
                                           </rsp>'

 ids = self.common.parseDOM(xml, "contact", ret = "id")
 titles = self.common.parseDOM(xml, "contact", ret = "display_name")
 portraits = self.common.parseDOM(xml, "portraits")
 next = "false"

 result = [];

 for i in range(0, len(ids)):
   group = {}
   group['contact'] = ids[i]
   group['Title'] = titles[i]

   thumbs_width = self.common.parseDOM(portraits, "portrait", ret = "width")    
   thumbs_url = self.common.parseDOM(portraits, "portrait")
   for j in range(0, len(thumbs_width)):
       if (int(thumbs_width[j]) <= 300):
          group['thumbnail'] = thumbs_url[j]

   result.append(group)
 print repr(result) # Prints [{'contact': '458471', 'thumbnail': 'http://somesite/portraits/defaults/d.300.jpg', 'Title': 'MileyCyrus'}

Again, this function has not been tested with XML in any way.

fetchPage(self, dict)

Fetches a page from the internet.

returns dict.

Input dictinary variables:

  • link(string): The URL.
  • cookie(string): The cookies that need to be set.
  • refering(string): The refering URL.
  • post_data(dict): Data to POST to the link.

The dict returned contains

  • content: HTML content
  • new_url: Redirect url
  • header: Header information
  • status: http return status
result = common.fetchPage({"link": "http://www.example.com/index.html"})
if result["status"] == 200:
   print "content: %s" %result["content"]

result = common.fetchPage({"link": "http://www.example.com/doesnotexist.html"})
if result["status"] == 500:
   print "redirect url: %s" %result["new_url"]
   print "header: %s" %result["header"]
   print "content: %s" %result["content"]

log(self, str, level = 0)

Sends the string to the xbmc.log function if the level provided is less than the level set in dbglevel.

Returns None

 import CommonFunctions
 common = CommonFunctions
 common.plugin = "PluginName"
 common.dbg = True
 common.dbglevel = 3

 def helloWorld():
   common.log("Ran this")
   common.log("Ignored this", 4)
   common.log("Ran this as well", 2)

 helloWorld()

Will give the following output in the xbmc.log

[PluginName] helloWorld : 'Ran This'
[PluginName] helloWorld : 'Ran This as well'

openFile(self, filepath, options = "w")

Opens a binary or text file handle

Returns filehandle

file = common.openFile("myfile.txt", "wb")
file.write("my data")
file.close()

getUserInput(self, title = "Input", default="", hidden=False)

This function raises a keyboard for user input

Returns string

search = common.getUserInput("Artist", "") # Will ask the user to write an Artist to search for

def_search = common.getUserInput("Artist", "Miley Cyrus") # Will default to Miley Cyrus if the user doesn't enter another artist.

getUserInputNumbers(self, title = "Input", default="", hidden=False)

This function raises a keyboard numpad for user input

Returns int

pin = common.getUserInputNumbers("Userpin", "") # Will ask the user to write a pin. 

def_pin = common.getUserInputNumbers("Userpin", "1234") # Will default to 1234 if the user doesn't enter another pin.

getParameters(self, dict)

Converts the request url passed on by xbmc to the plugin into a dict of key-value pairs

returns dict

params = common.getParameters(sys.argv[2]) # sys.argv[2] would be something like "?path=/root/favorites&login=true"
print repr(params) # Prints '{ "path": "/root/favorites", "login": "true" }'

replaceHTMLCodes(self, str)

Replaces html codes with ascii.

Returns string

clean_string = common.replaceHtmlCodes("&amp;&quot;&hellip;&gt;&lt;&#39;")
print clean_string # Prints &"...><'

stripTags(self, str)

Removes all DOM elements.

Returns string

clean_string = common.stripTags("I want this text <img src= alt='without this'>")
print clean_string # Prints "I want this text"

makeAscii(self, str)

This function implements a horrible hack related to python 2.4's terrible unicode handling.

Returns string

clean_string = common.makeAscii("test נלה מהי test")
print clean_string # Prints "test   test"