python test.py
Traceback (most recent call last):
File "test.py", line 13, in <module>
test(html)
File "test.py", line 10, in test
html5lib.treebuilders.dom.dom2sax(dom, handler)
File "/home/vkuznets/packages/html5lib-0.95/html5lib-0.95/html5lib/treebuilders/dom.py", line 271, in dom2sax
for child in node.childNodes: dom2sax(child, handler, nsmap)
File "/home/vkuznets/packages/html5lib-0.95/html5lib-0.95/html5lib/treebuilders/dom.py", line 256, in dom2sax
del attributes[(attr.namespaceURI, attr.nodeName)]
KeyError: (None, u'xml:lang')
With previous versions(at least 0.11) there's no any error. I assume this attribute may be invalid in the xml namespace, but anyway I don't think it is ok for parser just to crash. I've seen A LOT of html documents that has such attribute in the real world.
Please advise.
http://code.google.com/p/html5lib/issues/detail?id=200
Reported by vovanec, Mar 6, 2012