Post a reply

Write your message and submit

Options

Click in the dark area of the image to send your post.

Go back

Topic review (newest first)

shong
2011-09-14 05:11:16

Hi fred

Thanks for sharing your solution!

Best regards
Shong

fmarin156
2011-09-13 11:42:19

Hi,

i tried to parse html with xml parser (JDOM), but it's not very easy, principally because xml is well formed, but not for html.

By example, some tags could be terminated by the good "closed-tag" (<script>...</script>) or by <tag ..... /> (<script ..... />), and a parser xml terminate with an error in this last case !

So, i have searched a "good" html parser, and i choose "jericho" ; it's relatively well informed, and rather easy to use

Another thing : parsing a xml flow with JDOM (and XERCES) is very long (3 minutes for one page), while the same parsing with jericho takes hardly some seconds

fred

Board footer

Powered by FluxBB