Thread XML::LibXML + HTML::TreeBuilder sollen Parsen abbrechen bei defektem HTML (19 answers)
Opened by bikus at 2010-05-04 16:21

Gast bikus
 2010-05-05 14:01
#136828 #136828
Und überhaupt kann ich auch mit XML::LibXML den HTML::Treebuilder als Parser verwenden:

Code (perl): (dl )
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#!/usr/bin/env perl
use warnings; 
use strict;

use HTML::TreeBuilder;
# use XML::LibXML;
use WebSource::Parser;


eval {
        print "Parsing with XML::LibXML\n";
#       my $parser = XML::LibXML->new();
        my $parser = WebSource::Parser->new;

        my $doc = $parser->parse_html_string(<<'EOT');
<html>
<A>
EOT
        print $doc->toString;
};
print $@ if ($@); 



eval{
        print "\n\nParsing with HTML::TreeBuilder\n";
        my $tree = HTML::TreeBuilder->new; # empty tree
        $tree->parse(<<'EOT');
<html>
<A>
EOT
        $tree->dump; 
};
print $@ if (@)

Last edited: 2010-05-05 14:05:58 +0200 (CEST)

View full thread XML::LibXML + HTML::TreeBuilder sollen Parsen abbrechen bei defektem HTML