Thread I/O Operations: Pfade übertragen auf einen HTML-Parser (17 answers)
Opened by lin at 2010-10-03 13:29

lin
 2010-10-03 21:38
#141668 #141668
User since
2010-09-26
35 Artikel
BenutzerIn
[default_avatar]
Hi Perler, guten Abend!

hier bin ich wieder: Ich hab mich mal selber drangemacht.
Aus den obigen Befunden habe ich folgendes gefunden:

my @files = File::Find::Rule->file()
->name('einzelergebnis*.html')


Aus dem gestern noch angewendeten Code-Schnippel hab ich diesen o.g. Teil rausgeschnitten.
Code (perl): (dl )
1
2
3
4
5
6
7
8
9
PHP Code:
#!/usr/bin/perl 
use strict; 
use warnings; 
use diagnostics; 
use File::Find::Rule; 
my @files = File::Find::Rule->file() 
->name('einzelergebnis*.html') 
->in('.'); 


Die zwei kleinen Zeilen habe ich als Pfad-Definitionen auf den folgenden Code übertragen - und dann via Kommandozeile ausführen lassen.


Code (perl): (dl )
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
#!/usr/bin/perl

use strict;
use warnings;
use diagnostics;
use File::Find::Rule;
use HTML::TokeParser;

#my $file = 'school.html'

my@files= File::Find::Rule->file() 
                ->name('einzelergebnis*.html') 
                ->in(*'.'*); 
my $p = HTML::TokeParser->new($file) or die "Can't open: $!";
my %school;
while (my $tag = $p->get_tag('div', '/html')) {
    # first move to the right div that contains the information
    last if $tag->[0] eq '/html';
    next unless exists $tag->[1]{'id'} and $tag->[1]{'id'} eq 'inhalt_large';
    
    $p->get_tag('h1');
    $school{'location'} = $p->get_text('/h1');
    
    while (my $tag = $p->get_tag('div')) {
        last if exists $tag->[1]{'id'} and $tag->[1]{'id'} eq 'fusszeile';
        
        # get the school name from the heading
        next unless exists $tag->[1]{'class'} and $tag->[1]{'class'} eq 'fm_linkeSpalte';
        $p->get_tag('h2');
        $school{'name'} = $p->get_text('/h2');
        
        # verify format for school type
        $tag = $p->get_tag('span');
        unless (exists $tag->[1]{'class'} and $tag->[1]{'class'} eq 'schulart_text') {
            warn "unexpected format: parsing stopped";
            last;
        }
        $school{'type'} = $p->get_text('/span');
        
        # verify format for address
        $tag = $p->get_tag('p');
        unless (exists $tag->[1]{'class'} and $tag->[1]{'class'} eq 'einzel_text') {
            warn "unexpected format: parsing stopped";
            last;
        }
        $school{'address'} = clean_address($p->get_text('/p'));
        
        # find the description
        $tag = $p->get_tag('p');
        $school{'description'} = $p->get_text('/p');
    }
}

print qq/$school{'name'}n/;
print qq/$school{'location'}n/;
print qq/$school{'type'}n/;

foreach (@{$school{'address'}}) {
    print "$_\n";
}

print qq/nDescription: $school{'description'}n/;

sub clean_address {
    my $text = shift;
    my @lines = split "\n", $text;
    foreach (@lines) {
        s/^s+//;
        s/s+$//;
    }
    return @lines;
} 



resultate:

# perl perl_script_four.pl

Quote
suse-linux:/usr/perl # perl perl_script_four.pl
Global symbol "$file" requires explicit package name at perl_script_four.pl line 15.
Execution of perl_script_four.pl aborted due to compilation errors (#1)
(F) You've said "use strict" or "use strict vars", which indicates
that all variables must either be lexically scoped (using "my" or "state"),
declared beforehand using "our", or explicitly qualified to say
which package the global variable is in (using "::").

Uncaught exception from user code:
Global symbol "$file" requires explicit package name at perl_script_four.pl line 15.
Execution of perl_script_four.pl aborted due to compilation errors.
at perl_script_four.pl line 73
suse-linux:/usr/perl #


Also - jetzt muss ich erstmal gucken was da los ist!? Einen Haufen Fehler dürfe da halt drinne sein...




auch wenn ich die Zeile 15 verändere - und so wie folgt schreibe wird es nicht besser:

Code (perl): (dl )
my $p = HTML::TokeParser->new('einzelergebnis*.html') or die "Can't open: $!";


Resultat:

Quote
suse-linux:/usr/perl # perl perl_script_four.pl
Uncaught exception from user code:
Can't open: No such file or directory at perl_script_four.pl line 15.
at perl_script_four.pl line 15


ich werde mal weitersuchen müssen. Wenn jemand einen Tipp hat bin ich sehr dankbar!
Last edited: 2010-10-03 21:55:57 +0200 (CEST)

View full thread I/O Operations: Pfade übertragen auf einen HTML-Parser