2 Dateien/Tabellen mergen/vergleichen - #155335 (Allgemeines zu Perl)

FIFO

2012-01-08 10:19

User since
2005-06-01
469 Artikel
BenutzerIn

user image

Für sowas gibt's doch Datenbanken :)

In Perl würde ich versuchen, das in eine Datenbank-artige Struktur zu packen, die mir auch später noch andere Operationen erlaubt. Das geht natürlich auch einfacher, aber es ist nicht immer gut, sich darauf zu verlassen, dass z.B. die Struktur der Dateien unveränderlich ist (Reihenfolge der Spalten, etc.).

Gruß FIFO

Code (perl): (dl )

use warnings;
use strict;

use Data::Dump qw(dump);

my %db;             # Datenbank
my $using = 'V1';   # Spalte fuer JOIN

# Daten einlesen, hier aus DATA-Segment

for my $table ('table1', 'table2') {
    my $id_col;
    while (my $line = <DATA>) {
        last if $line !~ /\S/;

        # Spaltennamen incl. Reihenfolge sichern, 
        # Spaltennummer der $using-Spalte ermitteln
        
        if (! exists $db{'col_names'}{$table}) {
            my @cols = split ' ', $line; 
            grep { $id_col = $_ if $cols[$_] eq $using } 0..$#cols;
            die "$table does not contain col '$using'\n" if ! defined $id_col;
            @{$db{'col_names'}{$table}} = @cols;
            next;
        }

        my @cols = split ' ', $line;
        my $id = $cols[$id_col];

        # Daten in Datenbanktabelle einlesen
        $db{$table}{$id} = {
            map { @{$db{'col_names'}{$table}}[$_] => $cols[$_] } 0..$#cols
        };
    }
}

# Spaltennamen fuer Differenz-Tabellen und Join-Tabelle erzeugen,
# Namen, die nur in table2 vorhanden sind, an join-Spaltennamen anhaengen

@{$db{'col_names'}{'not_in_table1'}} = @{$db{'col_names'}{'table2'}};
@{$db{'col_names'}{'not_in_table2'}} = @{$db{'col_names'}{'table1'}};
@{$db{'col_names'}{'table_join'}}    = @{$db{'col_names'}{'table1'}};


for my $col (@{$db{'col_names'}{'table2'}}) {
    if (! grep { $_ eq $col } @{$db{'col_names'}{'table_join'}}) {
        push @{$db{'col_names'}{'table_join'}}, $col;
        }
}

# Daten mergen, Datensaetze aus table2 ueberschreiben die aus table1

for my $id (keys %{$db{'table2'}}) {
    if (exists $db{'table1'}{$id}) {
        %{$db{'table_join'}{$id}} = %{$db{'table2'}{$id}};
        }
    else {
        %{$db{'not_in_table1'}{$id}} = %{$db{'table2'}{$id}};
    }
}

for my $id (keys %{$db{'table1'}}) {
    if (! exists $db{'table2'}{$id}) {
        %{$db{'not_in_table2'}{$id}} = %{$db{'table1'}{$id}};
        %{$db{'table_join'}{$id}} = %{$db{'table1'}{$id}};
    }
}

# Entkommentieren, um Datenstruktur anzusehen:
# dump(%db);

# Ausgabe

no warnings 'uninitialized';

for my $table ('table_join', 'not_in_table1', 'not_in_table2') {
    printf(
        "$table: %d Zeilen\n\n%s\n",
        scalar(keys %{$db{$table}}),
        join("\t", @{$db{'col_names'}{$table}})
    );

    for my $row (sort keys %{$db{$table}}) {
        printf(
            "%s\n",
            join("\t",
                map { $db{$table}{$row}{$_} } @{$db{'col_names'}{$table}}
            )
        );
    }
    print "\n";
}

__DATA__
 V1 V2 V3 V4 V5
ID1 aa nn zz ii
ID2 ss rr oo ll
ID4 qq mm öö pp
ID5 nn vv bb xx

 V1 V2 V3 V4 V5 V6
ID1 aa nn zz ii 45
ID3 ff xx yy tt 45
ID4 qq mm öö pp 45
ID6 uu kk nn ff 45
ID7 qq ww ee rr 45

Ausgabe:

Code: (dl )

table_join: 4 Zeilen

V1	V2	V3	V4	V5	V6
ID1	aa	nn	zz	ii	45
ID2	ss	rr	oo	ll	
ID4	qq	mm	öö	pp	45	
ID5	nn	vv	bb	xx	

not_in_table1: 3 Zeilen

V1	V2	V3	V4	V5	V6
ID3	ff	xx	yy	tt	45
ID6	uu	kk	nn	ff	45
ID7	qq	ww	ee	rr	45

not_in_table2: 2 Zeilen

V1	V2	V3	V4	V5
ID2	ss	rr	oo	ll
ID5	nn	vv	bb	xx

Editiert von FIFO: Rohdaten korrigiert, Typo
Last edited: 2012-01-09 12:57:36 +0100 (CET)

Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it? -- Brian Kernighan: "The Elements of Programming Style"