Pattern Match - #140867 (Allgemeines zu Perl)

Gast TchuTchu

2010-08-23 18:02

Hallo Zusammen,

ich hänge hier an einer Stelle und komme einfach nicht auf die Lösung (evtl. Blind oder ein wenig doof.). Ich würde gerne eine Skalar Variable
($record) mittels eines Pattern Match auf wechselnde Inhalte Überprüfen.
Ich habe die Stelle im Code mit *1 Markiert.

Die Inhalte auf die geprüft werden soll sind bereits ohne Leerzeichen im Array @ids gespeichert. Ich habe schon einiges an Varianten Versucht doch leider ohne Erfolg.

Ich hoffe jemand kann mir Helfen. Falls noch mehr Daten benötigt werden einfach Melden.

Hier der Code Teil:

Code (perl): (dl )

sub get_next_record {

    my($fh) = @_;

    my($offset);
    my($record) = '';
    my($save_input_separator) = $/;

    $/ = "//\n";

    $record = <$fh>;

    $/ = $save_input_separator;

    return $record;
}

sub parse_annotation {

    my($annotation) = @_; 
    my(%results) = (  );

    while( $annotation =~ /^[A-Z].*\n(^\s.*\n)*/gm ) {
        my $value = $&;
        (my $key = $value) =~ s/^([A-Z]+).*/$1/s;
        $results{$key} = $value;
    }

    return %results;
}

sub get_annotation_and_dna {

    my($record) = @_;

    my($annotation) = '';
    my($dna) = '';

    # Now separate the annotation from the sequence data
    ($annotation, $dna) = ($record =~ /^(LOCUS.*ORIGIN\s*\n)(.*)\/\/\n/s);

    # clean the sequence of any whitespace or / characters 
    #  (the / has to be written \/ in the character class, because
    #   / is a metacharacter, so it must be "escaped" with \)
    $dna =~ s/[\s\/\d]//g;

    return($annotation, $dna)
}

sub parse_features {

    my($features) = @_;   # entire FEATURES field in a scalar variable

    # Declare and initialize variables
    my(@features) = ();   # used to store the individual features

    # Extract the features
    while( $features =~ /^ {5}\S.*\n(^ {21}\S.*\n)*/gm ) {

        my $feature = $&;
        push(@features, $feature);

    }

    return @features;
}

# Open library
$fh = open_file($library);
open(TT,">TT.txt");

while ($record = get_next_record($fh)){
                
        #Get the fields from the first GenBank record in a library
        %fields = parse_annotation($annotation);
        
        #Annotation and DNA
        ($annotation, $dna) = get_annotation_and_dna($record);
        
        # Extract the features from the FEATURES table
        @features = parse_features($fields{'FEATURES'});
        
        
        foreach my $id(@ids){
        #Hier liegt das Problem!!! *1
                if($record =~ /$id/){
                
                        print $id."\n";
                        print_sequence($dna, 60);
                        
                        # Print out the features
                        foreach my $feature (@features) {
 
                                # extract the name of the feature (or "feature key")
                                my($featurename) = ($feature =~ /^ {5}(\S+)/);
 
                                print TT "******** $featurename *********\n";
                                print TT $feature;
                                
                                
                        }
                }
        }
}
close TT;

//MODEDIT GwenDragon: doppeltes CODE-Tag entfernt

Hier der Inhalt der $record Variable:

Code: (dl )

LOCUS       AF503441                1598 bp    DNA     linear   PLN 20-JAN-2004
DEFINITION  Nicotiana langsdorffii x Nicotiana sanderae nectarin 5 (Nec5) gene,
            partial cds.
ACCESSION   AF503441 # Nach diesem Teil wird gesucht (Nur Nummer)
VERSION     AF503441.1  GI:30315242
KEYWORDS    .
SOURCE      Nicotiana langsdorffii x Nicotiana sanderae
  ORGANISM  Nicotiana langsdorffii x Nicotiana sanderae
            Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
            Spermatophyta; Magnoliophyta; eudicotyledons; core eudicotyledons;
            asterids; lamiids; Solanales; Solanaceae; Nicotianoideae;
            Nicotianeae; Nicotiana.
...

FEATURES             Location/Qualifiers
     source          1..1598
                     /organism="Nicotiana langsdorffii x Nicotiana sanderae"
                     /mol_type="genomic DNA"
                     /db_xref="taxon:164110"
     gene            663..>1598
                     /gene="Nec5"
     mRNA            663..>1598
                     /gene="Nec5"
     CDS             788..>1598

Was ich eigentlich erreichen möchte ist:

$record checken auf richtige ID => Wenn Korrekt gefunden dann suche Sequenz und Features unter entsprechnder ID in record variable.
=> Wenn Nein neue ID und wieder suchen das für alle records und alle ID.

Es geht darum über die ID festzustellen ob ich im richtigen record bin und die passenden Daten aussschneiden kann.
Last edited: 2010-08-24 08:57:39 +0200 (CEST)