>EM.1.1 EM.1 LK029911.1 +
CTGTCAATCCCGCACATTAAAGCATCTCTCTCCCCTTGCCTATTTTCTCGCCCCTTCCccatcctccttctctctcgctctctcctaGCATAGCTCCAAGCGCGAGGAACA
>EM.1.2 EM.1 LK029911.1 +
TCCTTTTAAGTACTGCAAACATACCTGGCGGCTTAAGCTTGCAGTATAATGTCAAGTGCTTCCCTACCACTTATTGCTTATAATTGTGATCACCCGTCAAATTGGTCGTTCTTGGAATATAGGCTCTTACTTTTAGAGCATGGTTATTTAAACAGGGGGACGAAAATCCCTCTTTGCTCAATGTAAGTCAGCTAGTTGTACTTACGTTTTGAACTCCACGTCCACACTGGAACAGCATTCGCTATGGACTGTGTCACACGACAAGCAGGTATGTTCATTT
>EM.1.3 EM.1 LK029911.1 +
CTGTCAATCCCGCACATTAAAGCATCTCTCTCCCCTTGCCTATTTTCTCGCCCCTTCCccatcctccttctctctcgctc
很多上面的這些個(gè)序列,要求是 EM.1相同的情況下,EM.1.1、EM.1.2、EM.1.3下面對(duì)的序列哪個(gè)長(zhǎng)就把那個(gè)提取出來(lái),像這個(gè)結(jié)果就要是
> EM.1
TCCTTTTAAGTACTGCAAACATACCTGGCGGCTTAAGCTTGCAGTATAATGTCAAGTGCTTCCCTACCACTTATTGCTTATAATTGTGATCACCCGTCAAATTGGTCGTTCTTGGAATATAGGCTCTTACTTTTAGAGCATGGTTATTTAAACAGGGGGACGAAAATCCCTCTTTGCTCAATGTAAGTCAGCTAGTTGTACTTACGTTTTGAACTCCACGTCCACACTGGAACAGCATTCGCTATGGACTGTGTCACACGACAAGCAGGTATGTTCATTT
下面是我寫的程序,寫的有問(wèn)題,我自己看不出,因?yàn)榈贸龅慕Y(jié)果有25個(gè)是重復(fù)的,就是提出來(lái)有25個(gè)是提取了兩次
#!perl
use strict;
use warnings;
use Data::Dumper;
die "usage $0 input output" unless @ARGV==2;
open IN,$ARGV[0];
my $zn;
my $n;
my %seq;
while(<IN>){
if(s/^>//){
my @its=split;
$n=$its[0];
$zn=$its[1];
}else{
chomp;
$seq{$zn}{$n}.=$_;
}
}
close IN;
#print Dumper(%seq),"\n";
open OUT,">","$ARGV[1]";
my %seq1;
my %seq2;
my $seq;
foreach my $zw (keys %seq){
# my $seq=$seq{$zw};
my @length;
foreach my $w (keys %{$seq{$zw}}){
$seq=$seq{$zw}{$w};
# $seq="\U$seq";
my $length=length($seq);
push @length,$length;
$seq1{$zw}{$w}{$length}{$seq}=1;
}
# print Dumper(%seq1);
my $longest=$length[-1];
foreach(@length){
$longest=$_ if $_ > $longest;
}
foreach my $w (keys %{$seq1{$zw}}){
foreach (keys %{$seq1{$zw}{$w}}){
if(exists $seq1{$zw}{$w}{$longest}){
my $seqs=Dumper(keys %{$seq1{$zw}{$w}{$longest}});
my @seqs=keys %{$seq1{$zw}{$w}{$longest}};
# print "@seqs\n";
print OUT ">$zw\n@seqs\n";
}
}
}
} 作者: sunzhiguolu 時(shí)間: 2015-09-28 15:27 本帖最后由 sunzhiguolu 于 2015-09-28 17:32 編輯