test.zip
(5.03 KB, 下載次數(shù): 7)
2013-03-15 11:11 上傳
點(diǎn)擊文件名下載附件
原始文件
文件1.txt
Entrez_Gene_Id Tumor_Sample_Barcode
182 TCGA-02-0043-01A-01W
220 TCGA-02-0089-01A-01W
220 TCGA-02-0028-01A-01W
286 TCGA-02-0083-01A-01W
286 TCGA-02-0028-01A-01W
287 TCGA-02-0015-01A-01W-0318-08
287 TCGA-08-0354-01A-01W-0318-08
287 TCGA-02-0064-01A-01W-0206-08
301 TCGA-02-0028-01A-01W
310 TCGA-02-0083-01A-01W-0206-08
324 TCGA-02-0015-01A-01W-0318-08
472 TCGA-02-0010-01A-01W-0189-08
472 TCGA-02-0114-01A-01W-0206-08
473 TCGA-02-0114-01A-01W
477 TCGA-02-0083-01A-01W
529 TCGA-02-0083-01A-01W-0206-08
——————————————————————————————————
文件2.txt
Term Genes
hsa05200 athways in cancer 1436, 6469, 5579, 6772, 1956, 5578, 2033, 862, 3082, 1029, 2335, 867, 7472, 3320, 2956, 7048, 4233, 5159, 5601, 2064, 5728, 675, 2735, 5156, 1441, 3815, 324, 2322, 4193, 5925, 2737, 5727, 7157, 2260, 5290, 4089, 5294, 5295, 999
hsa04510:Focal adhesion 5579, 1956, 5578, 3082, 2335, 1289, 4233, 5159, 5601, 57144, 2064, 1281, 5728, 5156, 3371, 2321, 85366, 9564, 3791, 5170, 3690, 1301, 3611, 1793, 5290, 1277, 2909, 5294, 5295, 7057
hsa05218:Melanoma 5728, 5156, 1956, 3082, 1029, 4193, 5925, 7157, 2260, 5290, 5294, 5295, 4233, 5159, 999
hsa05213:Endometrial cancer 5290, 2064, 5728, 1956, 2309, 5170, 324, 3611, 5294, 5295, 7157, 999
hsa05214:Glioma 5728, 5579, 5156, 1956, 5578, 1029, 4193, 5925, 7157, 5290, 5294, 5295, 5159
hsa05215 rostate cancer 5728, 2064, 5156, 1956, 2033, 5170, 4193, 5925, 7157, 2260, 5290, 3320, 5294, 5295, 5159
hsa05223:Non-small cell lung cancer 5290, 2064, 5579, 5578, 1956, 2309, 1029, 5170, 5925, 5294, 5295, 7157
hsa05212 ancreatic cancer 5601, 2064, 675, 6772, 1956, 1029, 5925, 7157, 5290, 4089, 7048, 5294, 5295
hsa05210:Colorectal cancer 5601, 5156, 1956, 324, 7157, 5290, 4089, 2956, 7048, 5294, 5295, 4233, 5159
hsa04070 hosphatidylinositol signaling system 5290, 5728, 5579, 5578, 3710, 79837, 5287, 8527, 5288, 5294, 5286, 5295
————————————————————————————————————————————————————————————————————
目的:
Term Genes
hsa05200 athways in cancer TCGA-02-0084-01A-01W TCGA-08-0390-01A-01W TCGA-06-0185-01A-01W-0254-08 TCGA-06-0185-01A-01W TCGA-02-0010-01A-01W-0189-08 TCGA-02-0083-01A-01W-0206-08 TCGA-02-0028-01A-01W TCGA-02-0083-01A-01W-0206-08 TCGA-08-0390-01A-01W TCGA-02-0114-01A-01W TCGA-02-0114-01A-01W-0206-08 TCGA-02-0083-01A-01W-0206-08 TCGA-02-0099-01A-01W TCGA-02-0083-01A-01W TCGA-02-0043-01A-01W TCGA-02-0014-01A-01W-0189-08 TCGA-02-0010-01A-01W TCGA-02-0083-01A-01W TCGA-02-0014-01A-01W-0189-08 TCGA-02-0113-01A-01W TCGA-06-0213-01A-01W-0254-08 TCGA-02-0014-01A-01W-0189-08 TCGA-06-0133-01A-02W-0224-08 TCGA-08-0347-01A-01W-0318-08 TCGA-02-0014-01A-01W-0189-08 TCGA-02-0083-01A-01W-0206-08 TCGA-02-0015-01A-01W-0318-08 TCGA-06-0125-01A-01W TCGA-02-0085-01A-01W-0206-08 TCGA-06-0206-01A-01W-0254-08 TCGA-02-0083-01A-01W TCGA-06-0125-01A-01W TCGA-06-0188-01A-01W TCGA-08-0245-01A-01W-0318-08 TCGA-06-0201-01A-01W TCGA-02-0028-01A-01W-0189-08 TCGA-02-0114-01A-01W TCGA-02-0048-01A-01W TCGA-06-0210-01A-01W
hsa05218:Melanoma TCGA-06-0213-01A-01W-0254-08 TCGA-08-0347-01A-01W-0318-08 TCGA-02-0010-01A-01W-0189-08 TCGA-08-0390-01A-01W TCGA-02-0114-01A-01W TCGA-02-0085-01A-01W-0206-08 TCGA-06-0206-01A-01W-0254-08 TCGA-06-0188-01A-01W TCGA-08-0245-01A-01W-0318-08 TCGA-06-0201-01A-01W TCGA-02-0114-01A-01W TCGA-02-0048-01A-01W TCGA-02-0010-01A-01W TCGA-02-0083-01A-01W TCGA-06-0210-01A-01W
我寫的代碼如下- #!usr/bin/perl
- use strict;
- use warnings;
- open (IN1, $ARGV[0]) or die $!;
- open (IN2, $ARGV[1]) or die $!;
- open (OUT, ">$ARGV[2]") or die $!;
- my %h;
- while(<IN1>){
- chomp;
- my @m=split(/\t/,$_);
- $h{$m[0]}=$m[1];
- }
- while(<IN2>)
- {
- chomp;
- my @a=split(/\t/,$_);
- my @m=split(", ",$a[1]);
- for (0..$#m){
- $m[$_]=$h{$m[$_]} if exists($h{$m[$_]});
- }
- print OUT "$a[0]\t@m\n";
- }
復(fù)制代碼 ————————————————————————————————
錯(cuò)誤:
1. 文件1.txt中,第一列是有重復(fù)的,結(jié)果中只有對(duì)應(yīng)的一個(gè)。
例如:286 TCGA-02-0083-01A-01W
286 TCGA-02-0028-01A-01W
而結(jié)果只是對(duì)應(yīng)了第一個(gè),第二個(gè)沒(méi)有展示。
2. 結(jié)果中有重復(fù)的
例如:hsa05200 athways in cancer中出現(xiàn)了兩次:
TCGA-02-0014-01A-01W-0189-08
TCGA-02-0014-01A-01W-0189-08
請(qǐng)教,
這個(gè)perl應(yīng)該怎么寫?
|