- 論壇徽章:
- 0
|
本帖最后由 小風0000 于 2016-08-16 05:07 編輯
- dat='''
- SNPID A702Y A704Y A706Y A708Y A710Y
- ARS-BFGL-BAC-10172 CC CC CC CC CC
- ARS-BFGL-BAC-1020 CC CC CT CC CC
- '''
- names=["SNPID","A702Y","A710Y"]
復制代碼 由于數(shù)據(jù)比較大,有4萬行,7000列,要提出800列的數(shù)據(jù),大家有什么好的辦法嗎?
- script,originalFN,targetFN = sys.argv
- originalInds = open(originalFN).readline().strip().split()
- targetInds = [line.strip() for line in open(targetFN)]
- targetF=open("targetInds.txt","w")
- #find index
- idx = [ originalInds.index(ind) for ind in targetInds if ind in originalInds ]
- idx.insert(0,0)
- #output
- for num,line in enumerate(open(originalFN)):
- print num
- tmp = [line.strip().split()[i] for i in idx]
- targetF.write(" ".join(tmp)+"\n")
- targetF.close()
復制代碼 這是我寫的代碼,先取出列名的下標,再在大文件一行行弄出來,有點慢,求助!
|
|