- 論壇徽章:
- 0
|
我剛開始學(xué)這個(gè),map和reduce在linux下用管道模擬沒有問題。
map不需要輸入的文件,實(shí)在里面直接打開的,這樣可以么?
幫忙看看我的錯(cuò)誤是map錯(cuò)了還是執(zhí)行的時(shí)候把沒有把需要的文件拷到hdfs的原因。
#!/usr/bin/env python
# usage: "train" or "submit"
import sys
id2indx = {}
tot_num = 0
indx_list = []
#print 'build index for items'
iprof = open('item.txt')
for line in iprof:
iid = int(line.split()[0])
if not iid in id2indx:
id2indx[iid] = tot_num
indx_list.append((iid, tot_num))
print '%s\t%d' % (id2indx[iid],iid)
tot_num += 1
#for (k, v) in self.indx_list:
#print '------- %d -> %d\n' % (k, v)
iprof.close()
#print 'build index for users'
uprof = open('user_profile.txt')
for line in uprof:
uid = int(line.split()[0])
if not uid in id2indx:
id2indx[uid] = tot_num
indx_list.append((uid, tot_num))
tot_num += 1
print '%s\t%d' % (id2indx[uid],uid)
uprof.close()
每次提交都失敗因?yàn)椴恍枰斎牒洼敵龅奈募,我隨便建了個(gè)空文件hello。
hadoop jar \$HADOOP_HOME/contrib/streaming/hadoop-0.20.2-streaming.jar -mapper ./python/map.py -reducer ./python/reduce.py |
|