- 論壇徽章:
- 0
|
嘗試組建一個(gè)集群(系統(tǒng)centos6.5),按照網(wǎng)上的安裝過程http://blog.csdn.net/educast/article/details/7168467,安裝了Torque2.5.13和Maui3.3.1,并且參考了南開大學(xué)張鋆的集群構(gòu)建教程,安了了mpiexec 0.84替代pbs_sched.
在主機(jī)root下將pbs_server,pbs_mom,maui都正常啟動(dòng)了,子節(jié)點(diǎn)上pbs_mon也正常啟動(dòng)了。 pbsnodes可以看到各個(gè)節(jié)點(diǎn)的情況。
但是測(cè)試一個(gè)任務(wù)時(shí)$ qsub submit.pbs, terminal下沒有出錯(cuò)信息,但是result文件是空的。查看了first_task.o0,發(fā)現(xiàn)里面有出錯(cuò)信息
/usr/local/sbin/pbs_iff: error while loading shared libraries: libimf.so: cannot open shared object file: No such file or directory
mpiexec: Error: get_hosts: pbs_connect: Unauthorized Request .
查找libimf.so,發(fā)現(xiàn)在/opt/intel/composer_xe_2013.3.163/compiler/lib/intel64,/opt/intel/composer_xe_2013.3.163/compiler/lib/ia32,/opt/intel/composer_xe_2013.3.163/compiler/lib/mic里面都有,嘗試將他們都在/etc/profile和/etc/bashrc里面添到到LD_LIBRARY_PATH,并且source了一下。但是仍然沒決決問題。
其實(shí)這 個(gè)libimf.so找到到,還發(fā)生在$sudo /etc/init.d/pbs_server start時(shí)。只有在su登錄后,才能正常啟動(dòng)。
[root@magnetics weitong]# /etc/init.d/pbs_server start
/var/spool/torque/server_priv/serverdb
Starting TORQUE Server: [確定]
[weitong@magnetics ~]$ sudo /etc/init.d/pbs_server start
[sudo] password for weitong:
/var/spool/torque/server_priv/serverdb
Starting TORQUE Server: /usr/local/sbin/pbs_server: error while loading shared libraries: libimf.so: cannot open shared object file: No such file or directory
[失敗]
*附***submit.pbs*****
#!/bin/sh
#PBS -l nodes=mag02:ppn=8+magnetics:ppn=8 (使用mag02上8個(gè)CPU和magnetics上8個(gè)CPU)
#PBS -q batch
#PBS -j oe
#PBS -N first_task (任務(wù)名字,隨便取)
cd /home/weitong
/usr/local/mpitorque/bin/mpiexec ./Work/Computing/hellocluster > result |
|