- 論壇徽章:
- 0
|
有一個(gè)集群的問(wèn)題,搞了幾天不得要領(lǐng),望高人指點(diǎn)。\r\n\r\n概況:\r\n2個(gè)節(jié)點(diǎn)共享一個(gè)SAN存儲(chǔ),系統(tǒng)是RHEL5.3,用redhat自帶的集群軟件配置NFS服務(wù),沒(méi)有配置GFS.\r\n\r\ncluster.conf 如下:\r\n<?xml version=\"1.0\"?>\r\n<cluster alias=\"NFSCluster\" config_version=\"123\" name=\"NFSCluster\">\r\n <fence_daemon post_fail_delay=\"0\" post_join_delay=\"3\"/>\r\n <clusternodes>\r\n <clusternode name=\"elgar\" nodeid=\"1\" votes=\"1\">\r\n </clusternode>\r\n <clusternode name=\"chopin\" nodeid=\"2\" votes=\"1\">\r\n </clusternode>\r\n </clusternodes>\r\n <cman expected_votes=\"1\" two_node=\"1\"/>\r\n <fencedevices>\r\n </fencedevices>\r\n <rm log_level=\"7\">\r\n <failoverdomains>\r\n <failoverdomain name=\"NFSCDomain\" ordered=\"0\" restricted=\"0\">\r\n <failoverdomainnode name=\"elgar\" priority=\"1\"/>\r\n <failoverdomainnode name=\"chopin\" priority=\"1\"/>\r\n </failoverdomain>\r\n </failoverdomains>\r\n <resources>\r\n <ip address=\"10.217.212.238\" monitor_link=\"1\"/>\r\n <fs device=\"/dev/mapper/nfscserver-proj\" force_fsck=\"0\" force_unmount=\"1\" fsid=\"53751\" fstype=\"ext3\" mountpoint=\"/proj\" name=\"fs-proj\" options=\"usrquota,grpquota\" self_fence=\"1\"/>\r\n <fs device=\"/dev/mapper/nfscserver-export\" force_fsck=\"0\" force_unmount=\"1\" fsid=\"38296\" fstype=\"ext3\" mountpoint=\"/export\" name=\"fs-export\" options=\"usrquota,grpquota\" self_fence=\"1\"/>\r\n <fs device=\"/dev/mapper/nfscserver-alpha\" force_fsck=\"0\" force_unmount=\"1\" fsid=\"25724\" fstype=\"ext3\" mountpoint=\"/alpha\" name=\"fs-alpha\" options=\"usrquota,grpquota\" self_fence=\"1\"/>\r\n <fs device=\"/dev/mapper/nfscserver-sim\" force_fsck=\"0\" force_unmount=\"1\" fsid=\"47898\" fstype=\"ext3\" mountpoint=\"/sim\" name=\"fs-sim\" options=\"usrquota,grpquota\" self_fence=\"1\"/>\r\n <fs device=\"/dev/mapper/nfscserver-cad\" force_fsck=\"0\" force_unmount=\"1\" fsid=\"10950\" fstype=\"ext3\" mountpoint=\"/cad\" name=\"fs-cad\" options=\"usrquota,grpquota\" self_fence=\"1\"/>\r\n <nfsexport name=\"NFS-E2K\"/>\r\n <nfsclient name=\"nfsclt-proj\" options=\"rw\" path=\"/proj\" target=\"*\"/>\r\n <nfsclient name=\"nfsclt-sim\" options=\"rw\" path=\"/sim\" target=\"*\"/>\r\n <nfsclient name=\"nfsclt-cad\" options=\"rw\" path=\"/cad\" target=\"*\"/>\r\n <nfsclient name=\"nfsclt-home1\" options=\"rw\" path=\"/export/home1\" target=\"*\"/>\r\n <nfsclient name=\"nfsclt-alpha\" options=\"rw\" path=\"/alpha\" target=\"*\"/>\r\n </resources>\r\n <service autostart=\"1\" domain=\"NFSCDomain\" name=\"NFCServices\" recovery=\"relocate\" nfslock=\"1\">\r\n <ip ref=\"10.217.212.238\">\r\n <fs ref=\"fs-export\">\r\n <nfsexport ref=\"NFS-E2K\">\r\n <nfsclient ref=\"nfsclt-home1\"/>\r\n </nfsexport>\r\n </fs>\r\n <fs ref=\"fs-alpha\">\r\n <nfsexport ref=\"NFS-E2K\">\r\n <nfsclient ref=\"nfsclt-alpha\"/>\r\n </nfsexport>\r\n </fs>\r\n <fs ref=\"fs-sim\">\r\n <nfsexport ref=\"NFS-E2K\">\r\n <nfsclient ref=\"nfsclt-sim\"/>\r\n </nfsexport>\r\n </fs>\r\n <fs ref=\"fs-cad\">\r\n <nfsexport ref=\"NFS-E2K\">\r\n <nfsclient ref=\"nfsclt-cad\"/>\r\n </nfsexport>\r\n </fs>\r\n <fs ref=\"fs-proj\">\r\n <nfsexport ref=\"NFS-E2K\">\r\n <nfsclient ref=\"nfsclt-proj\"/>\r\n </nfsexport>\r\n </fs>\r\n </ip>\r\n </service>\r\n </rm>\r\n</cluster>\r\n\r\n當(dāng)self_fence設(shè)為0時(shí),不能正常切換。相應(yīng)的log如下 (正在重啟的active節(jié)點(diǎn)):\r\nMar 26 20:25:41 chopin rpc.statd[11846]: unlink (/tmp/statd-apollo.z11831/sm.bak/10.217.212.222): Permission denied\r\nMar 26 20:25:41 chopin rpc.statd[11846]: unlink (/tmp/statd-apollo.z11831/sm.bak/10.217.212.254): Permission denied\r\nMar 26 20:25:41 chopin rpc.statd[11846]: unlink (/tmp/statd-apollo.z11831/sm.bak/10.217.212.207): Permission denied\r\nMar 26 20:25:43 chopin rpc.statd[11846]: Caught signal 15, un-registering and exiting.\r\nMar 26 20:25:43 chopin clurgmgrd: [5845]: <err> \'umount /export\' failed, error=0 \r\n\r\n當(dāng)self_fence設(shè)為1時(shí),也不能正常切換。相應(yīng)的log如下 (正在重啟的active節(jié)點(diǎn)):\r\nMar 26 20:25:41 chopin rpc.statd[11846]: unlink (/tmp/statd-apollo.z11831/sm.bak/10.217.212.254): Permission denied\r\nMar 26 20:25:41 chopin rpc.statd[11846]: unlink (/tmp/statd-apollo.z11831/sm.bak/10.217.212.207): Permission denied\r\nMar 26 20:25:43 chopin rpc.statd[11846]: Caught signal 15, un-registering and exiting.\r\nMar 26 20:25:43 chopin clurgmgrd: [5845]: <err> \'umount /export\' failed, error=0 \r\nMar 26 20:25:43 chopin clurgmgrd: [5845]: <alert> umount failed - REBOOTING\r\n此時(shí),備份節(jié)點(diǎn)的log如下:\r\nMar 29 12:26:47 elgar kernel: dlm: closing connection to node 2\r\nMar 29 12:26:47 elgar fenced[5086]: chopin not a cluster member after 0 sec post_fail_delay\r\nMar 29 12:26:47 elgar openais[5066]: [CLM ] r(0) ip(10.217.212.236) \r\nMar 29 12:26:47 elgar clurgmgrd[6647]: <info> State change: chopin DOWN \r\nMar 29 12:26:47 elgar fenced[5086]: fencing node \"chopin\"\r\nMar 29 12:26:47 elgar openais[5066]: [CLM ] Members Left: \r\nMar 29 12:26:47 elgar fenced[5086]: fence \"chopin\" failed\r\nMar 29 12:26:47 elgar openais[5066]: [CLM ] r(0) ip(10.217.212.237) \r\nMar 29 12:26:47 elgar openais[5066]: [CLM ] Members Joined: \r\nMar 29 12:26:47 elgar openais[5066]: [CLM ] CLM CONFIGURATION CHANGE \r\nMar 29 12:26:47 elgar openais[5066]: [CLM ] New Configuration: \r\nMar 29 12:26:47 elgar openais[5066]: [CLM ] r(0) ip(10.217.212.236) \r\nMar 29 12:26:47 elgar openais[5066]: [CLM ] Members Left: \r\nMar 29 12:26:47 elgar openais[5066]: [CLM ] Members Joined: \r\nMar 29 12:26:47 elgar openais[5066]: [SYNC ] This node is within the primary component and will provide service. \r\nMar 29 12:26:47 elgar openais[5066]: [TOTEM] entering OPERATIONAL state. \r\nMar 29 12:26:47 elgar openais[5066]: [CLM ] got nodejoin message 10.217.212.236 \r\nMar 29 12:26:47 elgar openais[5066]: [CPG ] got joinlist message from node 1 \r\nMar 29 12:26:48 elgar kernel: bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON\r\nMar 29 12:26:52 elgar fenced[5086]: fencing node \"chopin\"\r\nMar 29 12:26:52 elgar fenced[5086]: fence \"chopin\" failed\r\nMar 29 12:26:53 elgar mountd[5610]: export request from 10.217.212.230 failed.\r\nMar 29 12:26:53 elgar last message repeated 3 times\r\nMar 29 12:26:57 elgar fenced[5086]: fencing node \"chopin\"\r\nMar 29 12:26:57 elgar fenced[5086]: fence \"chopin\" failed\r\nMar 29 12:27:02 elgar fenced[5086]: fencing node \"chopin\"\r\nMar 29 12:27:02 elgar fenced[5086]: fence \"chopin\" failed\r\n\r\n不知道能否告知:\r\n1. 為什么有些filesystem不能卸載?有什么辦法去卸載嗎?\r\n2. 為什么備份節(jié)點(diǎn)嘗試著去fencing?這個(gè)情況不論我們有沒(méi)有配置fencing device都會(huì)出現(xiàn)。\r\n\r\n謝謝 |
|