- 論壇徽章:
- 0
|
Cluster Failover
Cluster membership is determined simply by which nodes are connected to the rest of the cluster; there is no configuration
setting explicitly defining the list of all possible cluster nodes. Therefore, every time a node joins the cluster,
the total size of the cluster is increased and when a node leaves (gracefully) the size is decreased.
集群成員關(guān)系簡(jiǎn)單的通過有哪些結(jié)點(diǎn)連接到集群來確定,有哪些可能的集群結(jié)點(diǎn)沒有特別的定義,然而,每次有結(jié)點(diǎn)加入集群,集群規(guī)模就加1,反之,當(dāng)結(jié)點(diǎn)離開(優(yōu)雅情況),規(guī)模減1
The size of the cluster is used to determine the required votes to achieve quorum. A quorum vote is done when a
node or nodes are suspected to no longer be part of the cluster (they do not respond). This no response timeout isthe evs.suspect_timeout setting in the wsrep_provider_options (default 5 sec),
結(jié)點(diǎn)規(guī)模用來確定仲裁的次數(shù)。每當(dāng)結(jié)點(diǎn)被懷疑離開集群時(shí)(不響應(yīng))就會(huì)進(jìn)行仲裁。這個(gè)不響應(yīng)時(shí)間是 wsrep_provider_options 當(dāng)中的 evs.suspect_timeout 來設(shè)(默認(rèn)為 5 秒)
and when a node goesdown ungracefully, write operations will be blocked on the cluster for slightly longer than that timeout.
而如果結(jié)點(diǎn)非優(yōu)雅的離開, 寫操作會(huì)被鎖住,時(shí)間略長(zhǎng)于那個(gè)超時(shí)設(shè)置。
Once the node (or nodes) is determined to be disconnected, then the remaining nodes cast a quorum vote and if a
majority remain from the total nodes connected from before the disconnect,then that partition remains up.
當(dāng)結(jié)點(diǎn)被檢測(cè)到確實(shí)已斷線,由剩下的結(jié)點(diǎn)發(fā)起一個(gè)仲裁,并且,如果斷線發(fā)生前的一個(gè)中選結(jié)點(diǎn)仍存活, 則沿用之前的劃分
In the case of a network partition, some nodes will be alive and active on each side of the network disconnect.
?????
In this case, only the quorum will continue, the partition(s) without quorum will go to the non-Primary state.
?????
Because of this, it’s not possible to safely have automatic failover in a 2 node cluster,
因?yàn)橐粋(gè)雙結(jié)點(diǎn)的集群不可能正常備援
because the failure of one node will cause the remaining node to go non-Primary.
因?yàn)橐粋(gè)結(jié)點(diǎn)的失效會(huì)造成剩余的結(jié)點(diǎn)成為?????(非主?)
Further, cluster with an even number of nodes (say two nodes in two different switches)
進(jìn)一步,結(jié)點(diǎn)數(shù)為偶數(shù)的集群(稱為雙結(jié)點(diǎn)在不同的開關(guān)中)
have some possibility of a split brain condition when if network connectivity is lost between the two partitions,
如果兩個(gè)劃分失去連通性,就會(huì)有腦裂的生成條件
neither would retain quorum, and so both would go to Non-Primary.
哪個(gè)都不能維持仲裁,于全部變?yōu)?????
Therefore: for automatic failover,
the “rule of 3s” is recommended. It applies at various levels of infrastructure, depending on how far cluster is spread
out to avoid single points of failure. For example:
• A cluster on a single switch should have 3 nodes
• A cluster spanning switches should be spread evenly across at least 3 switches
• A cluster spanning networks should be span at least 3 networks
• A cluster spanning data centers should span at least 3 data centers
This is all to prevent split brain situations from preventing automatic failover from working.
---------------------------
文中有些地方翻譯不下去了, 整體上感覺沒有把握文義,求指正和指點(diǎn) |
|