Distributed System XI
时间过的太快了真可怕,假期就要结束了我还啥也没做。这一篇Byzatine Fault Tolerance,拜占庭错误,以及检测方式,PBFT,Daniel这一个讲过很多很多遍啊因为每次我都去OH然后每次都有人问于是窝听了很多遍233333。因为有Daniel的板书所以单独占一篇(并不)
Background
Failure Models
- A system is k fault tolerant if it can survive faults in k components and still meet its specifications.
Some faults:
- Communication:
- Message transmission can be unreliable
- Time taken to deliver a message is unbounded
- Adversary can intercept messages
- Processes
- Can fail or team up to produce wrong results
Byzantine Fault Tolerance (Lamport)
Byzantine Agreement problem
System of N processes, where each process i will provide a value vi to each other. Some number of these processes may be incorrect.
Goal: Each process learn the true values sent by each of the correct processes
Assumptions:
- Every message that is sent is delivered correctly
- The receiver knows who sent the message
- Message delivery time is bounded
Each process sends its value to the other processes. Correct processes send the same (correct) value to all. Faulty processes may send different values to each if desired (or no message).
Byzantine General Problem
The Problem: “Several divisions of the Byzantine army are camped outside an enemy city, each division commanded by its own general. After observing the enemy, they must decide upon a common plan of action. Some of the generals may be traitors, trying to prevent the loyal generals from reaching agreement.”
Goal:
- All loyal lieutenants obey the same order.
- If the commanding general is loyal, the every loyal lieutenant obeys the order he sends.
Assumptions:
- Every message that is sent is delivered correctly
- The receiver of a message knows who sent it
- The absence of a message can be detected.
对于传统的RSM(replicated state machine), 出现的错误是node crash/network partitions, 不会出现像拜占庭错误这样随机发送的错误。因此需要2f + 1个拷贝就可以达到f fault tolerant. 但是对于拜占庭错误存在的情况下,则需要3f + 1个拷贝。
Paxos under Byzantine faults: 由于拜占庭错误而无法达成一致
Phase1 | Phase2 |
---|---|
在f个fault的情况下,每个quorum的size必须不大于N - f, 这里的quorum应该是指达成一致的server最小数量。而为了正确性,两个quorum的重合部分必须不小于f + 1,因此得到N >= 3f + 1.
Quorum >= N - f | Servers >= 3f + 1 |
---|---|
Agreement in Fault Systems
Circumstances under which distributed agreement can be reached. Most distributed systems assume:
- processes behave asynchronously
- messages are unicast
- communication delays are unbounded (see red blocks)
Example:
- Phase 1: Generals announce their troop strengths to each other
- Phase 2: Each process uses the messages to create a vector of responses
- Phase 3: Each process sends its vector to all other processes.
- Phase 4: Each process the information received from every other process to do its computation.
Phase1 | Phase1 | Phase1 |
---|---|---|
Phase2 | Phase3,4 |
---|---|
Signed messages
Additional assumptions:
- A loyal general’s signature cannot be forged and any alteration of the contents of the signed message can be detected.
- Anyone can verify the authenticity of a general’s signature.
Algorithm SM(m): solve Byzantine General’s problem if at most m traitors
For each i:
- If lieutenant i receives a message of the form v:0 from the commander and he has not received any order, then he lets Vi equal {v} and he sends v:0:i to every other lieutenant.
- If lieutenant i receives a message of the form v:0:j1:…: jk and v is not in the set Vi then he adds v to Vi and if k < m, he sends the message v:0:j1:…: other than j1,…, jk jk:i to every other lieutenant
- For each i: When lieutenant i will receive no more messages, he obeys the order in choice(Vi).
利用签名,在f fault tolerance要求下,N >= 2f + 1. 如果lieutenant发生拜占庭错误签名会发生改变,如果general发生拜占庭错误lieutenants收到的签名会改变。
Practical Byzantine Fault Tolerance
Replicate service across many nodes:
- Assumption: only a small fraction of nodes are Byzantine
- Rely on a super computation majority of votes to decide on correct
- Makes some weak synchrony (message delay) assumptions to ensure liveness
PBFT property: tolerates <=f failures using a RSM with 3f+1 replicas
Main ideas:
- Static configuration (same 3f+1 nodes)
- Deal with malicious primary: 3 phase protocol to agree on sequence number
- Deal with loss of agreement: a bigger quorum (2f+1 out of 3f+1 nodes)
- Need to authenticate communications
Strategy:
- Primary runs the protocol in the normal case
- Replicas watch the primary and do a view change if it fails
Normal case:
- Primary sends pre-prepare message to all
- Pre-prepare contains <v#, seq#, op>
- Replicas check the pre-prepare and if it is ok
- All to all communication
- Replicas wait for 2f+1 matching prepares
- Record operation in log as prepared
- Send commit message to all
- Commit contains <i,v#, seq# ,op>
- Achieve:
- All honest nodes that are prepared prepare the same value
- At least f+1 honest nodes have sent prepare/pre prepare
PBFT的一般情况,每个replica要等待至少2f + 1个prepares,在f个错误的情况下才能达成一致。达成一致的replica才能发送commit。因此client等待至少f + 1个replies能得到一致的结果。
Limitations:
- Expensive
- Protection is achieved only when <= f nodes fail
- Does not prevent many types of attacks
Daniel板书
参考资料: