Distributed System XI

时间过的太快了真可怕，假期就要结束了我还啥也没做。这一篇Byzatine Fault Tolerance，拜占庭错误，以及检测方式，PBFT，Daniel这一个讲过很多很多遍啊因为每次我都去OH然后每次都有人问于是窝听了很多遍233333。~~因为有Daniel的板书所以单独占一篇(并不)~~

Background

Failure Models

A system is k fault tolerant if it can survive faults in k components and still meet its specifications.

Some faults:

Communication:
- Message transmission can be unreliable
- Time taken to deliver a message is unbounded
- Adversary can intercept messages
Processes
- Can fail or team up to produce wrong results

Byzantine Fault Tolerance (Lamport)

Byzantine Agreement problem

System of N processes, where each process i will provide a value vi to each other. Some number of these processes may be incorrect.

Goal: Each process learn the true values sent by each of the correct processes

Assumptions:

Every message that is sent is delivered correctly
The receiver knows who sent the message
Message delivery time is bounded

Each process sends its value to the other processes. Correct processes send the same (correct) value to all. Faulty processes may send different values to each if desired (or no message).

Byzantine General Problem

The Problem: “Several divisions of the Byzantine army are camped outside an enemy city, each division commanded by its own general. After observing the enemy, they must decide upon a common plan of action. Some of the generals may be traitors, trying to prevent the loyal generals from reaching agreement.”

Goal:

All loyal lieutenants obey the same order.
If the commanding general is loyal, the every loyal lieutenant obeys the order he sends.

Assumptions:

Every message that is sent is delivered correctly
The receiver of a message knows who sent it
The absence of a message can be detected.

对于传统的RSM(replicated state machine), 出现的错误是node crash/network partitions, 不会出现像拜占庭错误这样随机发送的错误。因此需要2f + 1个拷贝就可以达到f fault tolerant. 但是对于拜占庭错误存在的情况下，则需要3f + 1个拷贝。

Paxos under Byzantine faults: 由于拜占庭错误而无法达成一致

Phase1	Phase2

在f个fault的情况下，每个quorum的size必须不大于N - f, 这里的quorum应该是指达成一致的server最小数量。而为了正确性，两个quorum的重合部分必须不小于f + 1，因此得到N >= 3f + 1.

Quorum >= N - f	Servers >= 3f + 1

Agreement in Fault Systems

Circumstances under which distributed agreement can be reached. Most distributed systems assume:

processes behave asynchronously
messages are unicast
communication delays are unbounded (see red blocks)

Example:

Phase 1: Generals announce their troop strengths to each other
Phase 2: Each process uses the messages to create a vector of responses
Phase 3: Each process sends its vector to all other processes.
Phase 4: Each process the information received from every other process to do its computation.

Phase1	Phase1	Phase1

Phase2	Phase3,4

Signed messages

Additional assumptions:

A loyal general’s signature cannot be forged and any alteration of the contents of the signed message can be detected.
Anyone can verify the authenticity of a general’s signature.

Algorithm SM(m): solve Byzantine General’s problem if at most m traitors

For each i:

If lieutenant i receives a message of the form v:0 from the commander and he has not received any order, then he lets Vi equal {v} and he sends v:0:i to every other lieutenant.
If lieutenant i receives a message of the form v:0:j1:…: jk and v is not in the set Vi then he adds v to Vi and if k < m, he sends the message v:0:j1:…: other than j1,…, jk jk:i to every other lieutenant
For each i: When lieutenant i will receive no more messages, he obeys the order in choice(Vi).

利用签名，在f fault tolerance要求下，N >= 2f + 1. 如果lieutenant发生拜占庭错误签名会发生改变，如果general发生拜占庭错误lieutenants收到的签名会改变。

Practical Byzantine Fault Tolerance

Replicate service across many nodes:

Assumption: only a small fraction of nodes are Byzantine
Rely on a super computation majority of votes to decide on correct
Makes some weak synchrony (message delay) assumptions to ensure liveness

PBFT property: tolerates <=f failures using a RSM with 3f+1 replicas

Main ideas:

Static configuration (same 3f+1 nodes)
Deal with malicious primary: 3 phase protocol to agree on sequence number
Deal with loss of agreement: a bigger quorum (2f+1 out of 3f+1 nodes)
Need to authenticate communications

Strategy:

Primary runs the protocol in the normal case
Replicas watch the primary and do a view change if it fails

Normal case:

Primary sends pre-prepare message to all
Pre-prepare contains <v#, seq#, op>
Replicas check the pre-prepare and if it is ok
All to all communication
Replicas wait for 2f+1 matching prepares
- Record operation in log as prepared
- Send commit message to all
- Commit contains <i,v#, seq# ,op>
Achieve:
- All honest nodes that are prepared prepare the same value
- At least f+1 honest nodes have sent prepare/pre prepare

PBFT的一般情况，每个replica要等待至少2f + 1个prepares，在f个错误的情况下才能达成一致。达成一致的replica才能发送commit。因此client等待至少f + 1个replies能得到一致的结果。

Limitations:

Expensive
Protection is achieved only when <= f nodes fail
Does not prevent many types of attacks

Daniel板书

参考资料：

PBFT paper

MIT online lecture of BFT

FEATURED TAGS

Java Hashset Linkedhashset BFS Array DFS Hashmap Design ICC DP Stack Python Database BinarySearch ArrayList SegmentTree BIT TopologicalSort Bit BST TreeSet TreeMap Trie String LinkedList KMP D&C Greedy Math Backtracking Tree UnionFind TwoPointers C/C++ GC Web Sort Graph DS DL PP ML Sampling Minimax Heap

Distributed System XI

15640 DS11 - Byzantine Fault Tolerance