Vector Clock | Notion

https://riak.com/posts/technical/why-vector-clocks-are-hard/

这里有一个很好的关于Vector Clock的现实中的例子.

https://www.youtube.com/watch?v=Me607bDYS50

Untitled

我们通过这种方式给数据加version. 通过vector clock, 我们可以判断两个version是否有父子关系, 还是有conflict

Versioning + Vector Clock

Versioning means treating each data modification as a new immutable version of data

A vector clock is a [server, version] pair associated with a data item. It can be used to check if one version precedes, succeeds, or in conflict with others.

Assume a vector clock is represented by D([S1, v1], [S2, v2], …, [Sn, vn]), where D is a data item, v1 is a version counter, and s1 is a server number, etc. If data item D is written to server Si, the system must perform one of the following tasks. • Increment vi if [Si, vi] exists. • Otherwise, create a new entry [Si, 1].

Even though vector clocks can detect conflicts, there are two notable downsides. First, vector clocks add complexity to the client because it needs to implement conflict resolution logic. Second, the [server: version] pairs in the vector clock could grow rapidly. To fix this problem, we set a threshold for the length, and if it exceeds the limit, the oldest pairs are removed. This can lead to inefficiencies in reconciliation because the descendant relationship cannot be determined accurately. However, based on Dynamo paper [4], Amazon has not yet encountered this problem in production; therefore, it is probably an acceptable solution for most companies.

https://www.educative.io/courses/grokking-modern-system-design-interview-for-engineers-managers/versioning-data-and-achieving-configurability

For example, if there are network partitions or multiple server failures, write requests may be processed by nodes not in the top n nodes in the preference list. As a result we can have a long version like this: ([A,10],[B,4],[C,1],[D,2],[E,1],[F,3],[G,5],[H,7],[I,2],[J,2],[K,1],[L,1]). It’s a hassle to store and maintain such a long version history.

We can limit the size of the vector clock in these situations. We employ a clock truncation strategy to store a timestamp with each (node, counter) pair to show when the data item was last updated by the node. Vector clock pairs are purged when the number of (node, counter) pairs exceeds a predetermined threshold (let’s say 10). Because the descendant linkages can’t be precisely calculated, this truncation approach can lead to a lack of efficiency in reconciliation.