I am writing to discuss the fraction of missing pieces under various scenarios. We have explored two primary approaches:
- With a fixed number of pieces (Static).
- When the number of pieces increases over time.
\textbf{First:}
In the first experiment, we assume a total of 1,000,000 pieces, with 100,000 farmers, where each farmer can store 1000 pieces. Given this setup, the replication factor is 100, resulting in no missing data.
However, when we reduce the number of farmers to 10,000, the replication factor drops to 10. Consequently, we observed that the fraction of missing data becomes 116/1,000,000.
\textbf{Second:}
For the dynamic scenario where \( n \) increases progressively, we follow the given scheme: height grows by n_0 plus a uniformly random value within the range ([min, multiplier * n_0]). In this setup, every farmer joins the system at n = 1000. The objective is to devise a strategy that reduces the likelihood of data loss.
We are assuming n_max = 1,000,000 and min = 100
The process for each farmer is to continuously compute the next height until the maximum is reached.
Multiplier = 4
With a multiplier set to 4, and assuming #farmers = 100,000 with each capable of storing 1000 pieces, the fraction of missing pieces comes to 9876/1,000,000.
The figure below is a histogram of the latest height of farmers.
When the number of farmers is reduced to 10,000, the fraction becomes 87794/1,000,000. This is a logical outcome given the decrease in the replication factor from 100 to 10.
Multiplier = 2
By setting the multiplier to 2, we essentially offer each farmer a broader scope for selecting pieces. This methodology appears to be both more realistic and practical.
For #farmers = 100,000, the fraction of missing pieces is 0.0064%. You can see which pieces weren’t selected by any farmers.
However, with a reduced count of #farmers = 10,000, the fraction of missing pieces rises to 0.0583%.
In conclusion, the results suggest that a multiplier of 2 is preferable. In a subsequent discussion, I will delve into potential modifications to the selection rule to further mitigate data loss.
You can also find the code here: