In the previous part of this two-part article series, I introduced all ingredients that are necessary to create a reputation mechanism for distributed oracle networks. We will directly continue on our journey on how to make the data supply for smart contracts a bit more secure.
The main contribution of my Master’s thesis was the identification of possible formulas that we could use to calculate the reputation of an oracle node within a distributed oracle network. By using a Blockchain and saving oracle answers to that irreversible data structure, we get a history about all answers that an oracle node gave in the past (see Figure 1). It is possible to use that history, to calculate a reputation score for a specific oracle node and thus eventually predict the future behaviour and detect malicious nodes.
The main research questions of my thesis were:
What existing reputation mechanisms / formulas could be used for distributed oracle networks
What possible reputation dimensions / parameters could be used in that scenario? (Latency, speed,…)
What specific attack scenarios exist for a Blockchain based distributed oracle network based on existing attack scenarios for normal P2P reputation mechanisms.
Reputation mechanisms have a long history in P2P systems. I did a lot of research and identified three basis mechanisms:
Beta Reputation System: Audun Jøsang and Roslan Ismail. The beta reputation system.
Bayesian Reputation System: Wang and J. Vassileva. Bayesian network-based trust model.
Fuzzy Reputation System: Nathan Griffiths, Kuo Ming Chao, and Muhammad Younas. Fuzzy trust for peer-to-peer systems.
Maybe I will give a short introduction about these in future articles.
The first step is to identify possible parameters / reputation dimensions for defining reputation in a distributed oracle network. Some examples will make it clearer what the term reputation dimensions or parameters mean:
Time in the system (how long is a node already participating in the system)
Last activity time (when was the last answer of a node?)
Quality of the provided data (relative to other answers)
Latency (relative to other answers)
Data size (is the peer only serving small requests?)
The calculation of these parameters is straightforward:
Time in the system: Current time – first answer time
Last activity time: Current time – last answer time
Quality: Relative distance of an answer compared to the other answers. Example
Real answer: 20,
Worst answer: 10,
Answer: 15 -> distance 0.5 in the linear model
Latency: Relative latency, starting from the first answer timestamp to the node’s answer timestamp
Data size: Fixed reputation step sizes Bytes, KB, MB,…
The general known attack scenarios for reputation systems in P2P networks are:
Self-promotion: Giving yourself good ratings
Traitor: First act honestly to build a high reputation and the using this to harm the network
Whitewashing: Rejoin the network under a different identity to reset the reputation
Slandering: Give a bad rating to other participants to harm their reputation
DoS: Spam the network
Orchestrated: Combination of multiple
To test the three proposed formulas, I set up a simulation which consists of generated answers and blocks. The simulation included 100 blocks of the format as shown in Figure 2. The included parameters were already described earlier as well as the tested formulas. I defined different scenarios testing all single reputation dimensions (quality, time in the system, activity,..) and combined them later using some predefined weighting scheme.
Three examples of the reputation at certain time-steps are the time in the system (Figure 3), the quality (Figure 4) and the combined traitor scenario (Figure 4) (a peer is first providing good quality and then decreasing the quality).
Honestly, my research is just the beginning of a long journey and a very small piece. I simulated three possible formulas to calculate the reputation of an oracle node based on its answer history derived from the Blockchain. So what conclusions can we make from the findings in my thesis?
Reduction of the attack scenarios to a subset (because we use a blockchain)
Self-promotion only from formula exploitation
No collusion in the reputation distribution because the reputation is derived directly from the answer history
Whitewasher attack is still possible but related to the formula
Traitor attack is still possible
51 % attack for Blockchains to manipulate the answer history is possible
Identification of various reputation dimensions
Formulas are generally usable with some tweaks, the best result was made with an extended bayes version incorporating partial reputation
Combination of parameters is necessary but how to weight?
I know this part was heavy, but if you are really interested, I would recommend to read my thesis. The final presentation is uploaded here:
Last year was an exciting year. In October 2018 I finally graduated from my Master’s in Computer Science. The topic of my thesis was about Blockchain / Oracle Reputation Systems. When I started to find a topic, I realised the lack of research material and references in the Blockchain space. So, I had to find a topic where I could incorporate previous research papers.
The general question for me about Blockchain and the real-world applicability of smart contract was about how it is possible to incorporate external data securely. Imagine that you want to implement a smart contract based on some external data or event and somebody manipulates that data. You will possibly trigger a payment that is irreversible. As a short wrap-up the data-feeding mechanism for smart contracts is shown in Figure 1. An external computer called oracle, fetches data from an online resource and feeds this data to a smart contract. The oracle can send data continuously or respond to events that were triggered by the smart contract.
After digging through a lot of whitepapers dealing with oracle networks and possible security architectures, I realized that there is not a single solution, but we have to use small pieces that can make the whole system more secure. The main pieces that I identified are:
Using distributed oracle networks instead of single oracles
Using multiple data sources
Using trusted computing environments / hardware extensions
Using incentivation schemes (for acting honestly)
Using reputation systems that can help both decision-making and incentivation
To give you a better feeling about my thesis and provide a introduction, the articles are split in two. The first article (this one) will be an introduction about all parts that are necessary to understand my thesis. The second part will then present my methodology and results as well as a conclusion.
A smart contract is some piece of code that can enforce an agreement that is coded within it. It can be used to trigger automated payments and lives within a Blockchain. That means it is stored on all participating computing nodes and then executed redundantly on every machine. A possible architecture for the implementation of smart contracts is using a virtual machine that executes the code. A very simple example of a smart contract could be an insurance contract involving weather data. The weather report is constantly sent to the smart contract and if the weather is really bad (e.g. there is a thunderstorm), eligible customers get a compensation automatically.
External Data / Oracle Networks
As already indicated, a real smart contract needs external data (like a weather report, betting results,…). This data could come from online sources – let’s say different weather forecast agencies. As a smart contract cannot fetch data itself yet, the data must be sent proactively by external data providers. These data providers are called oracles. An oracle is an ordinary computing device fetching data and sending it to the smart contract (see again figure 1).
The main security threat for smart contracts is including external data, as this can trigger unwanted payments. For this issue, common projects such as TownCrier, ChainLink or Witnet suggest using hardware-based trusted computing architectures for oracle nodes. These hardware modules can run code in a secure hardware environment. However, you have to trust the hardware vendor to provide a secure architecture. Having Intel’s meltdown in mind, this was not the best solution for me, but maybe a small piece.
Another component is the use of multiple oracles and multiple data sources. Thinking about this, an oracle network (P2P network) can be formulated itself as a Blockchain where the results of external data is stored within the Blockchain (see Figure 2). As the Blockchain is irreversible, it is always clear, which participating node gave which answer for which request. To get a better intuition, I proposed a block format in my thesis which you can see in Figure 3. The blocks contain the data (answer) for each requested data. For simplicity, I decided to use numerical data, but it wouldn’t be a problem to expand this to text data.
Having a P2P oracle network that fetches data and stores the result into blocks, you might think: Why do we need all of this? The answer lies in identifying malicious peers. The main security threat for a P2P oracle network are malicious peers that either want to exploit incentivation schemes or harm the network by providing wrong data. Using a reputation mechanism, it could be possible to identify malicious peers or generate incentivation schemes that are based on the actual reputation of a peer. Coming back to my thesis, the main research questions was about formulas to measure reputation in a distributed oracle network and includes a simulation for honest and dishonest peers. This part will be explained in the next article.
In the first article of the two part-series, we have seen the concept of feeding data to a smart contract using oracles. As smart contracts should trigger automated payments (actually that is the idea of a programmed contract), this poses substantial security issues, regarding external data sources and oracles themselves. Solutions for this problem could be small pieces like using secure hardware architectures (mainly Intel SGX is proposed), multiple data sources and setting up a P2P oracle network where participants get an incentive for providing data. By using a Blockchain as a medium to store the node’s answers irreversibly, we get a history of a node’s answers and can use that to calculate a reputation for identifying malicious peers.