Ethereum

Master Thesis: Blockchain Reputation Oracle Networks 2

In the previous part of this two-part article series, I introduced all ingredients that are necessary to create a reputation mechanism for distributed oracle networks. We will directly continue on our journey on how to make the data supply for smart contracts a bit more secure.

The main contribution of my Master’s thesis was the identification of possible formulas that we could use to calculate the reputation of an oracle node within a distributed oracle network. By using a Blockchain and saving oracle answers to that irreversible data structure, we get a history about all answers that an oracle node gave in the past (see Figure 1). It is possible to use that history, to calculate a reputation score for a specific oracle node and thus eventually predict the future behaviour and detect malicious nodes.

Figure 1: (Numeric) oracle answers saved in a Blockchain data structure.

The main research questions of my thesis were:

  • What existing reputation mechanisms / formulas could be used for distributed oracle networks
  • What possible reputation dimensions / parameters could be used in that scenario? (Latency, speed,…)
  • What specific attack scenarios exist for a Blockchain based distributed oracle network based on existing attack scenarios for normal P2P reputation mechanisms.

References

Reputation mechanisms have a long history in P2P systems. I did a lot of research and identified three basis mechanisms:

  • Beta Reputation System: Audun Jøsang and Roslan Ismail. The beta reputation system.
  • Bayesian Reputation System: Wang and J. Vassileva. Bayesian network-based trust model.
  • Fuzzy Reputation System: Nathan Griffiths, Kuo Ming Chao, and Muhammad Younas. Fuzzy trust for peer-to-peer systems.

Maybe I will give a short introduction about these in future articles.

Parameters

The first step is to identify possible parameters / reputation dimensions for defining reputation in a distributed oracle network. Some examples will make it clearer what the term reputation dimensions or parameters mean:

  • Time in the system (how long is a node already participating in the system)
  • Last activity time (when was the last answer of a node?)
  • Quality of the provided data (relative to other answers)
  • Latency (relative to other answers)
  • Data size (is the peer only serving small requests?)

The calculation of these parameters is straightforward:

  • Time in the system: Current time – first answer time
  • Last activity time: Current time – last answer time
  • Quality: Relative distance of an answer compared to the other answers. Example
    • Real answer: 20,
    • Worst answer: 10,
    • Answer: 15 -> distance 0.5 in the linear model
  • Latency: Relative latency, starting from the first answer timestamp to the node’s answer timestamp
  • Data size: Fixed reputation step sizes Bytes, KB, MB,…

Attack Scenarios

The general known attack scenarios for reputation systems in P2P networks are:

  • Self-promotion: Giving yourself good ratings
  • Traitor: First act honestly to build a high reputation and the using this to harm the network
  • Whitewashing: Rejoin the network under a different identity to reset the reputation
  • Slandering: Give a bad rating to other participants to harm their reputation
  • DoS: Spam the network
  • Orchestrated: Combination of multiple

Simulation

To test the three proposed formulas, I set up a simulation which consists of generated answers and blocks. The simulation included 100 blocks of the format as shown in Figure 2. The included parameters were already described earlier as well as the tested formulas. I defined different scenarios testing all single reputation dimensions (quality, time in the system, activity,..) and combined them later using some predefined weighting scheme.

Figure 2: Block format of the simulation

Examples

Three examples of the reputation at certain time-steps are the time in the system (Figure 3), the quality (Figure 4) and the combined traitor scenario (Figure 4) (a peer is first providing good quality and then decreasing the quality).

Figure 3: Reputation is continuously rising the longer a peer is int he system
Figure 4: A peer is providing a constant quality of 0.6 (0.4 bad quality)
Figure 5: A traitor first provides good quality (to get a high reputation) and then provides bad quality.

Conclusion

Honestly, my research is just the beginning of a long journey and a very small piece. I simulated three possible formulas to calculate the reputation of an oracle node based on its answer history derived from the Blockchain. So what conclusions can we make from the findings in my thesis?

  • Reduction of the attack scenarios to a subset (because we use a blockchain)
    • Self-promotion only from formula exploitation
    • No collusion in the reputation distribution because the reputation is derived directly from the answer history
    • Whitewasher attack is still possible but related to the formula
    • Traitor attack is still possible
    • 51 % attack for Blockchains to manipulate the answer history is possible
  • Identification of various reputation dimensions
  • Formulas are generally usable with some tweaks, the best result was made with an extended bayes version incorporating partial reputation
  • Combination of parameters is necessary but how to weight?

I know this part was heavy, but if you are really interested, I would recommend to read my thesis. The final presentation is uploaded here:


Download the thesis:

https://1drv.ms/b/s!Anfdi0f-Wv4Hhugy8rf74-I51WuBng

Master Thesis: Blockchain Reputation Oracle Networks 1

Last year was an exciting year. In October 2018 I finally graduated from my Master’s in Computer Science. The topic of my thesis was about Blockchain / Oracle Reputation Systems. When I started to find a topic, I realised the lack of research material and references in the Blockchain space. So, I had to find a topic where I could incorporate previous research papers. 

The general question for me about Blockchain and the real-world applicability of smart contract was about how it is possible to incorporate external data securely. Imagine that you want to implement a smart contract based on some external data or event and somebody manipulates that data. You will possibly trigger a payment that is irreversible. As a short wrap-up the data-feeding mechanism for smart contracts is shown in Figure 1. An external computer called oracle, fetches data from an online resource and feeds this data to a smart contract. The oracle can send data continuously or respond to events that were triggered by the smart contract.

Figure 1: Oracle feeding data to a smart contract

After digging through a lot of whitepapers dealing with oracle networks and possible security architectures, I realized that there is not a single solution, but we have to use small pieces that can make the whole system more secure. The main pieces that I identified are: 

  • Using distributed oracle networks instead of single oracles 
  • Using multiple data sources 
  • Using trusted computing environments / hardware extensions 
  • Using incentivation schemes (for acting honestly) 
  • Using reputation systems that can help both decision-making and incentivation  

To give you a better feeling about my thesis and provide a introduction, the articles are split in two. The first article (this one) will be an introduction about all parts that are necessary to understand my thesis. The second part will then present my methodology and results as well as a conclusion. 

Smart Contracts 

A smart contract is some piece of code that can enforce an agreement that is coded within it. It can be used to trigger automated payments and lives within a Blockchain. That means it is stored on all participating computing nodes and then executed redundantly on every machine. A possible architecture for the implementation of smart contracts is using a virtual machine that executes the code. A very simple example of a smart contract could be an insurance contract involving weather data. The weather report is constantly sent to the smart contract and if the weather is really bad (e.g. there is a thunderstorm), eligible customers get a compensation automatically.

External Data / Oracle Networks 

As already indicated, a real smart contract needs external data (like a weather report, betting results,…). This data could come from online sources – let’s say different weather forecast agencies. As a smart contract cannot fetch data itself yet, the data must be sent proactively by external data providers. These data providers are called oracles. An oracle is an ordinary computing device fetching data and sending it to the smart contract (see again figure 1). 

Oracle Security 

The main security threat for smart contracts is including external data, as this can trigger unwanted payments. For this issue, common projects such as TownCrier, ChainLink or Witnet suggest using hardware-based trusted computing architectures for oracle nodes. These hardware modules can run code in a secure hardware environment. However, you have to trust the hardware vendor to provide a secure architecture. Having Intel’s meltdown in mind, this was not the best solution for me, but maybe a small piece. 

Another component is the use of multiple oracles and multiple data sources. Thinking about this, an oracle network (P2P network) can be formulated itself as a Blockchain where the results of external data is stored within the Blockchain (see Figure 2). As the Blockchain is irreversible, it is always clear, which participating node gave which answer for which request. To get a better intuition, I proposed a block format in my thesis which you can see in Figure 3. The blocks contain the data (answer) for each requested data. For simplicity, I decided to use numerical data, but it wouldn’t be a problem to expand this to text data.

Figure 2: Oracle Network including various data sources
Figure 3: Possible block format for distributed oracle networks

Having a P2P oracle network that fetches data and stores the result into blocks, you might think: Why do we need all of this? The answer lies in identifying malicious peers. The main security threat for a P2P oracle network are malicious peers that either want to exploit incentivation schemes or harm the network by providing wrong data. Using a reputation mechanism, it could be possible to identify malicious peers or generate incentivation schemes that are based on the actual reputation of a peer. Coming back to my thesis, the main research questions was about formulas to measure reputation in a distributed oracle network and includes a simulation for honest and dishonest peers. This part will be explained in the next article.

Conclusion

In the first article of the two part-series, we have seen the concept of feeding data to a smart contract using oracles. As smart contracts should trigger automated payments (actually that is the idea of a programmed contract), this poses substantial security issues, regarding external data sources and oracles themselves. Solutions for this problem could be small pieces like using secure hardware architectures (mainly Intel SGX is proposed), multiple data sources and setting up a P2P oracle network where participants get an incentive for providing data. By using a Blockchain as a medium to store the node’s answers irreversibly, we get a history of a node’s answers and can use that to calculate a reputation for identifying malicious peers.

Blockchained Mobility Hackathon

From July 20th to 22nd, I attended the Blockchained Mobility Hackathon organized by Datarella (http://datarella.com/blockchained-mobility-hackathon/). The list of sponsors like IOTA, BMW, VW and Bosch was quite surprising as they usually do not collaborate a lot. Thank you for the super organization and sponsorship, it was an awesome experience!

Friday 20th

On Friday, the Hackathon started with an interesting panel discussion. As I still wonder about business models for decentralized Blockchain platforms, the panel participants surprised me with their idea of a shared, decentralized mobility platform. The concept is a platform, where mobility providers can offer their services. The real driver for these changing collaborative mindset seems to be the fear of players like Uber that could dictate market rates because of their monopolistic state. Personally, I find it hard to believe that this is going to happen, as we are still lacking open API’s that could have happened 10 yrs ago. From an idealistic viewpoint, it would be fantastic for the customers as it can lower negotiation costs and simplify inter-provider communication.

One funny side note for me was the talk of the Bavarian minister for Digital Agenda, Europe and Media. He stated that digitalization is happening in every governmental department. What was again the huge function set of my digital ID? ( No offence 😉 )

Saturday 21st

On Saturday everything was about pitching ideas that are connected to a mobility ecosystem. The image below shows a scenario, where a person wants to travel from Munich to Berlin. All information like reservations, bookings or travel information should be stored on a distributed ledger. Various use cases in that scenario could be finding and paying a car sharing, travelling by train or going by air taxi. All provider information is linked over the DL and makes an inter-provider communication possible. The overall goal should be to make the trip as easy as possible. With open providers and a shared mobility platform, the customer could have a huge benefit in simplicity.

Image Source: http://datarella.com/mobility-ecosystem/

The project that I joined was an incentivization system for loyal customers. The idea is that mobility providers can offer their customers loyalty tokens (ERC20 token). The number of tokens offered is relative to the usage frequency. If you see a customers usage is slightly decreasing, you can immediately offer more tokens. The tokens should be interchangeable between multiple mobility providers, which is a huge benefit for the customer. The difference in offering him money like direct Ether is to keep the money in the mobility ecosystem, as they are only spendable for mobility services.

Sunday 22nd

After some hacking on Saturday, we could pitch our solution on Sunday. Unfortunately, we could not win with our solution. A key feature of our case was the business model behind it. The platform that we proposed could be financed by selling our loyalty tokens to mobility providers. The mobility providers could benefit from lower advertising and customer acquisition costs as they can incentivize the customer for using their service at a time when the customer acquisition costs are not that high. The less the customer is using your service, the more it will cost you to get him back.

Further Reading

You can find more blog posts (also about other projects) here:

http://datarella.com/european-mobility-players-are-getting-serious-compete-collaborate-at-blockchainedmobility-hackathon/

https://www.wired.de/article/bei-der-blockchain-wollen-volkswagen-bmw-und-co-schneller-als-die-konkurrenz-sein

Quo Vadis Blockchain?

In the past two months, I did a really deep dive into Blockchain technology. When you read about Blockchain in the news, you get the impression that everything is ready to start. You have the choice of an endless list of Blockchain providers, there is a huge number of startups in that space that draws your attention.

Actually, when you try to set up a simple project, you will figure out a lot of problems with that technology at the moment. I will do a short list of problems and thoughts that I faced on my first baby steps with blockchain.

1 Which Blockchain to use?

Let’s start with a really tough one. Which Blockchain technology should I use? There are thousands out there!

The problem here is that every startup will tell you that they did the best Blockchain. So one way to go is to look for the market capitalization and network usage. What makes a Blockchain secure is, that there are a lot of (different) miners involved, that share the mining power. So if you ask yourself that question, you will come to Bitcoin and Ethereum, as they have the biggest networks. Other possibilities are Stellar, Neo, NEM,…

Another question could be, how robust a current Blockchain technology is. Bitcoin has the longest history – Ethereum got attacked multiple times. So my personal suggestion is to choose the Blockchain that got attacked most time because only then, you can be sure that it got already more secure than Blockchains without attacks.

2 Do you really need a Blockchain?

You have to think clearly about what is your final goal. Do you really need a Blockchain for that? Think about that Blockchains are very slow due to their distributed consensus. Think about using a normal Database first. A Blockchain is only needed if you have doubts to trust the other participants. The most senseless Blockchains for me are private ones. If you want to do a business with somebody, of course, there has to be trust. So you can program your Smart Contracts just a usual (without a Blockchain), share the code with your contractor, review that and fine. No real need for a Blockchain. The only real use case in a private scenario could be security, because of decentrality. But what about spreading your infrastructure all over the globe with a lot of replications? More control for you.

3 External data

A very big problem, that you have at the moment is the use of external data in a Smart Contract. As Smart Contracts have to be deterministic for the consensus, you can not make external calls (only to other Smart Contracts) to get external data.

The only thing you can do is to build an oracle (external data service) that feeds your data into the smart contract. Of course, this is a single point of failure. So you have to use multiple oracles with multiple data sources. But what if your data source gets hacked? Do you want to trigger a payment, just because somebody fed wrong data into your contract? This is a very critical issue at the moment and a lot of startups try to work on that. At the moment, I do not know any service that is really secure. The field of external data is itself a really big area of research.

4 Scalability

Scalability is a problem that every Blockchain has at the moment. Ethereum just launches their hybrid Proof of Stake / Proof of Work network. Nobody knows really if this will work in the long term. So if somebody likes to sell you a Blockchain with Proof of Importance, Proof of Stake, Proof of Whatever, don’t rely on it. The only thing working (and tested) at the moment is simple, power wasting Proof of Work.

5 Smart Contract Security

Every few months, you will find out that there got another Smart Contract hacked. So attackers can steal all the money, connected to that Smart Contract. It is not really possible at the moment, to proof the security of a Smart Contract code. So don’t do things with an extreme amount of money. We still have to find out, how to do it in a good and secure way. An audit of your Smart Contract code is absolutely mandatory to achieve a minimum amount of security.

6 Privacy

Remember that everything on the Blockchain is completely public (and it has to be for the consensus algorithm and validation). No private customer data can be involved. So the main idea is to calculate parameters off-chain and just input the result into the Smart Contract. Different technologies are working on that problem, but nothing is really there.

7 Data Storage

Where to store your data? Storing it in a Smart Contract is super expensive! Some technologies where you can store your data (immutable) are IPFS and Ethereum Swarm. These are really in development and not production ready (IPFS is more mature). So you have to store your data in a Cloud (and your Clients have to trust you again). Also, this will yield to a single point of failure.

8 Limited Computations

When you want to do computations in a Smart Contract, you are super limited in the capabilities. First of all, computations are expensive, because they are replicated on all miner’s nodes. Secondly, even if you are willing to pay a lot for your computations, Smart Contract languages like Ethereums Solidity are absolutely limited. Even no real floating point numbers, no higher math functions etc. A solution for this could be off-chain computations as proposed by TrueBit. This is a network itself and absolutely not production ready. What to do? Calculate your stuff on your normal Cloud machines, and feed the data into your smart contract. Indeed, this is again a single point of failure and a single attack point, thus not very secure.

9 Code Immutability

In software development, we are used to fixing bug with releasing new versions. For a Blockchain Smart Contract, this is not possible as the code is immutable. So what you can do, is to use a proxy Smart Contract, that will route your requests to the most recent version of your Contract. The questions you have to ask yourself is, how to shift the state of a Smart Contract to a new one? Customers send their funds to a current version of a Smart Contract, and “sign” to that version. If you shift the funds, how about legal issues? It is indeed not the same contract, after you updated it? Also an open topic.

10 Lacking research

In the Blockchain space, you just have projects with pseudo-scientific whitepapers. Don’t take these as real research papers, because they are not. The problem at the moment is, that we have no really scientific ground for the Blockchain. There are a few base technologies underlying, but scientific publications are rare. Everything is just starting.

Summary

The life in the Blockchain space is hard but also very interesting. Nothing is really ready at the moment, the practical use is just for digital currencies in the case of Bitcoin. Everything else lacks security issues or limited capabilities. Indeed you should be aware of the future development, as the Blockchain technologies can revolutionize the whole way we deploy and manage applications. My guess is that will take 2-5 years more until we can really do something useful.