June 2018

Understanding Blockchain: Peer Discovery and Establishing a Connection with Python

I am always curious about how things really work. For a Blockchain, there are a lot of different modules and mechanisms involved which can be further investigated. An often asked question is, how the connection in a network like Bitcoin is established. I will walk with you through the documentation and also create a Python example on how to first find peers in the Bitcoin network and then connect to one peer.

Node Discovery

The Bitcoin documentation is pretty nice about this topic. Everything is described here: https://en.bitcoin.it/wiki/Satoshi_Client_Node_Discovery.

When you run the Bitcoin client for the first time, you have no address database saved on your local disc. Thus there has to be a mechanism how you can connect to the network for the first time. The documentation contains a list, of all steps that can be done, to know about other peers. We will use just a bunch of list items to get a feeling how it works.

  1. Nodes discover their own external address by various methods.
  2. Nodes make DNS request to receive IP addresses.
  3. Nodes can use addresses hard-coded into the software.
  4. Nodes exchange addresses with other nodes.
  5. Nodes store addresses in a database and read that database on startup.

1. Nodes discover their own external address by various methods

As Bitcoin is a P2P network when you run your client, you have in- and outgoing connections. To allow other peers to connect, you have to provide your external IP address to them. This is nothing else than navigating to a webpage like https://whatismyipaddress.com and reading your IP. As stated in the documentation, your client will try to connect to 91.198.22.70 (checkip.dyndns.org) on port 80 (https://en.bitcoin.it/wiki/Satoshi_Client_Node_Discovery#Local_Client.27s_External_Address). Try it yourself in Python:

# Import requests and regex library
import requests
import re
 
def get_external_ip():
    # Make a request to checkip.dyndns.org as proposed
    # in https://en.bitcoin.it/wiki/Satoshi_Client_Node_Discovery#DNS_Addresses
    response = requests.get('http://checkip.dyndns.org').text
 
    # Filter the response with a regex for an IPv4 address
    ip = re.search("(?:[0-9]{1,3}\.){3}[0-9]{1,3}", response).group()
    return ip
 
external_ip = get_external_ip()
print(external_ip)

2. Nodes make DNS request to receive IP addresses.

In step 1. we got our external IP address. This is necessary so that we can exchange our external address with other clients. At the moment, nobody knows about us and we have no database yet, that contains peer addresses we could connect to. We can get such a list of peers when we first start the client by making a DNS request to receive a bunch of addresses. The client is compiled (hard-coded) with the following list of DNS addresses (view https://en.bitcoin.it/wiki/Satoshi_Client_Node_Discovery#DNS_Addresses):

  • seed.bitcoin.sipa.be
  • dnsseed.bluematt.me
  • dnsseed.bitcoin.dashjr.org
  • seed.bitcoinstats.com
  • seed.bitcoin.jonasschnelli.ch
  • seed.btc.petertodd.org

If we call a DNS, we can get multiple peer addresses form it. Let’s go to https://mxtoolbox.com/DNSLookup.aspx and type in

  • seed.bitcoin.sipa.be

Can you see all the A records? It is a list of peers! So if we save a few of them, we can establish connections to that nodes.

You can do this as well programmatically in Python (Short note: this is not defensive programming, just for educational purposes):

# Import socket and time library
import socket
import time
 
def get_node_addresses():
    # The list of seeds as hardcoded in a Bitcoin client
    # view https://en.bitcoin.it/wiki/Satoshi_Client_Node_Discovery#DNS_Addresses
    dns_seeds = [
        ("seed.bitcoin.sipa.be", 8333),
        ("dnsseed.bluematt.me", 8333),
        ("dnsseed.bitcoin.dashjr.org", 8333),
        ("seed.bitcoinstats.com", 8333),
        ("seed.bitnodes.io", 8333),
        ("bitseed.xf2.org", 8333),
    ]
 
    # The list where we store our found peers
    found_peers = []
    try:
        # Loop our seed list
        for (ip_address, port) in dns_seeds:
            index = 0
            # Connect to a dns address and get the A records
            for info in socket.getaddrinfo(ip_address, port,
                                           socket.AF_INET, socket.SOCK_STREAM,
                                           socket.IPPROTO_TCP):
                # The IP address and port is at index [4][0]
                # for example: ('13.250.46.106', 8333)
                found_peers.append((info[4][0], info[4][1]))
    except Exception:
        return found_peers
 
peers = get_node_addresses()
print(peers)

3. Nodes can use addresses hard-coded into the software.

If no DNS server is available, the last method used is using some hard-coded peer addresses.

4. Nodes exchange addresses with other nodes.

When another node connects to you, or you connect to another node, you exchange IP’s. These IP’s are stored in a Database on your machine, together with a timestamp. The addresses a node has in its database, are relayed to other connected peers. This is how your local database grows.

To see how an address relay message looks like, you can refer to:

https://en.bitcoin.it/wiki/Protocol_documentation#getaddr

https://en.bitcoin.it/wiki/Satoshi_Client_Node_Discovery#Handling_Message_.22getaddr.22

5. Nodes store addresses in a database and read that database on startup.

As all the nodes you discovered from DNS and the relay messages of other peers are stored in a database, you can use that database on the next startup.

Establishing a Connection

In the previous steps, we have investigated how we get a list of node addresses, when we start our client for the first time. So no matter if we get the node’s addresses from our internal database (when we already started the client once) or we got the result from calling the DNS we want to establish a connection to a peer to exchange information an participate in the network.

Let’s establish a connection to the first responding peer:

# Connect to the first responding peer from our dns list
def connect(peer_index):
    try:
        print("Trying to connect to ", peers[peer_index])
        # Try to establish the connection
        err = sock.connect(peers[peer_index])
        return peer_index
    except Exception:
        # Somehow the peer did not respond, test the next index
        # Sidenote: Recursive call to test the next peer
        # You would it not do like this in a real world, but it is for educational purposes only
        return connect(peer_index+1)
 
peer_index = connect(0)

When we connect to another peer, we have to send a version message immediately. The format of this version message is here described https://bitcoin.org/en/developer-reference#version. It contains information, like our IP address, the client version we use, etc.

As all messages have to be converted to the binary representation, we can use the struct functions in Python (https://docs.python.org/3/library/struct.html).
The trick is here to look up the format under https://bitcoin.org/en/developer-reference#version and search the corresponding format option under https://docs.python.org/3/library/struct.html#format-characters. Let’s make an example:

The protocol on the Bitcoin website states, that we first have to provide the version:

Bytes Name Data Type Required/Optional Description
4 version int32_t Required The highest protocol version understood by the transmitting node. See the protocol version section.

The version is a 4 bytes in32_t. So what we do now, is look that up in the Python documentation. From the table, we can see the following:

i int integer 4 (3)

This means we have to call struct.pack(“i”, 70015), to get the corresponding Binary value. We proceed like this through the whole protocol (view code example).

def create_version_message():
    # Encode all values to the right binary representation on https://bitcoin.org/en/developer-reference#version
    # And https://docs.python.org/3/library/struct.html#format-characters
 
    # The current protocol version, look it up under https://bitcoin.org/en/developer-reference#protocol-versions
    version = struct.pack("i", 70015)
 
    # Services that we support, can be either full-node (1) or not full-node (0)
    services = struct.pack("Q", 0)
 
    # The current timestamp
    timestamp = struct.pack("q", int(time.time()))
 
    # Services that receiver supports
    add_recv_services = struct.pack("Q", 0)
 
    # The receiver's IP, we got it from the DNS example above
    add_recv_ip = struct.pack(">16s", bytes(peers[peer_index][0], 'utf-8'))
 
    # The receiver's port (Bitcoin default is 8333)
    add_recv_port = struct.pack(">H", 8333)
 
    # Should be identical to services, was added later by the protocol
    add_trans_services = struct.pack("Q", 0)
    # Our ip or 127.0.0.1
    add_trans_ip = struct.pack(">16s", bytes("127.0.0.1", 'utf-8'))
    # Our port
    add_trans_port = struct.pack(">H", 8333)
 
    # A nonce to detect connections to ourself
    # If we receive the same nonce that we sent, we want to connect to oursel
    nonce = struct.pack("Q", random.getrandbits(64))
    # Can be a user agent like Satoshi:0.15.1, we leave it empty
    user_agent_bytes = struct.pack("B", 0)
    # The block starting height, you can find the latest on http://blockchain.info/
    starting_height = struct.pack("i", 525453)
    # We do not relay data and thus want to prevent to get tx messages
    relay = struct.pack("?", False)
 
    # Let's combine everything to our payload
    payload = version + services + timestamp + add_recv_services + add_recv_ip + add_recv_port + \
              add_trans_services + add_trans_ip + add_trans_port + nonce + user_agent_bytes + starting_height + relay
 
    # To meet the protocol specifications, we also have to create a header
    # The general header format is described here https://en.bitcoin.it/wiki/Protocol_documentation#Message_structure
 
    # The magic bytes, indicate the initiating network (Mainnet or Testned)
    # The known values can be found here https://en.bitcoin.it/wiki/Protocol_documentation#Common_structures
    magic = bytes.fromhex("F9BEB4D9")
 
    # The command we want to send e.g. version message
    # This must be null padded to reach 12 bytes in total (version = 7 Bytes + 5 zero bytes)
    command = b"version" + 5 * b"\00"
    # The payload length
    length = struct.pack("I", len(payload))
    # The checksum, combuted as described in https://en.bitcoin.it/wiki/Protocol_documentation#Message_structure
    checksum = hashlib.sha256(hashlib.sha256(payload).digest()).digest()[:4]
 
    # Build up the message
    return magic + command + length + checksum + payload
 
# Send out our version message
sock.send(create_version_message())

Wow this was a lot! But we have our message ready and sent it out 🙂

So how do we actually know that it worked out? Well we can receive a message from the other peer and encode it again.

def encode_received_message(recv_message):
    # Encode the magic number
    recv_magic = recv_message[:4].hex()
    # Encode the command (should be version)
    recv_command = recv_message[4:16]
 
    # Encode the payload length
    recv_length = struct.unpack("I", recv_message[16:20])
 
    # Encode the checksum
    recv_checksum = recv_message[20:24]
 
    # Encode the payload (the rest)
    recv_payload = recv_message[24:]
 
    # Encode the version of the other peer
    recv_version = struct.unpack("i", recv_payload[:4])
    return (recv_magic, recv_command, recv_length, recv_checksum, recv_payload, recv_version)
 
 
time.sleep(1)
 
# Receive the message
encoded_values = encode_received_message(sock.recv(8192))
print("Version: ", encoded_values[-1])

That’s it! We have first discovered the peers in our network and then established a manual connection. Digging into this was really helpful for my personal understanding of the Bitcoin protocol. View the full code here: https://gist.github.com/sappelt/9e60af207219bfb6c6d07c6dab38bcaa

This Python bitcoind client is really helpful: https://github.com/ricmoo/pycoind

View also this video (Python 2):