Blockchain is definitely one of the biggest buzzwords in the IT world lately. Uses beyond cryptocurrencies continue to fascinate the business world. But how can you be sure of Hyperledger Fabric security?
Stories of hacked crypto stock exchanges or their users is a never-ending story. This makes me suspect that many don’t know what they’re really doing, and don’t fully grasp the security implications. This article is to help people interested in blockchain technology and who want to understand its security aspects and take appropriate security measures.
Specifically, this article covers important security aspects of the Linux Foundation’s Hyperledger Fabric, which is the main blockchain platform, as well as Apache Kafka, and Apache ZooKeeper. Hyperledger requires these auxiliary tools for messaging and service discovery.
Developers often make the mistake of trying to implement Hyperledger Fabric security themselves. They focus on its functional security while leaving basics such as network or firewalling open to potential attacks. There is, however, a reason. Configuring a blockchain stack is far more complicated than configuring MySQL server, to name one example.
Typical Hyperledger/ Kafka/ ZooKeeper installation uses over 10 different ports on different types of hosts:
- 7050 for Orderer (central service)
- 2181, 2888, 3888 for all ZooKeeper instances
- 9092 for all Kafka instances
- 9000 for Kafka Manager (central service)
- 7054 for all Fabric Certificate Authority (CA) instances
- 7051, 7053, 8053 for Peers
- 2377, 7946, 4789 for Docker swarm
Most of these components are for internal use, so you don’t need to expose them outside your Hyperledger server node. Peers, on the other hand, link together over a P2P network, therefore, they need to see each other. Of course, you can change default port numbers. And clients connect directly to Peers, Certificate Authorities and Orderer.
Since each component can deploy inside NAT or even inside an internal container, mostly Docker or LXC, each component can also use custom external ports to connect to discover and connect to each other. For example, the Chinese Alibaba Cloud implements this by using ports 32050, 32060, and 32070 for Orderer service.
Even when exposing all services on default ports, there are a lot of network connections between components. There are also firewall rules to write and manage. This is precisely why many people surrender proper network security of their Hyperledger installations and pass all traffic between all “trusted” hosts. Obviously, this approach is only as secure as these hosts are trustworthy.
How to do it properly
There are several approaches. However, all of them are based on generating lists of IP ranges dynamically. You can do this either by simple shell script iterations and string concatenations or by compiling any form of iptables/ebtables profiles. Honestly, any other method specific to a chosen firewall solution that allows merging IP rules with port rules will work. Passing traffic through several chains, for instance.
A suggested approach is to make a three-layer solution where:
- the first layer is based on AWS VPC or Direct Connect assuming that the whole solution runs on AWS. It just narrows access to defined port numbers
- the second layer is a host-based firewall iptables, possibly wrapped by your chosen management solution. This restricts access to given IP ranges, constructed dynamically
- the third layer is also host-based, imposing granular rules per IP range and service
Why divide the same set of rules into two separate levels? This method has two advantages:
- splitting rules management into separate streams: trusted hosts management and Hyperledger services management
- additional protection in case of human mistakes. Setting too wide an access at one level doesn’t open your Hyperledger installation to attacks
If you use Docker, a good idea is to integrate the second level rules with Docker networking rules. Do this by replacing default PREROUTING rules by similar rules restricted to given IP ranges. More on that in my GitHub repository.
And, of course, services used internally within the same host should be the only place that exposes it. Think ZooKeeper nodes. Listening only on 127.0.0.1 is also a good idea. If you use Docker, use the Docker local network.
Process/application isolation and resource accounting
Hyperledger application stack is quite complicated:
- Hyperledger Fabric’s main component is in Go
- Apache Kafka is in Scala (runs on JVM)
- Apache ZooKeeper is in Java
- Apache CouchDB (optional Peer state database) is in Erlang
Each component has a different configuration style, low-level software dependencies such as a specific version of the Go stack, and details related to resource accounting. Therefore running everything on a single host is not a good idea. I recommend two approaches:
- a static approach, based on LXC containers (LXC is the same paravirtualization technology that Docker uses. However, LXC containers are persistent and act more like virtual machines, except without resource reservation, preferably using Proxmox
- a dynamic approach, based on Kubernetes, but I don’t recommend bare Docker for such complicated stack, except for the development phase
What’s the difference between these approaches? The first one is better for running small-size, but long-running setups for internal purposes like integrating blockchain-based applications — debugging Hyperledger on LXC in much easier than on Docker. And of course, it’s much easier to implement the proper firewalling scheme, as I mentioned above.
What LXC lacks is easy scalability and that’s why you should run production Hyperledger using Kubernetes. Both Proxmox and Kubernetes have built-in resource accounting:
At the time of writing this article, Amazon just announced Go language support in AWS Elastic Beanstalk. In the near future, this may be a nice alternative to LXC for setting up small setups for staging/integration/pre-production purposes. I will be watching this development.
You can also set up the whole Hyperledger stack using Docker. This is arguably a preferred solution for developing Hyperledger-based applications. There are two basic recommendations for such a deployment:
- run processes within containers as a non-root user (this part requires preparing such configuration in Dockerfile), and start containers themselves as non-root — this is to prevent exploitation in case you discover new bugs similar to CVE-2016-9962 or CVE-2019-5736
- control who has access to “docker” system group (and can manage containers on the host)
Of course, similar rules apply to all Hyperledger environments, no matter which platform you choose: control system-level access (eg. to prevent unauthorized copying of Hyperledger internal files, and perform at least some basic security hardening.
Log collection and analysis
Hyperledger Fabric provides several options to fine-tune logging so you can easily configure it to coexist with your chosen log collection and analysis solution. Think Splunk, ELK or EFK, or even logcheck/logwatch.
But which log analysis solution to choose? The one, that you or your team are most familiar with. If none, then ELK (Elasticsearch + Logstash + Kibana) should be the safe choice — it’s both popular and has pretty decent functionality.
What you should bear in mind at this stage, is the proper log file handling, especially if you run Hyperledger stack on bare Docker. A common mistake here is redirecting logs to/dev/null, which causes these logs to be lost. Instead, dump logs to files and import to ELK using Filebeat or parse them directly with tools such as logcheck.
Proper data backup
There are two different approaches to build a proper data backup/restore solution for Hyperledger Fabric:
- just backup all contents of /var/hyperledger/production directory — there is lots of data there in a real production installation, so this approach might seem superfluous. However, it’s not. Data storage is relatively cheap compared to the many hours of work from someone who knows Hyperledger Fabric security fundamentals
- handle Peer transient storage separately from ledgers, chains, or private data — this can raise the backup efficiency and lower the costs. Be aware, though that it requires deep (and current!) knowledge of Hyperledger Fabric internals to implement it properly. – or else, it could introduce a risk that such a backup would be incomplete and unusable, leaving us effectively without a backup
If you already have several people with Hyperledger Fabric security knowledge, then maintaining such a complicated (but more efficient!) backup should not be a problem. However if not, then my recommendation is to put more money into data storage to avoid the risk of losing data.
If you have enough experienced people who understand Hyperledger Fabric security architecture and its implications regarding data backup, restore and encryption, you can implement data encryption with internal the Hyperledger Fabric encryption library — so only peers with decryption keys can use internal files.
File encryption increases overall security, though it degrades backup compression and performance — and can break GDPR compliance. Fully automatic data recovery is more complicated. There is no mechanism for automatic reapplying of data where users have invoked their right to be forgotten. Data encryption on this level makes it impossible to implement one — which is still ok for automated backups but prevents fully automated restore processes.
All the above security aspects are “just” low-level technical aspects. They exist more or less for any hosted IT environment. Now, let’s discuss some more functional aspects of Hyperledger Fabric itself.
Hyperledger Fabric Security
Connecting TLS encryption and proper certificate handling are the most important aspects of functional security. Blockchain data is secure by design, however, the functional part of this security relies on proper configuration of Hyperledger certificate authority (CA) with proper key management.
Having the CA part configured, it’s time to set up Attribute-Based Access Control, which allows smart contracts to operate on specific client attributes. This, along with enabling TLS client authentication on peer nodes sets the overall trust level in the whole network reasonably high.
Of course, apart from the network level, there’s still the host level, on which malicious actors can steal data — at least unless it’s under encryption. So you should also consider the best method to encrypt it. You can use either Hyperledger Fabric native encryption or on the filesystem level such as LUKS or on a cloud provider level such as the AWS Key Management Service. How to do it properly? It depends on your whole architecture, and, in particular, which layers you want to have fully automated, and which should require manual intervention in case of failure.
Kafka and ZooKeeper functional security
Securing Hyperledger itself doesn’t make much sense when underlying components (connection encryption, authentication and authorization) are not secure. Also, don’t forget to properly secure access to the Kafka Manager panel — exposing it directly to the Internet is obviously a bad idea. Instead, you should put it behind some proxy, such as Nginx or HAproxy, that will also handle SSL termination.
Looking from a technical point of view, ZooKeeper is a simple pair of two TCP servers with some queues and distributed key-value store and a quorum algorithm, which is the heart of service discovery for all services relying on ZooKeeper. More on that here. This part is however more complicated since ZooKeeper functionality is much more than just a message broker.
As in Kafka, ZooKeeper needs a proper configuration of connection encryption (SSL – including keystore configuration on JVM level), authentication (SASL) and authorization (ACL).
ZooKeeper is harder to configure and maintain properly, since stability problems of particular instance chosen as Quorum Leader, lead to another election. The election process is fully automatic, however until the new Leader is chosen, ZooKeeper suspends its service discovery functionality.
Of course, a single failure such as a manual service start is absolutely not a problem. However, if many instances have random problems, or if your upgrade procedure instructs restarting all upgraded services at the same time, which is very common with most Puppet manifests for ZooKeeper, found on the Internet, then it can affect the stability of your whole network of services, not only the ZooKeeper quorum itself.
As you can see, ensuring Hyperledger Fabric security is not an easy task. In fact, it’s one of the hardest tasks in my almost four-year career at Espeo Software and over 20 years in IT overall.
There are lots of complex software running on different application stacks which use lots of data. All components are using service discovery instead of static configuration. Any infrastructure as code tool could easily manage it. Instead of this, you have a living, breathing, fragile service network. You also need a trained team that really understands the impact of their actions.
Having such people onboard is something that distinguishes companies that think seriously about their blockchain business from ones that only want to sell the buzzword and run.