Cloud (or Cloud Computing) was one of the largest buzzwords of 2009. Nine years have passed since then, but still the vast majority of people identify “the cloud” with Amazon and its AWS (Amazon Web Services) service. Despite that, there are several other cloud computing vendors which are worth looking at, especially if you’re building a blockchain solution designed to support a large group. These are primarily the Google Cloud Platform, Microsoft Azure, Oracle Cloud, Rackspace and Hetzner Cloud. Below, I’ll discuss their advantages and show how we deployed our blockchain infrastructure.
The domination of AWS
Amazon’s domination in the minds of people is somewhat justified because Amazon was the precursor and main promoter of the concept of cloud computing. Amazon’s services are the most known, have the highest reliability, the best documentation. In short, they’re the role model for the competition.
But there are also a number of other vendors. Some examples are the Chinese Alibaba Cloud or the Polish e24cloud. These are more or less successful AWS clones and have even more or less similar APIs. Most often, technologically they don’t bring anything new, but they operate in regions poorly handled by competitors (e.g. Alibaba Cloud in China).
Location, location, location
Let’s begin with datacenter locations. As I’ll show later, this might be an issue for blockchain infrastructure. With the increase in the physical distance between the client and the datacenter, network delays increase. In transaction systems, this may determine the order of transaction processing from individual clients, and consequently, profitability or other economic parameters.
AWS covers most of the world but doesn’t have datacenters in Africa, China, and Russia. The datacenters in India, Brazil, and Australia don’t offer a full range of services. So if we want to start a service strongly dependent on the quality of connections (e.g. blockchain or high-frequency trading), then it may be reasonable to use several different cloud vendors at the same time.
For example, one of the main advantages of Microsoft Azure is having over 50 datacenters in various regions of the world. These include the central states of the USA, Eastern Canada, Switzerland, Norway, China, India, Australia, South Korea, South Africa or the United Arab Emirates – in these regions AWS offers relatively large network delays.
More pros and cons of other cloud computing vendors
Google Cloud Platform
In addition to services based on open source software (Linux, Docker, MySQL, Postgres, MongoDB, HBase, etc.), also provides its own services. These are, for example, BigTable and Realtime Database. They allow more efficient operation of large amounts of data than if you’re using only open source technology, as well as more efficient load balancing than AWS services. The price for this, however, is vendor lock-in, i.e. the impossibility of departing from this particular vendor.
In addition to a number of locations, is also the best place to run all kinds of solutions based on Windows. This can be important if in our blockchain stack we use ready-made .NET libraries that don’t have their own implementations for Linux.
It’s a relatively new service of Hetzner Online, so far specializing in web hosting and low-cost dedicated servers. The Cloud offer brought a significant improvement in quality in relation to the current offer while maintaining very low prices. It still can’t compete with AWS in terms of stability, but it seems to be a matter of time. Its unique advantage is a datacenter in Finland.
Let’s take a look at the solutions I’ve (we’ve!) used in Espeo for multi-cloud infrastructure management as well as the blockchain platform itself for blockchain infrastructure.
First approach — manual management
Our first approach was, of course, manual management. By this I mean logging into different cloud consoles from several different browsers. This approach worked quite well until we were in control of about 5-6 AWS accounts and one account for each other cloud vendors. With such a small number of accounts, it was still possible to manage them so efficiently “on foot.” It seemed that the investments in the implementation of appropriate tools would take way too long to start paying off, especially that we didn’t know what technologies to stick to and which ones to avoid.
Second approach — tools. Open source?
The second approach was to analyze the available tools, but we wanted them to be open source tools. We were interested, among others, in the Terraform tool (from the creators of Vagrant). Very quickly, however, we got the impression that almost all existing open source tools are written as if it were a completely different business model than the one in which Espeo works. So, either to manage your own infrastructure (for one company or one group of companies) or in the best case for managing large projects in the Infrastructure as Code model. The latter means describing the infrastructure elements in the form of a language specially created for this purpose.
Infrastructure as Code is, of course, a very sensible approach, but it has a disadvantage. It doesn’t work well for very small projects, which are often at the MVP stage and operate on a single server. In such cases, the Infrastructure as Code approach is to shoot a fly with a cannon. The effect will, of course, be achieved, however, most customers will immediately ask: why should they pay so much for it?
Third approach — Polynimbus
Ultimately, we decided to use the Polynimbus tool. It supports 8 different cloud vendors and is a relatively simple (compared to Terraform) resource pool, which perfectly suited our needs. Polynimbus supports an unlimited number of AWS accounts and requires minimum configuration for each of them. It basically covers only issuing the access key, secret access key, and the default region. All the rest, including e.g. fast changing AMI ID numbers of system images, are detected automatically.
Let’s take a look at our entire blockchain infrastructure:
As you can see, Polynimbus is one of the elements of a perfectly integrated stack. It covers the management of the full lifecycle of the instance, regardless of whether they are instances of AWS (EC2), Azure, Oracle or others. Creating an instance looks like this:
- Polynimbus – proper creation of a new instance.
- ZoneManager – adding a DNS record to Amazon Route53, binding the destination hostname to the IP address returned by Polynimbus.
- Server Farmer – provisioning of the instance; at this stage various aspects of server security are configured. Central logging of events, backups, automatic updates, and then the instance is plugged into the farm (ie the central management system).
- Ansible – application provisioning, starting with Docker and support tools. Then the Go stack is built (non-standard due to Hyperledger requirements), after which Hyperledger Fabric and Consul services are installed and configured. The latter in client or server mode. In general, there is no real need to run more than two Consul instances per single availability zone.
- Next, the integration with a separate Apache Kafka cluster is configured, as well as with CircleCI.com responsible for the CI / CD processes, ie deployment of new versions of the application. So, the next step would be to start the Fabric node by CircleCI.com.
Independence at last
What’s important for both us and our clients, Polynimbus gives us full independence from any cloud vendor. Therefore, if we get a dedicated, more advantageous price offer, e.g. from Oracle, we don’t have to stay with AWS just because of some technical reasons.
One must remember real limitations. Not all power of each subsequent instance can be allocated to the proper application because one must remember about Consul cluster – so that Hyperledger connects to Consul in its own availability zone. And therefore, each of them must contain one or two Consul instances.
Thanks to this, we avoid a situation where global network failure causes problems with the correct operation of the application. In a correctly configured multi-cloud environment, multi-region, multi-AZ… In the case of global network failure, selected nodes simply cease to support current traffic. However, this failure doesn’t result in any other consequences. Thanks to an efficient management stack, in this case, if we anticipate longer problems, we’re able to add new nodes in other cloud vendors and regions.