We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 – 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!
Carbon emissions, as a key driver of environmental change, are coming increasingly under scrutiny by government regulators and in the court of investor opinion. Recent moves by the Biden administration to limit greenhouse gasses and by the SEC to force all public companies to disclose even low levels of carbon footprint impact have garnered significant media attention — reporting and compliance trends that are only likely to accelerate over time as the effects of climate change become more visible and pronounced.
The two most popular public blockchains, Bitcoin and Ethereum, employ a proof-of-work algorithm that consumes vast amounts of processing power, with Bitcoin alone using around 136 Terawatt-hours of electricity per year, more than the Netherlands or Argentina. Not only are these public chains massively inefficient on a per-transaction basis, but their power-hungry algorithms have inevitably led to block construction – known as mining – migrating to countries where environment laws are weaker and electrical power is produced from dirty sources, such as coal. This environmentally destructive footprint is inconsistent with the environmental stance of most US public companies, the U.S. government’s focus on carbon footprint reduction, and in the court of public opinion.
Private chains – such as Hyperledger Fabric – rely on 1990’s era “scale to peak capacity” approaches that do not support auto-scaling or other dynamic capacity mechanisms. While more efficient than Ethereum’s proof-of-work protocol, they suffer from massive under-utilization of data storage mechanisms and their need for heavy, “always on” compute capacity drains power (and produces carbon footprint) 24x7x365 regardless of actual transaction rates.
More modern approaches, such as Vendia’s blockchain, rely on more efficient serverless technologies and sustainable public cloud services. By exploiting these cloud-native technologies, modern blockchains offer tight cost enveloping and a carbon footprint that is actually lower than conventional (“centralized”) IT approaches to sharing data through hosted databases and APIs. Features designed to minimize file redundancy further enhance the ability of IT teams to improve storage efficiency without compromising functionality or security. Enterprises and companies of all sizes can benefit from both the speed of delivery and the improved cost and carbon footprint outcomes derived from SaaS-delivered blockchain capability using these newer approaches, allowing them to build cost effective cross-cloud data fabric, partner data sharing and operational data service solutions while simultaneously improving their carbon footprint stance.
First-generation chains: Promising tech, unacceptable environmental costs
Signs of climate change routinely make headlines … media attention that is increasingly shared with government and private industry attempts to control greenhouse gas emissions. Steps by the current U.S. administration to reduce carbon footprints and their resulting environmental damage include a variety of programs targeting supply chains, power production, and – most recently – SEC reporting requirements for public companies. While lowering greenhouse emissions and improving IT efficiency has been on the minds of CIOs for some time, this increased transparency and accountability is just the beginning of a push for compliance that will eventually rival SOC and PCI in its impact on R&D, business operations and investor reporting. Companies, especially larger enterprises, need to begin planning now for the inevitable impact of exposing their IT portfolio choices to the broader public.
Blockchain technologies offer companies a promising new platform for building everything from operational data store (ODS) systems that can span public cloud providers to secure partner data sharing that replaces conventional API-based solutions with blockchain-powered “smart APIs.” However, leveraging first generation blockchain technologies comes with unacceptable environmental costs:
- Ethereum, originally touted as the “world computer,” shares with Bitcoin an environmentally destructive proof-of-work algorithm that is actually designed to consume vast amounts of computing power as a mechanism to disincent fraud. Regardless of the technical advantages or disadvantages of this particular approach, it has led in practice to block mining shifting to countries with the lowest cost of electricity … inevitably based on dirty production methods including coal mining that exacerbate pollution and carbon footprint. Bitcoin, e.g., already consumes more electricity per year than some entire countries – expanding the use of Ethereum by 10-20 orders of magnitude, as would be required to give it the processing capability of conventional IT operational systems – would have untold impact on the environment using the existing approach. While the Ethereum community has long discussed moving to more efficient mechanisms, progress has been slow and lacking in real-world impact for the last several years.
- Private and permissioned chains, a category dominated by Hyperledger Fabric, continue to rely on last-century “scale to peak capacity” approaches. Unlike modern cloud native systems designed to exploit more efficient container packing and serverless technologies, Hyperledger Fabric, Quorum, and other “private chains” rely on single server deployments that offer no internal scaling mechanisms and that cannot be easily spun up or down, defeating attempts to apply auto-scaling or other capacity management techniques. This leads to an “always on” solution that employs 100% of computing, database, and storage capacity 24x7x365…even if no actual work is being performed.
As a result, blockchain technology has become associated in public opinion with a high, and largely unacceptable, carbon footprint. That’s unfortunate, because blockchains can actually improve carbon footprint, when implemented correctly. More modern approaches to blockchain protocols have focused not just on improving cost effectiveness and ease of use but also improving compute and storage efficiency, making it possible to actually decrease carbon emissions relative to conventional IT approaches.
In cryptocurrencies and other “public” blockchains, proof of stake has largely replaced proof of work in more modern implementations. Although proof of stake has occasionally been criticized as another form of centralization, it does avoid the high carbon footprint required by the Sybil attack-resistance proof-of-work approach. Public chains also serve a large, worldwide ecosystem, so at least the more popular ones enjoy a reasonable level of utilization.
Public chains still suffer from other forms of inefficiency: even when employing proof of stake, they are required to expend a large percentage of their computational resources maintaining Byzantine and denial-of-service attack resistance, rather than using those same resources to actual compute results. They also need to maintain a “least common denominator” approach to data modeling and storage that can serve anyone in their community, and cannot rely on optimizations based on data models or access patterns.
Worse, public chains are, well, public — by construction, every node needs to maintain a copy of all information and updates from all sources, regardless of access patterns. So even experimental or “test” data from a no-longer-existent startup will have to be copied and maintained by every node in the network, in perpetuity. Similarly, if two companies want to use a public chain to communicate but don’t necessarily need (or perhaps even want) others to participate in the exchange, every other node (and all auditing clients listening for updates) still has to be informed, making both data distribution and data storage vastly inefficient over time due to what the intentionally access pattern agnostic approach of public chain design. Techniques designed to ameliorate these problems, such as sharding and “L2 caches” have their own drawbacks, usually including the fact that they are both more centralized in their approaches and that they place the burden of picking a “subcommunity” with which to communicate on every client.
These public chain drawbacks don’t improve over time or with technology; in fact, as the throughput of streamed data and the total volume of stored data increase, they actually get worse. For all of these durable structural reasons, private chains will remain a more efficient and “greener” technology for applications such as partner data sharing, cross-cloud operational data stores, and real-time data fabrics than public chains.
First generation private chains, such as Hyperledger Fabric and Quorum, rely on known identities for node operators that do not require either Proof of Work or Proof of Stake to safely “mint” a block. However, as data sharing and data storage platforms go, they are woefully less efficient than modern, cloud-based approaches to storing and sharing data, such as Amazon DynamoDB or Azure CosmosDB. Cloud-based solutions such as these make more efficient use of infrastructure and electricity for several reasons:
- They are multi-tenanted, achieving aggregate utilization that is far higher than an individual company or deployment could produce through sharing of resources, without having to compromise burst capacity.
- Their storage capacity is continually expanding, avoiding “scale to peak” concerns that cause over-provisioning of legacy blockchain storage resources.
- They have flexible fleet sizing and work allocation algorithms, enabling them to direct compute power where needed, avoiding “scale to peak” concerns that cause over-provisioning of legacy blockchain compute resources.
- Their algorithms are inherently fault tolerant across containers, servers, and available (fault) zones, avoiding the requirement for applications to create fully redundant deployments. By contrast, legacy blockchains require multiple nodes to overcome server or availability zone failures, resulting in a much larger computational and storage footprint to achieve the same end result.
Given that public cloud services have solved many of these challenges for centralized data sharing solutions, it’s natural to wonder if they couldn’t be similarly applied to decentralized data sharing solutions, i.e. blockchains. And indeed, second generation blockchain approaches have done just that.
Serverless blockchains: The greenest chains
Numerous public cloud services are now described as “serverless”. While the term may seem somewhat ironic (given that they are, obviously, running on servers), the label conveys some important elements of both developer experience and implementation efficiency:
- Massive multi-tenancy – Serverless implementations take the cloud’s ability to multi-tenant to the extreme, allowing a set of shared resources much smaller than individual servers to process workloads from different customers without compromising security or operational workload isolation.
- Per-request scaling – Serverless approaches generally build automatic scaling into their algorithms directly, so that every request to them is also implicitly a scaling directive to the platform as a whole, which can then recruit more (or fewer) resources and allocate workloads among the massive fleet intelligently…a feat that individual company and limited on-premise deployments cannot pull off. Unlike conventional blockchain approaches, those built using a serverless approach gain its automatic, request-level scaling abilities, enabling them to offer tight cost enveloping and efficient per-transaction resource footprints.
- Implicit fault tolerance – One of the benefits of having a massive fleet and dynamic workload (re) allocation is that each customer’s workload can be protected from individual machine or even entire availability zone outages without the need to write code or modify their deployments. Unlike a conventional blockchain based on a single-server deployment model, blockchains based on serverless technologies become implicitly fault tolerant.
- Scales to zero – Unlike conventional deployments, such as those used by all first generation blockchains, serverless applications can turn completely “off”, meaning they have no ongoing footprint when no useful work is being performed. Instead, other customers of the underlying services take advantage of those (multi-tenanted) infrastructure, HVAC, electricity, and other resources to gain useful work from them. Critically, eliminating this baseline cost also eliminates the equivalent carbon footprint for the customer.
These multiple advantages of serverless technologies “pass through” into platforms built from them, as is the case with serverless blockchains technologies such as Vendia’s. What’s more, they not only improve on older private blockchain technologies that are “always on”, they actually improve on most conventional (centralized) approaches to building data sharing platforms, as the next section explores.
Conventional data center and commercial IT server utilization is notoriously low, with estimates ranging from 5-15% (i.,e. 85-95% waste). That’s not surprising, because any individual company’s applications and solutions have typical usage patterns. Trying to “fill in the low spots” with their own or outsourced third-party workloads is tantamount to building their own version of a hosted serverless compute platform – a challenge out of reach for all but the largest and most well staffed IT centers of the Fortune 50. For everyone else, their independent and isolated workloads effectively doom them to low server utilization rates, even when those servers are running in the public cloud.
Companies that need to build “public” APIs to share data across departments or organizations internally, to share data with business partners (in supply chains and other multi-company arrangements), or to create multi-cloud solutions find themselves in a predicament here: Building custom implementations to host the APIs, connect the APIs to the data, apply data integrity and constraint checks, create connectors to cloud and application data streams, implement event hooks and other notification solutions, and so on face an uphill battle. Not only are these implementations complex distributed systems that require high caliber engineering talent to develop and deploy, they require ongoing 24×7 operations support. And because they allow data to transit between companies, clouds, or organizations with differing compliance regimes, they face the highest levels of risk and scrutiny with respect to security, regulations, and policy enforcement. And because they are “single use” applications, they also suffer from low utilization. In the aggregate, owning a large portfolio of poorly utilized IT solutions, combined with upcoming reporting and transparency requirements, will be a significant liability for CIOs and CEOs to manage.
Modern blockchains offer a unique solution to these problems: By making it easy and secure to share real-time, operational data both internally and with partners, they lower time to market, remove project and security risks, and minimize the “undifferentiated heavy lift” of creating redundant data- and code-sharing platforms. By using modern, serverless blockchains, companies can simultaneously shift from 10% utilization in “homegrown” solutions to 100% utilization, because serverless solutions are only active when actual work is being performed, by construction. By leveraging the SaaS-style delivery of these blockchains, companies can also dramatically reduce the levels of staffing required to both develop and then operate the resulting systems, effectively shifting much of that burden onto the public cloud and blockchain service providers themselves, lowering IT costs even further. Finally, companies benefit from the massively multi-tenanted nature of the underlying cloud infrastructure, combined with the security and safety of having professionally managed fleets and software systems that are fully outsourced and staffed 24x7x365 around the globe. In short, adopting serverless blockchains allows companies to achieve higher utilization, lower environmental impact, faster time to market, and lower costs versus conventional approaches to building data-sharing solutions such as public APIs.
Efficient file sharing solutions
While databases may be the stars of enterprise data storage and sharing applications, the bulk of data owned and managed by companies is actually in the form of files. Thus, how files are shared, stored, exchanged, duplicated, and governed ends up having a larger effect on greenhouse gas emissions than database storage. Files are also key to partner data sharing solutions, as they often form the basis for both de jure and de facto industry data exchange standards.
The best modern blockchains manage files “on chain” along with scalar (database-held) data, treating them as native data types. But that alone isn’t enough: To avoid the environmental impact of duplicating large volumes of (often large) data files, it’s also necessary to avoid creating unnecessary duplicates in the form of redundant copies of the data in every partner’s IT stack.
To accomplish this, blockchains such as Vendia also include sharing controls and dynamic file exchange. These features allow customers to “set the dial” anywhere from fully redundant copies (maximum operational isolation but also maximum environmental impact from redundant storage) to fully dynamic, where only a single copy is stored and then fetched on demand when other users with appropriate permission request it. In between are hybrid strategies, such as caching (fetch on first use) and quorum (maintain a small number of copies in strategic locations, such as one per public cloud). Without these critical operational controls, along with conventional governance and access controls, redundant file storage would quickly balloon out of control, invalidating any gains made from improved sharing of scalar data. This is one of the reasons that “public” chain file sharing solutions, such as IPFS and FileCoin, have not grown to be even a small fraction of a percent of cloud data storage solutions such as Amazon S3 — the high cost, high latency, and low throughput of such systems blunts their decentralized advantages for all but the smallest size (and highest valued) files, making them a poor choice for most IT file sharing needs, such as partner data exchange.
Adopting modern blockchains
Because blockchain technology ranges from the environmentally destructive (Bitcoin, Ethereum) to “merely” low utilization (Hyperledger Fabric, Quorum) to guarantees of perfect application utilization (serverless solutions such as Vendia’s), IT professionals facing technology choices need to be careful to ensure they are adoption technologies that will be both cost effective and present their companies in the best possible light when carbon footprint reporting goes fully into effect. The following list will help identify technologies that improve a company’s carbon footprint stance, rather than damaging it:
- Does the solution employ proof-of-work algorithms? Proof of work is a known source of extremely high carbon emissions and “dirty” energy consumption that could mar a company’s public image, even if used indirectly.
- Does the solution support on-chain file management with high availability and redundancy, low latency, and IT-ready redundancy controls? Files are a large portion of an IT organization’s storage footprint; without a solution for managing and tracking content duplication and exchange, redundant file storage will quickly dominate attempts to share data effectively. Availability and redundancy should be built-in features, not client- or application-derived outcomes.
- Is the solution serverless or “single server” in nature? Serverless solutions offer 100% application utilization by construction, in addition to offering built-in scaling and fault tolerance. Single machine deployments are not fault tolerant, cannot scale, and are “always on” solutions (aka “scale to peak capacity”) that negatively impact a company’s carbon footprint.
- Is the solution delivered as a SaaS offering? SaaS offerings not only dramatically reduce development and maintenance costs, they also enable multi-tenanted approaches that increase efficiency and lower aggregate costs and carbon footprint further.
In a few short years, “saving the environment” has gone from a fringe movement to one of the top concerns of nations, influencing domestic and international policy. With new reporting requirements already present and the high likelihood of increased corporate compliance and reporting requirements likely, now is the time for CIOs, CEOs, and others to evaluate their IT choices and put strategies in place to lower carbon emissions over the long haul. Focusing on data and compute – the two key drivers of cost and power consumption – will enable companies to identify areas of improvement. With the increasing role of blockchains as mechanisms to share both code and data across companies and clouds, understanding and identifying which blockchain technologies and providers can help improve carbon footprint versus worsen it is an important question facing IT decision-makers and architects at all levels of an organization. The checklist provided in this article can serve as a vendor selection tool to help make informed decisions and guide a company towards a “carbon and cost” efficient solution.
Tim Wagner is a co-founder of Vendia, the inventor of AWS Lambda and a former general manager of AWS Lambda and Amazon API Gateway services. He has also served as VP of Engineering at Coinbase.
Welcome to the VentureBeat community!
DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.
If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.
You might even consider contributing an article of your own!