Celestia is a data availability layer based on the Cosmos SDK that provides a data space where sovereign rollups can safely and inexpensively store transaction data.
Unlike smart contract rollups, sovereign rollups conduct settlements on the p2p layer of nodes and are free from hard forks.
Celestia can achieve data availability by extending original data through 2d Reed-Solomon encoding and data availability sampling where light nodes randomly sample data chunks.
Namespaced Merkle Trees (NMTs) is a data structure that manages data corresponding to each rollup under a single namespace, allowing rollup networks to download all relevant data from Celestia easily.
Various well-known projects, including Manta, Eclipse, Astira, Dymension, are under development on Celestia, which is anticipated to establish a substantial modular ecosystem subsequent to the Ethereum network.
Approximately three years have passed since Vitalik Buterin's 'A rollup-centric ethereum roadmap' was published, and rollup networks based on the Ethereum network have successfully built ecosystems in comparison with alternative L1 networks. Notable optimistic rollups like Optimism and Arbitrum, while both being EVM-based, have each built their own ecosystems, and the recently OP-Stack-based Base network is also flourishing with support from Coinbase and successful consumer apps like friend.tech. Besides optimistic rollups, the zk rollup ecosystem, such as zkSync Era, Polygon zkEVM, Linea, and StarkNet, is also gradually growing.
A common feature of these rollup networks is that they are Ethereum-based smart contract rollups. These rollups have a rollup contract on the Ethereum network, utilizing the Ethereum network as a settlement and data availability (DA) layer. Because they store transaction data on the Ethereum network, even if malicious activity occurs on the rollup network, a valid state can be restored based on the transaction data stored on the Ethereum network, hence relying on the powerful security of the Ethereum network. Additionally, the fraud proof process of optimistic rollups and zero-knowledge proof verification of zk rollups occur on the Ethereum network. This also means that the Ethereum network confers finality on the smart contract rollup state and, therefore, rollup networks that share the Ethereum network as a settlement layer can easily build a trust-minimized bridge to communicate safely with each other.
Smart contract rollups have several drawbacks. One of the difficulties is the difficulty of a hard fork. Since the smart contract residing on the settlement layer manages everything related to the rollup, smart contract rollups cannot easily proceed with a hard fork. Of course, as pointed out in an article by Jon Charbonneau, the state of smart contract rollups can also be determined by the rollup’s full nodes, allowing for a hard fork similar to sovereign rollups from a certain perspective. This only applies to rollup native assets, and assets transferred via the settlement layer's rollup bridge will become worthless in the event of a hard fork unless a social consensus of the settlement layer can be achieved. Additionally, there are potential vulnerabilities due to bugs in the rollup contract and management through multi-sig.
Sovereign Rollup is a new form of rollup network that addresses these drawbacks. Like smart contract rollups, it relies on a DA layer for security by storing transaction data in the DA layer, but sovereign rollups validate transactions not through smart contracts but through their own p2p network, therefore not utilizing the settlement layer.
Since sovereign rollups do not have a separate rollup contract, they have various differences from smart contract rollups. The first is the advantage of a hard fork. In existing Layer 1 networks, security fragmentation is inevitable unless all nodes are persuaded to upgrade their clients when a hard fork occurs. If smart contract rollups hard fork, liquidity transferred from the settlement layer becomes futile. On the other hand, while sovereign rollups depend on the DA layer for security, they can freely hard fork because there is no separate rollup contract.
So far, few sovereign rollups have been launched on the mainnet, but numerous sovereign rollups are expected to be launched soon, along with the Celestia DA layer. This report will explore how Celestia enhances the scalability of rollup networks.
Celestia is one of the most popular modular blockchain projects, also known for popularizing the concept of modular blockchains. Unlike monolithic blockchains, modular blockchains refer to networks that distribute and handle the roles of a blockchain separately: 1) Execution 2) Sequencing 3) Settlement 4) Data Availability 5) Consensus.
Execution - Downloading transactions, executing them, and then updating the state.
Sequencing - Collecting users' transactions, determining their order, and producing blocks.
Settlement - Verifying the validity of transactions in the execution layer through fraud proofs or validity proofs.
Data Availability - Storing transaction data from the execution layer, ensuring it is always available.
Consensus - Determining the order of transactions, bundles, and blocks.
Celestia serves as a blockchain specialized in storing transaction data for Sovereign Rollups, acting as a DA layer, responsible solely for data availability and consensus. Since it abstains from execution tasks and facilitates light nodes to reconstruct blocks via Data Availability Sampling (DAS) securely, Celestia emerges as a secure and scalable DA layer. Let's delve into the various attributes of Celestia.
Unlike other monolithic blockchains, Celestia does not verify transactions and does not perform executions. The verification and execution are entrusted to the rollup networks, while Celestia focuses exclusively on data availability and consensus. Celestia, a blockchain based on the Cosmos SDK, determines the order of rollup transactions and batches according to fees.
Transaction verification occurs within the P2P layer of rollup network nodes. Unlike smart contract rollups, where the smart contract of the settlement layer determines the canonical state of the rollup, nodes locally decide on the canonical state of Sovereign Rollups using Celestia and provide fraud/validity proofs through the P2P layer. Is it truly safe for the canonical state of the rollup to be determined locally?
2.3.1 Data Availability Problem (DAP)
This is where data availability plays a crucial role. Data availability refers to the accessibility to the data of newly generated blocks. Assume you are a full node of Rollup A. As long as Celestia's security is reliable, you can access all data related to Rollup A stored in Celestia, enabling you to validate the legitimacy of transactions and, further, to generate fraud proofs in the case of optimistic rollups, or zero-knowledge proofs for zk-rollups to convey even to light nodes. Therefore, if data availability is assured, it’s safe for nodes to determine the rollup's canonical state locally.
However, let's assume a malicious block containing invalid or omitted transactions (i.e., data is not available) is introduced into the network. Since the full node of a Sovereign Rollup downloads all data, it can immediately detect this and will not accept the block. However, light nodes, which only download block headers, cannot determine the malicious block, leading the canonical state perceived by full nodes and light nodes to diverge. Full nodes cannot even generate and submit fraud proofs to light nodes because fraud proofs generated from transaction data cannot be created when transaction data is missing.
This is known as the Data Availability Problem (DAP), and it's vital to resolve it to ensure that light nodes are not compromised in terms of security. Celestia addresses the DAP through 1) erasure coding and 2) Data Availability Sampling (DAS). Through these two functionalities, transaction data stored in Celestia is always available, and light nodes can maintain a security level almost equivalent to full nodes.
2.3.2 Erasure Coding
Erasure coding involves extending data by adding redundant data to the original block data. Rather than just appending any data during expansion, it is ensured that original data can be reconstructed even if only a portion of the expanded data is known. Celestia employs Reed-Solomon Codes to perform erasure coding.
Reed-Solomon codes are used in real life and are applied to technologies like CDs, QR codes, and barcodes. You may have experienced that CDs still function even with some dust or damage, and QR codes or barcodes still work even when partially obscured. This is due to the characteristics of Reed-Solomon codes, which allow the original data to be reconstructed even if a part of the data is damaged or lost. When Reed-Solomon codes are applied by adding k pieces of extra data to n pieces of original data, even if up to k out of the total n+k pieces of data are lost, the original can still be reconstructed.
The illustration above succinctly demonstrates how Reed-Solomon codes expand data. Let's assume there are n(=4) original data points on the coordinate system, as shown on the left. A polynomial of degree n-1(=3) that passes through n points in the coordinate system is unique. Once the n-1(=3) degree polynomial is derived from the original data, we can plot k(=4) arbitrary points on that function. Therefore, having any 4 out of the total n+k(=8) data points enables us to find the same n-1(=3) degree polynomial and reconstruct the original data. Note that Reed-Solomon codes also utilize mod operations to prevent the data size from escalating exponentially.
While Reed-Solomon codes facilitate the reconstruction of the original data even when some of it is missing, they don't ensure that the data was correctly expanded in the first place. In Celestia, fraud proofs are introduced to assure the accurate expansion of data, rectifying any malicious data expansions through these proofs. The above example illustrates the simplest 1d Reed-Solomon encoding. A substantial amount of data corresponding to O(n) is required to generate fraud proofs for this. Celestia employs 2d Reed-Solomon encoding, as described below, to reduce the amount of data needed for fraud proofs, requiring only data on the order of O(n^0.5).
Source: Celestia
In Celestia, the original data is divided into k×k chunks and expanded to 2k×2k through several rounds of Reed-Solomon encoding, as depicted in the above illustration. The expanded data calculates 2k Merkle roots per row and 2k Merkle roots per column, totaling 4k Merkle roots, which are subsequently summarized through another Merkle tree to produce the final Merkle root. This Merkle root is utilized in the process where light nodes sample the data.
2.3.3 Data Availability Sampling (DAS)
Light nodes in Celestia can request and sample a portion of the expanded data from full nodes. Each of the light nodes downloads a subset of the 2k×2k data, verifying it as part of the block by comparing it with the Merkle root. According to erasure coding, once a certain threshold of expanded data is collected, the original can be reconstructed, ensuring the block is always available as long as the network’s light nodes accumulate downloaded data.
How much data must a malicious block producer hide among the 2k×2k data for the original data to be irrecoverable? The answer is that hiding k+1 elements in at least k+1 rows (or columns) makes the entire data irrecoverable. In other words, if (k+1)^2 data is missing from the total (2k)^2 data, the block is unrecoverable even if the light nodes perform sampling.
As in the above picture, let's assume that k+1 elements are missing in k+1 rows. In Celestia, the data has been expanded by a factor of 2 in both rows and columns. If k+1 out of the 2k data in the first row are missing, it cannot be recovered, and the second, third, ..., and k+1th rows similarly cannot be recovered because more than half of the data is missing in each. Applying the same logic to the columns leads to the conclusion that the total data cannot be recovered.
In other words, to perfectly deceive the light nodes, a malicious block producer needs to hide at least (k+1)^2 pieces of data, which can easily be detected through the light nodes' DAS. This is because light nodes query random chunks of the 2k×2k data chunks, and the probability of a light node querying the missing data is (k+1)^2/(2k)^2, which is quite high. If we assume k=32, this is approximately a 26.5% probability. Therefore, malicious blocks can be detected with very high probability through multiple queries. If the probability of not detecting a malicious block after n queries is P(n), the calculated probability is as follows:
P(1)=1−(0.265)=73.4%
P(3)=(0.734^)3=39.6%
P(15)=(0.734)^15=0.97%
So, when k=32, with just 15 queries, a block can be verified as malicious with a probability of over 99%. In summary, Celestia's most significant feature is always ensuring data availability, achieved by 1) expanding data through 2D Reed-Solomon encoding and 2) having light nodes perform data availability sampling from the expanded data. When the chunks sampled by the light nodes from the data expanded through 2D Reed-Solomon encoding reach a certain number, the block can be safely recovered, and if a malicious block creator makes it unrecoverable by omitting more than (k+1)^2 pieces of data, not only full nodes but also light nodes can easily detect this.
Compared to existing blockchains, where light nodes had no choice but to trust full nodes for security, the most significant feature in Celestia is that light nodes can directly contribute to network security. If the number of light nodes participating in the network increases, the number of data chunks being queried also increases, which means that the size of Celestia's blocks can safely increase. In conventional blockchains, the minimum hardware specification required for full nodes increases as the block size increases, leading to potential network centralization issues. However, in Celestia, the block size can increase just with a higher number of light nodes, and since light nodes can participate in DAS even with lower hardware specifications, it does not cause network centralization, which is one of the major features of Celestia.
Currently, Ethereum-based rollup networks store transaction data as calldata, which incurs significantly high data storage costs due to Ethereum's small block space. As of September 2023, Arbitrum spends between $20k-$60k per day, and zkSync Era between $50k-$70k per day on fees due to data storage. (The upcoming Cancun-Deneb upgrade to Ethereum, through EIP-4844, will introduce a new storage space called blob, intended to reduce the data storage costs incurred by rollup networks).
Thanks to erasure coding and DAS, Celestia can offer large data storage spaces to rollup networks. Therefore, sovereign rollups on Celestia can store transaction data much more cheaply compared to Ethereum-based rollups. Celestia plans to introduce a mechanism similar to Ethereum's EIP-1559 to determine data storage fees and even introduce a burning mechanism.
Celestia stores the data of numerous rollup networks as a DA layer. Therefore, various transaction data from different rollups are mixed in Celestia. How can each rollup retrieve transaction data corresponding to them? For this, Celestia introduces Namespaced Merkle Trees (NMTs).
Source: Celestia
NMT (Namespaced Merkle Trees) consists of Merkle trees made of leaves sorted by namespace identifiers, allowing rollups to prove that they have fetched all transaction data relevant to them. For instance, suppose there are a total of 8 pieces of data (D0-D7), as shown in the above diagram. Each piece of data has namespace identifiers (i.e., D0: 1, D4: 2). Now, nodes (N0-N7), which are hash values corresponding to each piece of data, are created, and these are used to form a Merkle tree.
If a rollup corresponding to namespace 2 wants to fetch all of its data, Celestia provides D3, D4, D5, D6, and to prove that this is all the data for namespace 2, it provides N2, N8, N7. The rollup can calculate the Merkle root using the given data and nodes and verify that it matches the block header. On the other hand, let's assume that Celestia only provided D4, D5. Then, N11 and N12 should be provided to calculate the Merkle root, but since N11 is 2-3 and N12 is 1-2, each being nodes containing data corresponding to namespace 2, the rollup can confirm that it has received less data pertaining to namespace 2. In other words, even though the data from numerous rollups is mixed in Celestia, each rollup can fetch only the data they want, thanks to NMTs.
Blobstream is a technology that allows rollups based on Ethereum to utilize Celestia as a DA layer. Rollups using Blobstream are similar in form to validium. Validium refers to a scalability solution based on the Ethereum network and an L2 that stores transaction data off-chain rather than on Ethereum.
Source: Celestia
As shown in the figure above, L2 using Blobstream uses the Ethereum network as a settlement layer and Celestia as a DA layer. Since the settlement and DA layers are separated, it is necessary to verify on the Ethereum network whether the data stored in Celestia is available. This is performed by the Blobstream DA contract existing on the Ethereum network. The DA attestation from Celestia is delivered to the Blobstream DA contract on the Ethereum network.
DA attestation is the Merkle root of L2 data signed by Celestia validators, proving that the data is available in Celestia. The Blobstream DA contract verifies whether the DA attestation received from Celestia has been signed by more than 2/3 of the validators and confirms that the related data is available in Celestia. Recently, Eclipse Mainnet has become a representative example of using Blobstream by publicly utilizing Ethereum as a settlement layer and Celestia as a DA layer. Arbitrum's rollup framework, Arbitrum Orbit, also supports Celestia as a DA layer through Blobstream.
Source: Celestia
Blobstream is composed of two components: the orchestrator and the relayer. An 'attestation' is a request for a signature. When a data commitment is created in the Blobstream module, the orchestrator queries and signs it, then sends it to the Blobstream P2P network. The relayer queries the attestation from the Celestia full node and the signed attestation from the Blobstream P2P network. Once it confirms more than 2/3 of the signatures, it sends it to the Blobstream DA contract on the target EVM chain.
In Blobstream, there is a drawback of incurring significant costs because the on-chain light clients existing on Ethereum verify the data root signatures of Celestia. To solve this, Succinct Labs developed Blobstream X. Blobstream X introduces a ZK-Tendermint light client to simplify the verification of Celestia's signatures, thereby reducing various burdens, including costs.
Source: Celestia
September 2023, Celestia Introduces the TIA Token. The total supply of TIA is 1 billion, with 60 million (6%) designated for the genesis airdrop. The token distribution and lock-up schedule for the initial supply is as depicted in the aforementioned illustration, and the utilities of the TIA token include:
Block Rewards: As a blockchain based on the Cosmos SDK, TIA tokens are utilized as block rewards for validators.
Data Storage Costs: Rollup networks pay TIA tokens to Celestia to store transaction data. Paying higher fees prioritizes inclusion in a block.
Rollup Token: Analogous to ETH's role in Ethereum-based rollups, sovereign rollups based on Celestia can use TIA as a native currency.
Governance: TIA tokens are used for governance regarding network parameter decisions and community pools.
As a notable modular blockchain project outside the Ethereum ecosystem, numerous projects will use Celestia as a DA layer. Here are the examples.
3.1.1 Manta Pacific
Manta Pacific, developed by p0x Labs, is a Celestia-based EVM rollup utilizing Caldera and OP Stack. Manta Pacific introduced Universal Circuits 2.0, allowing developers to deploy zk applications without learning zk-specific languages.
3.1.2 Eclipse Mainnet
Eclipse Mainnet is an L2 network that uses Ethereum as a settlement layer and Celestia as a DA layer, exemplifying Celestium. A fascinating aspect of Eclipse is its use of SolanaVM, not EVM, as the execution environment, inheriting Ethereum's robust security and SolanaVM’s speed.
The shared sequencing layer outsources the sequencing, or transaction ordering process, in rollup networks, providing various advantages like 1) censorship resistance, 2) improved liveness, and 3) cross-rollup composability.
3.2.1 Astria
Astria is a shared sequencing layer in the Celestia ecosystem, where sequencers follow the CometBFT consensus algorithm. Rollup networks utilizing Astria can adhere to two types of finality. The first involves a soft commitment acquired from Astria Shared Sequencer before being included in Celestia, and the second involves a hard commitment received post-inclusion.
3.2.2 Fairblock Network
Fairblock Network leverages pre-execution privacy solutions and uses Celestia as a DA layer. By encrypting transactions via identity-based encryption (IBE) and ordering them while contents remain hidden, Fairblock addresses malicious MEV, censorship, and front-running issues.
3.2.3 Radius
Radius is a shared sequencing layer that employs Practical Verifiable Delay Encryption (PVDE) and ZKP, encrypting user transactions with PVDE and decrypting them post-ordering, mitigating malicious censorship issues even with a single sequencer.
Rollup-as-a-Service assists developers in easily deploying rollup networks. Several RaaS projects currently offer Celestia as one of the DA layer options:
Rollkit is a framework for sovereign rollups, notable for providing an ABCI-compatible client interface.
OP Stack, an open-source tech stack based on Optimism and widely used in the Ethereum ecosystem, can utilize Celestia as a DA layer. Also, Caldera, based on OP Stack, can utilize Celestia.
Dymension provides Dymension Hub, utilizing Celestia as a DA layer, and Dymension RDK, facilitating easy app roll-up deployment (Rollapps).
AltLayer is a no-code tool enabling easy rollup deployment.
Sovereign SDK is a framework that makes developing zk sovereign rollups easy and supports zk rollups on Celestia.
Arbitrum's rollup framework, Arbitrum Orbit, supports Celestia as a DA layer in addition to Arbitrum One and Arbitrum Nova.
3.4.1 Osmosis
Osmosis, a prominent DEX in the Cosmos ecosystem and a liquidity hub for the IBC ecosystem, plans to offer various features in the Celestia ecosystem, such as providing cross-chain liquidity to Celestia-based sovereign rollups and abstracting data fees to allow payment in tokens other than TIA.
3.4.2 Catalyst
Catalyst, a cross-chain AMM protocol, enables swapping assets across numerous networks like Ethereum, Cosmos, Optimism, and Eclipse and plans to support easy cross-chain swaps for Celestia-based rollups.
In a current landscape dominated by smart contract rollups dependent on Ethereum, Celestia will allow rollups to retain sovereignty, providing only consensus and DA services to sovereign rollups. In essence, a blockchain encapsulates participants' social consensus into a structured protocol, with this social consensus being the pivotal element. Unlike conventional smart contract rollups that are inherently reliant on the social consensus of the settlement layer, sovereign rollups have the capability to forge their own independent social consensus, even as Layer 2 solutions. Emerging from the foundations laid by the LazyLedger whitepaper in 2019, Celestia aspires to pave the way for new possibilities—specifically, the advent of sovereign rollups—by offering a secure and expansively scalable data availability layer.
Thanks to Kate for designing the graphics for this article.
We produce in-depth blockchain research articles
zkRollup is emerging as a major pillar of layer 2 solutions, leveraging the technical advantages of zero-knowledge proofs. It is particularly impressive that Ethereum, which has the largest ecosystem, has officially chosen zkRollup as the direction for its layer 2 rollup. Additionally, Bitcoin is also seeking to achieve scalability by utilizing zkRollup. Following the emergence of Optimistic Rollup, zkRollup has been rapidly growing, offering advantages such as faster processing and lower operational costs. Let's take an in-depth look at zkRollup from its basics to the current market status and future prospects.
Initia’s future growth plans include the launch of its mainnet and the development of various DeFi, social, and NFT projects, potentially positioning it as a favorable option for launching rollups due to its user-centric and interconnected infrastructure.
Arbitrum and Optimism are striving to improve the technological aspects of fraud proof, while other projects are also implementing interesting approaches. Let's walk through their current activities and ongoing developments.
The strategic decisions and journey of Mantle, from BitDAO's inception to the Mantle V2 upgrade, offer valuable insights into building a successful Layer 2 blockchain.