KYVE, our decentralized data lake, is made up of data pools, each specific to a certain type of data set. What exactly are these data pools, how do they work, and why? In this course, learn all about the core technology of KYVE’s data lake.

Let’s start learning! →

What Is A KYVE Data Pool?

If you already took KYVE’s Fundamental Level 1 and Level 2 courses, you should be well familiar with the concept of KYVE’s protocol layer and data pools. 

But for those that are new, KYVE’s protocol layer, also known as a decentralized data lake, is made up of data pools, with each pool pertaining to a specific data set. Within each pool, there are 50 protocol validators incentivized to fetch, store, and validate the requested data. 

These data pools make up KYVE’s trustless data sets, bringing forward our true mission of providing ecosystems with secure, trustless data to build with and or analyze.

How Do KYVE’s Data Pools Work?

Anyone can create a data pool! You can imagine blockchain foundations, projects, data scientists, developers, and more all interested in creating a data pool in order to provide reliable, free access to the trustless data they need. 

Seeing that KYVE is decentralized, to create a pool, you need to go through our governance in order to propose the pool creation. When proposing, one must specify the following requirements: 

  • One or more data sources which the pool needs to validate and archive (more data source are available better it is in term of decentralization)
  • A runtime which has defined how to validate the data
  • Choose a web3 storage provider where validated data should get stored to (for example, Arweave)

In order to create a pool and have it run, there needs to be funding. Each data pool can have up to 50 funders, each splitting the costs of payout for the well-behaving protocol validators. It’s important to note that when funding a pool, there is no direct monetary gain, only the gain of the trustless data the pool is bringing forward. Funders can join in and help split the costs at any point.

Once a data pool is created and funded, the real action happens! The protocol validators will join into the pool and get to work. 1 of the 50 validators will be selected to be the “uploader” responsible for fetching, bundling, and uploading the data onto the storage provider.

Selection is based on a weighted, pseudo-random selection (weight being the total delegation a validator has in that pool). Therefore, the more delegation a validator has (either self-delegated or from foreign users), the more likely it is that the validator gets selected as the uploader for the next round. 

Once the data is uploaded, the rest of the validators will vote, reaching a consensus on if the data is correct or not by cross-checking the data with the other data sources they have compiled or been provided with. If the data is considered correct, the uploader will be rewarded, and the data bundle will be tracked for easy retrieval.

  • Read further into how an “uploader” is selected in our docs
  • How exactly is data retrieved from these data pools? Find out in our Fundamentals Level 2 course

Data Pool Use Cases

Seeing that the data pool technology is quite broad, there are many different ways it can be used. 

For example, archiving an entire blockchain. As blockchains grow, the amount of data they produce gets more and more complicated to store and keep track of. Especially for its main network participants like nodes* and validators*. Due to this, archival nodes have lost incentivization due to the staggering costs just to keep running, which means fewer devices are holding onto the past. 

This is a major issue for node runners, for example, not only to maintain their node, but also in the case of wanting to join a new network, they are having an increasingly hard time getting all the blocks they need to State-sync. However, thanks to KYVE archiving entire blockchains, they can easily and freely tap into this data to continue maintaining or join in on a new network. 

Historical data is also beneficial for providing a sort of archival node, as well as for data analysts and developers that have dashboards and DApps relying on specific data sets. 

KYVE’s data pools can also access off-chain data, such as sports results, weather data, stock pricing, and more. This brings in a whole new wave of opportunities for DApps, providing them with the opportunity to provide special services for their users, like sports betting in a decentralized way. 

Or this may help data scientists and analysts needing reliable access to true weather or climate data around the globe in order to build their climate-neutral solutions. 

The use cases are endless! Be sure to read into our core use cases course to find out more. 

Conclusion & Resources

Congratulations! You made it through the KYVE SKYNC course! You are now a KYVE protocol layer pro, understanding its core tech around data pools.

Want to go deeper into the technology of KYVE? Feel free to check these resources:

  • Want to know more about KYVE’s data pools? Discover our Docs
  • Run into a few words you weren’t aware of? The words marked with * in this article are defined in KYVE’s Glossary!