Welcome to the KYVE Network course! 

It’s important to note that KYVE refers to our blockchain and data lake, whereas KYVE Network refers to all of KYVE’s developer tools built around KYVE. In this lesson, we’ll cover the basics of these developer tools!

Without further ado, let’s get started! →

KYVE Network Stack

KYVE is a Layer 1 blockchain*, meaning it provides its own blockchain infrastructure for building an ecosystem of decentralized applications (dApps) around its solution of validating data in a decentralized way and making accessible to all. 

In order to best bring forward this mission, KYVE Network is built up of a stack of developer tools on its blockchain:

  1. KYVE: A decentralized data lake that fetches, stores and validates the data;
  2. Data Pipeline: To provide easy access and implementation of trustless data;
  3. And soon, an Oracle: To provide easy access to very specific off-chain data sets.

KYVE, our decentralized data lake, is at the core of the KYVE Network, providing decentralized, valid data that anyone can use to build trustlessly in Web3. Initially, in order to access KYVE’s data, one had to build their own sourcing solution, such as an indexer*. 

While this is still possible, our team opted to provide a complete tech stack for all trustless data requirements to promote easy development within the KYVE Network. Starting with releasing our Data Pipeline for those wanting a no-code solution for importing KYVE data into their preferred backends, such as MongoDB, Google BigQuery, SQL Databases, and more.

As the year goes on, the tech stack will expand to best adapt to all data situations, opening up to more use cases, such as providing a decentralized oracle for accessing specific data sets.

  • Continue the course to find out more about each tool!
  • Want to find out more on the KYVE Network Stack? Visit our website.
  • Run into a few words you weren’t aware of? The words marked with * in this article are defined in KYVE’s Glossary!

What’s a Data Lake?

In Web2*, a data lake* is known as a repository that allows organizations to store all their structured and unstructured data at any scale. The data within a data lake can be in raw format and can be stored in its original state, without the need for a predefined schema or structure. This allows for more flexibility and scalability in data storage and processing.

Once data is stored in a data lake, it can be processed and analyzed using a variety of tools and technologies. These tools allow for data to be transformed, cleaned, and integrated, making it ready for analysis and reporting.

We’ve taken this concept of the traditional data lake and modernized it with decentralization and built-in validation, creating KYVE

KYVE is a Layer 1 blockchain with two layers, a consensus layer that manages the overall chain infrastructure, and the protocol layer, which works as the data lake. The data lake is made up of pools* of data, each designated to working on a certain theme or source of data, like Solana blockchain data, or climate data. Within each pool is a network of validators that are in charge of fetching, validating, and storing the data requested, being rewarded in tokens for good behavior by those funding a pool. 

These pools can be created and or funded by anyone! Most likely developers needing access to a certain type of data for the projects they’re building. Creating a pool is fully customizable, the developer can decide what data they want the data lake to fetch and validate, where it should be stored, along with other parameters such as the amount of rewards for the validators within that pool. KYVE leverages the Web3 ecosystem by leaving the storage aspect to platforms that are specialized in the storage field, such as Filecoin, Arweave, IPFS, and more. 

Once the requested data has been fetched, validated and stored, developers can access the data for free by requesting the access point from KYVE directly, or via KYVE’s Data Pipeline. From there, they can import and transform the data in any format they need to best support their build.

Example: Imagine a developer building a dashboard for users to track their transactions across multiple different blockchains or Decentralized Exchanges* (DEX). The developer can request a KYVE pool to be created around the user data from the preferred blockchains or DEX. 

From there, protocol validators within KYVE handle the rest! Fetching, validating to make sure the data is fully correct and up to date, then stores it. The developer can then easily plug in this data (via Data Pipeline or their own sourcing solution) into their dashboard code, and voila!  

In this sense, KYVE’s data lake does allow custom data sourcing, storage, and access like a typical data lake would, but makes the storage part more secure, scalable, and unique via its Web3 partners and, of course, makes sure the data running through KYVE is truly correct through decentralized validation. 

  • Run into a few words you weren’t aware of? The words marked with * in this article are defined in KYVE’s Glossary!

Data Lake vs Data Warehouse

Data Lake* is not a very common term, especially in our day-to-day. It actually often gets confused with a similar-sounding solution called a “Data Warehouse”. Although these terms are similar, the solutions’ use cases are not. And for KYVE, it’s important to be able to tell the difference!

To put it simply:

Data lakes, as mentioned in the previous chapter, provide a vast pool of all types of data that’s easily accessible and adaptable for all users. KYVE’s data lake takes this concept to the next level by enabling developers to source both Web2 and Web3 data, while also ensuring it’s fully correct and easily accessible in a decentralized way.

How does it work? The KYVE data lake fetches any type of data, brings it into what we call a “storage pool*” (each set of data has its own pool, for example, data on the blockchain Solana has a Solana storage pool), stores the data onto storage backends, then validates the data in a decentralized way. Ater this process, anyone can easily access the valid data by requesting the access point from KYVE for free. Get more details in the KYVE Fundamentals course.

Who uses a data lake? In the Web2 world, data lakes are often used by data analysts and engineers. We can expect the same for KYVE, but more specifically, developers need this data to properly build and scale their dApps/blockchains. In order to access this data, users can use solutions like KYVE’s Data Pipeline*.

Data warehouses, on the other hand, contain multiple databases for storing already structured, filtered data for specific analytic purposes. For example, analyzing business insights.

Who uses data warehouses? In Web2, they’re often used for more operational corporate analyses, using it as their core intelligence center for reporting company metrics.

However, with a very segregated data storage structure, data warehouses bring some limitations… Since the data is already filtered into segregated sections, access in warehouses can be quite limited and or time-consuming. Also, managing and or altering this data can be very challenging as the data has already been structured in a certain way.

Overall, data warehouses aren’t particularly user-friendly for Web3 developers who require constant access to flexible data in order to appropriately fit their applications.

Now you know, both solutions are important, but NOT interchangeable. And for KYVE’s situation, a data lake is much more suitable!

  • Run into a few words you weren’t aware of? The words marked with * in this article are defined in KYVE’s Glossary!

What is an ELT Pipeline

KYVE’s Data Pipeline is an ELT pipeline for data sourcing. But what exactly does that mean? Let’s break it down…

A quite common way for developers, analysts, and more source and implement data is via a set of processes called ETL (extract, transform, load). They extract data from a source, transform it into the format they need, then load it onto their storage backend to be able to work with it. 

However, this order of processes creates limitations for its users, seeing that they need to transform the data before loading it onto their data backends to use it, meaning that if they need to transform it into something else, they would need to start over the entire process.

Since KYVE’s data lake stores and validates all types of raw data that can be used in many different ways, it just makes sense to go for an ELT* (extract, load, transform) approach. This allows users to keep the original data in their database and transform it in however many ways they need, making it very flexible for different use cases.

Example: Let’s imagine a developer that wants to build a weather forecasting app that displays local weather data. The developer needs to source raw weather data from various sources such as weather APIs, satellite data, and ground-based weather stations. The raw data comes in different formats such as JSON, XML, and CSV. 

By transforming the raw data into a standardized format, the developer can make sure that the weather data is accurate and consistent, and that the app can be easily updated with new data as it becomes available.

How exactly does the Data Pipeline work? 

KYVE actually leverages Airbyte, an open-source data-integration platform where you can easily implement a path for directing data from one place to another. You can imagine Data Pipeline as a custom plug-in source of KYVE’s data into your data management platforms. 

The process of bringing KYVE’s data onto your storage platform requires no coding, just a few simple clicks! Making the process of accessing and using trustless data that much easier. 

  • Want to know more about KYVE’s Data Pipeline? Read into this article.
  • Want to learn how to use KYVE’s Data Pipeline? Be sure to take our course on the KYVE Network.
  • Run into a few words you weren’t aware of? The words marked with * in this article are defined in KYVE’s Glossary!

What is an Oracle

An oracle* acts as a bridge between the blockchain and the outside world, allowing smart contracts to access data and trigger actions in response to real-world events. Oracles are necessary in decentralized applications, as they provide the necessary information for smart contracts to execute automatically and transparently without relying on centralized intermediaries.

Most oracles have a centralized structure, meaning one entity that fetches and provides the data. This can allow the oracle to be very fast, since only one entity needs to make decisions when going through the data. However, this also means there’s only one entity that needs to be taken over in order to hack the entire oracle and disrupt the data flow.

Example: Just last year, a crypto trading platform on Solana called Mango Markets had a targeted attack on their own token. Hackers were able to artificially raise the price of the $MNGO token by as much as 5-10x the original price by manipulating the pricing data in two main oracles. 

These two oracles then provided the incorrect pricing data to their clients — many different crypto exchange platforms  — where the hacker was able to then take full advantage and sell their “high-priced” $MNGO. 

Although it was the oracles’ data that was manipulated, Mango Markets claimed that the oracles were working just as they should, seeing that they were reporting the price logged in the data they were collecting. However, this hack could have been easily avoided if these oracles were validating the data in a decentralized way before providing it to exchange platforms. 

This is why KYVE aims to solve the oracle problem by providing validation, as well as releasing its own decentralized oracle specifically built for bridging data between blockchains. Enabling concise data sourcing while also maintaining a trustless, secure environment. 

Not only this, but there is a lot more to discover in regards to the potential of decentralized oracles, and our team is working hard at bringing forward an innovative, impactful solution. More information about KYVE’s upcoming oracle will be released in due time, be sure to stay tuned!

  • Run into a few words you weren’t aware of? The words marked with * in this article are defined in KYVE’s Glossary!

Conclusion & Resources

Congratulations! You made it through the KYVE Network Course! You are now a KYVE Network pro!

Want to go deeper into the technology of KYVE? Feel free to check these resources: