ob_dbt_pipeline

Go to file

artin 80ae6d2d63 Update README.md		2024-06-18 07:02:36 +00:00
airbyte	added readmes	2024-06-17 12:51:10 -07:00
dbt/ob_pipeline	added readmes	2024-06-17 13:04:39 -07:00
open-metadata	added readmes	2024-06-17 12:51:10 -07:00
OB Pipeline Diagram.png	added readmes	2024-06-17 12:40:49 -07:00
README.md	Update README.md	2024-06-18 07:02:36 +00:00

README.md

OpenBlocks Takehome

Summary

This is a quickly constructed setup for a mature method to consume data from various sources and present the data to any team within the org.
There is automated consumers of data, an observability and discoverability stack, transformation via the industry standard dbt, and a modular enough design to be scalable.

A diagram can be found at: OB Pipeline Diagram.png

Structure

├── README.md
├── dbt                   # the dbt project
│   └── models
│       └── staging       # staging models
├── models         
│   └── staging           # staging models
├── airbyte                # the airbyte quickstart
├── open-metadata         # the open-metadata info 
├── OB Pipeline Diagrama
└── README.me             # this file

## Notes on flexibility ##

This proposed model can immediately work for any Chain that a node is avaiable.  As a test, you can tie in the cloudflare ethereum node via airbyte, have it continously bring in live data whilst also backfilling blocks until genesis.

The dbt job to roll up by day or by block is a trivial change to the existing sql for the models we have now.

Airflow can be used to perform interesting tasks, like tracking certain wallets and/or finding transactions withing certain thresholds.

Open-Metadata can offer quick glance across all the datasets, imagine 5 or 10 chains (even more) at showing the health of the api endpoints at once, as well as quickly identifying any possible holes in the data.

Open-Metadata also acts as an internal Data Mart allwoing discourse regarding the data, tasks associated with the data (tasks for the data team) and sharing of queries, all in the a logical location, no need to click around various sites.