mirror of
https://github.com/FlipsideCrypto/near-models.git
synced 2026-02-06 13:56:44 +00:00
- add incremental model for syncing near lake s3 bucket
- add s3 udfs
This commit is contained in:
parent
107f0e519f
commit
e70f097e52
87
README.md
87
README.md
@ -5,36 +5,38 @@ Curated SQL Views and Metrics for the Near Blockchain.
|
||||
What's Near? Learn more [here](https://near.org/)
|
||||
|
||||
## Setup
|
||||
|
||||
### Prerequisites
|
||||
|
||||
1. Complete the steps in the [Data Curator Onboarding Guide](https://docs.metricsdao.xyz/data-curation/data-curator-onboarding).
|
||||
* Note that the Data Curator Onboarding Guide assumes that you will ask to be added as a contributor to a MetricsDAO project. Ex: https://github.com/MetricsDAO/near_dbt.
|
||||
* However, if you have not yet been added as a contributor, or you'd like to take an even lower-risk approach, you can always follow the [Fork and Pull Workflow](https://reflectoring.io/github-fork-and-pull/) by forking a copy of the project to which you'd like to contribute to a local copy of the project in your github account. Just make sure to:
|
||||
- Note that the Data Curator Onboarding Guide assumes that you will ask to be added as a contributor to a MetricsDAO project. Ex: https://github.com/MetricsDAO/near_dbt.
|
||||
- However, if you have not yet been added as a contributor, or you'd like to take an even lower-risk approach, you can always follow the [Fork and Pull Workflow](https://reflectoring.io/github-fork-and-pull/) by forking a copy of the project to which you'd like to contribute to a local copy of the project in your github account. Just make sure to:
|
||||
- Fork the MetricsDAO repository.
|
||||
- Git clone from your forked repository. Ex: `git clone https://github.com/YourAccount/near_dbt`.
|
||||
- Create a branch for the changes you'd like to make. Ex: `git branch readme-update`.
|
||||
- Switch to the branch. Ex: `git checkout readme-update`.
|
||||
- Make your changes on the branch and follow the rest of the steps in the [Fork and Pull Workflow](https://reflectoring.io/github-fork-and-pull/) to notify the MetricsDAO repository owners to review your changes.
|
||||
2. Download [Docker for Desktop](https://www.docker.com/products/docker-desktop).
|
||||
* (Optional) You can run the Docker tutorial.
|
||||
- Switch to the branch. Ex: `git checkout readme-update`.
|
||||
- Make your changes on the branch and follow the rest of the steps in the [Fork and Pull Workflow](https://reflectoring.io/github-fork-and-pull/) to notify the MetricsDAO repository owners to review your changes.
|
||||
2. Download [Docker for Desktop](https://www.docker.com/products/docker-desktop).
|
||||
- (Optional) You can run the Docker tutorial.
|
||||
3. Install [VSCode](https://code.visualstudio.com/).
|
||||
|
||||
### Prerequisites: Additional Windows Subsystem for Linux (WSL) Setup
|
||||
|
||||
4. For Windows users, you'll need to install WSL and connect VSCode to WSL by
|
||||
* Right clicking VSCode and running VSCode as admin.
|
||||
* Installing [WSL](https://docs.microsoft.com/en-us/windows/wsl/install) by typing `wsl --install` in VScode's terminal.
|
||||
* Following the rest of the [VSCode WSL instruction](https://code.visualstudio.com/docs/remote/wsl) to create a new WSL user.
|
||||
* Installing the Remote Development extension (ms-vscode-remote.vscode-remote-extensionpack) in VSCode.
|
||||
* Finally, restarting VSCode in a directory in which you'd like to work. For example,
|
||||
- `cd ~/metricsDAO/data_curation/near_dbt`
|
||||
- `code .`
|
||||
4. For Windows users, you'll need to install WSL and connect VSCode to WSL by
|
||||
+ Right clicking VSCode and running VSCode as admin.
|
||||
- Installing [WSL](https://docs.microsoft.com/en-us/windows/wsl/install) by typing `wsl --install` in VScode's terminal.
|
||||
+ Following the rest of the [VSCode WSL instruction](https://code.visualstudio.com/docs/remote/wsl) to create a new WSL user.
|
||||
+ Installing the Remote Development extension (ms-vscode-remote.vscode-remote-extensionpack) in VSCode.
|
||||
- Finally, restarting VSCode in a directory in which you'd like to work. For example,
|
||||
- `cd ~/metricsDAO/data_curation/near_dbt`
|
||||
|
||||
- `code .`
|
||||
|
||||
### Create the Environment Variables
|
||||
|
||||
1. Create a `.env` file with the following contents (note `.env` will not be committed to source) in the near_dbt directory (ex: near_dbt/.env):
|
||||
|
||||
```
|
||||
```
|
||||
SF_ACCOUNT=zsniary-metricsdao
|
||||
SF_USERNAME=<your_metrics_dao_snowflake_username>
|
||||
SF_PASSWORD=<your_metrics_dao_snowflake_password>
|
||||
@ -43,9 +45,9 @@ What's Near? Learn more [here](https://near.org/)
|
||||
SF_WAREHOUSE=DEFAULT
|
||||
SF_ROLE=PUBLIC
|
||||
SF_SCHEMA=SILVER
|
||||
```
|
||||
```
|
||||
|
||||
**Replace** the SF_USERNAME and SF_PASSWORD with the temporary Snowflake user name and password you received in the Snowflake step of the [Data Curator Onboarding Guide](https://docs.metricsdao.xyz/data-curation/data-curator-onboarding).
|
||||
**Replace** the SF_USERNAME and SF_PASSWORD with the temporary Snowflake user name and password you received in the Snowflake step of the [Data Curator Onboarding Guide](https://docs.metricsdao.xyz/data-curation/data-curator-onboarding).
|
||||
|
||||
2. New to DBT? It's pretty dope. Read up on it [here](https://www.getdbt.com/docs/)
|
||||
|
||||
@ -53,6 +55,31 @@ What's Near? Learn more [here](https://near.org/)
|
||||
|
||||
Run the following commands from inside the Near directory (**you must have completed the Setup steps above^^**)
|
||||
|
||||
## Variables
|
||||
|
||||
To control which external table environment a model references, as well as, whether a Stream is invoked at runtime using control variables:
|
||||
* STREAMLINE_INVOKE_STREAMS
|
||||
When True, invokes streamline on model run as normal
|
||||
When False, NO-OP
|
||||
* STREAMLINE_USE_DEV_FOR_EXTERNAL_TABLES
|
||||
When True, uses DEV schema Streamline. Ethereum_DEV
|
||||
When False, uses PROD schema Streamline. Ethereum
|
||||
|
||||
Default values are False
|
||||
|
||||
* Usage:
|
||||
`dbt run --var '{"STREAMLINE_USE_DEV_FOR_EXTERNAL_TABLES": True, "STREAMLINE_INVOKE_STREAMS": True}' -m ...`
|
||||
|
||||
To control the creation of UDF or SP macros with dbt run:
|
||||
* UPDATE_UDFS_AND_SPS
|
||||
When True, executes all macros included in the on-run-start hooks within dbt_project.yml on model run as normal
|
||||
When False, none of the on-run-start macros are executed on model run
|
||||
|
||||
Default values are False
|
||||
|
||||
* Usage:
|
||||
`dbt run --var '{"UPDATE_UDFS_AND_SPS": True}' -m ...`
|
||||
|
||||
### DBT Environment
|
||||
|
||||
1. In VSCode's terminal, type `cd near_dbt`.
|
||||
@ -61,10 +88,10 @@ Run the following commands from inside the Near directory (**you must have compl
|
||||
|
||||
### DBT Project Docs
|
||||
|
||||
1. In VSCode, open another terminal.
|
||||
1. In VSCode, open another terminal.
|
||||
2. In this new terminal, run `make dbt-docs`. This will compile your dbt documentation and launch a web-server at http://localhost:8080
|
||||
|
||||
Documentation is automatically generated and hosted using Netlify.
|
||||
Documentation is automatically generated and hosted using Netlify.
|
||||
[](https://app.netlify.com/sites/mdao-near/deploys)
|
||||
|
||||
## Project Overview
|
||||
@ -94,12 +121,12 @@ Blocks and transactions are fed into the above two Near tables utilizing the Cha
|
||||
| Column | Type | Description |
|
||||
| --------------- | ------------ | ---------------------------------------------------------------- |
|
||||
| record_id | VARCHAR | A unique id for the record generated by Chainwalkers |
|
||||
| offset_id | NUMBER(38,0) | Synonmous with block_id for Near |
|
||||
| block_id | NUMBER(38,0) | The height of the chain this block corresponds with |
|
||||
| offset_id | NUMBER(38, 0) | Synonmous with block_id for Near |
|
||||
| block_id | NUMBER(38, 0) | The height of the chain this block corresponds with |
|
||||
| block_timestamp | TIMESTAMP | The time the block was minted |
|
||||
| network | VARCHAR | The blockchain network (i.e. mainnet, testnet, etc.) |
|
||||
| chain_id | VARCHAR | Synonmous with blockchain name for Near |
|
||||
| tx_count | NUMBER(38,0) | The number of transactions in the block |
|
||||
| tx_count | NUMBER(38, 0) | The number of transactions in the block |
|
||||
| header | json variant | A json queryable column containing the blocks header information |
|
||||
| ingested_at | TIMESTAMP | The time this data was ingested into the table by Snowflake |
|
||||
|
||||
@ -109,13 +136,13 @@ Blocks and transactions are fed into the above two Near tables utilizing the Cha
|
||||
| --------------- | ------------ | ---------------------------------------------------------------------- |
|
||||
| record_id | VARCHAR | A unique id for the record generated by Chainwalkers |
|
||||
| tx_id | VARCHAR | A unique on chain identifier for the transaction |
|
||||
| tx_block_index | NUMBER(38,0) | The index of the transaction within the block. Starts at 0. |
|
||||
| offset_id | NUMBER(38,0) | Synonmous with block_id for Near |
|
||||
| block_id | NUMBER(38,0) | The height of the chain this block corresponds with |
|
||||
| tx_block_index | NUMBER(38, 0) | The index of the transaction within the block. Starts at 0. |
|
||||
| offset_id | NUMBER(38, 0) | Synonmous with block_id for Near |
|
||||
| block_id | NUMBER(38, 0) | The height of the chain this block corresponds with |
|
||||
| block_timestamp | TIMESTAMP | The time the block was minted |
|
||||
| network | VARCHAR | The blockchain network (i.e. mainnet, testnet, etc.) |
|
||||
| chain_id | VARCHAR | Synonmous with blockchain name for Near |
|
||||
| tx_count | NUMBER(38,0) | The number of transactions in the block |
|
||||
| tx_count | NUMBER(38, 0) | The number of transactions in the block |
|
||||
| header | json variant | A json queryable column containing the blocks header information |
|
||||
| tx | array | An array of json queryable objects containing each tx and decoded logs |
|
||||
| ingested_at | TIMESTAMP | The time this data was ingested into the table by Snowflake |
|
||||
@ -124,7 +151,7 @@ Blocks and transactions are fed into the above two Near tables utilizing the Cha
|
||||
|
||||
Data in this DBT project is written to the `NEAR` database in MetricsDAO.
|
||||
|
||||
This database has 2 schemas, one for `DEV` and one for `PROD`. As a contributer you have full permission to write to the `DEV` schema. However the `PROD` schema can only be written to by Metric DAO's DBT Cloud account. The DBT Cloud account controls running / scheduling models against the `PROD` schema.
|
||||
This database has 2 schemas, one for `DEV` and one for `PROD` . As a contributer you have full permission to write to the `DEV` schema. However the `PROD` schema can only be written to by Metric DAO's DBT Cloud account. The DBT Cloud account controls running / scheduling models against the `PROD` schema.
|
||||
|
||||
## Branching / PRs
|
||||
|
||||
@ -138,6 +165,6 @@ When creating a PR please include the following details in the PR description:
|
||||
|
||||
## More DBT Resources:
|
||||
|
||||
- Learn more about dbt [in the docs](https://docs.getdbt.com/docs/introduction)
|
||||
- Check out [Discourse](https://discourse.getdbt.com/) for commonly asked questions and answers
|
||||
- Check out [the blog](https://blog.getdbt.com/) for the latest news on dbt's development and best practices
|
||||
* Learn more about dbt [in the docs](https://docs.getdbt.com/docs/introduction)
|
||||
* Check out [Discourse](https://discourse.getdbt.com/) for commonly asked questions and answers
|
||||
* Check out [the blog](https://blog.getdbt.com/) for the latest news on dbt's development and best practices
|
||||
|
||||
@ -25,6 +25,7 @@ clean-targets: # directories to be removed by `dbt clean`
|
||||
- "dbt_packages"
|
||||
|
||||
on-run-start:
|
||||
- "{{ create_udfs() }}"
|
||||
- "{{create_sps()}}"
|
||||
- "{{create_get_nearblocks_fts()}}"
|
||||
|
||||
@ -51,3 +52,6 @@ tests:
|
||||
|
||||
vars:
|
||||
"dbt_date:time_zone": GMT
|
||||
STREAMLINE_INVOKE_STREAMS: False
|
||||
STREAMLINE_USE_DEV_FOR_EXTERNAL_TABLES: False
|
||||
UPDATE_UDFS_AND_SPS: False
|
||||
|
||||
17
macros/create_udfs.sql
Normal file
17
macros/create_udfs.sql
Normal file
@ -0,0 +1,17 @@
|
||||
{% macro create_udfs() %}
|
||||
{% if var("UPDATE_UDFS_AND_SPS") %}
|
||||
{% if target.database != "NEAR_COMMUNITY_DEV" %}
|
||||
{% set sql %}
|
||||
CREATE schema if NOT EXISTS silver;
|
||||
CREATE schema if NOT EXISTS streamline;
|
||||
{{ create_udf_introspect() }}
|
||||
{{ create_udf_s3_list_directories() }}
|
||||
{{ create_udf_s3_list_objects() }}
|
||||
{{ create_udf_s3_copy_objects() }}
|
||||
{{ create_udf_s3_copy_objects_overwrite() }}
|
||||
|
||||
{% endset %}
|
||||
{% do run_query(sql) %}
|
||||
{% endif %}
|
||||
{% endif %}
|
||||
{% endmacro %}
|
||||
62
macros/streamline_udfs.sql
Normal file
62
macros/streamline_udfs.sql
Normal file
@ -0,0 +1,62 @@
|
||||
{% macro create_udf_introspect() %}
|
||||
CREATE
|
||||
OR REPLACE EXTERNAL FUNCTION streamline.udf_introspect(
|
||||
echo STRING
|
||||
) returns text api_integration = aws_stg_us_east_1_api max_batch_rows = 10 AS {% if target.name == "prod" %}
|
||||
'https://jfqhk99kj1.execute-api.us-east-1.amazonaws.com/stg/s3/introspect'
|
||||
{% else %}
|
||||
'https://jfqhk99kj1.execute-api.us-east-1.amazonaws.com/stg/s3/introspect'
|
||||
{%- endif %};
|
||||
{% endmacro %}
|
||||
|
||||
{% macro create_udf_s3_list_directories() %}
|
||||
CREATE
|
||||
OR REPLACE EXTERNAL FUNCTION streamline.udf_s3_list_directories(
|
||||
path STRING
|
||||
) returns ARRAY api_integration = aws_stg_us_east_1_api max_batch_rows = 10 AS {% if target.name == "prod" %}
|
||||
'https://jfqhk99kj1.execute-api.us-east-1.amazonaws.com/stg/s3/list_directories'
|
||||
{% else %}
|
||||
'https://jfqhk99kj1.execute-api.us-east-1.amazonaws.com/stg/s3/list_directories'
|
||||
{%- endif %};
|
||||
{% endmacro %}
|
||||
|
||||
{% macro create_udf_s3_list_objects() %}
|
||||
CREATE
|
||||
OR REPLACE EXTERNAL FUNCTION streamline.udf_s3_list_objects(
|
||||
path STRING
|
||||
) returns ARRAY api_integration = aws_stg_us_east_1_api max_batch_rows = 10 AS {% if target.name == "prod" %}
|
||||
'https://jfqhk99kj1.execute-api.us-east-1.amazonaws.com/stg/s3/list_objects'
|
||||
{% else %}
|
||||
'https://jfqhk99kj1.execute-api.us-east-1.amazonaws.com/stg/s3/list_objects'
|
||||
{%- endif %};
|
||||
{% endmacro %}
|
||||
|
||||
{% macro create_udf_s3_copy_objects() %}
|
||||
CREATE
|
||||
OR REPLACE EXTERNAL FUNCTION streamline.udf_s3_copy_objects(
|
||||
paths ARRAY,
|
||||
source STRING,
|
||||
target STRING
|
||||
) returns ARRAY api_integration = aws_stg_us_east_1_api headers = (
|
||||
'overwrite' = '1'
|
||||
) max_batch_rows = 1 AS {% if target.name == "prod" %}
|
||||
'https://jfqhk99kj1.execute-api.us-east-1.amazonaws.com/stg/s3/copy_objects'
|
||||
{% else %}
|
||||
'https://jfqhk99kj1.execute-api.us-east-1.amazonaws.com/stg/s3/copy_objects'
|
||||
{%- endif %};
|
||||
{% endmacro %}
|
||||
|
||||
{% macro create_udf_s3_copy_objects_overwrite() %}
|
||||
CREATE
|
||||
OR REPLACE EXTERNAL FUNCTION streamline.udf_s3_objects_overwrite(
|
||||
paths ARRAY,
|
||||
source STRING,
|
||||
target STRING
|
||||
) returns ARRAY api_integration = aws_stg_us_east_1_api headers = (
|
||||
'overwrite' = '1'
|
||||
) max_batch_rows = 1 AS {% if target.name == "prod" %}
|
||||
'https://jfqhk99kj1.execute-api.us-east-1.amazonaws.com/stg/s3/copy_objects'
|
||||
{% else %}
|
||||
'https://jfqhk99kj1.execute-api.us-east-1.amazonaws.com/stg/s3/copy_objects'
|
||||
{%- endif %};
|
||||
{% endmacro %}
|
||||
44
models/silver/streamline/streamline__s3_sync.sql
Normal file
44
models/silver/streamline/streamline__s3_sync.sql
Normal file
@ -0,0 +1,44 @@
|
||||
{{ config(
|
||||
materialized = 'view',
|
||||
tags = ['streamline'],
|
||||
post_hook = "select * from {{this.schema}}.{{this.identifier}}",
|
||||
comment = "incrementally sync Streamline.near_dev.blocks and Streamline.near_dev.shards"
|
||||
) }}
|
||||
|
||||
WITH latest_block AS (
|
||||
|
||||
SELECT
|
||||
MAX(SPLIT_PART(file_name, '/', 1)) AS block_id
|
||||
FROM
|
||||
TABLE(
|
||||
information_schema.external_table_file_registration_history(
|
||||
table_name => 'streamline.near_dev.blocks',
|
||||
start_time => DATEADD('hour', {{ var("S3_LOOKBACK_HOURS", -2) }}, SYSDATE()))
|
||||
)
|
||||
WHERE
|
||||
operation_status = 'REGISTERED_NEW'
|
||||
),
|
||||
blocks AS (
|
||||
-- Look back 1000 blocks from most max block_id and count forward 4000
|
||||
SELECT
|
||||
ROW_NUMBER() over (
|
||||
ORDER BY
|
||||
SEQ4()
|
||||
) AS series,
|
||||
b.block_id :: INTEGER - 1000 + series AS new_block_id,
|
||||
RIGHT(REPEAT('0', 12) || new_block_id :: STRING, 12) AS prefix
|
||||
FROM
|
||||
TABLE(GENERATOR(rowcount => 4000)),
|
||||
latest_block b
|
||||
)
|
||||
SELECT
|
||||
b.prefix,
|
||||
streamline.udf_s3_copy_objects(
|
||||
streamline.udf_s3_list_objects(
|
||||
's3://near-lake-data-mainnet/' || b.prefix || '/*'
|
||||
),
|
||||
's3://near-lake-data-mainnet/',
|
||||
's3://stg-us-east-1-serverless-near-lake-mainnet-fsc/'
|
||||
)
|
||||
FROM
|
||||
blocks b
|
||||
Loading…
Reference in New Issue
Block a user