diff --git a/README.md b/README.md index 6eebb21..3e0405a 100644 --- a/README.md +++ b/README.md @@ -5,36 +5,38 @@ Curated SQL Views and Metrics for the Near Blockchain. What's Near? Learn more [here](https://near.org/) ## Setup + ### Prerequisites 1. Complete the steps in the [Data Curator Onboarding Guide](https://docs.metricsdao.xyz/data-curation/data-curator-onboarding). - * Note that the Data Curator Onboarding Guide assumes that you will ask to be added as a contributor to a MetricsDAO project. Ex: https://github.com/MetricsDAO/near_dbt. - * However, if you have not yet been added as a contributor, or you'd like to take an even lower-risk approach, you can always follow the [Fork and Pull Workflow](https://reflectoring.io/github-fork-and-pull/) by forking a copy of the project to which you'd like to contribute to a local copy of the project in your github account. Just make sure to: + - Note that the Data Curator Onboarding Guide assumes that you will ask to be added as a contributor to a MetricsDAO project. Ex: https://github.com/MetricsDAO/near_dbt. + - However, if you have not yet been added as a contributor, or you'd like to take an even lower-risk approach, you can always follow the [Fork and Pull Workflow](https://reflectoring.io/github-fork-and-pull/) by forking a copy of the project to which you'd like to contribute to a local copy of the project in your github account. Just make sure to: - Fork the MetricsDAO repository. - Git clone from your forked repository. Ex: `git clone https://github.com/YourAccount/near_dbt`. - Create a branch for the changes you'd like to make. Ex: `git branch readme-update`. - - Switch to the branch. Ex: `git checkout readme-update`. - - Make your changes on the branch and follow the rest of the steps in the [Fork and Pull Workflow](https://reflectoring.io/github-fork-and-pull/) to notify the MetricsDAO repository owners to review your changes. -2. Download [Docker for Desktop](https://www.docker.com/products/docker-desktop). - * (Optional) You can run the Docker tutorial. + - Switch to the branch. Ex: `git checkout readme-update`. + - Make your changes on the branch and follow the rest of the steps in the [Fork and Pull Workflow](https://reflectoring.io/github-fork-and-pull/) to notify the MetricsDAO repository owners to review your changes. +2. Download [Docker for Desktop](https://www.docker.com/products/docker-desktop). + - (Optional) You can run the Docker tutorial. 3. Install [VSCode](https://code.visualstudio.com/). ### Prerequisites: Additional Windows Subsystem for Linux (WSL) Setup -4. For Windows users, you'll need to install WSL and connect VSCode to WSL by - * Right clicking VSCode and running VSCode as admin. - * Installing [WSL](https://docs.microsoft.com/en-us/windows/wsl/install) by typing `wsl --install` in VScode's terminal. - * Following the rest of the [VSCode WSL instruction](https://code.visualstudio.com/docs/remote/wsl) to create a new WSL user. - * Installing the Remote Development extension (ms-vscode-remote.vscode-remote-extensionpack) in VSCode. - * Finally, restarting VSCode in a directory in which you'd like to work. For example, - - `cd ~/metricsDAO/data_curation/near_dbt` - - `code .` +4. For Windows users, you'll need to install WSL and connect VSCode to WSL by + + Right clicking VSCode and running VSCode as admin. + - Installing [WSL](https://docs.microsoft.com/en-us/windows/wsl/install) by typing `wsl --install` in VScode's terminal. + + Following the rest of the [VSCode WSL instruction](https://code.visualstudio.com/docs/remote/wsl) to create a new WSL user. + + Installing the Remote Development extension (ms-vscode-remote.vscode-remote-extensionpack) in VSCode. + - Finally, restarting VSCode in a directory in which you'd like to work. For example, + - `cd ~/metricsDAO/data_curation/near_dbt` + + - `code .` ### Create the Environment Variables 1. Create a `.env` file with the following contents (note `.env` will not be committed to source) in the near_dbt directory (ex: near_dbt/.env): - ``` +``` SF_ACCOUNT=zsniary-metricsdao SF_USERNAME= SF_PASSWORD= @@ -43,9 +45,9 @@ What's Near? Learn more [here](https://near.org/) SF_WAREHOUSE=DEFAULT SF_ROLE=PUBLIC SF_SCHEMA=SILVER - ``` +``` - **Replace** the SF_USERNAME and SF_PASSWORD with the temporary Snowflake user name and password you received in the Snowflake step of the [Data Curator Onboarding Guide](https://docs.metricsdao.xyz/data-curation/data-curator-onboarding). +**Replace** the SF_USERNAME and SF_PASSWORD with the temporary Snowflake user name and password you received in the Snowflake step of the [Data Curator Onboarding Guide](https://docs.metricsdao.xyz/data-curation/data-curator-onboarding). 2. New to DBT? It's pretty dope. Read up on it [here](https://www.getdbt.com/docs/) @@ -53,6 +55,31 @@ What's Near? Learn more [here](https://near.org/) Run the following commands from inside the Near directory (**you must have completed the Setup steps above^^**) +## Variables + +To control which external table environment a model references, as well as, whether a Stream is invoked at runtime using control variables: +* STREAMLINE_INVOKE_STREAMS +When True, invokes streamline on model run as normal +When False, NO-OP +* STREAMLINE_USE_DEV_FOR_EXTERNAL_TABLES +When True, uses DEV schema Streamline. Ethereum_DEV +When False, uses PROD schema Streamline. Ethereum + +Default values are False + +* Usage: + `dbt run --var '{"STREAMLINE_USE_DEV_FOR_EXTERNAL_TABLES": True, "STREAMLINE_INVOKE_STREAMS": True}' -m ...` + +To control the creation of UDF or SP macros with dbt run: +* UPDATE_UDFS_AND_SPS +When True, executes all macros included in the on-run-start hooks within dbt_project.yml on model run as normal +When False, none of the on-run-start macros are executed on model run + +Default values are False + +* Usage: + `dbt run --var '{"UPDATE_UDFS_AND_SPS": True}' -m ...` + ### DBT Environment 1. In VSCode's terminal, type `cd near_dbt`. @@ -61,10 +88,10 @@ Run the following commands from inside the Near directory (**you must have compl ### DBT Project Docs -1. In VSCode, open another terminal. +1. In VSCode, open another terminal. 2. In this new terminal, run `make dbt-docs`. This will compile your dbt documentation and launch a web-server at http://localhost:8080 -Documentation is automatically generated and hosted using Netlify. +Documentation is automatically generated and hosted using Netlify. [![Netlify Status](https://api.netlify.com/api/v1/badges/12fc0079-7428-4771-9923-38ee6599db0f/deploy-status)](https://app.netlify.com/sites/mdao-near/deploys) ## Project Overview @@ -94,12 +121,12 @@ Blocks and transactions are fed into the above two Near tables utilizing the Cha | Column | Type | Description | | --------------- | ------------ | ---------------------------------------------------------------- | | record_id | VARCHAR | A unique id for the record generated by Chainwalkers | -| offset_id | NUMBER(38,0) | Synonmous with block_id for Near | -| block_id | NUMBER(38,0) | The height of the chain this block corresponds with | +| offset_id | NUMBER(38, 0) | Synonmous with block_id for Near | +| block_id | NUMBER(38, 0) | The height of the chain this block corresponds with | | block_timestamp | TIMESTAMP | The time the block was minted | | network | VARCHAR | The blockchain network (i.e. mainnet, testnet, etc.) | | chain_id | VARCHAR | Synonmous with blockchain name for Near | -| tx_count | NUMBER(38,0) | The number of transactions in the block | +| tx_count | NUMBER(38, 0) | The number of transactions in the block | | header | json variant | A json queryable column containing the blocks header information | | ingested_at | TIMESTAMP | The time this data was ingested into the table by Snowflake | @@ -109,13 +136,13 @@ Blocks and transactions are fed into the above two Near tables utilizing the Cha | --------------- | ------------ | ---------------------------------------------------------------------- | | record_id | VARCHAR | A unique id for the record generated by Chainwalkers | | tx_id | VARCHAR | A unique on chain identifier for the transaction | -| tx_block_index | NUMBER(38,0) | The index of the transaction within the block. Starts at 0. | -| offset_id | NUMBER(38,0) | Synonmous with block_id for Near | -| block_id | NUMBER(38,0) | The height of the chain this block corresponds with | +| tx_block_index | NUMBER(38, 0) | The index of the transaction within the block. Starts at 0. | +| offset_id | NUMBER(38, 0) | Synonmous with block_id for Near | +| block_id | NUMBER(38, 0) | The height of the chain this block corresponds with | | block_timestamp | TIMESTAMP | The time the block was minted | | network | VARCHAR | The blockchain network (i.e. mainnet, testnet, etc.) | | chain_id | VARCHAR | Synonmous with blockchain name for Near | -| tx_count | NUMBER(38,0) | The number of transactions in the block | +| tx_count | NUMBER(38, 0) | The number of transactions in the block | | header | json variant | A json queryable column containing the blocks header information | | tx | array | An array of json queryable objects containing each tx and decoded logs | | ingested_at | TIMESTAMP | The time this data was ingested into the table by Snowflake | @@ -124,7 +151,7 @@ Blocks and transactions are fed into the above two Near tables utilizing the Cha Data in this DBT project is written to the `NEAR` database in MetricsDAO. -This database has 2 schemas, one for `DEV` and one for `PROD`. As a contributer you have full permission to write to the `DEV` schema. However the `PROD` schema can only be written to by Metric DAO's DBT Cloud account. The DBT Cloud account controls running / scheduling models against the `PROD` schema. +This database has 2 schemas, one for `DEV` and one for `PROD` . As a contributer you have full permission to write to the `DEV` schema. However the `PROD` schema can only be written to by Metric DAO's DBT Cloud account. The DBT Cloud account controls running / scheduling models against the `PROD` schema. ## Branching / PRs @@ -138,6 +165,6 @@ When creating a PR please include the following details in the PR description: ## More DBT Resources: -- Learn more about dbt [in the docs](https://docs.getdbt.com/docs/introduction) -- Check out [Discourse](https://discourse.getdbt.com/) for commonly asked questions and answers -- Check out [the blog](https://blog.getdbt.com/) for the latest news on dbt's development and best practices +* Learn more about dbt [in the docs](https://docs.getdbt.com/docs/introduction) +* Check out [Discourse](https://discourse.getdbt.com/) for commonly asked questions and answers +* Check out [the blog](https://blog.getdbt.com/) for the latest news on dbt's development and best practices diff --git a/dbt_project.yml b/dbt_project.yml index fb71a49..6fd4d82 100644 --- a/dbt_project.yml +++ b/dbt_project.yml @@ -25,6 +25,7 @@ clean-targets: # directories to be removed by `dbt clean` - "dbt_packages" on-run-start: + - "{{ create_udfs() }}" - "{{create_sps()}}" - "{{create_get_nearblocks_fts()}}" @@ -51,3 +52,6 @@ tests: vars: "dbt_date:time_zone": GMT + STREAMLINE_INVOKE_STREAMS: False + STREAMLINE_USE_DEV_FOR_EXTERNAL_TABLES: False + UPDATE_UDFS_AND_SPS: False diff --git a/macros/create_udfs.sql b/macros/create_udfs.sql new file mode 100644 index 0000000..0df2c28 --- /dev/null +++ b/macros/create_udfs.sql @@ -0,0 +1,17 @@ +{% macro create_udfs() %} + {% if var("UPDATE_UDFS_AND_SPS") %} + {% if target.database != "NEAR_COMMUNITY_DEV" %} + {% set sql %} + CREATE schema if NOT EXISTS silver; +CREATE schema if NOT EXISTS streamline; +{{ create_udf_introspect() }} + {{ create_udf_s3_list_directories() }} + {{ create_udf_s3_list_objects() }} + {{ create_udf_s3_copy_objects() }} + {{ create_udf_s3_copy_objects_overwrite() }} + + {% endset %} + {% do run_query(sql) %} + {% endif %} + {% endif %} +{% endmacro %} diff --git a/macros/streamline_udfs.sql b/macros/streamline_udfs.sql new file mode 100644 index 0000000..6669255 --- /dev/null +++ b/macros/streamline_udfs.sql @@ -0,0 +1,62 @@ +{% macro create_udf_introspect() %} + CREATE + OR REPLACE EXTERNAL FUNCTION streamline.udf_introspect( + echo STRING + ) returns text api_integration = aws_stg_us_east_1_api max_batch_rows = 10 AS {% if target.name == "prod" %} + 'https://jfqhk99kj1.execute-api.us-east-1.amazonaws.com/stg/s3/introspect' + {% else %} + 'https://jfqhk99kj1.execute-api.us-east-1.amazonaws.com/stg/s3/introspect' + {%- endif %}; +{% endmacro %} + +{% macro create_udf_s3_list_directories() %} + CREATE + OR REPLACE EXTERNAL FUNCTION streamline.udf_s3_list_directories( + path STRING + ) returns ARRAY api_integration = aws_stg_us_east_1_api max_batch_rows = 10 AS {% if target.name == "prod" %} + 'https://jfqhk99kj1.execute-api.us-east-1.amazonaws.com/stg/s3/list_directories' + {% else %} + 'https://jfqhk99kj1.execute-api.us-east-1.amazonaws.com/stg/s3/list_directories' + {%- endif %}; +{% endmacro %} + +{% macro create_udf_s3_list_objects() %} + CREATE + OR REPLACE EXTERNAL FUNCTION streamline.udf_s3_list_objects( + path STRING + ) returns ARRAY api_integration = aws_stg_us_east_1_api max_batch_rows = 10 AS {% if target.name == "prod" %} + 'https://jfqhk99kj1.execute-api.us-east-1.amazonaws.com/stg/s3/list_objects' + {% else %} + 'https://jfqhk99kj1.execute-api.us-east-1.amazonaws.com/stg/s3/list_objects' + {%- endif %}; +{% endmacro %} + +{% macro create_udf_s3_copy_objects() %} + CREATE + OR REPLACE EXTERNAL FUNCTION streamline.udf_s3_copy_objects( + paths ARRAY, + source STRING, + target STRING + ) returns ARRAY api_integration = aws_stg_us_east_1_api headers = ( + 'overwrite' = '1' + ) max_batch_rows = 1 AS {% if target.name == "prod" %} + 'https://jfqhk99kj1.execute-api.us-east-1.amazonaws.com/stg/s3/copy_objects' + {% else %} + 'https://jfqhk99kj1.execute-api.us-east-1.amazonaws.com/stg/s3/copy_objects' + {%- endif %}; +{% endmacro %} + +{% macro create_udf_s3_copy_objects_overwrite() %} + CREATE + OR REPLACE EXTERNAL FUNCTION streamline.udf_s3_objects_overwrite( + paths ARRAY, + source STRING, + target STRING + ) returns ARRAY api_integration = aws_stg_us_east_1_api headers = ( + 'overwrite' = '1' + ) max_batch_rows = 1 AS {% if target.name == "prod" %} + 'https://jfqhk99kj1.execute-api.us-east-1.amazonaws.com/stg/s3/copy_objects' + {% else %} + 'https://jfqhk99kj1.execute-api.us-east-1.amazonaws.com/stg/s3/copy_objects' + {%- endif %}; +{% endmacro %} diff --git a/models/silver/streamline/streamline__s3_sync.sql b/models/silver/streamline/streamline__s3_sync.sql new file mode 100644 index 0000000..af70d44 --- /dev/null +++ b/models/silver/streamline/streamline__s3_sync.sql @@ -0,0 +1,44 @@ +{{ config( + materialized = 'view', + tags = ['streamline'], + post_hook = "select * from {{this.schema}}.{{this.identifier}}", + comment = "incrementally sync Streamline.near_dev.blocks and Streamline.near_dev.shards" +) }} + +WITH latest_block AS ( + + SELECT + MAX(SPLIT_PART(file_name, '/', 1)) AS block_id + FROM + TABLE( + information_schema.external_table_file_registration_history( + table_name => 'streamline.near_dev.blocks', + start_time => DATEADD('hour', {{ var("S3_LOOKBACK_HOURS", -2) }}, SYSDATE())) + ) + WHERE + operation_status = 'REGISTERED_NEW' + ), + blocks AS ( + -- Look back 1000 blocks from most max block_id and count forward 4000 + SELECT + ROW_NUMBER() over ( + ORDER BY + SEQ4() + ) AS series, + b.block_id :: INTEGER - 1000 + series AS new_block_id, + RIGHT(REPEAT('0', 12) || new_block_id :: STRING, 12) AS prefix + FROM + TABLE(GENERATOR(rowcount => 4000)), + latest_block b + ) + SELECT + b.prefix, + streamline.udf_s3_copy_objects( + streamline.udf_s3_list_objects( + 's3://near-lake-data-mainnet/' || b.prefix || '/*' + ), + 's3://near-lake-data-mainnet/', + 's3://stg-us-east-1-serverless-near-lake-mainnet-fsc/' + ) + FROM + blocks b