mirror of
https://github.com/FlipsideCrypto/analytics-workflow-templates.git
synced 2026-02-06 11:17:52 +00:00
add some base cursor rules
This commit is contained in:
parent
1ffd146c6b
commit
3dfb150555
9
.cursor/mcp.json
Normal file
9
.cursor/mcp.json
Normal file
@ -0,0 +1,9 @@
|
||||
{
|
||||
"mcpServers": {
|
||||
"datamate": {
|
||||
"url": "http://localhost:7701/sse",
|
||||
"type": "sse",
|
||||
"updatedAt": "2025-07-07T21:44:10.464Z"
|
||||
}
|
||||
}
|
||||
}
|
||||
158
.cursor/rules/dbt-documentation-standards.mdc
Normal file
158
.cursor/rules/dbt-documentation-standards.mdc
Normal file
@ -0,0 +1,158 @@
|
||||
---
|
||||
description:
|
||||
globs: models/descriptions/*,*.yml,models/gold/**/*.sql
|
||||
alwaysApply: false
|
||||
---
|
||||
# dbt Documentation Standards
|
||||
When working with dbt projects, ensure comprehensive documentation that supports LLM-driven analytics workflows. This includes rich table and column descriptions that provide complete context for understanding blockchain data.
|
||||
|
||||
## Table Documentation Standards
|
||||
Every dbt Model must have an accompanying yml file that provides model documentation.
|
||||
|
||||
### Basic YML File Format
|
||||
Every dbt model yml file must follow this basic structure:
|
||||
|
||||
```yaml
|
||||
version: 2
|
||||
|
||||
models:
|
||||
- name: [model_name]
|
||||
description: "{{ doc('table_name') }}"
|
||||
tests:
|
||||
- [appropriate_tests_for_the_model]
|
||||
|
||||
columns:
|
||||
- name: [COLUMN_NAME]
|
||||
description: "{{ doc('column_name')}}"
|
||||
tests:
|
||||
- [appropriate_tests_for_the_column]
|
||||
```
|
||||
|
||||
#### Required Elements:
|
||||
- **version: 2** - Must be the first line
|
||||
- **models:** - Top-level key containing the model definitions
|
||||
- **name:** - The exact name of the dbt model (without .sql extension)
|
||||
- **description:** - Reference to markdown documentation using `{{ doc('table_name') }}`
|
||||
- **columns:** - List of all columns in the model with their documentation
|
||||
|
||||
#### Column Documentation Format:
|
||||
```yaml
|
||||
- name: [COLUMN_NAME_IN_UPPERCASE]
|
||||
description: "{{ doc('column_name')}}"
|
||||
tests:
|
||||
- [test_name]:
|
||||
[test_parameters]
|
||||
```
|
||||
|
||||
### Table Descriptions
|
||||
Table documentation must include 4 standard elements, formatted in markdown. As the base documentation file is YML, the table description must be written in a dbt documentation markdown file in the `models/descriptions/` directory. The Table YML can then use the jinja doc block to reference it.
|
||||
|
||||
The 4 standard categories (designed for LLM client consumption to aid in data model discovery and selection):
|
||||
|
||||
1. **Description** (the "what"): What the model is mapping from the blockchain, data scope and coverage, transformations and business logic applied. DO NOT EXPLAIN THE DBT MODEL LINEAGE. This is not important for the use case of LLM-driven blockchain analytics.
|
||||
2. **Key Use Cases**: Examples of when this table might be used and for what analysis, specific analytical scenarios and applications
|
||||
3. **Important Relationships**: How this table might be used alongside OTHER GOLD LEVEL models, dependencies and connections to other key gold models. For example, a table like logs, events, or receipts may contain information from a larget transactions. To build this description, you SHOULD review the dbt model lineage to understand genuine model relationships. You must convert the model name to a database object, for example `core__fact_blocks.sql` = `core.fact_blocks` = `<schema>__<table_name>.sql`
|
||||
4. **Commonly-used Fields**: Fields most important to condicting analytics. Determining these requires an understanding of the data model, what the columns are (via their descriptions) and how those fields aid in analytics. One way an understanding can be inferred is by analyzing curated models (anything that is not a core model in the gold/core/ directory is curated. Core models clean and map basic data objects of the blockchain like blocks, transactions, events, logs, etc. Curated models map specialized areas of activity like defi, nfts, governance, etc.). Blockchain data is often logged to a data table as a json object with a rich mapping of event-based details.
|
||||
|
||||
### Lineage Analysis
|
||||
Before writing table descriptions:
|
||||
- Read the dbt model SQL to understand the logic
|
||||
- Follow upstream dependencies to understand data flow
|
||||
- Review source models and transformations
|
||||
- Understand the business context and use cases
|
||||
- Review the column descriptions to estabish an understanding of the model, as a whole.
|
||||
- At the gold level, an ez_ table typically sources data from a fact_ table and this relationship should be documented. Ez_ tables add business logic such as (but not limited to) labels and USD price information
|
||||
|
||||
## Column Documentation Standards
|
||||
|
||||
### Rich Descriptions
|
||||
Each column description must include:
|
||||
- Clear definition of what the field represents
|
||||
- Data type and format expectations
|
||||
- Business context and use cases
|
||||
- Examples where helpful (especially for blockchain-specific concepts)
|
||||
- Relationships to other fields when relevant
|
||||
- Any important caveats or limitations
|
||||
|
||||
### Blockchain-Specific Context
|
||||
For blockchain data:
|
||||
- Reference official protocol documentation for technical accuracy. Use web search to find official developer documentation for the subject blockchain.
|
||||
- Explain blockchain-specific concepts (gas, consensus, etc.)
|
||||
- Provide examples using the specific blockchain's conventions
|
||||
- Clarify differences from other blockchains when relevant
|
||||
|
||||
### YAML Requirements
|
||||
- Column names MUST BE CAPITALIZED in YAML files
|
||||
- Use `{{ doc('column_name') }}` references for consistent desciption across models. The doc block must refer to a valid description in `models/descriptions`
|
||||
- Include appropriate tests for data quality
|
||||
|
||||
## Examples of Good Documentation
|
||||
|
||||
### Table Documentation Example
|
||||
```markdown
|
||||
{% docs table_transfers %}
|
||||
## Description
|
||||
This table tracks all token transfers on the <name> blockchain, capturing movements of native tokens and fungible tokens between accounts. The data includes both successful and failed transfers, with complete transaction context and token metadata.
|
||||
|
||||
## Key Use Cases
|
||||
- Token flow analysis and wallet tracking
|
||||
- DeFi protocol volume measurements
|
||||
- Cross-chain bridge monitoring
|
||||
- Whale movement detection and alerts
|
||||
- Token distribution and holder analysis
|
||||
|
||||
## Important Relationships
|
||||
- Subset of `gold.transactions`
|
||||
- Maps events emitted in `gold.events` or `gold.logs`
|
||||
- Utilizes token price data from `gold.prices` to compute USD columns
|
||||
|
||||
## Commonly-used Fields
|
||||
- `tx_hash`: Essential for linking to transaction details and verification
|
||||
- `sender_id` and `receiver_id`: Core fields for flow analysis and network mapping
|
||||
- `amount_raw` and `amount_usd`: Critical for value calculations and financial analysis
|
||||
- `token_address`: Key for filtering by specific tokens and DeFi analysis
|
||||
- `block_timestamp`: Primary field for time-series analysis and trend detection
|
||||
{% enddocs %}
|
||||
```
|
||||
|
||||
### Column Documentation Example
|
||||
```markdown
|
||||
{% docs amount_raw %}
|
||||
Unadjusted amount of tokens as it appears on-chain (not decimal adjusted). This is the raw token amount before any decimal precision adjustments are applied. For example, if transferring 1 native token, the amount_raw would be 1000000000000000000000000 (1e24) since <this blockchain's native token> has 24 decimal places. This field preserves the exact on-chain representation of the token amount for precise calculations and verification.
|
||||
{% enddocs %}
|
||||
```
|
||||
Important consideration: be sure to research and confirm figures such as decimal places. If unknown DO NOT MAKE UP A NUMBER.
|
||||
|
||||
## Common Patterns to Follow
|
||||
- Start with a clear definition
|
||||
- Provide context about why the field exists
|
||||
- Include examples for complex concepts
|
||||
- Explain relationships to other fields
|
||||
- Mention any important limitations or considerations
|
||||
- Use consistent terminology throughout the project
|
||||
|
||||
## Quality Standards
|
||||
|
||||
### Completeness
|
||||
- Every column must have a clear, detailed description
|
||||
- Table descriptions must explain the model's purpose and scope
|
||||
- Documentation must be self-contained without requiring external context
|
||||
- All business logic and transformations must be explained
|
||||
|
||||
### Accuracy
|
||||
- Technical details must match official blockchain documentation
|
||||
- Data types and formats must be correctly described
|
||||
- Examples must use appropriate blockchain conventions
|
||||
- Relationships between fields must be accurately described
|
||||
|
||||
### Clarity
|
||||
- Descriptions must be clear and easy to understand
|
||||
- Complex concepts must be explained with examples
|
||||
- Terminology must be consistent throughout the project
|
||||
- Language must support LLM understanding
|
||||
|
||||
### Consistency
|
||||
- Use consistent terminology across all models
|
||||
- Follow established documentation patterns
|
||||
- Maintain consistent formatting and structure
|
||||
- Ensure similar fields have similar descriptions
|
||||
186
.cursor/rules/dbt-overview-standard.mdc
Normal file
186
.cursor/rules/dbt-overview-standard.mdc
Normal file
@ -0,0 +1,186 @@
|
||||
---
|
||||
description:
|
||||
globs: __overview__.md,models/descriptions/*,models/gold/**/*.sql
|
||||
alwaysApply: false
|
||||
---
|
||||
# dbt Overview Standards
|
||||
When working with dbt projects, ensure comprehensive documentation exists about the project and a base reference to the gold models exists in the `__overview__.md` file for both human and LLM consumption.
|
||||
|
||||
## Project __overview__
|
||||
The file `models/descriptions/__overview__.md` is the entry point to the model description and documentation and must contain a rich description of what the project is.
|
||||
The file MUST START AND END WITH DBT JINJA DOCS TAGS `{% docs __overview__ %}` and `{% ENDDOCS %}`
|
||||
|
||||
## Quick Links Section Requirements
|
||||
The `__overview__.md` file MUST contain a "Quick Links to Table Documentation" section that provides direct navigation to all gold model documentation. This section must include a simple list, organized by gold schema, with the models and a hyperlink to the model documentation. If there is an existing section like "using dbt docs" that instructs the user on how to navigate dbt docs or a list of links to flipside and dbt, remove it! These are outdated.
|
||||
|
||||
### Required Elements:
|
||||
**Hyperlinks to Gold Model Documentation** - A comprehensive list of all gold models organized by schema. The schema can be inferred from the model name as the slug before the double underscore. For example, `core__fact_blocks` is a model named `fact_blocks` in the schema `core`.
|
||||
|
||||
### Gold Model Links Structure:
|
||||
The quicklinks section must be organized by schema and use the relative link to load the page generated by dbt documentation. The relative link structure is `#!/model/dbt_uniqueId` where dbt's `uniqueId` format is `node_type.project_name.model_name`. All of these `node_types` are `model`. `project_name` is the name of the dbt models project established in `dbt_project.yml` by the `name` variable. `model_name` is finally the name of the model as determed by the name of the sql file OR the value of `name` in the model's associated `.yml` file. For example, a uniqueId for the blockchain's `fact_blocks` model would be `model.<blockchain_name>_models.core__fact_blocks` making the relative URL `#!/model/model.<blockchain_name>_models.core__fact_blocks`.
|
||||
|
||||
```markdown
|
||||
## **Quick Links to Table Documentation**
|
||||
|
||||
**Click on the links below to jump to the documentation for each schema.**
|
||||
|
||||
### [Schema Name] Tables
|
||||
|
||||
**[Model Type Tables:]**
|
||||
|
||||
- model_1
|
||||
- model_2
|
||||
|
||||
### CORE Tables
|
||||
**Dimension Tables:**
|
||||
- [core__fact_blocks](relative/path/to/model)
|
||||
|
||||
**Fact Tables:**
|
||||
- [model_1](relative/path/to/model)
|
||||
|
||||
```
|
||||
|
||||
### Schema Organization Rules:
|
||||
1. **Group by Schema**: Organize models by their schema (core, defi, nft, price, social, governance, etc.)
|
||||
2. **Use Exact Schema Names**: Use the exact schema names as they appear in the database (e.g., `<blockchain_database>.CORE`, `<blockchain_database>.DEFI`, `<blockchain_database>.NFT`)
|
||||
3. **Model Type Subgrouping**: Within each schema, subgroup by model type:
|
||||
- **Dimension Tables:** (dim_* models)
|
||||
- **Fact Tables:** (fact_* models)
|
||||
- **Easy Views:** (ez_* models)
|
||||
4. **Link Format**: Use the exact dbt docs link format: `#!/model/model.[project_name].[schema]__[model]`
|
||||
5. **Model Naming**: Use the exact model name as it appears in the file system (without .sql extension)
|
||||
|
||||
### Implementation Guidelines for Coding Agents:
|
||||
1. **Scan Directory Structure**: Read `models/gold/` directory to identify all schema subdirectories
|
||||
2. **Extract Model Names**: For each schema directory, list all `.sql` files and remove the `.sql` extension
|
||||
3. **Determine Schema Mapping**: Map model names to database schema names:
|
||||
dbt models in this project utilize a double underscore in the model name to denote schema vs table <schema>__<table_name>.sql:
|
||||
- `core__fact_blocks` → `<blockchain_database>.CORE.FACT_BLOCKS`
|
||||
- `defi__ez_dex_swaps` → `<blockchain_database>.DEFI.EZ_DEX_SWAPS`
|
||||
4. **Categorize Models**: Group models by prefix:
|
||||
- `dim_*` → Dimension Tables
|
||||
- `fact_*` → Fact Tables
|
||||
- `ez_*` → Easy Views
|
||||
- `udf_*`, `udtf_*` → Custom Functions
|
||||
5. **Generate Links**: Create markdown links using the proper format
|
||||
6. **Maintain Order**: Keep models in alphabetical order within each category
|
||||
|
||||
### Validation Requirements:
|
||||
- Every gold model must have a corresponding link
|
||||
- Links must use the correct dbt docs format
|
||||
- Schema names must match the actual database schema
|
||||
- Model names must match the actual file names (without .sql extension)
|
||||
- Links must be organized by schema and model type
|
||||
- All links must be functional and point to valid dbt documentation
|
||||
- Do NOT add a hyperlink to the category headers. Only hyperlink individual models
|
||||
|
||||
## XML Tag Requirements
|
||||
Every `__overview__.md` file MUST include structured `<llm>` XML tags for easy interpretation by an LLM.
|
||||
```xml
|
||||
<llm>
|
||||
<blockchain>[Protocol Name]</blockchain>
|
||||
<aliases>[Common Aliases]</aliases>
|
||||
<ecosystem>[Execution Environment or Layer Type, for example EVM, SVM, IBC, Layer 1, Layer 2]</ecosystem>
|
||||
<description>[Rich 3-5 sentence description of the blockchain, its consensus mechanism, key features, and developer/user benefits including if the blockchain was built for a specific usecase.]</description>
|
||||
<external_resources>
|
||||
<block_scanner>[Link to the primary block scanner for the blockchain]</block_scanner>
|
||||
<developer_documenation>[Link to the primary developer documentation, maintained by the blockchain devs]</developer_documentation>
|
||||
</external_resources>
|
||||
<expert>
|
||||
<constraints>
|
||||
<table_availability>
|
||||
<!-- Specify which tables/schemas are available for this blockchain -->
|
||||
<!-- Example: "Ensure that your queries use only available tables for [BLOCKCHAIN]" -->
|
||||
</table_availability>
|
||||
|
||||
<schema_structure>
|
||||
<!-- Explain how the database is organized (dimensions, facts, naming conventions) -->
|
||||
<!-- Example: "Understand that dimensions and facts combine to make ez_ tables" -->
|
||||
</schema_structure>
|
||||
</constraints>
|
||||
|
||||
<optimization>
|
||||
<performance_filters>
|
||||
<!-- Define key filtering strategies for query performance -->
|
||||
<!-- Example: "use filters like block_timestamp over the last N days to improve speed" -->
|
||||
</performance_filters>
|
||||
|
||||
<query_structure>
|
||||
<!-- Specify preferred SQL patterns and structures -->
|
||||
<!-- Example: "Use CTEs, not subqueries, as readability is important" -->
|
||||
</query_structure>
|
||||
|
||||
<implementation_guidance>
|
||||
<!-- Provide guidelines for advanced SQL features -->
|
||||
<!-- Example: "Be smart with aggregations, window functions, etc." -->
|
||||
</implementation_guidance>
|
||||
</optimization>
|
||||
|
||||
<domain_mapping>
|
||||
<token_operations>
|
||||
<!-- Map token-related queries to specific tables -->
|
||||
<!-- Example: "For token transfers, use ez_token_transfers table" -->
|
||||
</token_operations>
|
||||
|
||||
<defi_analysis>
|
||||
<!-- Specify DeFi-related tables and their use cases -->
|
||||
<!-- Example: "For DeFi analysis, use ez_bridge_activity, ez_dex_swaps, ez_lending" -->
|
||||
</defi_analysis>
|
||||
|
||||
<nft_analysis>
|
||||
<!-- Define NFT-specific tables and functionality -->
|
||||
<!-- Example: "For NFT queries, utilize ez_nft_sales table in nft schema" -->
|
||||
</nft_analysis>
|
||||
|
||||
<specialized_features>
|
||||
<!-- Cover blockchain-specific features or complex data structures -->
|
||||
<!-- Example: "The XYZ data is complex, so ensure you ask clarifying questions" -->
|
||||
</specialized_features>
|
||||
</domain_mapping>
|
||||
|
||||
<interaction_modes>
|
||||
<direct_user>
|
||||
<!-- Define behavior for direct user interactions -->
|
||||
<!-- Example: "Ask clarifying questions when dealing with complex data" -->
|
||||
</direct_user>
|
||||
|
||||
<agent_invocation>
|
||||
<!-- Specify response format when invoked by other AI agents -->
|
||||
<!-- Example: "When invoked by another AI agent, respond with relevant query text" -->
|
||||
</agent_invocation>
|
||||
</interaction_modes>
|
||||
|
||||
<engagement>
|
||||
<exploration_tone>
|
||||
<!-- Set the overall tone and encouragement for data exploration -->
|
||||
<!-- Example: "Have fun exploring the [BLOCKCHAIN] ecosystem through data!" -->
|
||||
</exploration_tone>
|
||||
</engagement>
|
||||
</expert>
|
||||
</llm>
|
||||
```
|
||||
Place these XML tags at the end of the documentation (BUT STILL BEFORE THE JINJA ENDDOCS TAG).
|
||||
|
||||
## Update Process for Coding Agents:
|
||||
To update the overview, the coding agent MUST:
|
||||
|
||||
1. **Scan Current Gold Models**:
|
||||
- Read the entire `models/gold/` directory structure
|
||||
- Identify all `.sql` files across all schema subdirectories
|
||||
- Extract model names (remove `.sql` extension)
|
||||
|
||||
2. **Generate Updated Quicklinks Section**:
|
||||
- Follow these implementation guidelines
|
||||
- Create a complete new quicklinks section with all current gold models
|
||||
- Maintain proper schema organization and model type grouping
|
||||
|
||||
3. **Update __overview__.md**:
|
||||
- Replace the entire "Quick Links to Table Documentation" section with the newly generated content
|
||||
- Ensure proper markdown formatting and link structure
|
||||
- Create or update the XML tag block
|
||||
|
||||
4. **Validation Check**:
|
||||
- Verify all gold models have corresponding links
|
||||
- Confirm links use correct dbt docs format
|
||||
- Check that schema names and model names are accurate
|
||||
- Ensure alphabetical ordering within categories
|
||||
76
.cursor/rules/general-coding-standards.mdc
Normal file
76
.cursor/rules/general-coding-standards.mdc
Normal file
@ -0,0 +1,76 @@
|
||||
---
|
||||
description:
|
||||
globs:
|
||||
alwaysApply: true
|
||||
---
|
||||
# dbt Model Standards for Flipside Crypto
|
||||
|
||||
## General Rules
|
||||
- Follow the existing code style and patterns in the codebase
|
||||
- Write clear, concise, and well-documented code
|
||||
- Use meaningful variable, function, column and model names
|
||||
- Handle errors gracefully and provide helpful error messages
|
||||
- Test your code thoroughly before submitting
|
||||
- Follow the existing project structure and conventions
|
||||
|
||||
## Code Quality
|
||||
- Write self-documenting code with clear and consistent names
|
||||
- Use consistent formatting and indentation
|
||||
- Implement proper error handling and logging
|
||||
- Follow DRY (Don't Repeat Yourself) principles
|
||||
- Use meaningful commit messages
|
||||
- Use snake_case for all objects (tables, columns, models)
|
||||
- Maintain column naming consistency through the pipeline
|
||||
|
||||
## dbt Model Structure
|
||||
- Models are connected through ref() and source() functions
|
||||
- Data flows from source -> bronze -> silver -> gold layers
|
||||
- Each model has upstream dependencies and downstream consumers
|
||||
- Column-level lineage is maintained through transformations
|
||||
- Parse ref() and source() calls to identify direct dependencies
|
||||
- Track column transformations from upstream models
|
||||
- Consider impact on downstream consumers
|
||||
- Preserve business logic across transformations
|
||||
|
||||
## Model Naming and Organization
|
||||
- Follow naming patterns: bronze__, silver__, core__, fact_, dim__, ez__, where a double underscore indicates a break between a model schema and object name. I.e. core__fact_blocks equates to <database>.core.fact_blocks.
|
||||
- Organize by directory structure: bronze/, silver/, gold/, etc.
|
||||
- Upstream models appear on the LEFT side of the DAG
|
||||
- Current model is the focal point
|
||||
- Downstream models appear on the RIGHT side of the DAG
|
||||
|
||||
## Modeling Standards
|
||||
- Use snake_case for all objects
|
||||
- Prioritize incremental processing always
|
||||
- Follow source/bronze/silver/gold layering
|
||||
- Document chain-specific assumptions
|
||||
- Include incremental predicates to improve performance
|
||||
- For gold layer models, include search optimization following Snowflake's recommended best practices
|
||||
- Cluster models on appropriate fields
|
||||
|
||||
## Testing Requirements
|
||||
- Ensure proper token decimal handling
|
||||
- Implement unique tests for primary keys
|
||||
- Implement recency tests for tables that are expected to have frequent data updates
|
||||
- Add not_null tests for required columns
|
||||
- Use relationships tests for foreign keys
|
||||
|
||||
## Performance Guidelines
|
||||
- Optimize for high TPS (blockchain data)
|
||||
- Handle large state change volumes efficiently
|
||||
- Index frequently queried dimensions
|
||||
- Consider partition pruning strategies
|
||||
- Implement appropriate clustering keys
|
||||
- Optimize database queries for large datasets
|
||||
- Use appropriate indexing strategies
|
||||
- Monitor resource usage and optimize accordingly
|
||||
- Consider the impact of changes on existing systems
|
||||
|
||||
## Documentation
|
||||
- Document data sources
|
||||
- Map entity relationships
|
||||
- Include model descriptions in yml files per expanded rule [dbt-documentation-standards.mdc](mdc:.cursor/rules/dbt-documentation-standards.mdc)
|
||||
- Document column descriptions and business logic
|
||||
- Explain incremental logic and predicates
|
||||
- Note any data quality considerations
|
||||
|
||||
220
.cursor/rules/review-dbt-documentation.mdc
Normal file
220
.cursor/rules/review-dbt-documentation.mdc
Normal file
@ -0,0 +1,220 @@
|
||||
---
|
||||
description:
|
||||
globs:
|
||||
alwaysApply: false
|
||||
---
|
||||
# Review dbt Documentation Process
|
||||
|
||||
## Overview
|
||||
This document outlines the comprehensive process for reviewing and improving column and table descriptions for gold models in dbt projects. The goal is to provide robust, rich details that improve context for LLM-driven analytics workflows, ensuring that documentation is complete, accurate, and self-contained without requiring external expert files.
|
||||
|
||||
## Objectives
|
||||
- Create clear, detailed documentation that supports LLM understanding of blockchain data
|
||||
- Ensure technical accuracy by referencing official protocol documentation
|
||||
- Provide rich context for each table and column to enable effective analytics
|
||||
- Maintain consistency across all models and schemas
|
||||
- Support automated analytics workflows without requiring expert context files
|
||||
|
||||
## Pre-Review Requirements
|
||||
|
||||
### 1. Research Phase
|
||||
**Blockchain Protocol Documentation**
|
||||
- Search and read official developer documentation for the target blockchain. Utilize web search to find authentic and accurate developer documentation
|
||||
- Review technical specifications, whitepapers, and API documentation
|
||||
- Understand the blockchain's consensus mechanism, data structures, and conventions
|
||||
- Research common use cases and analytics patterns specific to the blockchain
|
||||
- Identify key technical concepts that need explanation (e.g., gas mechanics, consensus, token standards)
|
||||
|
||||
**External Resources to Consult**
|
||||
- Official blockchain documentation
|
||||
- Developer guides and tutorials
|
||||
- Technical specifications and whitepapers
|
||||
- Community documentation and forums
|
||||
- Block explorers and API documentation
|
||||
|
||||
### 2. Project Context Analysis
|
||||
- Review the `__overview__.md` file and rewrite it per the @dbt-overview-standard rule to create a summary of what this particular blockchain is, unique characteristics, and any other general information about the chain itself
|
||||
- Review existing documentation patterns and terminology
|
||||
- Understand the data flow and model lineage structure
|
||||
|
||||
## Review Process
|
||||
|
||||
### Step 1: Model Analysis
|
||||
**SQL Logic Review**
|
||||
- Read the dbt model SQL file to understand the transformations and business logic
|
||||
- Follow upstream dependencies to understand data flow from source to gold layer
|
||||
- Review bronze source models, silver staging models, and intermediate transformations
|
||||
- Identify any complex joins, aggregations, or business logic that needs explanation
|
||||
- Understand the incremental logic and any filtering conditions
|
||||
|
||||
**Lineage Analysis**
|
||||
- Map the complete data lineage from source to gold model
|
||||
- Identify key transformations and their purposes
|
||||
- Understand relationships between related models for the sole purpose of generating a robust description
|
||||
- Do not include data lineage analysis in the table description
|
||||
|
||||
### Step 2: Column Description Review
|
||||
**Individual Column Analysis**
|
||||
For each column in the model:
|
||||
|
||||
1. **Technical Understanding**
|
||||
- Read the SQL to understand how the column is derived
|
||||
- Check upstream models if the column comes from a transformation
|
||||
- Understand the data type and format expectations
|
||||
- Identify any business logic applied to the column
|
||||
|
||||
2. **Blockchain Context**
|
||||
- Research the blockchain-specific meaning of the column
|
||||
- Reference official documentation for technical accuracy
|
||||
- Understand how this field relates to blockchain concepts
|
||||
- Identify any blockchain-specific conventions or requirements
|
||||
|
||||
3. **Documentation Assessment**
|
||||
- Review existing column description
|
||||
- Evaluate completeness and clarity
|
||||
- Check for missing context or examples
|
||||
- Ensure the description supports LLM understanding
|
||||
|
||||
**Required Elements for Column Descriptions**
|
||||
- Clear definition of what the field represents
|
||||
- Data type and format expectations
|
||||
- Business context and use cases
|
||||
- Examples where helpful (especially for blockchain-specific concepts)
|
||||
- Relationships to other fields when relevant
|
||||
- Any important caveats or limitations
|
||||
- Blockchain-specific context and conventions
|
||||
|
||||
### Step 3: Table Description Review
|
||||
**Current State Assessment**
|
||||
- Review the updated column descriptions
|
||||
- Review existing table description in the YAML file
|
||||
- Evaluate completeness and clarity
|
||||
- Identify missing context or unclear explanations
|
||||
|
||||
**Required Elements for Table Descriptions**
|
||||
Table documentation must include 4 standard elements, formatted in markdown. As the base documentation file is YML, the table description must be written in a dbt documentation markdown file in the `models/descriptions/` directory. The Table YML can then use the jinja doc block to reference it.
|
||||
|
||||
The 4 standard categories are fully defined in the @dbt-documentation-standards rule. They are:
|
||||
|
||||
1. **Description**
|
||||
2. **Key Use Cases**
|
||||
3. **Important Relationships**
|
||||
4. **Commonly-used Fields**
|
||||
|
||||
### Step 4: Documentation File Review
|
||||
**Individual Documentation Files**
|
||||
- Check if each column has a corresponding `.md` file in `models/descriptions/`
|
||||
- Review existing documentation for completeness and accuracy
|
||||
- Update or create documentation files as needed
|
||||
|
||||
**Documentation File Format**
|
||||
```markdown
|
||||
{% docs column_name %}
|
||||
[Rich, detailed description including:
|
||||
- Clear definition
|
||||
- Data format and examples
|
||||
- Business context
|
||||
- Blockchain-specific details
|
||||
- Relationships to other fields
|
||||
- Important considerations]
|
||||
{% enddocs %}
|
||||
```
|
||||
|
||||
### Step 5: YAML File Review
|
||||
**YAML Structure Validation**
|
||||
- Ensure column names are CAPITALIZED in YAML files
|
||||
- Verify all columns reference documentation using `{{ doc('column_name') }}`
|
||||
- Check that appropriate tests are included
|
||||
- Validate the overall YAML structure
|
||||
|
||||
**YAML File Format**
|
||||
```yaml
|
||||
version: 2
|
||||
|
||||
models:
|
||||
- name: [model_name]
|
||||
description: |-
|
||||
[Clear, direct table description]
|
||||
|
||||
columns:
|
||||
- name: [COLUMN_NAME_IN_UPPERCASE]
|
||||
description: "{{ doc('column_name') }}"
|
||||
tests:
|
||||
- [appropriate_tests]
|
||||
```
|
||||
|
||||
## Review Checklist
|
||||
|
||||
### Table Level
|
||||
- [ ] Table documentation is in markdown file in `models/descriptions/` directory
|
||||
- [ ] Table YAML references documentation using jinja doc block
|
||||
- [ ] **Description** section explains what blockchain data is being modeled
|
||||
- [ ] **Key Use Cases** section provides specific analytical scenarios
|
||||
- [ ] **Important Relationships** section explains connections to other GOLD models
|
||||
- [ ] **Commonly-used Fields** section identifies critical columns and their importance
|
||||
- [ ] Documentation is optimized for LLM client consumption
|
||||
|
||||
### Column Level
|
||||
- [ ] Each column has a comprehensive description
|
||||
- [ ] Data types and formats are clearly specified
|
||||
- [ ] Business context and use cases are explained
|
||||
- [ ] Examples are provided for complex concepts
|
||||
- [ ] Relationships to other fields are documented
|
||||
- [ ] Important limitations or caveats are noted
|
||||
- [ ] Blockchain-specific context is included
|
||||
|
||||
### Documentation Files
|
||||
- [ ] All columns have corresponding `.md` files
|
||||
- [ ] Documentation files contain rich, detailed descriptions
|
||||
- [ ] Examples use appropriate blockchain conventions
|
||||
- [ ] Technical accuracy is verified against official documentation
|
||||
|
||||
### YAML Files
|
||||
- [ ] Column names are CAPITALIZED
|
||||
- [ ] All columns reference documentation using `{{ doc('column_name') }}`
|
||||
- [ ] Appropriate tests are included
|
||||
- [ ] YAML structure is valid
|
||||
|
||||
## Implementation Guidelines
|
||||
|
||||
### Documentation Writing Tips
|
||||
- Start with a clear definition of what the field represents
|
||||
- Provide context about why the field exists and its importance
|
||||
- Include examples for complex concepts, especially blockchain-specific ones
|
||||
- Explain relationships to other fields when relevant
|
||||
- Mention any important limitations or considerations
|
||||
- Use consistent terminology throughout the project
|
||||
|
||||
### Blockchain-Specific Considerations
|
||||
- Reference official protocol documentation for technical concepts
|
||||
- Explain blockchain-specific concepts (gas, consensus, etc.)
|
||||
- Provide examples using the specific blockchain's conventions
|
||||
- Clarify differences from other blockchains when relevant
|
||||
- Include information about data freshness and update mechanisms
|
||||
|
||||
### LLM Optimization
|
||||
- Write descriptions that are complete and self-contained
|
||||
- Use clear, structured language that supports automated understanding
|
||||
- Include context that helps LLMs understand the data's purpose
|
||||
- Provide examples that illustrate common use cases
|
||||
- Ensure descriptions support common analytics workflows
|
||||
|
||||
## Post-Review Actions
|
||||
|
||||
### Validation
|
||||
- Verify all documentation is technically accurate
|
||||
- Check that descriptions are complete and self-contained
|
||||
- Ensure consistency across related models
|
||||
- Validate that documentation supports common analytics use cases
|
||||
|
||||
### Testing
|
||||
- Test documentation by having an LLM attempt to understand the data
|
||||
- Verify that descriptions enable effective query generation
|
||||
- Check that examples are clear and helpful
|
||||
- Ensure documentation supports the intended analytics workflows
|
||||
|
||||
### Maintenance
|
||||
- Update documentation when models change
|
||||
- Review and refresh documentation periodically
|
||||
- Maintain consistency as new models are added
|
||||
- Keep documentation aligned with blockchain protocol updates
|
||||
Loading…
Reference in New Issue
Block a user