Use Cases

Data Quality & Validation

Check for anomalies, validate completeness, and reconcile data across sources.

Bad data causes bad decisions. Your AI can run quality checks across tables and sources in minutes, catching issues that would take hours to find manually.

Running quality checks

Run data quality checks on the ORDERS table: null rates, duplicate ORDER_IDs, and invalid statuses.

AI calls get_schema("snowflake-demo-datasource", database="ANALYTICS", table="ORDERS")
   -> 8 columns, identifies key fields to validate

AI writes queries/snowflake-demo-datasource/orders_quality.sql
AI calls query("snowflake-demo-datasource", "queries/snowflake-demo-datasource/orders_quality.sql")
   -> Results cached in DuckDB table: orders_quality

AI: Quality report for ORDERS (17 rows):

    No duplicate ORDER_IDs
    No NULL values in any column
    4 valid statuses: DELIVERED, SHIPPED, PENDING, CANCELLED

    1 cancelled order (ORDER_ID 1005, Ethan Hunt, $123).
    Only INACTIVE customer in the system - worth documenting
    in RULES.md.

The checks are saved as query files in your workspace so you can re-run them later.

Cross-source reconciliation

The most valuable quality checks compare data across systems:

Compare customer counts between Snowflake and Salesforce. Are there customers in one system but not the other?

AI calls query("snowflake-demo-datasource", ...) -> 10 customers cached
AI calls query("salesforce-demo-datasource", ...) -> 20 accounts cached

AI writes queries/DUCKDB/reconciliation.sql
AI calls query("DUCKDB", "queries/DUCKDB/reconciliation.sql")

AI: Snowflake has 10 individual customer records.
    Salesforce has 20 company account records.
    No overlap by name - these are different entity types
    (individuals vs. companies). There's no shared key.

    If these should be linked, you'll need a mapping table
    or a shared identifier like email or account ID.

Trend-based checks

Point-in-time checks catch current issues. Trend checks catch pipeline breaks and data gaps:

Check if any columns in the ORDERS table have had changes in value distribution over time.

AI writes queries/snowflake-demo-datasource/orders_trends.sql
AI calls query("snowflake-demo-datasource", "queries/snowflake-demo-datasource/orders_trends.sql")

AI: Orders by month:
    Oct 2024: 5 orders, $3,421
    Nov 2024: 7 orders, $5,862
    Dec 2024: 5 orders, $4,454

    No anomalies in volume. The cancelled order is in November.
    All December orders are DELIVERED.

Getting started

Focus on the tables that drive critical decisions first. Ask your AI to save validation queries so you can re-run them or schedule them. When you find a known issue (e.g., "Ethan Hunt is the only INACTIVE customer and the only cancellation"), document it in RULES.md so your AI accounts for it automatically in future queries.

On this page