Use Cases

Data Quality & Validation

Check for anomalies, validate completeness, and reconcile data across sources.

Bad data causes bad decisions. Your AI can run quality checks across tables and connections in minutes, catching issues that would take hours to find manually.

Running quality checks

Install the Snowflake demo, then:

Run data quality checks on the ORDERS table: null rates, duplicate ORDER_IDs, and invalid statuses.

AI runs workspace_shell("connection describe snowflake-demo --database ANALYTICS --table ORDERS --json")
   → 8 columns, identifies key fields to validate

AI writes connections/snowflake-demo/queries/orders_quality.sql
AI runs workspace_shell("connection query snowflake-demo --file connections/snowflake-demo/queries/orders_quality.sql --json")
   → DuckDB relation: snowflake_demo_orders_quality

AI: Quality report for ORDERS (17 rows):

    No duplicate ORDER_IDs
    No NULL values in any column
    4 valid statuses: DELIVERED, SHIPPED, PENDING, CANCELLED

    1 cancelled order (ORDER_ID 1005, Ethan Hunt, $123).
    Only INACTIVE customer in the system - worth documenting
    in connections/snowflake-demo/RULES.md.

The check is saved as a query file in the workspace, so re-running it later — manually or on a cron schedule — is one command.

Cross-source reconciliation

The most valuable quality checks compare data across systems:

Compare customer counts between Snowflake and Salesforce. Are there customers in one system but not the other?

AI runs workspace_shell("connection query snowflake-demo --file ... --json")
   → 10 customers materialized into DuckDB
AI runs workspace_shell("connection query salesforce-demo --file ... --json")
   → 20 accounts materialized into DuckDB

AI writes connections/DUCKDB/queries/reconciliation.sql
AI runs workspace_shell("connection query DUCKDB --file connections/DUCKDB/queries/reconciliation.sql --json")

AI: Snowflake has 10 individual customer records.
    Salesforce has 20 company account records.
    No overlap by name - these are different entity types
    (individuals vs. companies). There's no shared key.

    If these should be linked, you'll need a mapping table
    or a shared identifier like email or account ID.

Trend-based checks

Point-in-time checks catch current issues. Trend checks catch pipeline breaks and data gaps:

Check if any columns in the ORDERS table have changes in value distribution over time.

AI writes connections/snowflake-demo/queries/orders_trends.sql
AI runs workspace_shell("connection query snowflake-demo --file connections/snowflake-demo/queries/orders_trends.sql --json")

AI: Orders by month:
    Oct 2024: 5 orders, $3,421
    Nov 2024: 7 orders, $5,862
    Dec 2024: 5 orders, $4,454

    No anomalies in volume. The cancelled order is in November.
    All December orders are DELIVERED.

Getting started

Focus on the tables that drive critical decisions first. Ask your AI to save validation queries so you can re-run them or schedule them through cron. When you find a known issue (e.g., "Ethan Hunt is the only INACTIVE customer and the only cancellation"), document it in the per-connection RULES.md so your AI accounts for it automatically in future queries.

On this page