Cross-Source Analysis

Correlate data across databases, warehouses, CRMs, and storage in one conversation.

Your revenue data is in Snowflake. Customer details are in Salesforce. Support tickets are in Jira. Getting a complete picture usually means exporting from each system and pasting into a spreadsheet, or filing a ticket with the data team and waiting.

You can do this in a single conversation instead.

How it works

Each connection query materializes its full result into DuckDB inside your workspace. Once you have data from multiple connections, your AI joins them through connections/DUCKDB/. No data leaves your workspace. No expensive warehouse cross-join. No CSV exports.

Example: Snowflake + Salesforce against the demos

Install both demos first:

You: "Install the Snowflake and Salesforce demos."

Then run the cross-source flow:

Pull all customers from Snowflake and all accounts from Salesforce.

AI writes connections/snowflake-demo/queries/all_customers.sql
AI runs workspace_shell("connection query snowflake-demo --file connections/snowflake-demo/queries/all_customers.sql --json")
   → 10 customers → DuckDB relation: snowflake_demo_all_customers

AI writes connections/salesforce-demo/queries/accounts.sql
AI runs workspace_shell("connection query salesforce-demo --file connections/salesforce-demo/queries/accounts.sql --json")
   → 20 accounts → DuckDB relation: salesforce_demo_accounts

Join them in DuckDB. Do any customers appear in both systems?

AI writes connections/DUCKDB/queries/cross_source_join.sql
AI runs workspace_shell("connection query DUCKDB --file connections/DUCKDB/queries/cross_source_join.sql --json")

AI: No matches by name. The Snowflake CUSTOMERS table uses individual
    names (Alice Johnson, Charlie Brown) while Salesforce Account uses
    company names (The Virtual Viper, Crystal Fortress).

    These datasets don't share a direct join key. To link them, you'd
    need a mapping table or a shared identifier like email or account ID.
    I'd recommend documenting this in connections/snowflake-demo/RULES.md
    so future queries don't attempt this join.

This is a realistic outcome. Cross-source joins depend on knowing which fields match between systems — and sometimes they don't. Discovering that early saves time.

How real teams use this

RevOps across Salesforce + Athena + ClickUp. A RevOps team combines account ARR and opportunity data from Salesforce, usage metrics from Athena, and project tasks from ClickUp. One conversation gives them a view that would otherwise require three dashboards and a spreadsheet. Their power user ran 3,286 commands across 21 active days, primarily doing account-level analysis joining CRM and warehouse data.

Restaurant analytics across ClickHouse + Athena + HubSpot. An operations team queries kitchen scores and order flow from ClickHouse, event data from Athena, and deal pipeline from HubSpot. They use DuckDB to join location-level performance with deal stages to understand which restaurant locations drive the most revenue.

Spend management on Snowflake. A finance team queries invoices and orders from Snowflake, then reconciles against procurement data. The DuckDB cache lets them iterate on the reconciliation without re-querying the warehouse.

Best practices

Document join keys in RULES.md. Cross-source joins depend on knowing which fields match between systems. Write them down: salesforce.accounts.external_id = postgres.companies.crm_id. See Context (RULES.md).

Cache reference data early. If you'll join against a customer list or product catalog repeatedly, ask your AI to run that connection query at the start of the session — it materializes once into DuckDB and is reusable from there.

Build incrementally. Start with two connections, get the join working, then add a third. Debugging a three-way join is harder than building up step by step.

Cross-Source Analysis

How it works

Example: Snowflake + Salesforce against the demos

How real teams use this

Best practices

On this page