Cloud Warehouses
Working with Snowflake, BigQuery, Redshift, and Athena in MarcoPolo.
MarcoPolo connects to Snowflake, BigQuery, Redshift, and Athena, the platforms where your company's analytical data typically lives.
Connecting
Each warehouse has a different connection flow:
Snowflake. Account identifier, warehouse, database, schema, and credentials. Your AI can query across schemas and databases within the same account.
BigQuery. Project ID and dataset. Authenticates via service account credentials. Handles BigQuery SQL dialect including UNNEST, STRUCT, and partitioned table syntax.
Redshift. Host, port, database, and credentials. Same connection pattern as PostgreSQL (Redshift is PostgreSQL-based) with Redshift-specific optimizations.
Athena. Data catalog and S3 output location. Writes Presto/Trino-compatible SQL and handles Athena's async query execution.
Generate a connection link from your AI or set up directly in the web UI.
Working with warehouses
Cloud warehouses often hold your most complex and valuable data: revenue tables, event streams, aggregated metrics, historical snapshots. The MarcoPolo workflow is designed for this:
Explore first. Warehouses can have hundreds of tables. Ask your AI to survey the landscape before querying: "What schemas and tables are in our Snowflake warehouse? Which ones look like they contain revenue data?"
Query and cache. Results load into DuckDB in your workspace. This matters for warehouses especially. You iterate locally instead of running expensive warehouse queries repeatedly.
Correlate across sources. The most common pattern: pull summary data from your warehouse, then correlate it with CRM data from Salesforce or support data from Jira. All combined locally in DuckDB.
What teams are doing
- Spend management analysis on Snowflake: quarterly invoice summaries and orders-not-invoiced reconciliation
- Grid and utility forecasting on BigQuery: substation load analysis, feeder summaries, demand forecasting
- Event pipeline debugging on Athena: tracing data through bronze/silver/gold layers, finding partition anomalies
- Restaurant operations across ClickHouse + Athena + PostgreSQL: kitchen performance, order flow analysis, device event filtering
Best practices
Be specific about scope. Warehouse tables can be huge. Guide your AI to filter early: "Show me last month's data only" is much better than "Show me everything."
Map the warehouse in RULES.md. Document which schemas and tables matter, how they relate, and what the naming conventions mean. Especially valuable for warehouses with hundreds of tables. See Context (RULES.md).
Cache aggressively. Once results are in DuckDB, follow-up analysis is fast and free. Ask your AI to load reference tables (customer lists, product catalogs) into DuckDB early in a session.