Triage & Break-Fix
Investigate production issues by checking logs, querying databases, and tracing data.
Something's broken. The API is slow, the dashboard numbers look wrong, a customer is reporting missing data. You need to check logs, query databases, and trace data through systems - fast.
This is where the MarcoPolo plugin is most valuable. Triage requires chaining multiple tools across multiple data sources in a single investigation. The plugin's query workflow skill and workspace navigation keep Claude on the right path instead of fumbling with tool discovery.
Investigation: cancelled order anomaly
Using Claude Code with the plugin and demo data sources:
I'm seeing a cancelled order in the Snowflake data. Investigate - how many are there, which customers, and check if there are related support tickets in S3.
AI reads docs/snowflake-demo-datasource/RULES.md and SYNTAX.md
AI writes queries/snowflake-demo-datasource/cancelled_orders.sql
AI calls query("snowflake-demo-datasource", "queries/snowflake-demo-datasource/cancelled_orders.sql")
-> 1 cancelled order: ORDER_ID 1005, CUSTOMER_ID 5 (Ethan Hunt),
$123.00, 2024-11-01. Only INACTIVE customer in the system.
AI calls browse("aws-s3-demo-datasource", "opsrus-lakehouse-demo/bronze/support_tickets/2025/")
-> Partitions: 06/, 07/
AI calls browse("aws-s3-demo-datasource", "opsrus-lakehouse-demo/bronze/events/2025/07/")
-> Daily partitions: 01/ through 15/
AI writes queries/DUCKDB/customer_order_status.sql
AI calls query("DUCKDB", "queries/DUCKDB/customer_order_status.sql")
-> Joins cached customer and order data:
Ethan Hunt is the only INACTIVE customer and the only cancellation.
All other customers are ACTIVE with DELIVERED/SHIPPED/PENDING orders.
AI: One cancelled order, tied to the only inactive customer (Ethan Hunt,
Australia, $123). The S3 support tickets have data for Jun-Jul 2025
but the order is from Nov 2024 - the timelines don't overlap.
This looks like a clean account deactivation, not a system issue.
The cancellation amount is the minimum in the dataset, suggesting
it may have been a test order.Three data sources checked (Snowflake, S3, DuckDB cross-reference), root cause identified, all in one conversation.
How real teams use this
ClickHouse performance investigation. A restaurant analytics team debugged slow queries by profiling snapshot distributions, event filtering patterns, and materialized view behavior across ClickHouse, Athena, and PostgreSQL - all in one Claude Code session. The investigation involved 352 commands over 11 active days, iterating on the same query files as they narrowed down the issue.
Missing customer data. A CS lead reports that a customer's dashboard shows zero revenue last month. Claude checks the warehouse - revenue data exists in the raw tables but was excluded during aggregation because a billing sync process marked the account as inactive. Root cause is upstream in the billing sync, not the data pipeline.
Pipeline failure. A data engineer notices the nightly aggregation hasn't run since Monday. Claude checks the job config table in Postgres, finds a cron schedule change from a deployment, traces the impact through the S3 data lake partitions to confirm which days are affected, and identifies the specific config value to revert.
Best practices
Start broad, narrow quickly. Ask your AI to check the obvious things first: is the data there? Are there error logs? Is the service up? Before diving deep.
Use the workspace as evidence. Your AI saves queries and results to the workspace. This creates an audit trail of the investigation that you can share with the team or reference later.
Save the pattern. If you find yourself investigating the same category of issue repeatedly, ask your AI to write a diagnostic script and save it to the workspace.
Install the plugin for triage. Triage requires chaining tools across data sources without wrong turns. The plugin's skills keep Claude oriented on which workspace to use and which tools to call.