How it Works

Tools

The MCP tools your AI uses to work with data in MarcoPolo.

Your AI uses these tools to interact with MarcoPolo. You don't call them by name. Your AI picks the right one based on what you ask. But understanding them helps when you want more control or need to debug unexpected behavior.

Data discovery

list_datasources

Lists all data sources available in your workspace: databases, warehouses, storage buckets, and SaaS apps.

You: "What data sources do I have?"

Returns the name, type, and capabilities for each source. This is usually the first thing your AI calls.

get_schema

Retrieves schema metadata. Supports progressive drill-down: databases, then tables, then columns and types. For SaaS apps: available objects and fields.

You: "Show me the schema for our Snowflake warehouse."

Your AI uses this to understand structure before writing queries. Pair it with context artifacts for best results.

Data retrieval

query

Executes a query against a data source. Your AI writes SQL tailored to the target system (using SYNTAX.md guides MarcoPolo provides), or API calls for SaaS apps. The query is written to a file first (queries/{datasource}/report.sql), then executed by file path.

You: "What were our top 10 highest-value deals last quarter?"

Results auto-load into a named DuckDB table for follow-up analysis. The query file is saved to your workspace for reuse.

Follow-up queries in DuckDB. After results are cached, your AI can query DuckDB directly for further analysis without hitting your production data source:

You: "Filter those to just the enterprise segment and show quarter-over-quarter trend."

Your AI writes a DuckDB query against the cached table. Fast, free, and doesn't touch your source system.

Parameterized queries. Your AI can use Jinja2 templates in SQL files for reusable, parameterized queries:

-- queries/ATHENA/orders_by_region.sql
SELECT * FROM orders WHERE region = '{{ region }}' LIMIT {{ limit }}

Write operations. Some data sources support writing data back. For example, your AI can push analysis results to a Google Sheet or update records in a database. Check the SYNTAX.md for each data source to see what write operations are supported.

browse

Lists files and directories in object storage (S3, GCS, Azure Blob).

You: "Show me what's in the raw-logs bucket."

Returns file names, sizes, and timestamps.

download

Pulls a file from object storage into your workspace's downloads/ directory.

You: "Download the latest sales report from S3."

The file lands in your workspace where your AI can read, parse, or load it into DuckDB.

upload

Sends a file from your workspace to an object storage data source.

You: "Upload the cleaned dataset back to S3."

Useful for exporting analysis results, processed files, or generated reports to your cloud storage.

Workspace operations

execute_command

Runs a shell command in your workspace: Python scripts, file operations, data processing, git, anything you'd do in a terminal.

You: "Run the cleanup script on the exported data."

Full Linux command-line access within your isolated workspace. This is also how your AI reads context files (cat docs/RULES.md), creates query files, and manages the workspace filesystem.

create_data_view

Generates an interactive visualization or dashboard from data in your workspace.

You: "Build a bar chart showing revenue by product line."

The output is viewable in both the conversation and the web app. Artifacts have shareable URLs.

generate_connector_url

Generates a secure link to set up a new data source connection.

You: "I want to add a new PostgreSQL connection."

Opens the connector setup flow in your browser.

Scheduling with dv-schedule

Your AI can create and manage recurring automated tasks using the dv-schedule command via execute_command. Schedule daily data syncs, weekly reports, or any repeating workflow.

You: "Schedule a daily sync that pulls new Salesforce opportunities into a summary table."
You: "Set up a weekly data quality check on the customers table."

Your AI runs dv-schedule create with the appropriate parameters. Scheduled tasks live in your workspace's schedules/ directory and run automatically. Use dv-schedule list to see what's scheduled, or dv-schedule --help for all options.

How tools chain together

A typical data question involves multiple tools in sequence. When you ask "What's our MRR?", your AI might:

  1. Call list_datasources to find the billing database
  2. Call get_schema to locate the subscriptions table
  3. Call query to pull the data
  4. Call execute_command to run a Python script for aggregation
  5. Call create_data_view to generate a chart

With the MarcoPolo plugin installed, your AI follows proven patterns for this tool chaining: reading context first, caching results in DuckDB, and saving queries for reuse.

On this page