SQL Tools

sql_execute

Run SQL queries against your connected warehouse.

> sql_execute "SELECT department, COUNT(*) as headcount FROM employees GROUP BY 1 ORDER BY 2 DESC LIMIT 10" --warehouse prod-snowflake

┌────────────┬───────────┐
│ department │ headcount │
├────────────┼───────────┤
│ Engineering│ 342       │
│ Sales      │ 218       │
│ Marketing  │ 156       │
└────────────┴───────────┘
3 rows returned (limit: 100)

Parameters: - query (required): SQL to execute - warehouse (optional): Connection name from config. Uses default if omitted - limit (optional, default: 100): Max rows returned

sql_analyze

Static analysis that detects 19 SQL anti-patterns before you run anything.

> sql_analyze "SELECT * FROM events e JOIN users u ON e.user_id = u.id WHERE LOWER(u.email) LIKE '%@gmail.com'"

Issues found (3):

  ⚠ SELECT_STAR (warning, confidence: high)
    Line 1: Use explicit column list instead of SELECT *
    → Reduces data scanned and improves lineage tracking

  ⚠ FUNCTION_IN_FILTER (warning, confidence: high)
    Line 1: LOWER(u.email) prevents index usage
    → Pre-compute or use ILIKE: u.email ILIKE '%@gmail.com'

  ⚠ LIKE_LEADING_WILDCARD (info, confidence: high)
    Line 1: LIKE '%@gmail.com' cannot use indexes
    → Consider a computed column or reverse index if this is a hot path

All 19 rules

Rule	Severity	What it catches
`SELECT_STAR`	warning	`SELECT *` instead of explicit columns
`SELECT_STAR_IN_SUBQUERY`	warning	`SELECT *` in subqueries (breaks lineage)
`CARTESIAN_PRODUCT`	critical	Missing JOIN condition
`IMPLICIT_CARTESIAN`	critical	Cross join without explicit CROSS JOIN
`CORRELATED_SUBQUERY`	warning	Subquery referencing outer query (performance)
`MISSING_LIMIT`	info	No LIMIT on potentially large result sets
`ORDER_BY_WITHOUT_LIMIT`	warning	ORDER BY without LIMIT (sorts entire result)
`ORDER_BY_IN_SUBQUERY`	info	Unnecessary ORDER BY in subquery
`FUNCTION_IN_FILTER`	warning	Functions on columns in WHERE (non-sargable)
`FUNCTION_IN_JOIN`	warning	Functions on columns in JOIN conditions
`NON_EQUI_JOIN`	info	Non-equality joins (`<`, `>`, `!=`)
`OR_IN_JOIN`	warning	OR in JOIN conditions (cross join risk)
`LIKE_LEADING_WILDCARD`	info	`LIKE '%pattern'` prevents index usage
`LARGE_IN_LIST`	info	IN clause with many values
`NOT_IN_WITH_SUBQUERY`	warning	NOT IN with subquery (NULL handling issues)
`UNION_INSTEAD_OF_UNION_ALL`	info	UNION vs UNION ALL (unnecessary dedup)
`UNUSED_CTE`	info	CTE defined but never referenced
`GROUP_BY_PRIMARY_KEY`	info	GROUP BY on primary key (redundant)
`WINDOW_WITHOUT_PARTITION`	warning	Window function without PARTITION BY

Each rule includes a confidence score (high/medium/low) based on AST complexity: - Wildcards, EXISTS clauses, correlated subqueries, multi-joins with OR, and non-equi joins reduce confidence - High confidence = definite anti-pattern. Medium/low = review recommended

sql_optimize

Get optimization suggestions with rewritten SQL.

> sql_optimize "SELECT * FROM orders o JOIN customers c ON o.customer_id = c.id WHERE YEAR(o.order_date) = 2026"

Suggestions:
  1. Replace SELECT * with explicit columns (saves ~70% scan)
  2. Replace YEAR(o.order_date) = 2026 with:
     o.order_date >= '2026-01-01' AND o.order_date < '2027-01-01'
     (enables partition pruning)

Optimized SQL:
  SELECT o.order_id, o.amount, o.order_date, c.name, c.email
  FROM orders o
  JOIN customers c ON o.customer_id = c.id
  WHERE o.order_date >= '2026-01-01'
    AND o.order_date < '2027-01-01'

Estimated improvement: 70-80% reduction in bytes scanned

sql_translate

Translate SQL between warehouse dialects.

> sql_translate \
    --source snowflake \
    --target bigquery \
    "SELECT
        DATEADD(day, -30, CURRENT_TIMESTAMP()) AS thirty_days_ago,
        IFF(status = 'active', 1, 0) AS is_active,
        TRY_TO_NUMBER(amount_str) AS amount,
        ARRAY_AGG(tag) WITHIN GROUP (ORDER BY tag) AS tags
     FROM my_table
     QUALIFY ROW_NUMBER() OVER (PARTITION BY id ORDER BY updated_at DESC) = 1"

Translated (BigQuery):
  SELECT
      DATE_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY) AS thirty_days_ago,
      IF(status = 'active', 1, 0) AS is_active,
      SAFE_CAST(amount_str AS NUMERIC) AS amount,
      ARRAY_AGG(tag ORDER BY tag) AS tags
  FROM `my_table`
  QUALIFY ROW_NUMBER() OVER (PARTITION BY id ORDER BY updated_at DESC) = 1

Warnings:
  - TRY_TO_NUMBER → SAFE_CAST: returns NULL on failure (same behavior)
  - ARRAY_AGG: WITHIN GROUP syntax removed, ORDER BY moved inline

Supported dialects

snowflake, bigquery, databricks, redshift, postgres, mysql, sqlserver, duckdb

sql_format

Format SQL for readability.

> sql_format "select a.id,b.name,count(*) as cnt from table_a a join table_b b on a.id=b.a_id where a.status='active' group by 1,2 order by 3 desc"

SELECT
  a.id,
  b.name,
  COUNT(*) AS cnt
FROM table_a a
JOIN table_b b
  ON a.id = b.a_id
WHERE a.status = 'active'
GROUP BY 1, 2
ORDER BY 3 DESC

sql_fix

Diagnose and auto-fix SQL errors.

> sql_fix \
    --error "SQL compilation error: Object 'ANALYTICS.PUBLIC.USERSS' does not exist" \
    "SELECT * FROM analytics.public.userss"

Diagnosis: Typo in table name. 'userss' should be 'users'

Fixed SQL:
  SELECT * FROM analytics.public.users

Additional suggestions:
  - Use schema_search to find the correct table name
  - Replace SELECT * with explicit columns

sql_diff

Compare two versions of a SQL query.

> sql_diff \
    --original "SELECT id, name FROM users WHERE active = true" \
    --modified "SELECT id, name, email FROM users WHERE active = true AND created_at > '2026-01-01'"

Diff:
  - SELECT id, name FROM users WHERE active = true
  + SELECT id, name, email FROM users WHERE active = true AND created_at > '2026-01-01'

Changes:
  + Added column: email
  + Added filter: created_at > '2026-01-01'
  Additions: 2 | Deletions: 0 | Modifications: 0

sql_rewrite

Apply automatic SQL rewrites for optimization.

> sql_rewrite "SELECT * FROM orders WHERE YEAR(order_date) = 2026 AND status IN ('shipped', 'delivered') ORDER BY order_date"

Rewrites applied:
  1. Predicate pushdown: YEAR(order_date) = 2026 → order_date >= '2026-01-01' AND order_date < '2027-01-01'
  2. SELECT pruning: SELECT * → explicit columns (when schema context available)

Rewritten SQL:
  SELECT *
  FROM orders
  WHERE order_date >= '2026-01-01'
    AND order_date < '2027-01-01'
    AND status IN ('shipped', 'delivered')
  ORDER BY order_date

Rewrite strategies

Predicate pushdown. Move filters closer to data source
SELECT pruning. Replace * with explicit columns
Function elimination. Replace non-sargable functions with range predicates
JOIN reordering. Smaller tables first
Subquery flattening. Convert to JOINs where possible
UNION ALL promotion. Replace UNION with UNION ALL when safe

sql_explain

Generate execution plans.

> sql_explain "SELECT * FROM orders JOIN customers ON orders.customer_id = customers.id" --warehouse prod-snowflake

EXPLAIN output:
  GlobalStats:
    partitionsTotal=1024
    partitionsAssigned=1024
    bytesAssigned=4.2GB
  Operations:
    1. TableScan: orders (1024 partitions, 4.2GB)
    2. TableScan: customers (8 partitions, 12MB)
    3. Join: HASH (orders.customer_id = customers.id)
    4. Result: 15 columns

sql_autocomplete

Schema-aware SQL completion.

> sql_autocomplete --prefix "SELECT o.order_id, o.amo" --table_context ["orders"]

Suggestions:
  1. o.amount (DECIMAL), from orders.amount
  2. o.amount_usd (DECIMAL), from orders.amount_usd