Files
supermarket/MCP_EXAMPLES.md
Mc Smog 5a763a4e13 feat: add Postgres MCP integration for database testing
- Add postgres-mcp service to docker-compose.yml (SSE mode on port 8000)
- Add .mcp.json.example with SSE configuration template
- Add .gitignore entries for .claude/settings.local.json and .mcp.json
- Add MCP_EXAMPLES.md with query examples for testing scraping results
- Add analysis scripts: analyze-category-nulls.ts, check-product-details.ts,
  inspect-api-response.ts

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-21 23:29:02 +05:00

8.2 KiB

MCP Examples for Supermarket Scraper

This document contains example queries and prompts you can use with the Postgres MCP server to test and analyze your scraping results.

Setup

1. Install Docker (if not already installed)

2. Pull the Postgres MCP image

docker pull crystaldba/postgres-mcp

3. Start your database

docker-compose up -d postgres

4. Configure Claude Code with MCP

Copy the configuration from .mcp.json.example and add it to your Claude config:

OS Config File Location
Windows %APPDATA%\Claude\claude_desktop_config.json
macOS ~/Library/Application Support/Claude/claude_desktop_config.json
Linux ~/.config/Claude/claude_desktop_config.json

Or via VSCode: SettingsMCPConfiguration File


Natural Language Prompts

You can ask the AI questions in natural language, and it will use Postgres MCP to query your database:

Database Overview

  • "What tables exist in the database?"
  • "Show me the schema of the Product table"
  • "What are the relationships between tables?"
  • "Analyze the database health"

Scraping Results

  • "How many products are in the database?"
  • "Show me products with the highest discounts"
  • "Find products without categories"
  • "What is the price distribution of products?"
  • "Which stores have the most products?"

Performance

  • "Are there any slow queries?"
  • "What indexes should I add to improve performance?"
  • "Show me the database health report"

SQL Query Examples

You can also ask the AI to execute specific SQL queries using the MCP tools.

1. Basic Scraping Validation

-- Total products count
SELECT COUNT(*) as total_products FROM "Product";

-- Products by store
SELECT s.name, COUNT(p.id) as product_count
FROM "Store" s
LEFT JOIN "Product" p ON s.id = p."storeId"
GROUP BY s.id, s.name;

-- Latest scraping session
SELECT * FROM "ScrapingSession"
ORDER BY "startedAt" DESC LIMIT 1;

-- All scraping sessions with status
SELECT
  id,
  "sourceType",
  status,
  "startedAt",
  "finishedAt",
  CASE
    WHEN "finishedAt" IS NOT NULL
    THEN EXTRACT(EPOCH FROM ("finishedAt" - "startedAt"))
    ELSE NULL
  END as duration_seconds
FROM "ScrapingSession"
ORDER BY "startedAt" DESC;

2. Category Analysis

-- Products without categories
SELECT COUNT(*) FROM "Product" WHERE "categoryId" IS NULL;

-- Categories by product count
SELECT c.name, COUNT(p.id) as product_count
FROM "Category" c
LEFT JOIN "Product" p ON p."categoryId" = c.id
GROUP BY c.id, c.name
ORDER BY product_count DESC NULLS LAST
LIMIT 20;

-- Category hierarchy with counts
SELECT
  c1.name as category,
  c2.name as parent_category,
  COUNT(p.id) as product_count
FROM "Category" c1
LEFT JOIN "Category" c2 ON c1."parentId" = c2.id
LEFT JOIN "Product" p ON p."categoryId" = c1.id
GROUP BY c1.id, c1.name, c2.name
ORDER BY product_count DESC;

-- Top-level categories (no parent)
SELECT c.name, COUNT(p.id) as product_count
FROM "Category" c
LEFT JOIN "Product" p ON p."categoryId" = c.id
WHERE c."parentId" IS NULL
GROUP BY c.id, c.name
ORDER BY product_count DESC;

3. Price and Promotion Analysis

-- Products with active discounts
SELECT
  name,
  "currentPrice",
  "oldPrice",
  "discountPercent",
  "promotionEndDate"
FROM "Product"
WHERE "oldPrice" IS NOT NULL
  AND ("promotionEndDate" IS NULL OR "promotionEndDate" > NOW())
ORDER BY "discountPercent" DESC
LIMIT 20;

-- Expired promotions
SELECT
  name,
  "currentPrice",
  "oldPrice",
  "discountPercent",
  "promotionEndDate"
FROM "Product"
WHERE "oldPrice" IS NOT NULL
  AND "promotionEndDate" IS NOT NULL
  AND "promotionEndDate" < NOW()
ORDER BY "promotionEndDate" DESC
LIMIT 20;

-- Price distribution
SELECT
  CASE
    WHEN "currentPrice" < 100 THEN '0-100'
    WHEN "currentPrice" < 500 THEN '100-500'
    WHEN "currentPrice" < 1000 THEN '500-1000'
    ELSE '1000+'
  END as price_range,
  COUNT(*) as count
FROM "Product"
GROUP BY price_range
ORDER BY price_range;

-- Most expensive products
SELECT name, "currentPrice", brand, unit
FROM "Product"
ORDER BY "currentPrice" DESC
LIMIT 20;

-- Cheapest products
SELECT name, "currentPrice", brand, unit
FROM "Product"
WHERE "currentPrice" > 0
ORDER BY "currentPrice" ASC
LIMIT 20;

4. Data Quality Checks

-- Products missing critical fields
SELECT
  COUNT(*) FILTER (WHERE name IS NULL OR name = '') as missing_name,
  COUNT(*) FILTER (WHERE "categoryId" IS NULL) as missing_category,
  COUNT(*) FILTER (WHERE brand IS NULL OR brand = '') as missing_brand,
  COUNT(*) FILTER (WHERE "imageUrl" IS NULL OR "imageUrl" = '') as missing_image,
  COUNT(*) FILTER (WHERE url IS NULL OR url = '') as missing_url,
  COUNT(*) as total_products
FROM "Product";

-- Duplicate products check (same externalId for different stores)
SELECT "externalId", COUNT(*) as count
FROM "Product"
GROUP BY "externalId"
HAVING COUNT(*) > 1;

-- Products with strange prices (0 or negative)
SELECT name, "currentPrice", "oldPrice"
FROM "Product"
WHERE "currentPrice" <= 0 OR ("oldPrice" IS NOT NULL AND "oldPrice" <= 0)
LIMIT 20;

-- Products with impossible discounts
SELECT name, "currentPrice", "oldPrice", "discountPercent"
FROM "Product"
WHERE "discountPercent" < 0 OR "discountPercent" > 100
LIMIT 20;

5. Rating Analysis

-- Top rated products
SELECT
  name,
  rating,
  "scoresCount",
  "commentsCount",
  brand
FROM "Product"
WHERE rating IS NOT NULL
ORDER BY rating DESC, "scoresCount" DESC
LIMIT 20;

-- Most reviewed products
SELECT
  name,
  rating,
  "scoresCount",
  "commentsCount",
  brand
FROM "Product"
WHERE "commentsCount" IS NOT NULL
ORDER BY "commentsCount" DESC
LIMIT 20;

-- Products without ratings
SELECT COUNT(*) FROM "Product" WHERE rating IS NULL;

6. Brand Analysis

-- Top brands by product count
SELECT brand, COUNT(*) as product_count
FROM "Product"
WHERE brand IS NOT NULL AND brand != ''
GROUP BY brand
ORDER BY product_count DESC
LIMIT 20;

-- Average price by brand (for brands with 10+ products)
SELECT
  brand,
  COUNT(*) as product_count,
  AVG("currentPrice") as avg_price,
  MIN("currentPrice") as min_price,
  MAX("currentPrice") as max_price
FROM "Product"
WHERE brand IS NOT NULL AND brand != ''
GROUP BY brand
HAVING COUNT(*) >= 10
ORDER BY product_count DESC
LIMIT 20;

7. Health Check Queries

-- Table sizes
SELECT
  schemaname,
  tablename,
  pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;

-- Index usage
SELECT
  schemaname,
  tablename,
  indexname,
  idx_scan,
  idx_tup_read,
  idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;

-- Table row counts
SELECT
  'Store' as table_name,
  COUNT(*) as row_count
FROM "Store"
UNION ALL
SELECT 'Category', COUNT(*) FROM "Category"
UNION ALL
SELECT 'Product', COUNT(*) FROM "Product"
UNION ALL
SELECT 'ScrapingSession', COUNT(*) FROM "ScrapingSession";

MCP Tools Reference

Postgres MCP provides these tools that the AI can use:

Tool Description
list_schemas Lists all database schemas
list_objects Lists tables, views, sequences in a schema
get_object_details Gets table/column details
execute_sql Executes SQL queries
explain_query Shows query execution plan
get_top_queries Reports slowest queries
analyze_workload_indexes Recommends indexes for workload
analyze_db_health Performs comprehensive health checks

Example Workflow

Here's a typical workflow for testing scraping results:

  1. Start the database:

    docker-compose up -d postgres
    
  2. Run the scraper:

    pnpm dev
    
  3. Ask the AI to verify:

    • "Check the database health"
    • "How many products were scraped?"
    • "Are there any products without categories?"
    • "Show me the top 20 products by discount"
    • "Find any data quality issues"
  4. Analyze performance:

    • "Are there any slow queries?"
    • "Should I add any indexes?"