- Add postgres-mcp service to docker-compose.yml (SSE mode on port 8000) - Add .mcp.json.example with SSE configuration template - Add .gitignore entries for .claude/settings.local.json and .mcp.json - Add MCP_EXAMPLES.md with query examples for testing scraping results - Add analysis scripts: analyze-category-nulls.ts, check-product-details.ts, inspect-api-response.ts Co-Authored-By: Claude <noreply@anthropic.com>
8.2 KiB
8.2 KiB
MCP Examples for Supermarket Scraper
This document contains example queries and prompts you can use with the Postgres MCP server to test and analyze your scraping results.
Setup
1. Install Docker (if not already installed)
- Windows: Docker Desktop
- macOS: Docker Desktop for Mac
- Linux:
sudo apt-get install docker.io
2. Pull the Postgres MCP image
docker pull crystaldba/postgres-mcp
3. Start your database
docker-compose up -d postgres
4. Configure Claude Code with MCP
Copy the configuration from .mcp.json.example and add it to your Claude config:
| OS | Config File Location |
|---|---|
| Windows | %APPDATA%\Claude\claude_desktop_config.json |
| macOS | ~/Library/Application Support/Claude/claude_desktop_config.json |
| Linux | ~/.config/Claude/claude_desktop_config.json |
Or via VSCode: Settings → MCP → Configuration File
Natural Language Prompts
You can ask the AI questions in natural language, and it will use Postgres MCP to query your database:
Database Overview
- "What tables exist in the database?"
- "Show me the schema of the Product table"
- "What are the relationships between tables?"
- "Analyze the database health"
Scraping Results
- "How many products are in the database?"
- "Show me products with the highest discounts"
- "Find products without categories"
- "What is the price distribution of products?"
- "Which stores have the most products?"
Performance
- "Are there any slow queries?"
- "What indexes should I add to improve performance?"
- "Show me the database health report"
SQL Query Examples
You can also ask the AI to execute specific SQL queries using the MCP tools.
1. Basic Scraping Validation
-- Total products count
SELECT COUNT(*) as total_products FROM "Product";
-- Products by store
SELECT s.name, COUNT(p.id) as product_count
FROM "Store" s
LEFT JOIN "Product" p ON s.id = p."storeId"
GROUP BY s.id, s.name;
-- Latest scraping session
SELECT * FROM "ScrapingSession"
ORDER BY "startedAt" DESC LIMIT 1;
-- All scraping sessions with status
SELECT
id,
"sourceType",
status,
"startedAt",
"finishedAt",
CASE
WHEN "finishedAt" IS NOT NULL
THEN EXTRACT(EPOCH FROM ("finishedAt" - "startedAt"))
ELSE NULL
END as duration_seconds
FROM "ScrapingSession"
ORDER BY "startedAt" DESC;
2. Category Analysis
-- Products without categories
SELECT COUNT(*) FROM "Product" WHERE "categoryId" IS NULL;
-- Categories by product count
SELECT c.name, COUNT(p.id) as product_count
FROM "Category" c
LEFT JOIN "Product" p ON p."categoryId" = c.id
GROUP BY c.id, c.name
ORDER BY product_count DESC NULLS LAST
LIMIT 20;
-- Category hierarchy with counts
SELECT
c1.name as category,
c2.name as parent_category,
COUNT(p.id) as product_count
FROM "Category" c1
LEFT JOIN "Category" c2 ON c1."parentId" = c2.id
LEFT JOIN "Product" p ON p."categoryId" = c1.id
GROUP BY c1.id, c1.name, c2.name
ORDER BY product_count DESC;
-- Top-level categories (no parent)
SELECT c.name, COUNT(p.id) as product_count
FROM "Category" c
LEFT JOIN "Product" p ON p."categoryId" = c.id
WHERE c."parentId" IS NULL
GROUP BY c.id, c.name
ORDER BY product_count DESC;
3. Price and Promotion Analysis
-- Products with active discounts
SELECT
name,
"currentPrice",
"oldPrice",
"discountPercent",
"promotionEndDate"
FROM "Product"
WHERE "oldPrice" IS NOT NULL
AND ("promotionEndDate" IS NULL OR "promotionEndDate" > NOW())
ORDER BY "discountPercent" DESC
LIMIT 20;
-- Expired promotions
SELECT
name,
"currentPrice",
"oldPrice",
"discountPercent",
"promotionEndDate"
FROM "Product"
WHERE "oldPrice" IS NOT NULL
AND "promotionEndDate" IS NOT NULL
AND "promotionEndDate" < NOW()
ORDER BY "promotionEndDate" DESC
LIMIT 20;
-- Price distribution
SELECT
CASE
WHEN "currentPrice" < 100 THEN '0-100'
WHEN "currentPrice" < 500 THEN '100-500'
WHEN "currentPrice" < 1000 THEN '500-1000'
ELSE '1000+'
END as price_range,
COUNT(*) as count
FROM "Product"
GROUP BY price_range
ORDER BY price_range;
-- Most expensive products
SELECT name, "currentPrice", brand, unit
FROM "Product"
ORDER BY "currentPrice" DESC
LIMIT 20;
-- Cheapest products
SELECT name, "currentPrice", brand, unit
FROM "Product"
WHERE "currentPrice" > 0
ORDER BY "currentPrice" ASC
LIMIT 20;
4. Data Quality Checks
-- Products missing critical fields
SELECT
COUNT(*) FILTER (WHERE name IS NULL OR name = '') as missing_name,
COUNT(*) FILTER (WHERE "categoryId" IS NULL) as missing_category,
COUNT(*) FILTER (WHERE brand IS NULL OR brand = '') as missing_brand,
COUNT(*) FILTER (WHERE "imageUrl" IS NULL OR "imageUrl" = '') as missing_image,
COUNT(*) FILTER (WHERE url IS NULL OR url = '') as missing_url,
COUNT(*) as total_products
FROM "Product";
-- Duplicate products check (same externalId for different stores)
SELECT "externalId", COUNT(*) as count
FROM "Product"
GROUP BY "externalId"
HAVING COUNT(*) > 1;
-- Products with strange prices (0 or negative)
SELECT name, "currentPrice", "oldPrice"
FROM "Product"
WHERE "currentPrice" <= 0 OR ("oldPrice" IS NOT NULL AND "oldPrice" <= 0)
LIMIT 20;
-- Products with impossible discounts
SELECT name, "currentPrice", "oldPrice", "discountPercent"
FROM "Product"
WHERE "discountPercent" < 0 OR "discountPercent" > 100
LIMIT 20;
5. Rating Analysis
-- Top rated products
SELECT
name,
rating,
"scoresCount",
"commentsCount",
brand
FROM "Product"
WHERE rating IS NOT NULL
ORDER BY rating DESC, "scoresCount" DESC
LIMIT 20;
-- Most reviewed products
SELECT
name,
rating,
"scoresCount",
"commentsCount",
brand
FROM "Product"
WHERE "commentsCount" IS NOT NULL
ORDER BY "commentsCount" DESC
LIMIT 20;
-- Products without ratings
SELECT COUNT(*) FROM "Product" WHERE rating IS NULL;
6. Brand Analysis
-- Top brands by product count
SELECT brand, COUNT(*) as product_count
FROM "Product"
WHERE brand IS NOT NULL AND brand != ''
GROUP BY brand
ORDER BY product_count DESC
LIMIT 20;
-- Average price by brand (for brands with 10+ products)
SELECT
brand,
COUNT(*) as product_count,
AVG("currentPrice") as avg_price,
MIN("currentPrice") as min_price,
MAX("currentPrice") as max_price
FROM "Product"
WHERE brand IS NOT NULL AND brand != ''
GROUP BY brand
HAVING COUNT(*) >= 10
ORDER BY product_count DESC
LIMIT 20;
7. Health Check Queries
-- Table sizes
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;
-- Index usage
SELECT
schemaname,
tablename,
indexname,
idx_scan,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;
-- Table row counts
SELECT
'Store' as table_name,
COUNT(*) as row_count
FROM "Store"
UNION ALL
SELECT 'Category', COUNT(*) FROM "Category"
UNION ALL
SELECT 'Product', COUNT(*) FROM "Product"
UNION ALL
SELECT 'ScrapingSession', COUNT(*) FROM "ScrapingSession";
MCP Tools Reference
Postgres MCP provides these tools that the AI can use:
| Tool | Description |
|---|---|
list_schemas |
Lists all database schemas |
list_objects |
Lists tables, views, sequences in a schema |
get_object_details |
Gets table/column details |
execute_sql |
Executes SQL queries |
explain_query |
Shows query execution plan |
get_top_queries |
Reports slowest queries |
analyze_workload_indexes |
Recommends indexes for workload |
analyze_db_health |
Performs comprehensive health checks |
Example Workflow
Here's a typical workflow for testing scraping results:
-
Start the database:
docker-compose up -d postgres -
Run the scraper:
pnpm dev -
Ask the AI to verify:
- "Check the database health"
- "How many products were scraped?"
- "Are there any products without categories?"
- "Show me the top 20 products by discount"
- "Find any data quality issues"
-
Analyze performance:
- "Are there any slow queries?"
- "Should I add any indexes?"