# MCP Examples for Supermarket Scraper This document contains example queries and prompts you can use with the Postgres MCP server to test and analyze your scraping results. ## Setup ### 1. Install Docker (if not already installed) - Windows: [Docker Desktop](https://www.docker.com/products/docker-desktop/) - macOS: [Docker Desktop for Mac](https://www.docker.com/products/docker-desktop/) - Linux: `sudo apt-get install docker.io` ### 2. Pull the Postgres MCP image ```bash docker pull crystaldba/postgres-mcp ``` ### 3. Start your database ```bash docker-compose up -d postgres ``` ### 4. Configure Claude Code with MCP Copy the configuration from `.mcp.json.example` and add it to your Claude config: | OS | Config File Location | |----|---------------------| | Windows | `%APPDATA%\Claude\claude_desktop_config.json` | | macOS | `~/Library/Application Support/Claude/claude_desktop_config.json` | | Linux | `~/.config/Claude/claude_desktop_config.json` | Or via VSCode: `Settings` → `MCP` → `Configuration File` --- ## Natural Language Prompts You can ask the AI questions in natural language, and it will use Postgres MCP to query your database: ### Database Overview - "What tables exist in the database?" - "Show me the schema of the Product table" - "What are the relationships between tables?" - "Analyze the database health" ### Scraping Results - "How many products are in the database?" - "Show me products with the highest discounts" - "Find products without categories" - "What is the price distribution of products?" - "Which stores have the most products?" ### Performance - "Are there any slow queries?" - "What indexes should I add to improve performance?" - "Show me the database health report" --- ## SQL Query Examples You can also ask the AI to execute specific SQL queries using the MCP tools. ### 1. Basic Scraping Validation ```sql -- Total products count SELECT COUNT(*) as total_products FROM "Product"; -- Products by store SELECT s.name, COUNT(p.id) as product_count FROM "Store" s LEFT JOIN "Product" p ON s.id = p."storeId" GROUP BY s.id, s.name; -- Latest scraping session SELECT * FROM "ScrapingSession" ORDER BY "startedAt" DESC LIMIT 1; -- All scraping sessions with status SELECT id, "sourceType", status, "startedAt", "finishedAt", CASE WHEN "finishedAt" IS NOT NULL THEN EXTRACT(EPOCH FROM ("finishedAt" - "startedAt")) ELSE NULL END as duration_seconds FROM "ScrapingSession" ORDER BY "startedAt" DESC; ``` ### 2. Category Analysis ```sql -- Products without categories SELECT COUNT(*) FROM "Product" WHERE "categoryId" IS NULL; -- Categories by product count SELECT c.name, COUNT(p.id) as product_count FROM "Category" c LEFT JOIN "Product" p ON p."categoryId" = c.id GROUP BY c.id, c.name ORDER BY product_count DESC NULLS LAST LIMIT 20; -- Category hierarchy with counts SELECT c1.name as category, c2.name as parent_category, COUNT(p.id) as product_count FROM "Category" c1 LEFT JOIN "Category" c2 ON c1."parentId" = c2.id LEFT JOIN "Product" p ON p."categoryId" = c1.id GROUP BY c1.id, c1.name, c2.name ORDER BY product_count DESC; -- Top-level categories (no parent) SELECT c.name, COUNT(p.id) as product_count FROM "Category" c LEFT JOIN "Product" p ON p."categoryId" = c.id WHERE c."parentId" IS NULL GROUP BY c.id, c.name ORDER BY product_count DESC; ``` ### 3. Price and Promotion Analysis ```sql -- Products with active discounts SELECT name, "currentPrice", "oldPrice", "discountPercent", "promotionEndDate" FROM "Product" WHERE "oldPrice" IS NOT NULL AND ("promotionEndDate" IS NULL OR "promotionEndDate" > NOW()) ORDER BY "discountPercent" DESC LIMIT 20; -- Expired promotions SELECT name, "currentPrice", "oldPrice", "discountPercent", "promotionEndDate" FROM "Product" WHERE "oldPrice" IS NOT NULL AND "promotionEndDate" IS NOT NULL AND "promotionEndDate" < NOW() ORDER BY "promotionEndDate" DESC LIMIT 20; -- Price distribution SELECT CASE WHEN "currentPrice" < 100 THEN '0-100' WHEN "currentPrice" < 500 THEN '100-500' WHEN "currentPrice" < 1000 THEN '500-1000' ELSE '1000+' END as price_range, COUNT(*) as count FROM "Product" GROUP BY price_range ORDER BY price_range; -- Most expensive products SELECT name, "currentPrice", brand, unit FROM "Product" ORDER BY "currentPrice" DESC LIMIT 20; -- Cheapest products SELECT name, "currentPrice", brand, unit FROM "Product" WHERE "currentPrice" > 0 ORDER BY "currentPrice" ASC LIMIT 20; ``` ### 4. Data Quality Checks ```sql -- Products missing critical fields SELECT COUNT(*) FILTER (WHERE name IS NULL OR name = '') as missing_name, COUNT(*) FILTER (WHERE "categoryId" IS NULL) as missing_category, COUNT(*) FILTER (WHERE brand IS NULL OR brand = '') as missing_brand, COUNT(*) FILTER (WHERE "imageUrl" IS NULL OR "imageUrl" = '') as missing_image, COUNT(*) FILTER (WHERE url IS NULL OR url = '') as missing_url, COUNT(*) as total_products FROM "Product"; -- Duplicate products check (same externalId for different stores) SELECT "externalId", COUNT(*) as count FROM "Product" GROUP BY "externalId" HAVING COUNT(*) > 1; -- Products with strange prices (0 or negative) SELECT name, "currentPrice", "oldPrice" FROM "Product" WHERE "currentPrice" <= 0 OR ("oldPrice" IS NOT NULL AND "oldPrice" <= 0) LIMIT 20; -- Products with impossible discounts SELECT name, "currentPrice", "oldPrice", "discountPercent" FROM "Product" WHERE "discountPercent" < 0 OR "discountPercent" > 100 LIMIT 20; ``` ### 5. Rating Analysis ```sql -- Top rated products SELECT name, rating, "scoresCount", "commentsCount", brand FROM "Product" WHERE rating IS NOT NULL ORDER BY rating DESC, "scoresCount" DESC LIMIT 20; -- Most reviewed products SELECT name, rating, "scoresCount", "commentsCount", brand FROM "Product" WHERE "commentsCount" IS NOT NULL ORDER BY "commentsCount" DESC LIMIT 20; -- Products without ratings SELECT COUNT(*) FROM "Product" WHERE rating IS NULL; ``` ### 6. Brand Analysis ```sql -- Top brands by product count SELECT brand, COUNT(*) as product_count FROM "Product" WHERE brand IS NOT NULL AND brand != '' GROUP BY brand ORDER BY product_count DESC LIMIT 20; -- Average price by brand (for brands with 10+ products) SELECT brand, COUNT(*) as product_count, AVG("currentPrice") as avg_price, MIN("currentPrice") as min_price, MAX("currentPrice") as max_price FROM "Product" WHERE brand IS NOT NULL AND brand != '' GROUP BY brand HAVING COUNT(*) >= 10 ORDER BY product_count DESC LIMIT 20; ``` ### 7. Health Check Queries ```sql -- Table sizes SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size FROM pg_tables WHERE schemaname = 'public' ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC; -- Index usage SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetch FROM pg_stat_user_indexes ORDER BY idx_scan DESC; -- Table row counts SELECT 'Store' as table_name, COUNT(*) as row_count FROM "Store" UNION ALL SELECT 'Category', COUNT(*) FROM "Category" UNION ALL SELECT 'Product', COUNT(*) FROM "Product" UNION ALL SELECT 'ScrapingSession', COUNT(*) FROM "ScrapingSession"; ``` --- ## MCP Tools Reference Postgres MCP provides these tools that the AI can use: | Tool | Description | |------|-------------| | `list_schemas` | Lists all database schemas | | `list_objects` | Lists tables, views, sequences in a schema | | `get_object_details` | Gets table/column details | | `execute_sql` | Executes SQL queries | | `explain_query` | Shows query execution plan | | `get_top_queries` | Reports slowest queries | | `analyze_workload_indexes` | Recommends indexes for workload | | `analyze_db_health` | Performs comprehensive health checks | --- ## Example Workflow Here's a typical workflow for testing scraping results: 1. **Start the database**: ```bash docker-compose up -d postgres ``` 2. **Run the scraper**: ```bash pnpm dev ``` 3. **Ask the AI to verify**: - "Check the database health" - "How many products were scraped?" - "Are there any products without categories?" - "Show me the top 20 products by discount" - "Find any data quality issues" 4. **Analyze performance**: - "Are there any slow queries?" - "Should I add any indexes?"