4.3 KiB
4.3 KiB
AGENTS.md
Guidelines for AI coding agents working on this repository.
Project Overview
TypeScript-based scraper for Russian supermarkets (Magnit). Uses Playwright for sessions, Axios for API, PostgreSQL with Drizzle ORM.
Build & Run Commands
Package Manager: Use pnpm (not npm/yarn)
pnpm install # Install dependencies
pnpm exec playwright install chromium # Install browsers (once)
pnpm type-check # Type checking (validation)
pnpm build # Build TypeScript to dist/
pnpm dev # Run main scraper
pnpm enrich # Run product enrichment
pnpm test-db # Test database connection
Drizzle Commands
pnpm db:generate # Generate migration files
pnpm db:migrate # Apply migrations
pnpm db:push # Push schema changes directly (dev only)
pnpm db:studio # Open database GUI
Running Scripts Directly
tsx src/scripts/scrape-magnit-products.ts
MAGNIT_STORE_CODE=992301 tsx src/scripts/scrape-magnit-products.ts
Testing
No test framework configured. Manual testing via pnpm test-db, pnpm dev, Prisma Studio.
Code Style
Imports
- External packages first, then internal modules
- Always include
.jsextension for local imports (ESM) - Use named imports from Drizzle schema
import { chromium, Browser } from 'playwright';
import axios from 'axios';
import { Logger } from '../../../utils/logger.js';
import { db } from '../../../config/database.js';
import { products, stores, categories } from '../../../db/schema.js';
import { eq, and, asc } from 'drizzle-orm';
Naming Conventions
| Type | Convention | Example |
|---|---|---|
| Classes/Interfaces | PascalCase | MagnitApiScraper, CreateProductData |
| Functions/variables | camelCase | scrapeAllProducts, deviceId |
| Constants | UPPER_SNAKE_CASE | ACTUAL_API_PAGE_SIZE |
| Class files | PascalCase | MagnitApiScraper.ts |
| Util files | camelCase | logger.ts, errors.ts |
TypeScript Patterns
- Strict mode - all types explicit
- Interfaces for data, optional props with
?,readonlyfor constants
export interface MagnitScraperConfig {
storeCode: string;
headless?: boolean;
}
Error Handling
Use custom error classes from src/utils/errors.ts:
ScraperError- scraping failuresDatabaseError- database operationsAPIError- HTTP/API failures (includes statusCode)
try {
// operation
} catch (error) {
Logger.error('Ошибка операции:', error);
throw new APIError(
`Не удалось: ${error instanceof Error ? error.message : String(error)}`,
statusCode
);
}
Logging
Use static Logger class from src/utils/logger.ts:
Logger.info('Message'); // Always shown
Logger.error('Error:', error); // Always shown
Logger.debug('Debug'); // Only when DEBUG=true
Async/Class Patterns
- All async methods return
Promise<T>with explicit return types - Class order: private props -> constructor -> public methods -> private methods
- Lifecycle:
initialize()-> operations ->close()
Services Pattern
- Services receive
db(Drizzle instance) via constructor (DI) - Use
getOrCreatefor idempotent operations - Never call Drizzle directly from scrapers
Database Patterns
- Upsert via composite unique constraint on
(externalId, storeId) - Batch processing: 50 items per batch
- Prices: Decimal (rubles), stored as decimal type
- Use
.select().from().where()for queries - Use
.insert().values()for inserts - Use
.update().set().where()for updates - Use
.delete().where()for deletes
Comments
- JSDoc for public methods, inline comments in Russian
/** Инициализация сессии через Playwright */
async initialize(): Promise<void> { }
Cursor Rules
Requestly API Tests (.requestly-supermarket/**/*.json)
- Use
rq.test()for tests,rq.expect()for assertions - Access response via
rq.response.body(parse as JSON) - Prices in kopecks (24999 = 249.99 rubles)
See .cursor/rules/requestly-test-rules.mdc for full docs.
Environment Variables
DATABASE_URL=postgresql://user:password@localhost:5432/supermarket
MAGNIT_STORE_CODE=992301
DEBUG=true