feat: enhanced Magnit scraper with streaming mode and retry logic

- Add streaming mode for memory-efficient large catalog scraping
- Implement retry logic with exponential backoff
- Add auto session reinitialization on 403 errors
- Add configurable options (pageSize, maxProducts, rateLimitDelay)
- Add maxIterations protection against infinite loops
- Add retry.ts utility module with withRetry and withRetryAndReinit
- Update .env.example with new scraping options
- Add pgAdmin and CloudBeaver to docker-compose

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
2026-01-21 22:14:04 +05:00
parent 19c0426cdc
commit 9164527f58
5 changed files with 585 additions and 74 deletions

View File

@@ -6,3 +6,14 @@ MAGNIT_STORE_CODE="992301"
MAGNIT_STORE_TYPE="6"
MAGNIT_CATALOG_TYPE="1"
# Scraping Options
MAGNIT_USE_STREAMING=true # true = streaming mode (рекомендуется), false = legacy
MAGNIT_PAGE_SIZE=50 # Размер страницы API (max 50)
MAGNIT_MAX_PRODUCTS= # Лимит товаров (пусто = без лимита)
MAGNIT_RATE_LIMIT_DELAY=300 # Задержка между запросами (ms)
MAGNIT_MAX_ITERATIONS=10000 # Защита от бесконечного цикла
MAGNIT_HEADLESS=true # Headless режим браузера
# Resilience Options
MAGNIT_RETRY_ATTEMPTS=3 # Количество попыток retry
MAGNIT_REQUEST_TIMEOUT=30000 # Timeout запросов (ms)