Command Palette

Search for a command to run...

YepAPI
Free · All Tools

Data Pipelines Agent Skill

ETL patterns, cron scheduling, queue-based processing, and batch jobs with Inngest/BullMQ.

etlpipelinesqueuesbatch

The Skill

Full content, every format. Copy it, download it, or install with one command.

SKILL.md
---
description: ETL patterns, cron scheduling, queue-based processing, and batch jobs with Inngest/BullMQ.
homepage: https://yepapi.com/skills/data-pipelines
metadata:
  tags: [etl, pipelines, queues, batch]
---

# Data Pipelines

## Rules

- Inngest for serverless event-driven pipelines: define functions that trigger on events, built-in retries and scheduling
- BullMQ for self-hosted queues: Redis-backed, supports delayed jobs, rate limiting, priorities, and concurrency control
- Cron scheduling: use Inngest `cron` trigger or `node-cron` — store schedule in config, not hardcoded
- ETL pattern: Extract (fetch from source) -> Transform (validate, clean, reshape) -> Load (write to destination) — each step is a separate function
- Idempotency: every pipeline step must be safe to retry — use unique job IDs and upserts, not inserts
- Batch processing: chunk large datasets (100-1000 items per batch), process in parallel with concurrency limits
- Error handling: dead letter queue for failed jobs — log the payload and error, alert after N failures
- Backpressure: limit concurrent workers — don't let a burst of events overwhelm downstream services
- Observability: log `jobId`, `step`, `duration`, `status` for every pipeline run — track success rate and p95 latency
- Graceful shutdown: handle `SIGTERM` — finish current job before exiting, don't accept new jobs
- Data validation: validate input schema at Extract, validate output schema at Load — fail fast on bad data

## Avoid

- Long-running HTTP requests as pipeline steps — use queues for anything over 30 seconds
- Processing unbounded datasets without pagination — always paginate source data with cursor or offset
- Storing job payloads in the queue — store a reference ID, fetch data in the worker
- Cron jobs without distributed locks — use Redis locks or Inngest to prevent duplicate execution

Install

Why Use the Data Pipelines Skill?

Without this skill, your AI guesses at data pipelines patterns. It might hallucinate deprecated APIs, use outdated conventions, or miss best practices entirely. With it, your AI follows a proven ruleset — every suggestion aligns with current standards.

Drop this skill into your project and your AI instantly knows the rules. Better code suggestions, fewer errors, faster shipping.

Try These Prompts

These prompts work better with the Data Pipelines skill installed. Your AI knows the context and writes code that fits.

"Build an ETL pipeline with cron scheduling, queue-based processing, and error recovery"

"Create a data import system that handles CSV uploads, validation, and batch processing"

"Set up an Inngest-based pipeline for transforming and loading data from external APIs"

Data Pipelines skill — FAQ

It covers ETL patterns, cron-based scheduling, queue processing, and batch jobs with Inngest or BullMQ. Your AI builds reliable data pipelines with proper error handling and monitoring.

Run `npx skills add YepAPI/skills --skill data-pipelines` in your project root. This copies the skill file into your repo where your AI coding tool can read it automatically.

Use cron for scheduled batch processing (daily reports, nightly imports). Use event-driven with Inngest or BullMQ for real-time processing triggered by user actions or webhooks.

Want more skills?

Browse all 110 free skills for builders.

See All Skills