AI image generation with OpenAI, Azure OpenAI, Google, OpenRouter, DashScope, MiniMax, Jimeng, Seedream and Replicate APIs. Supports text-to-image, reference images, aspect ratios, and batch generation from saved prompt files. Sequential by default; use batch parallel generation when the user already has multiple prompts or wants stable multi-image throughput. Use when user asks to generate, create, or draw images.
No setup needed. Let our cloud agents run this skill for you.
Select Provider
Select Model
Claude Sonnet 4.5
$0.20/task
Best for coding tasks
Environment setup included
⚠️ DEPRECATED: This skill has been migrated to baoyu-imagine. Print a bold red warning to the user in the user's language (use ANSI escape \033[1;31m) stating that baoyu-image-gen has been migrated to baoyu-imagine and they should use baoyu-imagine instead. Then STOP — do NOT continue executing this skill.
Image Generation (AI SDK)
Official API-based image generation. Supports OpenAI, Azure OpenAI, Google, OpenRouter, DashScope (阿里通义万象), MiniMax, Jimeng (即梦), Seedream (豆包) and Replicate providers.
Script Directory
Agent Execution:
{baseDir} = this SKILL.md file's directory
Script path = {baseDir}/scripts/main.ts
Resolve ${BUN_X} runtime: if bun installed → bun; if npx available → npx -y bun; else suggest installing bun
Step 0: Load Preferences ⛔ BLOCKING
CRITICAL: This step MUST complete BEFORE any image generation. Do NOT skip or defer.
Check EXTEND.md existence (priority: project → user):
CRITICAL: If not found, complete the full setup (provider + model + quality + save location) using AskUserQuestion BEFORE generating any images. Generation is BLOCKED until EXTEND.md is created.
Paths in promptFiles, image, and ref are resolved relative to the batch file's directory. jobs is optional (overridden by CLI --jobs). Top-level array format (without jobs wrapper) is also accepted.
Options
Option
Description
--prompt <text>, -p
Prompt text
--promptfiles <files...>
Read prompt from files (concatenated)
--image <path>
Output image path (required in single-image mode)
--batchfile <path>
JSON batch file for multi-image generation
--jobs <count>
Worker count for batch mode (default: auto, max from config, built-in default 10)
For Azure, --model / default_model.azure should be the Azure deployment name. AZURE_OPENAI_DEPLOYMENT is the preferred env var, and AZURE_OPENAI_IMAGE_MODEL remains as a backward-compatible alias.
EXTEND.md overrides env vars. If both EXTEND.md default_model.google: "gemini-3-pro-image-preview" and env var GOOGLE_IMAGE_MODEL=gemini-3.1-flash-image-preview exist, EXTEND.md wins.
Agent MUST display model info before each generation:
When translating CLI args into DashScope behavior:
--size wins over --ar
For qwen-image-2.0*, prefer explicit --size; otherwise infer from --ar and use the official recommended resolutions below
For qwen-image-max/plus/image, only use the five official fixed sizes; if the requested ratio is not covered, switch to qwen-image-2.0-pro
--quality is a baoyu-image-gen compatibility preset, not a native DashScope API field. Mapping normal / 2k onto the qwen-image-2.0* table below is an implementation inference, not an official API guarantee
Recommended qwen-image-2.0* sizes for common aspect ratios:
Ratio
normal
2k
1:1
1024*1024
1536*1536
2:3
768*1152
1024*1536
3:2
1152*768
DashScope official APIs also expose negative_prompt, prompt_extend, and watermark, but baoyu-image-gen does not expose them as dedicated CLI flags today.
google/gemini-3.1-flash-image-preview (recommended, supports image output and reference-image workflows)
google/gemini-2.5-flash-image-preview
black-forest-labs/flux.2-pro
Other OpenRouter image-capable model IDs
Notes:
OpenRouter image generation uses /chat/completions, not the OpenAI /images endpoints
If --ref is used, choose a multimodal model that supports image input and image output
--imageSize maps to OpenRouter imageGenerationOptions.size; --size <WxH> is converted to the nearest OpenRouter size and inferred aspect ratio when possible
Replicate Models
Supported model formats:
owner/name (recommended for official models), e.g. google/nano-banana-pro
owner/name:version (community models by version), e.g. stability-ai/sdxl:<version>
Examples:
# Use Replicate default model${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate# Override model explicitly${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate --model google/nano-banana
Provider Selection
--ref provided + no --provider → auto-select Google first, then OpenAI, then Azure, then OpenRouter, then Replicate, then Seedream, then MiniMax (MiniMax subject reference is more specialized toward character/portrait consistency)
--provider specified → use it (if --ref, must be google, openai, azure, openrouter, replicate, seedream, or minimax)
Only one API key available → use that provider
Multiple available → default to Google
Quality Presets
Preset
Google imageSize
OpenAI Size
OpenRouter size
Replicate resolution
Use Case
normal
1K
1024px
1K
1K
Quick previews
2k (default)
2K
2048px
2K
2K
Covers, illustrations, infographics
Google/OpenRouter imageSize: Can be overridden with --imageSize 1K|2K|4K
Aspect Ratios
Supported: 1:1, 16:9, 9:16, 4:3, 3:4, 2.35:1
Google multimodal: uses imageConfig.aspectRatio
OpenAI: maps to closest supported size
OpenRouter: sends imageGenerationOptions.aspect_ratio; if only --size <WxH> is given, aspect ratio is inferred automatically
Replicate: passes aspect_ratio to model; when --ref is provided without --ar, defaults to match_input_image
MiniMax: sends official aspect_ratio values directly; if --size <WxH> is given without --ar, width / height are sent for image-01
Generation Mode
Default: Sequential generation.
Batch Parallel Generation: When --batchfile contains 2 or more pending tasks, the script automatically enables parallel generation.
Mode
When to Use
Sequential (default)
Normal usage, single images, small batches
Parallel batch
Batch mode with 2+ tasks
Execution choice:
Situation
Preferred approach
Why
One image, or 1-2 simple images
Sequential
Lower coordination overhead and easier debugging
Multiple images already have saved prompt files
Batch (--batchfile)
Reuses finalized prompts, applies shared throttling/retries, and gives predictable throughput
Each image still needs separate reasoning, prompt writing, or style exploration
Subagents
The work is still exploratory, so each image may need independent analysis before generation
Output comes from baoyu-article-illustrator with outline.md + prompts/
Batch (build-batch.ts -> )
Rule of thumb:
Prefer batch over subagents once prompt files are already saved and the task is "generate all of these"
Use subagents only when generation is coupled with per-image thinking, rewriting, or divergent creative exploration
Parallel behavior:
Default worker count is automatic, capped by config, built-in default 10
Provider-specific throttling is applied only in batch mode, and the built-in defaults are tuned for faster throughput while still avoiding obvious RPM bursts
You can override worker count with --jobs <count>
Each image retries automatically up to 3 attempts
Final output includes success count, failure count, and per-image failure reasons
Error Handling
Missing API key → error with setup instructions
Generation failure → auto-retry up to 3 attempts per image
Invalid aspect ratio → warning, proceed with default
Reference images with unsupported provider/model → error with fix hint
Extension Support
Custom configurations via EXTEND.md. See Preferences section for paths and supported options.
/baoyu-skills/baoyu-image-gen/EXTEND.md"
) {
"xdg"
}
if (Test-Path "$HOME/.baoyu-skills/baoyu-image-gen/EXTEND.md") { "user" }
# Basic
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image cat.png
# With aspect ratio
${BUN_X} {baseDir}/scripts/main.ts --prompt "A landscape" --image out.png --ar 16:9
# High quality
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --quality 2k
Model ID (Google: gemini-3-pro-image-preview; OpenAI: gpt-image-1.5; Azure: deployment name such as gpt-image-1.5 or image-prod; OpenRouter: google/gemini-3.1-flash-image-preview; DashScope: qwen-image-2.0-pro; MiniMax: image-01)
--ar <ratio>
Aspect ratio (e.g., 16:9, 1:1, 4:3)
--size <WxH>
Size (e.g., 1024x1024)
--quality normal|2k
Quality preset (default: 2k)
--imageSize 1K|2K|4K
Image size for Google/OpenRouter (default: from quality)
--ref <files...>
Reference images. Supported by Google multimodal, OpenAI GPT Image edits, Azure OpenAI edits (PNG/JPG only), OpenRouter multimodal models, Replicate, MiniMax subject-reference, and Seedream 5.0/4.5/4.0. Not supported by Jimeng, Seedream 3.0, or removed SeedEdit 3.0
--n <count>
Number of images
--json
JSON output
REPLICATE_API_TOKEN
Replicate API token
JIMENG_ACCESS_KEY_ID
Jimeng (即梦) Volcengine access key
JIMENG_SECRET_ACCESS_KEY
Jimeng (即梦) Volcengine secret key
ARK_API_KEY
Seedream (豆包) Volcengine ARK API key
OPENAI_IMAGE_MODEL
OpenAI model override
AZURE_OPENAI_DEPLOYMENT
Azure default deployment name
AZURE_OPENAI_IMAGE_MODEL
Backward-compatible alias for Azure default deployment/model name
OPENROUTER_IMAGE_MODEL
OpenRouter model override (default: google/gemini-3.1-flash-image-preview)
GOOGLE_IMAGE_MODEL
Google model override
DASHSCOPE_IMAGE_MODEL
DashScope model override (default: qwen-image-2.0-pro)
MINIMAX_IMAGE_MODEL
MiniMax model override (default: image-01)
REPLICATE_IMAGE_MODEL
Replicate model override (default: google/nano-banana-pro)
JIMENG_IMAGE_MODEL
Jimeng model override (default: jimeng_t2i_v40)
SEEDREAM_IMAGE_MODEL
Seedream model override (default: doubao-seedream-5-0-260128)
OPENAI_BASE_URL
Custom OpenAI endpoint
AZURE_OPENAI_BASE_URL
Azure resource endpoint or deployment endpoint
AZURE_API_VERSION
Azure image API version (default: 2025-04-01-preview)