Optimize Exa API performance with search type selection, caching, and parallelization. Use when experiencing slow responses, implementing caching strategies, or optimizing request throughput for Exa integrations. Trigger with phrases like "exa performance", "optimize exa", "exa latency", "exa caching", "exa slow", "exa fast".
Use the skills CLI to install this skill with one command. Auto-detects all installed AI assistants.
Method 1 - skills CLI
npx skills i jeremylongshore/claude-code-plugins-plus-skills/plugins/saas-packs/exa-pack/skills/exa-performance-tuningMethod 2 - openskills (supports sync & update)
npx openskills install jeremylongshore/claude-code-plugins-plus-skillsAuto-detects Claude Code, Cursor, Codex CLI, Gemini CLI, and more. One install, works everywhere.
Installation Path
Download and extract to one of the following locations:
No setup needed. Let our cloud agents run this skill for you.
Select Provider
Select Model
Best for coding tasks
No setup required
Optimize Exa search API response times for production workloads. Key levers: search type selection (instant < fast < auto < neural < deep), result count reduction, content scope control, result caching, and parallel query execution.
| Type | Typical Latency | Use Case |
|---|---|---|
instant | < 150ms | Real-time autocomplete, typeahead |
fast | p50 < 425ms | Speed-critical user-facing search |
auto | 300-1500ms | General purpose (default) |
neural | 500-2000ms | Best semantic quality |
deep | 2-5s | Maximum coverage, light deep search |
deep-reasoning | 5-15s | Complex research questions |
import Exa from "exa-js";
const exa = new Exa(process.env.EXA_API_KEY);
function selectSearchType(latencyBudgetMs: number) {
if (latencyBudgetMs < 200) return "instant";
if (latencyBudgetMs
// Each content option adds latency. Only request what you need.
// Fastest: metadata only (no content retrieval)
const metadataOnly = await exa.search("query", { numResults: 5 });
// Medium: highlights only (much smaller than full text)
const highlightsOnly = await exa.searchAndContents("query", {
numResults: 5,
import { LRUCache } from "lru-cache";
const searchCache = new LRUCache<string, any>({
max: 5000,
ttl: 2 * 3600 * 1000, // 2-hour TTL
});
async function cachedSearch(query
// Run independent queries concurrently instead of sequentially
async function parallelSearch(queries: string[]) {
const searches = queries.map(q =>
cachedSearch(q, { type: "auto", numResults: 3 })
);
return Promise.all(searches);
// 3 parallel searches: ~600ms total (limited by slowest)
// 3 sequential searches: ~1800ms total
// Phase 1: Fast search for URLs only
// Phase 2: Selective content retrieval for top results only
async function twoPhaseSearch(query: string) {
// Phase 1: metadata only (fast)
const results = await exa.search(query, { type: "auto", numResults: 10 });
// Phase 2: get content only for top 3 results
const topUrls = results.results.
function normalizeQuery(query: string): string {
return query
.toLowerCase()
.trim()
.replace(/\s+/g, " ") // collapse whitespace
.replace(/
| Strategy | Latency Savings | Implementation |
|---|---|---|
instant type | 5-10x faster than neural | One-line change |
| Reduce numResults (10 -> 3) | ~200-500ms saved | One-line change |
| Highlights instead of text | ~100-300ms saved | Replace text with highlights |
| LRU cache | 100% for cache hits | ~20 lines |
| Parallel queries | 2-3x throughput | Promise.all wrapper |
| Issue | Cause | Solution |
|---|---|---|
| Search taking 3s+ | Neural search on complex query | Switch to fast or auto type |
| Timeout on content | Large pages, slow sources | Set maxCharacters limit |
| Cache miss rate high | Unique queries each time | Normalize queries before caching |
| Rate limit (429) | Too many concurrent searches | Add request queue with concurrency limit |
For cost optimization, see exa-cost-tuning. For reliability, see exa-reliability-patterns.
| Two-phase search | ~30-50% for large result sets | ~15 lines |