Skip to main content

Document Preparation

Organize Your Files

Structure documents logically for better retrieval:
./documentation/
├── getting-started/
│   ├── installation.md
│   └── quickstart.md
├── guides/
│   ├── configuration.md
│   └── deployment.md
└── api/
    ├── authentication.md
    └── endpoints.md
Benefits:
  • Easier to maintain
  • Better source citations
  • Logical grouping improves context

Use Descriptive Filenames

// Good
data: [
  "./user-authentication-guide.md",
  "./api-rate-limiting.md",
  "./troubleshooting-common-errors.md"
]

// Avoid
data: [
  "./doc1.md",
  "./file2.md",
  "./temp.md"
]
Descriptive names appear in source citations and help users verify information.

Clean Your Documents

Remove unnecessary content:
  • Page numbers
  • Headers/footers
  • Navigation elements
  • Duplicate content
Keep content focused:
  • One topic per document
  • Clear sections
  • Consistent formatting
##Chunk Size Selection

Choosing the Right Size

The chunk size directly affects answer quality:
  • Small Chunks (500-800)
  • Medium Chunks (1000-1500)
  • Large Chunks (1500-2000)
Best for:
  • FAQ documents
  • Glossaries
  • Quick facts
  • Definition lookups
Example:
chunk: { size: 600, overlap: 100 }
Pros:
  • Precise answers
  • Less noise
  • Good for specific questions
Cons:
  • May miss broader context
  • More chunks = more vectors = higher cost

Overlap Guidelines

Set overlap to 10-20% of chunk size:
// Good ratios
chunk: { size: 1000, overlap: 150 }  // 15%
chunk: { size: 1500, overlap: 250 }  // 16%
chunk: { size: 2000, overlap: 300 }  // 15%

// Too little overlap (may lose context)
chunk: { size: 1000, overlap: 50 }   // 5%

// Too much overlap (wasteful)
chunk: { size: 1000, overlap: 500 }  // 50%

Retrieval Optimization

topK Configuration

Choose based on your needs:
// Precise, focused answers
limits: { topK: 3 }

// Balanced (default)
limits: { topK: 8 }

// Comprehensive coverage
limits: { topK: 15 }
Guidelines:
  • Start with 8, adjust based on results
  • Increase if answers seem incomplete
  • Decrease for faster responses and lower costs

Score Threshold

Filter low-quality matches:
// No filtering (default) - include all matches
limits: { scoreThreshold: 0 }

// Moderate confidence
limits: { scoreThreshold: 0.6 }

// High confidence only
limits: { scoreThreshold: 0.75 }

// Very strict (may miss relevant content)
limits: { scoreThreshold: 0.9 }
When to use:
  • 0: General knowledge bases, comprehensive coverage
  • 0.6-0.7: Most applications, balanced approach
  • 0.75-0.85: Legal, medical, compliance - high accuracy required
  • 0.9+: Only when extreme precision is critical

Context Size

Balance between context and cost:
// Minimal context (faster, cheaper)
limits: { maxContextChars: 5000 }

// Balanced (default)
limits: { maxContextChars: 8000 }

// Rich context (slower, more expensive)
limits: { maxContextChars: 12000 }
Impact:
  • More context = better answers but higher costs
  • Less context = faster responses but may miss information

Model Selection

Choose the Right Model

gpt-4o-mini

Best for:
  • High-volume applications
  • Simple Q&A
  • Cost-sensitive projects
  • Fast responses needed
Cost: ~$0.15/1M input tokens
ai: {
  provider: "openai",
  model: "gpt-4o-mini"
}

gpt-4o

Best for:
  • Complex reasoning
  • Legal/medical applications
  • High-accuracy requirements
  • Nuanced questions
Cost: ~$5.00/1M input tokens
ai: {
  provider: "openai",
  model: "gpt-4o"
}

Cost vs. Quality Trade-offs

// Cost-optimized configuration
await createCtxWindow({
  namespace: "budget-docs",
  data: ["./docs"],
  chunk: { size: 2000, overlap: 100 },     // Fewer chunks
  limits: {
    topK: 5,                                 // Fewer retrievals
    maxContextChars: 5000,                   // Less context
    scoreThreshold: 0.6                      // Filter low matches
  },
  ai: { provider: "openai", model: "gpt-4o-mini" }
});

// Quality-optimized configuration
await createCtxWindow({
  namespace: "premium-docs",
  data: ["./docs"],
  chunk: { size: 1500, overlap: 250 },     // Balanced chunks
  limits: {
    topK: 12,                                // More retrievals
    maxContextChars: 12000,                  // Rich context
    scoreThreshold: 0                        // No filtering
  },
  ai: { provider: "openai", model: "gpt-4o" }
});

Performance Optimization

Initialize Early

Create context windows during application startup, not on-demand:
// Good: Initialize once at startup
async function startup() {
  await createCtxWindow({
    namespace: "docs",
    data: ["./documentation"],
    ai: { provider: "openai" },
    vectorStore: { provider: "pinecone" }
  });

  await startServer();
}

// Bad: Creating on every request
app.get("/ask", async (req, res) => {
  // This re-ingests documents every time!
  await createCtxWindow({ /* ... */ });
  // ...
});

Use Registry Pattern

For applications with multiple context windows:
// Good: Create once, use many times
await createCtxWindow({
  namespace: "user-docs",
  data: ["./docs/users"],
  ai: { provider: "openai" },
  vectorStore: { provider: "pinecone" }
});

// Use anywhere
function handleUserQuestion(q: string) {
  const cw = getCtxWindow("user-docs");
  return cw.ask(q);
}

// Bad: Passing instances around
await createCtxWindow({ /* ... */ });
handleQuestion(cw, q);  // Coupling, harder to maintain

Implement Caching

Cache frequently asked questions:
const cache = new Map<string, AskResult>();
const CACHE_TTL = 1000 * 60 * 60; // 1 hour

async function cachedAsk(cw: ContextWindow, question: string) {
  const cached = cache.get(question);

  if (cached && Date.now() - cached.timestamp < CACHE_TTL) {
    return cached.result;
  }

  const result = await cw.ask(question);
  cache.set(question, { result, timestamp: Date.now() });

  return result;
}

Batch Similar Operations

Process multiple questions in parallel:
// Good: Parallel processing
const results = await Promise.all([
  cw.ask("Question 1"),
  cw.ask("Question 2"),
  cw.ask("Question 3")
]);

// Bad: Sequential processing
const result1 = await cw.ask("Question 1");
const result2 = await cw.ask("Question 2");
const result3 = await cw.ask("Question 3");

Error Handling

Validate Input

function validateQuestion(question: string) {
  if (!question || question.trim().length === 0) {
    throw new Error("Question cannot be empty");
  }

  if (question.length > 500) {
    throw new Error("Question too long (max 500 characters)");
  }

  return question.trim();
}

async function safeAsk(cw: ContextWindow, question: string) {
  try {
    const validated = validateQuestion(question);
    return await cw.ask(validated);
  } catch (error) {
    console.error("Validation error:", error);
    throw error;
  }
}

Handle API Failures

async function resilientAsk(
  cw: ContextWindow,
  question: string,
  maxRetries = 3
) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await cw.ask(question);
    } catch (error) {
      if (attempt === maxRetries - 1) throw error;

      const delay = Math.pow(2, attempt) * 1000;
      console.log(`Retry ${attempt + 1} after ${delay}ms`);
      await new Promise(resolve => setTimeout(resolve, delay));
    }
  }
}

Provide Fallbacks

async function askWithFallback(cw: ContextWindow, question: string) {
  try {
    const result = await cw.ask(question);

    if (result.text.includes("I don't know")) {
      return {
        ...result,
        text: "I couldn't find an answer in the documentation. Would you like to contact support?"
      };
    }

    return result;
  } catch (error) {
    return {
      text: "I'm experiencing technical difficulties. Please try again later.",
      sources: []
    };
  }
}

Security Best Practices

Protect API Keys

// Good: Environment variables
const apiKey = process.env.OPENAI_API_KEY;

// Bad: Hardcoded
const apiKey = "sk-...";  // Never do this!

// Good: Validation
if (!process.env.OPENAI_API_KEY) {
  throw new Error("OPENAI_API_KEY not set");
}

Input Sanitization

function sanitizeQuestion(question: string): string {
  // Remove potential injection attempts
  return question
    .replace(/<script>/gi, "")
    .replace(/javascript:/gi, "")
    .trim()
    .slice(0, 500); // Max length
}

Rate Limiting

import rateLimit from "express-rate-limit";

const limiter = rateLimit({
  windowMs: 15 * 60 * 1000, // 15 minutes
  max: 100 // limit each IP to 100 requests per windowMs
});

app.use("/api/ask", limiter);

Monitoring & Logging

Track Performance

async function monitoredAsk(cw: ContextWindow, question: string) {
  const startTime = Date.now();

  try {
    const result = await cw.ask(question);
    const duration = Date.now() - startTime;

    console.log({
      type: "success",
      question,
      duration,
      sourceCount: result.sources.length
    });

    return result;
  } catch (error) {
    const duration = Date.now() - startTime;

    console.error({
      type: "error",
      question,
      duration,
      error: error instanceof Error ? error.message : "Unknown"
    });

    throw error;
  }
}

Log Important Events

// Initialization
console.log("Creating context window:", namespace);
await createCtxWindow({ /* ... */ });
console.log("Context window ready:", namespace);

// Queries
console.log("Question received:", question);
const result = await cw.ask(question);
console.log("Answer generated:", {
  sourceCount: result.sources.length,
  hasAnswer: !result.text.includes("I don't know")
});

// Errors
console.error("Failed to answer question:", {
  question,
  error: error.message,
  stack: error.stack
});

Testing Strategies

Unit Tests

describe("Context Window", () => {
  let cw: ContextWindow;

  beforeAll(async () => {
    cw = await createCtxWindow({
      namespace: "test",
      data: ["./test-fixtures"],
      ai: { provider: "openai" },
      vectorStore: { provider: "pinecone" }
    });
  });

  it("should answer known questions", async () => {
    const result = await cw.ask("What is the test topic?");
    expect(result.text).not.toContain("I don't know");
    expect(result.sources.length).toBeGreaterThan(0);
  });

  it("should handle unknown questions", async () => {
    const result = await cw.ask("Completely unrelated question");
    expect(result.text).toContain("I don't know");
  });
});

Integration Tests

describe("API Integration", () => {
  it("should process questions end-to-end", async () => {
    const response = await request(app)
      .post("/api/ask")
      .send({ question: "How do I get started?" })
      .expect(200);

    expect(response.body).toHaveProperty("answer");
    expect(response.body).toHaveProperty("sources");
  });
});

Production Checklist

Before deploying to production:
  • API keys stored securely (environment variables, secrets manager)
  • Error handling implemented for all failure modes
  • Rate limiting configured
  • Input validation in place
  • Logging and monitoring set up
  • Tested with realistic data volumes
  • Costs estimated and budgeted
  • Backup and recovery plan
  • Documentation for maintenance
  • Health check endpoint implemented