Best Practices - context-window

Document Preparation

Organize Your Files

Structure documents logically for better retrieval:

./documentation/
├── getting-started/
│   ├── installation.md
│   └── quickstart.md
├── guides/
│   ├── configuration.md
│   └── deployment.md
└── api/
    ├── authentication.md
    └── endpoints.md

Benefits:

Easier to maintain
Better source citations
Logical grouping improves context

Use Descriptive Filenames

// Good
data: [
  "./user-authentication-guide.md",
  "./api-rate-limiting.md",
  "./troubleshooting-common-errors.md"
]

// Avoid
data: [
  "./doc1.md",
  "./file2.md",
  "./temp.md"
]

Descriptive names appear in source citations and help users verify information.

Clean Your Documents

Remove unnecessary content:

Page numbers
Headers/footers
Navigation elements
Duplicate content

Keep content focused:

One topic per document
Clear sections
Consistent formatting

##Chunk Size Selection

Choosing the Right Size

The chunk size directly affects answer quality:

Small Chunks (500-800)
Medium Chunks (1000-1500)
Large Chunks (1500-2000)

Best for:

FAQ documents
Glossaries
Quick facts
Definition lookups

Example:

chunk: { size: 600, overlap: 100 }

Pros:

Precise answers
Less noise
Good for specific questions

Cons:

May miss broader context
More chunks = more vectors = higher cost

Overlap Guidelines

Set overlap to 10-20% of chunk size:

// Good ratios
chunk: { size: 1000, overlap: 150 }  // 15%
chunk: { size: 1500, overlap: 250 }  // 16%
chunk: { size: 2000, overlap: 300 }  // 15%

// Too little overlap (may lose context)
chunk: { size: 1000, overlap: 50 }   // 5%

// Too much overlap (wasteful)
chunk: { size: 1000, overlap: 500 }  // 50%

Retrieval Optimization

topK Configuration

Choose based on your needs:

// Precise, focused answers
limits: { topK: 3 }

// Balanced (default)
limits: { topK: 8 }

// Comprehensive coverage
limits: { topK: 15 }

Guidelines:

Start with 8, adjust based on results
Increase if answers seem incomplete
Decrease for faster responses and lower costs

Score Threshold

Filter low-quality matches:

// No filtering (default) - include all matches
limits: { scoreThreshold: 0 }

// Moderate confidence
limits: { scoreThreshold: 0.6 }

// High confidence only
limits: { scoreThreshold: 0.75 }

// Very strict (may miss relevant content)
limits: { scoreThreshold: 0.9 }

When to use:

0: General knowledge bases, comprehensive coverage
0.6-0.7: Most applications, balanced approach
0.75-0.85: Legal, medical, compliance - high accuracy required
0.9+: Only when extreme precision is critical

Context Size

Balance between context and cost:

// Minimal context (faster, cheaper)
limits: { maxContextChars: 5000 }

// Balanced (default)
limits: { maxContextChars: 8000 }

// Rich context (slower, more expensive)
limits: { maxContextChars: 12000 }

Impact:

More context = better answers but higher costs
Less context = faster responses but may miss information

Model Selection

Choose the Right Model

gpt-4o-mini

Best for:

High-volume applications
Simple Q&A
Cost-sensitive projects
Fast responses needed

Cost: ~$0.15/1M input tokens

ai: {
  provider: "openai",
  model: "gpt-4o-mini"
}

gpt-4o

Best for:

Complex reasoning
Legal/medical applications
High-accuracy requirements
Nuanced questions

Cost: ~$5.00/1M input tokens

ai: {
  provider: "openai",
  model: "gpt-4o"
}

Cost vs. Quality Trade-offs

// Cost-optimized configuration
await createCtxWindow({
  namespace: "budget-docs",
  data: ["./docs"],
  chunk: { size: 2000, overlap: 100 },     // Fewer chunks
  limits: {
    topK: 5,                                 // Fewer retrievals
    maxContextChars: 5000,                   // Less context
    scoreThreshold: 0.6                      // Filter low matches
  },
  ai: { provider: "openai", model: "gpt-4o-mini" }
});

// Quality-optimized configuration
await createCtxWindow({
  namespace: "premium-docs",
  data: ["./docs"],
  chunk: { size: 1500, overlap: 250 },     // Balanced chunks
  limits: {
    topK: 12,                                // More retrievals
    maxContextChars: 12000,                  // Rich context
    scoreThreshold: 0                        // No filtering
  },
  ai: { provider: "openai", model: "gpt-4o" }
});

Performance Optimization

Initialize Early

Create context windows during application startup, not on-demand:

// Good: Initialize once at startup
async function startup() {
  await createCtxWindow({
    namespace: "docs",
    data: ["./documentation"],
    ai: { provider: "openai" },
    vectorStore: { provider: "pinecone" }
  });

  await startServer();
}

// Bad: Creating on every request
app.get("/ask", async (req, res) => {
  // This re-ingests documents every time!
  await createCtxWindow({ /* ... */ });
  // ...
});

Use Registry Pattern

For applications with multiple context windows:

// Good: Create once, use many times
await createCtxWindow({
  namespace: "user-docs",
  data: ["./docs/users"],
  ai: { provider: "openai" },
  vectorStore: { provider: "pinecone" }
});

// Use anywhere
function handleUserQuestion(q: string) {
  const cw = getCtxWindow("user-docs");
  return cw.ask(q);
}

// Bad: Passing instances around
await createCtxWindow({ /* ... */ });
handleQuestion(cw, q);  // Coupling, harder to maintain

Implement Caching

Cache frequently asked questions:

const cache = new Map<string, AskResult>();
const CACHE_TTL = 1000 * 60 * 60; // 1 hour

async function cachedAsk(cw: ContextWindow, question: string) {
  const cached = cache.get(question);

  if (cached && Date.now() - cached.timestamp < CACHE_TTL) {
    return cached.result;
  }

  const result = await cw.ask(question);
  cache.set(question, { result, timestamp: Date.now() });

  return result;
}

Batch Similar Operations

Process multiple questions in parallel:

// Good: Parallel processing
const results = await Promise.all([
  cw.ask("Question 1"),
  cw.ask("Question 2"),
  cw.ask("Question 3")
]);

// Bad: Sequential processing
const result1 = await cw.ask("Question 1");
const result2 = await cw.ask("Question 2");
const result3 = await cw.ask("Question 3");

Error Handling

Validate Input

function validateQuestion(question: string) {
  if (!question || question.trim().length === 0) {
    throw new Error("Question cannot be empty");
  }

  if (question.length > 500) {
    throw new Error("Question too long (max 500 characters)");
  }

  return question.trim();
}

async function safeAsk(cw: ContextWindow, question: string) {
  try {
    const validated = validateQuestion(question);
    return await cw.ask(validated);
  } catch (error) {
    console.error("Validation error:", error);
    throw error;
  }
}

Handle API Failures

async function resilientAsk(
  cw: ContextWindow,
  question: string,
  maxRetries = 3
) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await cw.ask(question);
    } catch (error) {
      if (attempt === maxRetries - 1) throw error;

      const delay = Math.pow(2, attempt) * 1000;
      console.log(`Retry ${attempt + 1} after ${delay}ms`);
      await new Promise(resolve => setTimeout(resolve, delay));
    }
  }
}

Provide Fallbacks

async function askWithFallback(cw: ContextWindow, question: string) {
  try {
    const result = await cw.ask(question);

    if (result.text.includes("I don't know")) {
      return {
        ...result,
        text: "I couldn't find an answer in the documentation. Would you like to contact support?"
      };
    }

    return result;
  } catch (error) {
    return {
      text: "I'm experiencing technical difficulties. Please try again later.",
      sources: []
    };
  }
}

Security Best Practices

Protect API Keys

// Good: Environment variables
const apiKey = process.env.OPENAI_API_KEY;

// Bad: Hardcoded
const apiKey = "sk-...";  // Never do this!

// Good: Validation
if (!process.env.OPENAI_API_KEY) {
  throw new Error("OPENAI_API_KEY not set");
}

Input Sanitization

function sanitizeQuestion(question: string): string {
  // Remove potential injection attempts
  return question
    .replace(/<script>/gi, "")
    .replace(/javascript:/gi, "")
    .trim()
    .slice(0, 500); // Max length
}

Rate Limiting

import rateLimit from "express-rate-limit";

const limiter = rateLimit({
  windowMs: 15 * 60 * 1000, // 15 minutes
  max: 100 // limit each IP to 100 requests per windowMs
});

app.use("/api/ask", limiter);

Monitoring & Logging

Track Performance

async function monitoredAsk(cw: ContextWindow, question: string) {
  const startTime = Date.now();

  try {
    const result = await cw.ask(question);
    const duration = Date.now() - startTime;

    console.log({
      type: "success",
      question,
      duration,
      sourceCount: result.sources.length
    });

    return result;
  } catch (error) {
    const duration = Date.now() - startTime;

    console.error({
      type: "error",
      question,
      duration,
      error: error instanceof Error ? error.message : "Unknown"
    });

    throw error;
  }
}

Log Important Events

// Initialization
console.log("Creating context window:", namespace);
await createCtxWindow({ /* ... */ });
console.log("Context window ready:", namespace);

// Queries
console.log("Question received:", question);
const result = await cw.ask(question);
console.log("Answer generated:", {
  sourceCount: result.sources.length,
  hasAnswer: !result.text.includes("I don't know")
});

// Errors
console.error("Failed to answer question:", {
  question,
  error: error.message,
  stack: error.stack
});

Testing Strategies

Unit Tests

describe("Context Window", () => {
  let cw: ContextWindow;

  beforeAll(async () => {
    cw = await createCtxWindow({
      namespace: "test",
      data: ["./test-fixtures"],
      ai: { provider: "openai" },
      vectorStore: { provider: "pinecone" }
    });
  });

  it("should answer known questions", async () => {
    const result = await cw.ask("What is the test topic?");
    expect(result.text).not.toContain("I don't know");
    expect(result.sources.length).toBeGreaterThan(0);
  });

  it("should handle unknown questions", async () => {
    const result = await cw.ask("Completely unrelated question");
    expect(result.text).toContain("I don't know");
  });
});

Integration Tests

describe("API Integration", () => {
  it("should process questions end-to-end", async () => {
    const response = await request(app)
      .post("/api/ask")
      .send({ question: "How do I get started?" })
      .expect(200);

    expect(response.body).toHaveProperty("answer");
    expect(response.body).toHaveProperty("sources");
  });
});

Production Checklist

Before deploying to production:

Configuration

Detailed configuration options

Examples

Complete code examples

Troubleshooting

Solve common issues

Use Cases

Real-world applications

Getting Started

Guides

​Document Preparation

​Organize Your Files

​Use Descriptive Filenames

​Clean Your Documents

​Choosing the Right Size

​Overlap Guidelines

​Retrieval Optimization

​topK Configuration

​Score Threshold

​Context Size

​Model Selection

​Choose the Right Model

gpt-4o-mini

gpt-4o

​Cost vs. Quality Trade-offs

​Performance Optimization

​Initialize Early

​Use Registry Pattern

​Implement Caching

​Batch Similar Operations

​Error Handling

​Validate Input

​Handle API Failures

​Provide Fallbacks

​Security Best Practices

​Protect API Keys

​Input Sanitization

​Rate Limiting

​Monitoring & Logging

​Track Performance

​Log Important Events

​Testing Strategies

​Unit Tests

​Integration Tests

​Production Checklist

​Related

Configuration

Examples

Troubleshooting

Use Cases

Document Preparation

Organize Your Files

Use Descriptive Filenames

Clean Your Documents

Choosing the Right Size

Overlap Guidelines

Retrieval Optimization

topK Configuration

Score Threshold

Context Size

Model Selection

Choose the Right Model

Cost vs. Quality Trade-offs

Performance Optimization

Initialize Early

Use Registry Pattern

Implement Caching

Batch Similar Operations

Error Handling

Validate Input

Handle API Failures

Provide Fallbacks

Security Best Practices

Protect API Keys

Input Sanitization

Rate Limiting

Monitoring & Logging

Track Performance

Log Important Events

Testing Strategies

Unit Tests

Integration Tests

Production Checklist

Related