Experiment
Controlled testing and iteration
Experiment
Experiments enable controlled testing, A/B testing, and systematic iteration on AI functions and workflows.
Overview
An Experiment lets you test variations of prompts, models, parameters, or entire workflows against each other. Track metrics, compare results, and gradually roll out winning variants.
import { Experiment } from 'ai-experiments'
const experiment = Experiment('summarization-v2', {
variants: {
control: { model: 'gpt-4', prompt: existingPrompt },
treatment: { model: 'claude-3', prompt: newPrompt },
},
metrics: ['quality', 'latency', 'cost'],
allocation: { control: 50, treatment: 50 },
})
// Run with experiment
const result = await experiment.run(input)Defining Experiments
import { Experiment } from 'ai-experiments'
const experiment = Experiment('checkout-flow', {
// Define variants to test
variants: {
baseline: {
steps: ['cart', 'shipping', 'payment', 'confirm'],
},
streamlined: {
steps: ['cart', 'payment'], // Combined shipping + confirm
},
},
// Metrics to track
metrics: ['conversion', 'time_to_complete', 'abandonment'],
// Traffic allocation (percentages)
allocation: { baseline: 80, streamlined: 20 },
// Experiment duration
duration: '2 weeks',
})A/B Testing AI Functions
import { AI } from 'ai-functions'
import { Experiment } from 'ai-experiments'
// Test different prompt strategies
const experiment = Experiment('email-generation', {
variants: {
concise: AI('Write a brief, professional email', { maxTokens: 200 }),
detailed: AI('Write a comprehensive, friendly email', { maxTokens: 500 }),
},
metrics: ['user_satisfaction', 'response_rate'],
})
// Use in production
export const generateEmail = experiment.wrap(async (context) => {
return await experiment.run(context)
})Tracking Metrics
// Record outcomes
await experiment.record(variantId, {
quality: 0.92,
latency: 450,
userSatisfaction: 4.5,
})
// Get current results
const results = await experiment.results()
// {
// control: { quality: 0.88, latency: 520, n: 1000 },
// treatment: { quality: 0.92, latency: 450, n: 1000 },
// winner: 'treatment',
// confidence: 0.95
// }Gradual Rollout
// Start with small allocation
await experiment.setAllocation({ control: 95, treatment: 5 })
// If metrics look good, increase
await experiment.setAllocation({ control: 50, treatment: 50 })
// Promote winner to 100%
await experiment.promote('treatment')Experiment Lifecycle
experiment.status // 'draft' | 'running' | 'paused' | 'completed'
await experiment.start()
await experiment.pause()
await experiment.resume()
await experiment.complete()
// Archive with results
await experiment.archive()See Also
- Function - AI functions to experiment on
- Workflow - Workflow variations
- Database - Storing experiment data
Was this page helpful?