- Platform Experiments - Configure the experiment in LangWatch, then trigger it from CI/CD with a single line
- Experiments via SDK - Define the entire experiment in code and run it in CI/CD
| Approach | Best For |
|---|---|
| Platform Experiments | Non-technical team members can modify experiments; configuration lives in LangWatch |
| Experiments via SDK | Version control your experiment config; full flexibility in code |
Option 1: Platform Experiments
Configure your experiment once in the LangWatch Experiments via UI, then trigger it from CI/CD.Setup
-
Create your experiment in the Experiments via UI
- Add your dataset
- Configure targets (prompts, models, or API endpoints)
- Select evaluators
- Run it once to verify it works
-
Get your experiment slug from the URL:
Or click the CI/CD button in the experiment toolbar.
- Run from CI/CD:
- Python
- TypeScript
GitHub Actions Example
Options
Option 2: Experiments via SDK
Define your entire experiment in code. This gives you full control and version control over your experiment configuration.Basic Example
- Python
- TypeScript
GitHub Actions Example
scripts/run_evaluation.py contains your full experiment code.
Comparing Multiple Configurations
SDK experiments shine when comparing different configurations:Results Summary
Both approaches output a CI-friendly summary:print_summary() method:
- Outputs results in a structured format
- Returns exit code 1 if any evaluations failed (unless
exit_on_failure=False) - Provides a link to view detailed results in LangWatch
CI Platform Examples
GitLab CI
- Platform Experiment
- via SDK
CircleCI
- Platform Experiment
- via SDK
Error Handling
- Python
- TypeScript
REST API (Platform Experiments)
For custom integrations, you can use the REST API directly:Start a Run
Poll for Status
Next Steps
Experiments via UI
Create experiments in the platform UI
Experiments via SDK
Full guide to SDK experiments
Evaluators
Browse available evaluators
Datasets
Manage your test datasets