Commit Graph

2 Commits

Author SHA1 Message Date
Abhijit Balaji 6c94c4d9ca feat(prompt-optimization): implement multi-objective evaluation metrics
Established a Pareto-ready evaluation foundation for the Genetic-Pareto (GEPA)
optimizer, supporting simultaneous optimization of accuracy and density.

Key improvements:
- Core Architecture: Defined standardized `MetricResult` and `OptimizationDirection`
  types in `packages/core/src/evals/types.ts` to support multi-objective fitness.
- Centralized Config: Implemented `packages/core/src/evals/config.ts` with tunable
  weights and detailed documentation for scoring gradients.
- Tool Alignment Metric: Created `metrics/toolAlignment.ts` to measure functional
  accuracy, argument precision, and explicit shell avoidance.
- Token Frugality Metric: Created `metrics/tokenFrugality.ts` to measure and
  penalize conversational noise ("chatter") using a configurable threshold.
- Verification Suite: Added comprehensive unit tests for all metrics, achieving
  100% coverage of scoring logic and gradient steps.
- Project Integration: Relocated `schema.ts` to the core package for build safety,
  updated the data validator, and extended project-wide lint/format scripts.
2026-03-04 10:08:14 -08:00
Abhijit Balaji c0b463dbcf feat(prompt-optimization): implement Data Layer MVP and Tool Alignment dataset
Established the "Heart" of the Prompt Optimization Pipeline by building a robust,
extensible data infrastructure and a high-fidelity golden dataset.

Key improvements:
- Core Schema: Defined the `Scenario` interface in `data/schema.ts` supporting
  multiple negative failure modes, platform-specific shell contexts (Unix/Win32),
  and strict tool-call typing.
- Optimization Manifest: Created `data/manifest.json` to define "No-Fly Zones"
  for the optimizer, protecting literal tool names and template variables, while
  providing descriptive context for validation.
- Tool Alignment Dataset: Authored 113 scenarios in `data/tool_alignment.jsonl`
  across 20 tools, focusing on "Built-in over Shell" preference. Heavily weighted
  `replace` (12) and `write_file` (10) to enforce surgical editing.
- Extensible Validator: Implemented `scripts/validate-data.ts` to provide
  real-time integrity checks and purpose-driven coverage reports.
- Project Integration: Added `data:validate`, `data:format`, and `data:lint`
  scripts to package.json and updated ESLint config to cover the data directory.
2026-03-04 10:08:13 -08:00