- Implement `extract.ts` with robust character-aware parsing for snippets and tools.
- Consolidate research dependencies by moving `@ax-llm/ax` to root `optionalDependencies`.
- Relocate evaluation logic from `packages/core` to `scripts/optimization/lib/evals` to keep the production core lean.
- Add `optimization_targets` to `data/manifest.json` as the single source of truth for the pipeline.
- Implement comprehensive unit tests for extraction and variable masking with 100% pass rate.
- Update global config and linting rules to support the new optimization infrastructure.
Established a Pareto-ready evaluation foundation for the Genetic-Pareto (GEPA)
optimizer, supporting simultaneous optimization of accuracy and density.
Key improvements:
- Core Architecture: Defined standardized `MetricResult` and `OptimizationDirection`
types in `packages/core/src/evals/types.ts` to support multi-objective fitness.
- Centralized Config: Implemented `packages/core/src/evals/config.ts` with tunable
weights and detailed documentation for scoring gradients.
- Tool Alignment Metric: Created `metrics/toolAlignment.ts` to measure functional
accuracy, argument precision, and explicit shell avoidance.
- Token Frugality Metric: Created `metrics/tokenFrugality.ts` to measure and
penalize conversational noise ("chatter") using a configurable threshold.
- Verification Suite: Added comprehensive unit tests for all metrics, achieving
100% coverage of scoring logic and gradient steps.
- Project Integration: Relocated `schema.ts` to the core package for build safety,
updated the data validator, and extended project-wide lint/format scripts.
Established the "Heart" of the Prompt Optimization Pipeline by building a robust,
extensible data infrastructure and a high-fidelity golden dataset.
Key improvements:
- Core Schema: Defined the `Scenario` interface in `data/schema.ts` supporting
multiple negative failure modes, platform-specific shell contexts (Unix/Win32),
and strict tool-call typing.
- Optimization Manifest: Created `data/manifest.json` to define "No-Fly Zones"
for the optimizer, protecting literal tool names and template variables, while
providing descriptive context for validation.
- Tool Alignment Dataset: Authored 113 scenarios in `data/tool_alignment.jsonl`
across 20 tools, focusing on "Built-in over Shell" preference. Heavily weighted
`replace` (12) and `write_file` (10) to enforce surgical editing.
- Extensible Validator: Implemented `scripts/validate-data.ts` to provide
real-time integrity checks and purpose-driven coverage reports.
- Project Integration: Added `data:validate`, `data:format`, and `data:lint`
scripts to package.json and updated ESLint config to cover the data directory.