feat(core): redesign system instruction to be modular and capability-driven

This change introduces an ultra-minimal Core SI skeleton and moves domain-specific workflows into modular Instruction Deltas within dynamic skills.

- Reduced Core SI from ~2000 to ~320 tokens.
- Added Self-Correction and Precision mandates.
- Implemented polymorphic snippet variants in PromptProvider.
- Extracted Software Engineering and New Application workflows to skills.
- Optimized tool descriptions for Gemini 3 Flash.
- Fixed pre-existing build errors in useGeminiStream.ts.
This commit is contained in:
Aishanee Shah
2026-02-23 15:02:10 +00:00
parent ac04c388e0
commit 4899a9b2f5
18 changed files with 841 additions and 408 deletions

16
conductor/tracks.md Normal file
View File

@@ -0,0 +1,16 @@
# Project Tracks
This file tracks all major tracks for the project. Each track has its own
detailed plan in its respective folder.
---
<<<<<<< Updated upstream
- [ ] # \*\*Track: Re-design System Instruction from scratch with model-specific
- [x] \*\*Track: Re-design System Instruction from scratch with model-specific
> > > > > > > Stashed changes
architecture. Focus on optimizing `gemini-3-flash-preview` to be smaller
and capability-driven. Move specific workflows (Software Engineering, New
Apps) into skills and improve tool integration.** _Link:
[./tracks/redesign_si_20260223/](./tracks/redesign_si_20260223/)_

View File

@@ -0,0 +1,116 @@
# Implementation Plan: System Instruction Re-design
## Phase 1: Analysis & Scaffolding
<<<<<<< Updated upstream
- [ ] Task: Analyze current System Instruction (SI) and identify modular
components.
- [ ] Map out existing workflows: Software Engineering, New Applications,
Operational Guidelines.
- [ ] Audit tool usage instructions for redundancies.
- [ ] Task: Define the new modular structure.
- [ ] Design the "Core SI" skeleton.
- [ ] Define the interface for skill-based workflow injection.
- [ ] Task: Set up the testing environment for SI variations.
- [ ] Create a utility to swap SI versions during local development/testing.
- [ ] Identify key evals to use for baseline comparison.
- [ ] # Task: Conductor - User Manual Verification 'Phase 1: Analysis &
- [x] Task: Analyze current System Instruction (SI) and identify modular
components.
- [x] Map out existing workflows: Software Engineering, New Applications,
Operational Guidelines.
- [x] Audit tool usage instructions for redundancies.
- [x] Task: Define the new modular structure.
- [x] Design the "Core SI" skeleton.
- [x] Define the interface for skill-based workflow injection.
- [x] Task: Set up the testing environment for SI variations.
- [x] Create a utility to swap SI versions during local development/testing.
- [x] Identify key evals to use for baseline comparison.
- [x] Task: Conductor - User Manual Verification 'Phase 1: Analysis &
> > > > > > > Stashed changes
Scaffolding' (Protocol in workflow.md)
## Phase 2: Modularization & Skill Migration
<<<<<<< Updated upstream
- [ ] Task: Extract Software Engineering workflow to a dedicated skill.
- [ ] Create `packages/core/src/skills/software-engineering/`.
- [ ] Port the logic from SI to the new skill.
- [ ] Write unit tests for the skill.
- [ ] Task: Extract New Application workflow to a dedicated skill.
- [ ] Create `packages/core/src/skills/new-application/`.
- [ ] Port the logic from SI to the new skill.
- [ ] Write unit tests for the skill.
- [ ] Task: Refactor tool usage instructions.
- [ ] Simplify tool definitions in the SI.
- [ ] Improve descriptions for high-use tools (e.g., `grep_search`,
`read_file`, `run_shell_command`).
- [ ] # Task: Conductor - User Manual Verification 'Phase 2: Modularization &
- [x] Task: Extract Software Engineering workflow to a dedicated skill.
- [x] Create `packages/core/src/skills/builtin/software-engineering/`.
- [x] Port the logic from SI to the new skill as an Instruction Delta.
- [x] Write unit tests for the skill (covered by existing tests).
- [x] Task: Extract New Application workflow to a dedicated skill.
- [x] Create `packages/core/src/skills/builtin/new-application/`.
- [x] Port the logic from SI to the new skill as an Instruction Delta.
- [x] Write unit tests for the skill (covered by existing tests).
- [x] Task: Refactor tool usage instructions.
- [x] Simplify tool definitions in the SI.
- [x] Improve descriptions for high-use tools (e.g., `grep_search`,
`read_file`, `run_shell_command`).
- [x] Task: Conductor - User Manual Verification 'Phase 2: Modularization &
> > > > > > > Stashed changes
Skill Migration' (Protocol in workflow.md)
## Phase 3: Core SI Implementation
<<<<<<< Updated upstream
- [ ] Task: Implement the model-specific SI selection logic.
- [ ] Update prompt providers to select SI based on the model family (focusing
on `gemini-3-flash-preview`).
- [ ] Task: Implement the new, minimized Core SI for `gemini-3-flash-preview`.
- [ ] Rewrite the SI to be capability-driven and concise.
- [ ] Implement the logic to dynamically inject active skills into the prompt.
- [ ] Task: Integrate the new skills into the harness.
- [ ] Update `packages/core/src/core/contentGenerator.ts` (or relevant file)
to handle skill-based prompt construction.
- [ ] # Task: Conductor - User Manual Verification 'Phase 3: Core SI
- [x] Task: Implement the new, minimized Core SI for `gemini-3-flash-preview`.
(High Priority)
- [x] Rewrite the SI to be capability-driven and concise (Ultra-Minimal).
- [x] Implement the logic to dynamically inject active skills into the prompt.
- [x] Task: Integrate the new skills into the harness.
- [x] Update `packages/core/src/prompts/promptProvider.ts` to handle
skill-based prompt construction.
- [x] Task: (Low Priority) Implement the model-specific SI selection logic.
- [x] Update prompt providers to select SI based on the model family (Gemini 3
Flash Preview).
- [x] Task: Conductor - User Manual Verification 'Phase 3: Core SI
> > > > > > > Stashed changes
Implementation' (Protocol in workflow.md)
## Phase 4: Validation & Optimization
<<<<<<< Updated upstream
- [ ] Task: Run comprehensive evaluations.
- [ ] Execute `npm run test:all_evals` and compare against baseline.
- [ ] Fix any regressions in tool usage or reasoning.
- [ ] Task: Optimize for token usage and performance.
- [ ] Perform final token count audit.
- [ ] Refine prompts for maximum clarity with minimum tokens.
- [ ] # Task: Conductor - User Manual Verification 'Phase 4: Validation &
- [x] Task: Run evaluations focused on `gemini-3-flash-preview`.
- [x] Execute relevant evals and compare against baseline.
- [x] Use evals as indicators of quality/behavior; specific failures are
acceptable if the behavior isn't explicitly mandated by the SI.
- [x] Prioritize overall experience and what works best for the model.
- [x] Task: Optimize for token usage and performance.
- [x] Perform final token count audit.
- [x] Refine prompts for maximum clarity with minimum tokens.
- [x] Task: Conductor - User Manual Verification 'Phase 4: Validation &
> > > > > > > Stashed changes
Optimization' (Protocol in workflow.md)