Restructured project from nested workspace pattern to flat single-repo layout. This eliminates redundant nesting and consolidates all project files under version control. ## Migration Summary **Before:** ``` alex/ (workspace, not versioned) ├── chess-game/ (git repo) │ ├── js/, css/, tests/ │ └── index.html └── docs/ (planning, not versioned) ``` **After:** ``` alex/ (git repo, everything versioned) ├── js/, css/, tests/ ├── index.html ├── docs/ (project documentation) ├── planning/ (historical planning docs) ├── .gitea/ (CI/CD) └── CLAUDE.md (configuration) ``` ## Changes Made ### Structure Consolidation - Moved all chess-game/ contents to root level - Removed redundant chess-game/ subdirectory - Flattened directory structure (eliminated one nesting level) ### Documentation Organization - Moved chess-game/docs/ → docs/ (project documentation) - Moved alex/docs/ → planning/ (historical planning documents) - Added CLAUDE.md (workspace configuration) - Added IMPLEMENTATION_PROMPT.md (original project prompt) ### Version Control Improvements - All project files now under version control - Planning documents preserved in planning/ folder - Merged .gitignore files (workspace + project) - Added .claude/ agent configurations ### File Updates - Updated .gitignore to include both workspace and project excludes - Moved README.md to root level - All import paths remain functional (relative paths unchanged) ## Benefits ✅ **Simpler Structure** - One level of nesting removed ✅ **Complete Versioning** - All documentation now in git ✅ **Standard Layout** - Matches open-source project conventions ✅ **Easier Navigation** - Direct access to all project files ✅ **CI/CD Compatible** - All workflows still functional ## Technical Validation - ✅ Node.js environment verified - ✅ Dependencies installed successfully - ✅ Dev server starts and responds - ✅ All core files present and accessible - ✅ Git repository functional ## Files Preserved **Implementation Files:** - js/ (3,517 lines of code) - css/ (4 stylesheets) - tests/ (87 test cases) - index.html - package.json **CI/CD Pipeline:** - .gitea/workflows/ci.yml - .gitea/workflows/release.yml **Documentation:** - docs/ (12+ documentation files) - planning/ (historical planning materials) - README.md **Configuration:** - jest.config.js, babel.config.cjs, playwright.config.js - .gitignore (merged) - CLAUDE.md 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
546 lines
12 KiB
Markdown
546 lines
12 KiB
Markdown
---
|
|
name: "AgentDB Learning Plugins"
|
|
description: "Create and train AI learning plugins with AgentDB's 9 reinforcement learning algorithms. Includes Decision Transformer, Q-Learning, SARSA, Actor-Critic, and more. Use when building self-learning agents, implementing RL, or optimizing agent behavior through experience."
|
|
---
|
|
|
|
# AgentDB Learning Plugins
|
|
|
|
## What This Skill Does
|
|
|
|
Provides access to 9 reinforcement learning algorithms via AgentDB's plugin system. Create, train, and deploy learning plugins for autonomous agents that improve through experience. Includes offline RL (Decision Transformer), value-based learning (Q-Learning), policy gradients (Actor-Critic), and advanced techniques.
|
|
|
|
**Performance**: Train models 10-100x faster with WASM-accelerated neural inference.
|
|
|
|
## Prerequisites
|
|
|
|
- Node.js 18+
|
|
- AgentDB v1.0.7+ (via agentic-flow)
|
|
- Basic understanding of reinforcement learning (recommended)
|
|
|
|
---
|
|
|
|
## Quick Start with CLI
|
|
|
|
### Create Learning Plugin
|
|
|
|
```bash
|
|
# Interactive wizard
|
|
npx agentdb@latest create-plugin
|
|
|
|
# Use specific template
|
|
npx agentdb@latest create-plugin -t decision-transformer -n my-agent
|
|
|
|
# Preview without creating
|
|
npx agentdb@latest create-plugin -t q-learning --dry-run
|
|
|
|
# Custom output directory
|
|
npx agentdb@latest create-plugin -t actor-critic -o ./plugins
|
|
```
|
|
|
|
### List Available Templates
|
|
|
|
```bash
|
|
# Show all plugin templates
|
|
npx agentdb@latest list-templates
|
|
|
|
# Available templates:
|
|
# - decision-transformer (sequence modeling RL - recommended)
|
|
# - q-learning (value-based learning)
|
|
# - sarsa (on-policy TD learning)
|
|
# - actor-critic (policy gradient with baseline)
|
|
# - curiosity-driven (exploration-based)
|
|
```
|
|
|
|
### Manage Plugins
|
|
|
|
```bash
|
|
# List installed plugins
|
|
npx agentdb@latest list-plugins
|
|
|
|
# Get plugin information
|
|
npx agentdb@latest plugin-info my-agent
|
|
|
|
# Shows: algorithm, configuration, training status
|
|
```
|
|
|
|
---
|
|
|
|
## Quick Start with API
|
|
|
|
```typescript
|
|
import { createAgentDBAdapter } from 'agentic-flow/reasoningbank';
|
|
|
|
// Initialize with learning enabled
|
|
const adapter = await createAgentDBAdapter({
|
|
dbPath: '.agentdb/learning.db',
|
|
enableLearning: true, // Enable learning plugins
|
|
enableReasoning: true,
|
|
cacheSize: 1000,
|
|
});
|
|
|
|
// Store training experience
|
|
await adapter.insertPattern({
|
|
id: '',
|
|
type: 'experience',
|
|
domain: 'game-playing',
|
|
pattern_data: JSON.stringify({
|
|
embedding: await computeEmbedding('state-action-reward'),
|
|
pattern: {
|
|
state: [0.1, 0.2, 0.3],
|
|
action: 2,
|
|
reward: 1.0,
|
|
next_state: [0.15, 0.25, 0.35],
|
|
done: false
|
|
}
|
|
}),
|
|
confidence: 0.9,
|
|
usage_count: 1,
|
|
success_count: 1,
|
|
created_at: Date.now(),
|
|
last_used: Date.now(),
|
|
});
|
|
|
|
// Train learning model
|
|
const metrics = await adapter.train({
|
|
epochs: 50,
|
|
batchSize: 32,
|
|
});
|
|
|
|
console.log('Training Loss:', metrics.loss);
|
|
console.log('Duration:', metrics.duration, 'ms');
|
|
```
|
|
|
|
---
|
|
|
|
## Available Learning Algorithms (9 Total)
|
|
|
|
### 1. Decision Transformer (Recommended)
|
|
|
|
**Type**: Offline Reinforcement Learning
|
|
**Best For**: Learning from logged experiences, imitation learning
|
|
**Strengths**: No online interaction needed, stable training
|
|
|
|
```bash
|
|
npx agentdb@latest create-plugin -t decision-transformer -n dt-agent
|
|
```
|
|
|
|
**Use Cases**:
|
|
- Learn from historical data
|
|
- Imitation learning from expert demonstrations
|
|
- Safe learning without environment interaction
|
|
- Sequence modeling tasks
|
|
|
|
**Configuration**:
|
|
```json
|
|
{
|
|
"algorithm": "decision-transformer",
|
|
"model_size": "base",
|
|
"context_length": 20,
|
|
"embed_dim": 128,
|
|
"n_heads": 8,
|
|
"n_layers": 6
|
|
}
|
|
```
|
|
|
|
### 2. Q-Learning
|
|
|
|
**Type**: Value-Based RL (Off-Policy)
|
|
**Best For**: Discrete action spaces, sample efficiency
|
|
**Strengths**: Proven, simple, works well for small/medium problems
|
|
|
|
```bash
|
|
npx agentdb@latest create-plugin -t q-learning -n q-agent
|
|
```
|
|
|
|
**Use Cases**:
|
|
- Grid worlds, board games
|
|
- Navigation tasks
|
|
- Resource allocation
|
|
- Discrete decision-making
|
|
|
|
**Configuration**:
|
|
```json
|
|
{
|
|
"algorithm": "q-learning",
|
|
"learning_rate": 0.001,
|
|
"gamma": 0.99,
|
|
"epsilon": 0.1,
|
|
"epsilon_decay": 0.995
|
|
}
|
|
```
|
|
|
|
### 3. SARSA
|
|
|
|
**Type**: Value-Based RL (On-Policy)
|
|
**Best For**: Safe exploration, risk-sensitive tasks
|
|
**Strengths**: More conservative than Q-Learning, better for safety
|
|
|
|
```bash
|
|
npx agentdb@latest create-plugin -t sarsa -n sarsa-agent
|
|
```
|
|
|
|
**Use Cases**:
|
|
- Safety-critical applications
|
|
- Risk-sensitive decision-making
|
|
- Online learning with exploration
|
|
|
|
**Configuration**:
|
|
```json
|
|
{
|
|
"algorithm": "sarsa",
|
|
"learning_rate": 0.001,
|
|
"gamma": 0.99,
|
|
"epsilon": 0.1
|
|
}
|
|
```
|
|
|
|
### 4. Actor-Critic
|
|
|
|
**Type**: Policy Gradient with Value Baseline
|
|
**Best For**: Continuous actions, variance reduction
|
|
**Strengths**: Stable, works for continuous/discrete actions
|
|
|
|
```bash
|
|
npx agentdb@latest create-plugin -t actor-critic -n ac-agent
|
|
```
|
|
|
|
**Use Cases**:
|
|
- Continuous control (robotics, simulations)
|
|
- Complex action spaces
|
|
- Multi-agent coordination
|
|
|
|
**Configuration**:
|
|
```json
|
|
{
|
|
"algorithm": "actor-critic",
|
|
"actor_lr": 0.001,
|
|
"critic_lr": 0.002,
|
|
"gamma": 0.99,
|
|
"entropy_coef": 0.01
|
|
}
|
|
```
|
|
|
|
### 5. Active Learning
|
|
|
|
**Type**: Query-Based Learning
|
|
**Best For**: Label-efficient learning, human-in-the-loop
|
|
**Strengths**: Minimizes labeling cost, focuses on uncertain samples
|
|
|
|
**Use Cases**:
|
|
- Human feedback incorporation
|
|
- Label-efficient training
|
|
- Uncertainty sampling
|
|
- Annotation cost reduction
|
|
|
|
### 6. Adversarial Training
|
|
|
|
**Type**: Robustness Enhancement
|
|
**Best For**: Safety, robustness to perturbations
|
|
**Strengths**: Improves model robustness, adversarial defense
|
|
|
|
**Use Cases**:
|
|
- Security applications
|
|
- Robust decision-making
|
|
- Adversarial defense
|
|
- Safety testing
|
|
|
|
### 7. Curriculum Learning
|
|
|
|
**Type**: Progressive Difficulty Training
|
|
**Best For**: Complex tasks, faster convergence
|
|
**Strengths**: Stable learning, faster convergence on hard tasks
|
|
|
|
**Use Cases**:
|
|
- Complex multi-stage tasks
|
|
- Hard exploration problems
|
|
- Skill composition
|
|
- Transfer learning
|
|
|
|
### 8. Federated Learning
|
|
|
|
**Type**: Distributed Learning
|
|
**Best For**: Privacy, distributed data
|
|
**Strengths**: Privacy-preserving, scalable
|
|
|
|
**Use Cases**:
|
|
- Multi-agent systems
|
|
- Privacy-sensitive data
|
|
- Distributed training
|
|
- Collaborative learning
|
|
|
|
### 9. Multi-Task Learning
|
|
|
|
**Type**: Transfer Learning
|
|
**Best For**: Related tasks, knowledge sharing
|
|
**Strengths**: Faster learning on new tasks, better generalization
|
|
|
|
**Use Cases**:
|
|
- Task families
|
|
- Transfer learning
|
|
- Domain adaptation
|
|
- Meta-learning
|
|
|
|
---
|
|
|
|
## Training Workflow
|
|
|
|
### 1. Collect Experiences
|
|
|
|
```typescript
|
|
// Store experiences during agent execution
|
|
for (let i = 0; i < numEpisodes; i++) {
|
|
const episode = runEpisode();
|
|
|
|
for (const step of episode.steps) {
|
|
await adapter.insertPattern({
|
|
id: '',
|
|
type: 'experience',
|
|
domain: 'task-domain',
|
|
pattern_data: JSON.stringify({
|
|
embedding: await computeEmbedding(JSON.stringify(step)),
|
|
pattern: {
|
|
state: step.state,
|
|
action: step.action,
|
|
reward: step.reward,
|
|
next_state: step.next_state,
|
|
done: step.done
|
|
}
|
|
}),
|
|
confidence: step.reward > 0 ? 0.9 : 0.5,
|
|
usage_count: 1,
|
|
success_count: step.reward > 0 ? 1 : 0,
|
|
created_at: Date.now(),
|
|
last_used: Date.now(),
|
|
});
|
|
}
|
|
}
|
|
```
|
|
|
|
### 2. Train Model
|
|
|
|
```typescript
|
|
// Train on collected experiences
|
|
const trainingMetrics = await adapter.train({
|
|
epochs: 100,
|
|
batchSize: 64,
|
|
learningRate: 0.001,
|
|
validationSplit: 0.2,
|
|
});
|
|
|
|
console.log('Training Metrics:', trainingMetrics);
|
|
// {
|
|
// loss: 0.023,
|
|
// valLoss: 0.028,
|
|
// duration: 1523,
|
|
// epochs: 100
|
|
// }
|
|
```
|
|
|
|
### 3. Evaluate Performance
|
|
|
|
```typescript
|
|
// Retrieve similar successful experiences
|
|
const testQuery = await computeEmbedding(JSON.stringify(testState));
|
|
const result = await adapter.retrieveWithReasoning(testQuery, {
|
|
domain: 'task-domain',
|
|
k: 10,
|
|
synthesizeContext: true,
|
|
});
|
|
|
|
// Evaluate action quality
|
|
const suggestedAction = result.memories[0].pattern.action;
|
|
const confidence = result.memories[0].similarity;
|
|
|
|
console.log('Suggested Action:', suggestedAction);
|
|
console.log('Confidence:', confidence);
|
|
```
|
|
|
|
---
|
|
|
|
## Advanced Training Techniques
|
|
|
|
### Experience Replay
|
|
|
|
```typescript
|
|
// Store experiences in buffer
|
|
const replayBuffer = [];
|
|
|
|
// Sample random batch for training
|
|
const batch = sampleRandomBatch(replayBuffer, batchSize: 32);
|
|
|
|
// Train on batch
|
|
await adapter.train({
|
|
data: batch,
|
|
epochs: 1,
|
|
batchSize: 32,
|
|
});
|
|
```
|
|
|
|
### Prioritized Experience Replay
|
|
|
|
```typescript
|
|
// Store experiences with priority (TD error)
|
|
await adapter.insertPattern({
|
|
// ... standard fields
|
|
confidence: tdError, // Use TD error as confidence/priority
|
|
// ...
|
|
});
|
|
|
|
// Retrieve high-priority experiences
|
|
const highPriority = await adapter.retrieveWithReasoning(queryEmbedding, {
|
|
domain: 'task-domain',
|
|
k: 32,
|
|
minConfidence: 0.7, // Only high TD-error experiences
|
|
});
|
|
```
|
|
|
|
### Multi-Agent Training
|
|
|
|
```typescript
|
|
// Collect experiences from multiple agents
|
|
for (const agent of agents) {
|
|
const experience = await agent.step();
|
|
|
|
await adapter.insertPattern({
|
|
// ... store experience with agent ID
|
|
domain: `multi-agent/${agent.id}`,
|
|
});
|
|
}
|
|
|
|
// Train shared model
|
|
await adapter.train({
|
|
epochs: 50,
|
|
batchSize: 64,
|
|
});
|
|
```
|
|
|
|
---
|
|
|
|
## Performance Optimization
|
|
|
|
### Batch Training
|
|
|
|
```typescript
|
|
// Collect batch of experiences
|
|
const experiences = collectBatch(size: 1000);
|
|
|
|
// Batch insert (500x faster)
|
|
for (const exp of experiences) {
|
|
await adapter.insertPattern({ /* ... */ });
|
|
}
|
|
|
|
// Train on batch
|
|
await adapter.train({
|
|
epochs: 10,
|
|
batchSize: 128, // Larger batch for efficiency
|
|
});
|
|
```
|
|
|
|
### Incremental Learning
|
|
|
|
```typescript
|
|
// Train incrementally as new data arrives
|
|
setInterval(async () => {
|
|
const newExperiences = getNewExperiences();
|
|
|
|
if (newExperiences.length > 100) {
|
|
await adapter.train({
|
|
epochs: 5,
|
|
batchSize: 32,
|
|
});
|
|
}
|
|
}, 60000); // Every minute
|
|
```
|
|
|
|
---
|
|
|
|
## Integration with Reasoning Agents
|
|
|
|
Combine learning with reasoning for better performance:
|
|
|
|
```typescript
|
|
// Train learning model
|
|
await adapter.train({ epochs: 50, batchSize: 32 });
|
|
|
|
// Use reasoning agents for inference
|
|
const result = await adapter.retrieveWithReasoning(queryEmbedding, {
|
|
domain: 'decision-making',
|
|
k: 10,
|
|
useMMR: true, // Diverse experiences
|
|
synthesizeContext: true, // Rich context
|
|
optimizeMemory: true, // Consolidate patterns
|
|
});
|
|
|
|
// Make decision based on learned experiences + reasoning
|
|
const decision = result.context.suggestedAction;
|
|
const confidence = result.memories[0].similarity;
|
|
```
|
|
|
|
---
|
|
|
|
## CLI Operations
|
|
|
|
```bash
|
|
# Create plugin
|
|
npx agentdb@latest create-plugin -t decision-transformer -n my-plugin
|
|
|
|
# List plugins
|
|
npx agentdb@latest list-plugins
|
|
|
|
# Get plugin info
|
|
npx agentdb@latest plugin-info my-plugin
|
|
|
|
# List templates
|
|
npx agentdb@latest list-templates
|
|
```
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### Issue: Training not converging
|
|
```typescript
|
|
// Reduce learning rate
|
|
await adapter.train({
|
|
epochs: 100,
|
|
batchSize: 32,
|
|
learningRate: 0.0001, // Lower learning rate
|
|
});
|
|
```
|
|
|
|
### Issue: Overfitting
|
|
```typescript
|
|
// Use validation split
|
|
await adapter.train({
|
|
epochs: 50,
|
|
batchSize: 64,
|
|
validationSplit: 0.2, // 20% validation
|
|
});
|
|
|
|
// Enable memory optimization
|
|
await adapter.retrieveWithReasoning(queryEmbedding, {
|
|
optimizeMemory: true, // Consolidate, reduce overfitting
|
|
});
|
|
```
|
|
|
|
### Issue: Slow training
|
|
```bash
|
|
# Enable quantization for faster inference
|
|
# Use binary quantization (32x faster)
|
|
```
|
|
|
|
---
|
|
|
|
## Learn More
|
|
|
|
- **Algorithm Papers**: See docs/algorithms/ for detailed papers
|
|
- **GitHub**: https://github.com/ruvnet/agentic-flow/tree/main/packages/agentdb
|
|
- **MCP Integration**: `npx agentdb@latest mcp`
|
|
- **Website**: https://agentdb.ruv.io
|
|
|
|
---
|
|
|
|
**Category**: Machine Learning / Reinforcement Learning
|
|
**Difficulty**: Intermediate to Advanced
|
|
**Estimated Time**: 30-60 minutes
|