AyCode.Core/.github/skills/docs-discovery/SKILL.md

104 lines
7.1 KiB
Markdown

---
name: docs-discovery
description: Load all .md documentation relevant to the user's current coding task BEFORE searching source code or making modifications. Scans `docs/` folders via Glob using topic keywords extracted from the user's message, loads paired main/ISSUES/TODO .md sets as one unit, and respects the no-re-read rule (skips already-loaded files). Use when the user's request mentions any domain concept, class name, file area, or feature behavior — invoke BEFORE the first `code_search`/`get_file`/`Grep` on source files. Typical triggers: any coding/refactoring task, "let's fix X", "review Y", "how does Z work", or any question that would otherwise lead to source-code exploration.
compatibility: Designed for Claude Code and GitHub Copilot (VS). Uses the host agent's Glob/Read tools.
metadata:
author: Fullepi
version: "1.0"
---
# docs-discovery
Ensure relevant `.md` documentation is loaded **before** any source-code search or modification, so the LLM has the documented behaviour, known issues, and planned work in context. Saves many `Grep` / `get_file` rounds — reading a handful of .md files upfront is cheaper than rediscovering information via code search.
## Before you start
This skill READS `.md` files and updates the LLM's `[LOADED_DOCS: ...]` state. It MUST NOT modify any file. Follow the **no-re-read** and **explicit-consent-for-modifications** rules from the active repo's `copilot-instructions.md` (rule numbers may differ per repo — refer to the rule NAMES).
## Step 1 — Extract topic keywords
Parse the user's most recent message (and the wider conversation tail if relevant) for concrete concepts. Examples:
- Class / type names: `AcLoggerBase`, `SegmentBufferReader`, `AcBinaryHubProtocol`, `FruitBankSignalRClient`
- Feature areas: "logger", "log writer", "serializer", "SignalR", "hub protocol", "chunked framing", "connection builder", "options"
- File hints: `Program.cs`, `AcLoggerBase.cs`, `SIGNALR.md`
- Patterns / idioms: "DI factory", "appsettings", "mode negotiation"
Derive **root topic tokens** from these — singular, lowercase, domain-defining words:
- `"AcLoggerBase"``logger`, `logging`
- `"SignalR client"``signalr`
- `"AcBinarySerializer"``binary`, `serializer`
- `"AcBinaryHubProtocol"``protocol`, `signalr_binary_protocol`, `binary`
- `"chunked"``signalr_binary_protocol`
Keep the set small (usually 1-3 root tokens). If the request genuinely spans multiple domains, include all.
## Step 2 — Map tokens to glob patterns (semantic, not hardcoded)
For each root token, synthesize `.md` filename patterns using common conventions:
| Token example | Glob patterns to try |
|---|---|
| `logger`, `log`, `logging` | `**/docs/LOGGING*.md` |
| `binary`, `serializer` | `**/docs/BINARY*.md` |
| `signalr`, `hub` | `**/docs/SIGNALR*.md` |
| `protocol`, `wire`, `chunked` | `**/docs/*PROTOCOL*.md` |
| `grid`, `mggrid` | `**/docs/MGGRID*.md` |
Do NOT require the tokens to match a pre-baked list — construct patterns from the token itself uppercased (e.g., `logger``**/docs/LOGGER*.md` + `**/docs/LOGGING*.md`). Natural language variants (logger/logging, serialize/serializer, binary/binaries) should all be attempted.
Also consider suffix patterns:
- `**/docs/*{TOKEN}*.md` (substring match)
- `**/docs/*{TOKEN}_ISSUES.md`, `**/docs/*{TOKEN}_TODO.md` (paired docs)
## Step 3 — Execute the Glob and dedupe against already-loaded docs
Run each glob pattern via the host agent's Glob tool. Collect all matching absolute paths.
**Dedupe against `[LOADED_DOCS: ...]` prefix:**
- If a match is already in LOADED_DOCS → skip it (Rule #3)
- If a match is under `bin/`, `obj/`, `node_modules/`, `Test_Benchmark_Results/`, or a worktree-backup path → skip it (not framework docs)
If the total match count exceeds 10, narrow the glob pattern (e.g., require domain token near the filename start, not just substring). LLM context is finite.
## Step 4 — Load the filtered set
Read all remaining matches in parallel (batch the Read calls in one tool-use block). The newly-loaded basenames will appear in your next response's `[LOADED_DOCS: ...]` prefix under the `+K this turn: <basenames>` delta, per the active repo's Rule #1 format.
## Step 5 — Respect the paired-docs convention
If any `{DOMAIN}.md` is loaded (e.g., `LOGGING.md`), ALSO glob and load its companions:
- `{DOMAIN}_ISSUES.md` — known issues / limitations / workarounds
- `{DOMAIN}_TODO.md` — planned work / open tickets
These are **paired docs** and must be loaded as a set. Skipping ISSUES/TODO risks reintroducing fixed bugs or conflicting with ongoing refactors.
## Step 6 — Proceed to the user's task
The response's `[LOADED_DOCS: N files (+K this turn: <basenames>)]` prefix (per the active repo's Rule #1) already surfaces the newly-loaded filenames and the cumulative count. **No separate confirmation line is needed** — the prefix itself is the confirmation. Continue directly to the user's actual request.
If any relevant docs were skipped as already-loaded (Rule #3 dedupe), you MAY optionally mention them inline where relevant (e.g., "I already have LOGGING.md from earlier"). Do not reiterate the full loaded list.
## Do NOT
- **Re-read** any `.md` file already in `[LOADED_DOCS: ...]` — the **no-re-read** rule is absolute (check the active repo's `copilot-instructions.md` for the authoritative phrasing; rule number may differ per repo). The only exception: user explicitly states the file has changed on disk via external means.
- **Load unrelated domains** — if the user asks about the Logger, don't load SignalR docs "just in case".
- **Load more than ~10 files** in a single invocation — if the glob matches more, refine the pattern. If the request truly spans many domains, split into multiple sequential invocations with narrower scope each.
- **Skip folder `README.md`** — if the active repo's conventions include a **folder-navigation / folder-README-first** rule, honour it. `README.md` in a loaded `docs/` folder is always in scope.
## Tool usage
This skill is tool-neutral. Map these capabilities to the host agent's tools (per the active repo's `CLAUDE.md`):
- Globbing file paths: `Glob` (Claude Code), `file_search` (Copilot), `Get-ChildItem -Filter`
- Reading files: `Read` (Claude Code), `get_file` (Copilot)
- Parallelizing reads: issue multiple tool calls in a single response where the host supports it
## Edge cases
- **No matching docs found:** Emit `> docs-discovery: no .md matches for tokens [list]. Proceeding with code-search only.` This is informational — the task may be in a domain without documentation, which is itself a signal to be careful.
- **Token extraction is ambiguous:** Prefer SUPERSET — load a few extra .md files rather than missing relevant ones. Loading 3 extra docs is cheap; missing ISSUES.md and reintroducing a fixed bug is expensive.
- **User says "don't load docs" / "just search the code":** Respect it. Skip this skill entirely for that turn.
- **Recursive trigger:** If loaded docs reference other `.md` files via cross-reference, do NOT auto-follow unless the user's request explicitly extends to them. Cross-refs can cascade; relevance-bounded glob is the primary mechanism.