AyCode.Core/.github/skills/docs-discovery/SKILL.md

136 lines
10 KiB
Markdown

---
name: docs-discovery
description: Load all .md documentation relevant to the user's current coding task BEFORE searching source code or making modifications. Scans `docs/` folders via Glob using topic keywords extracted from the user's message, loads paired main/ISSUES/TODO .md sets as one unit, and respects the no-re-read rule (skips already-loaded files). Use when the user's request mentions any domain concept, class name, file area, or feature behavior — invoke BEFORE the first `code_search`/`get_file`/`Grep` on source files. Typical triggers: any coding/refactoring task, "let's fix X", "review Y", "how does Z work", or any question that would otherwise lead to source-code exploration.
compatibility: Designed for Claude Code and GitHub Copilot (VS). Uses the host agent's Glob/Read tools.
metadata:
author: Fullepi
version: "1.0"
---
# docs-discovery
Ensure relevant `.md` documentation is loaded **before** any source-code search or modification, so the LLM has the documented behaviour, known issues, and planned work in context. Saves many `Grep` / `get_file` rounds — reading a handful of .md files upfront is cheaper than rediscovering information via code search.
## Before you start
This skill READS `.md` files and updates the LLM's `[LOADED_DOCS: ...]` state. It MUST NOT modify any file. Follow the **no-re-read** and **explicit-consent-for-modifications** rules from the active repo's `copilot-instructions.md` (rule numbers may differ per repo — refer to the rule NAMES).
## Step 1 — Extract topic keywords
Parse the user's most recent message (and the wider conversation tail if relevant) for concrete concepts. Examples:
- Class / type names: `AcLoggerBase`, `SegmentBufferReader`, `AcBinaryHubProtocol`, `<Consumer>SignalRClient` (any derived/consumer-specific type)
- Feature areas: "logger", "log writer", "serializer", "SignalR", "hub protocol", "chunked framing", "connection builder", "options"
- File hints: `Program.cs`, `AcLoggerBase.cs`, `SIGNALR.md`
- Patterns / idioms: "DI factory", "appsettings", "mode negotiation"
Derive **root topic tokens** from these — singular, lowercase, domain-defining words:
- `"AcLoggerBase"``logger`, `logging`
- `"SignalR client"``signalr`
- `"AcBinarySerializer"``binary`, `serializer`
- `"AcBinaryHubProtocol"``protocol`, `signalr_binary_protocol`, `binary`
- `"chunked"``signalr_binary_protocol`
Keep the set small (usually 1-3 root tokens). If the request genuinely spans multiple domains, include all.
## Step 2 — Map tokens to glob patterns (semantic, not hardcoded)
### ⚠️ CRITICAL — the recursive `**/` wildcard is MANDATORY in every glob
The `**/` is NOT cosmetic. It matches `docs/` at **any depth** in the workspace:
- repo-root: `<Repo>/docs/TOPIC/`
- project-level: `<Repo>/<Project>/docs/TOPIC/`**very common for Pattern B layouts**
- nested: `<Repo>/<Project>/<SubProject>/docs/TOPIC/`
**Correct form — always**: `<OptionalRepoPrefix>/**/docs/{TOKEN}/**/*.md`
**Wrong form — never**: `<OptionalRepoPrefix>/docs/{TOKEN}/**/*.md` (missing the leading `**/`)
**Failure mode** (this happens often with Pattern B projects):
- You know the target repo (e.g. via `own-dep-repos`) — say `<Repo> = AyCode.Core`.
- You synthesize `<Repo>/docs/{TOKEN}/...` because "that's where docs usually live".
- Glob returns 0 matches (repo-root `docs/` doesn't contain topic folders — only flat reference docs).
- You conclude "no docs exist" and fall through to code-search.
- Meanwhile the actual docs sit at `<Repo>/<Project>/docs/{TOKEN}/` — one level deeper.
**The rule is absolute**: NEVER drop the leading `**/`, even when you "know" the repo. Let the recursive glob find the actual depth. Relative-path guesses based on "usual" layouts are a reliable source of false-empty conclusions.
### File layout convention
(See `LLM_PROTOCOL_DECISIONS.md` entry "Docs migrated to folder+README pattern".)
Topics with multiple files live in named folders: `docs/TOPIC/README.md` + `docs/TOPIC/TOPIC_ISSUES.md` + `docs/TOPIC/TOPIC_TODO.md` (or other `TOPIC_*.md` companions). Single-file reference docs remain flat at the `docs/` root (e.g., `docs/ARCHITECTURE.md`, `docs/GLOSSARY.md`).
For each root token, synthesize glob patterns targeting BOTH layouts:
| Token example | Primary glob (folder) | Companion glob (flat + variants) |
|---|---|---|
| `logger`, `log`, `logging` | `**/docs/LOGGING/**/*.md` | `**/docs/LOGGING_*.md` (legacy/variants) |
| `binary`, `serializer` | `**/docs/BINARY/**/*.md` | `**/docs/BINARY_*.md` |
| `signalr`, `hub` | `**/docs/SIGNALR*/**/*.md` | — (covers SIGNALR + SIGNALR_BINARY_PROTOCOL folders) |
| `protocol`, `wire`, `chunked` | `**/docs/*PROTOCOL*/**/*.md` | — |
| `grid`, `mggrid` | `**/docs/MGGRID/**/*.md` | — |
| `architecture`, `conventions`, `glossary` | — (flat, single-file) | `**/docs/ARCHITECTURE.md`, `**/docs/CONVENTIONS.md`, `**/docs/GLOSSARY.md` |
Do NOT require tokens to match a pre-baked list — construct patterns from the token itself uppercased:
- Primary: `**/docs/{TOKEN}/**/*.md` (matches everything inside the topic folder)
- Companion/variant: `**/docs/{TOKEN}_*.md` (matches flat files or variant prefix folders like `SIGNALR_BINARY_PROTOCOL`)
Natural language variants (logger/logging, serialize/serializer, binary/binaries) should all be attempted against both the primary and companion patterns.
**For README.md discovery** (folder-navigation rule): if a topic folder match is found, the `README.md` in that folder is the entry point and MUST be included in the load set (not just sibling `_ISSUES` / `_TODO` files).
(See the CRITICAL section at the top of this Step 2 for the full explanation of why the leading `**/` is mandatory — this is the most common cause of false-empty docs conclusions.)
## Step 3 — Execute the Glob and dedupe against already-loaded docs
Run each glob pattern via the host agent's Glob tool. Collect all matching absolute paths.
**Dedupe against `[LOADED_DOCS: ...]` prefix:**
- If a match is already in LOADED_DOCS → skip it (Rule #3)
- If a match is under `bin/`, `obj/`, `node_modules/`, `Test_Benchmark_Results/`, or a worktree-backup path → skip it (not framework docs)
If the total match count exceeds 10, narrow the glob pattern (e.g., require domain token near the filename start, not just substring). LLM context is finite.
**False-empty guardrail:** if the glob returns 0 matches OR all matched files are 0-byte, do NOT conclude "docs are empty" — first re-validate the glob (typo? literal path substituted?) and retry once with the same token under a corrected `**/docs/...` pattern (NEVER with an ad-hoc path guess). Only after the validated retry also fails should you fall through to code-search.
## Step 4 — Load the filtered set
Read all remaining matches in parallel (batch the Read calls in one tool-use block). The newly-loaded files will appear in your next response's `[LOADED_DOCS: ...]` prefix under the `+K this turn: <short names>` delta, per the active repo's Rule #1 format (basename by default; `TOPIC/README.md` for topic-folder READMEs to disambiguate across the many `README.md` files the Pattern-B docs layout introduces).
## Step 5 — Respect the paired-docs convention
If any `{DOMAIN}.md` is loaded (e.g., `LOGGING.md`), ALSO glob and load its companions:
- `{DOMAIN}_ISSUES.md` — known issues / limitations / workarounds
- `{DOMAIN}_TODO.md` — planned work / open tickets
These are **paired docs** and must be loaded as a set. Skipping ISSUES/TODO risks reintroducing fixed bugs or conflicting with ongoing refactors.
## Step 6 — Proceed to the user's task
The response's `[LOADED_DOCS: N files (+K this turn: <basenames>)]` prefix (per the active repo's Rule #1) already surfaces the newly-loaded filenames and the cumulative count. **No separate confirmation line is needed** — the prefix itself is the confirmation. Continue directly to the user's actual request.
If any relevant docs were skipped as already-loaded (Rule #3 dedupe), you MAY optionally mention them inline where relevant (e.g., "I already have LOGGING.md from earlier"). Do not reiterate the full loaded list.
## Do NOT
- **Re-read** any `.md` file already in `[LOADED_DOCS: ...]` — the **no-re-read** rule is absolute (check the active repo's `copilot-instructions.md` for the authoritative phrasing; rule number may differ per repo). The only exception: user explicitly states the file has changed on disk via external means.
- **Load unrelated domains** — if the user asks about the Logger, don't load SignalR docs "just in case".
- **Load more than ~10 files** in a single invocation — if the glob matches more, refine the pattern. If the request truly spans many domains, split into multiple sequential invocations with narrower scope each.
- **Skip folder `README.md`** — if the active repo's conventions include a **folder-navigation / folder-README-first** rule, honour it. `README.md` in a loaded `docs/` folder is always in scope.
## Tool usage
This skill is tool-neutral. Map these capabilities to the host agent's tools (per the active repo's `CLAUDE.md`):
- Globbing file paths: `Glob` (Claude Code), `file_search` (Copilot), `Get-ChildItem -Filter`
- Reading files: `Read` (Claude Code), `get_file` (Copilot)
- Parallelizing reads: issue multiple tool calls in a single response where the host supports it
## Edge cases
- **No matching docs found:** Emit `> docs-discovery: no .md matches for tokens [list]. Proceeding with code-search only.` This is informational — the task may be in a domain without documentation, which is itself a signal to be careful.
- **Token extraction is ambiguous:** Prefer SUPERSET — load a few extra .md files rather than missing relevant ones. Loading 3 extra docs is cheap; missing ISSUES.md and reintroducing a fixed bug is expensive.
- **User says "don't load docs" / "just search the code":** Respect it. Skip this skill entirely for that turn.
- **Recursive trigger:** If loaded docs reference other `.md` files via cross-reference, do NOT auto-follow unless the user's request explicitly extends to them. Cross-refs can cascade; relevance-bounded glob is the primary mechanism.