AyCode.Core/.github/skills/docs-discovery/SKILL.md

10 KiB

name: docs-discovery description: Load all .md documentation relevant to the user's current coding task BEFORE searching source code or making modifications. Scans docs/ folders via Glob using topic keywords extracted from the user's message, loads paired main/ISSUES/TODO .md sets as one unit, and respects the no-re-read rule (skips already-loaded files). Use when the user's request mentions any domain concept, class name, file area, or feature behavior — invoke BEFORE the first code_search/get_file/Grep on source files. Typical triggers: any coding/refactoring task, "let's fix X", "review Y", "how does Z work", or any question that would otherwise lead to source-code exploration. compatibility: Designed for Claude Code and GitHub Copilot (VS). Uses the host agent's Glob/Read tools. metadata: author: Fullepi version: "1.0"

docs-discovery

Ensure relevant .md documentation is loaded before any source-code search or modification, so the LLM has the documented behaviour, known issues, and planned work in context. Saves many Grep / get_file rounds — reading a handful of .md files upfront is cheaper than rediscovering information via code search.

Before you start

This skill READS .md files and updates the LLM's [LOADED_DOCS: ...] state. It MUST NOT modify any file. Follow the no-re-read and explicit-consent-for-modifications rules from the active repo's copilot-instructions.md (rule numbers may differ per repo — refer to the rule NAMES).

Step 1 — Extract topic keywords

Parse the user's most recent message (and the wider conversation tail if relevant) for concrete concepts. Examples:

  • Class / type names: AcLoggerBase, SegmentBufferReader, AcBinaryHubProtocol, <Consumer>SignalRClient (any derived/consumer-specific type)
  • Feature areas: "logger", "log writer", "serializer", "SignalR", "hub protocol", "chunked framing", "connection builder", "options"
  • File hints: Program.cs, AcLoggerBase.cs, SIGNALR.md
  • Patterns / idioms: "DI factory", "appsettings", "mode negotiation"

Derive root topic tokens from these — singular, lowercase, domain-defining words:

  • "AcLoggerBase"logger, logging
  • "SignalR client"signalr
  • "AcBinarySerializer"binary, serializer
  • "AcBinaryHubProtocol"protocol, signalr_binary_protocol, binary
  • "chunked"signalr_binary_protocol

Keep the set small (usually 1-3 root tokens). If the request genuinely spans multiple domains, include all.

Step 2 — Map tokens to glob patterns (semantic, not hardcoded)

⚠️ CRITICAL — the recursive **/ wildcard is MANDATORY in every glob

The **/ is NOT cosmetic. It matches docs/ at any depth in the workspace:

  • repo-root: <Repo>/docs/TOPIC/
  • project-level: <Repo>/<Project>/docs/TOPIC/very common for Pattern B layouts
  • nested: <Repo>/<Project>/<SubProject>/docs/TOPIC/

Correct form — always: <OptionalRepoPrefix>/**/docs/{TOKEN}/**/*.md Wrong form — never: <OptionalRepoPrefix>/docs/{TOKEN}/**/*.md (missing the leading **/)

Failure mode (this happens often with Pattern B projects):

  • You know the target repo (e.g. via own-dep-repos) — say <Repo> = AyCode.Core.
  • You synthesize <Repo>/docs/{TOKEN}/... because "that's where docs usually live".
  • Glob returns 0 matches (repo-root docs/ doesn't contain topic folders — only flat reference docs).
  • You conclude "no docs exist" and fall through to code-search.
  • Meanwhile the actual docs sit at <Repo>/<Project>/docs/{TOKEN}/ — one level deeper.

The rule is absolute: NEVER drop the leading **/, even when you "know" the repo. Let the recursive glob find the actual depth. Relative-path guesses based on "usual" layouts are a reliable source of false-empty conclusions.

File layout convention

(See LLM_PROTOCOL_DECISIONS.md entry "Docs migrated to folder+README pattern".)

Topics with multiple files live in named folders: docs/TOPIC/README.md + docs/TOPIC/TOPIC_ISSUES.md + docs/TOPIC/TOPIC_TODO.md (or other TOPIC_*.md companions). Single-file reference docs remain flat at the docs/ root (e.g., docs/ARCHITECTURE.md, docs/GLOSSARY.md).

For each root token, synthesize glob patterns targeting BOTH layouts:

Token example Primary glob (folder) Companion glob (flat + variants)
logger, log, logging **/docs/LOGGING/**/*.md **/docs/LOGGING_*.md (legacy/variants)
binary, serializer **/docs/BINARY/**/*.md **/docs/BINARY_*.md
signalr, hub **/docs/SIGNALR*/**/*.md — (covers SIGNALR + SIGNALR_BINARY_PROTOCOL folders)
protocol, wire, chunked **/docs/*PROTOCOL*/**/*.md
grid, mggrid **/docs/MGGRID/**/*.md
architecture, conventions, glossary — (flat, single-file) **/docs/ARCHITECTURE.md, **/docs/CONVENTIONS.md, **/docs/GLOSSARY.md

Do NOT require tokens to match a pre-baked list — construct patterns from the token itself uppercased:

  • Primary: **/docs/{TOKEN}/**/*.md (matches everything inside the topic folder)
  • Companion/variant: **/docs/{TOKEN}_*.md (matches flat files or variant prefix folders like SIGNALR_BINARY_PROTOCOL)

Natural language variants (logger/logging, serialize/serializer, binary/binaries) should all be attempted against both the primary and companion patterns.

For README.md discovery (folder-navigation rule): if a topic folder match is found, the README.md in that folder is the entry point and MUST be included in the load set (not just sibling _ISSUES / _TODO files).

(See the CRITICAL section at the top of this Step 2 for the full explanation of why the leading **/ is mandatory — this is the most common cause of false-empty docs conclusions.)

Step 3 — Execute the Glob and dedupe against already-loaded docs

Run each glob pattern via the host agent's Glob tool. Collect all matching absolute paths.

Dedupe against [LOADED_DOCS: ...] prefix:

  • If a match is already in LOADED_DOCS → skip it (Rule #3)
  • If a match is under bin/, obj/, node_modules/, Test_Benchmark_Results/, or a worktree-backup path → skip it (not framework docs)

If the total match count exceeds 10, narrow the glob pattern (e.g., require domain token near the filename start, not just substring). LLM context is finite.

False-empty guardrail: if the glob returns 0 matches OR all matched files are 0-byte, do NOT conclude "docs are empty" — first re-validate the glob (typo? literal path substituted?) and retry once with the same token under a corrected **/docs/... pattern (NEVER with an ad-hoc path guess). Only after the validated retry also fails should you fall through to code-search.

Step 4 — Load the filtered set

Read all remaining matches in parallel (batch the Read calls in one tool-use block). The newly-loaded files will appear in your next response's [LOADED_DOCS: ...] prefix under the +K this turn: <short names> delta, per the active repo's Rule #1 format (basename by default; TOPIC/README.md for topic-folder READMEs to disambiguate across the many README.md files the Pattern-B docs layout introduces).

Step 5 — Respect the paired-docs convention

If any {DOMAIN}.md is loaded (e.g., LOGGING.md), ALSO glob and load its companions:

  • {DOMAIN}_ISSUES.md — known issues / limitations / workarounds
  • {DOMAIN}_TODO.md — planned work / open tickets

These are paired docs and must be loaded as a set. Skipping ISSUES/TODO risks reintroducing fixed bugs or conflicting with ongoing refactors.

Step 6 — Proceed to the user's task

The response's [LOADED_DOCS: N files (+K this turn: <basenames>)] prefix (per the active repo's Rule #1) already surfaces the newly-loaded filenames and the cumulative count. No separate confirmation line is needed — the prefix itself is the confirmation. Continue directly to the user's actual request.

If any relevant docs were skipped as already-loaded (Rule #3 dedupe), you MAY optionally mention them inline where relevant (e.g., "I already have LOGGING.md from earlier"). Do not reiterate the full loaded list.

Do NOT

  • Re-read any .md file already in [LOADED_DOCS: ...] — the no-re-read rule is absolute (check the active repo's copilot-instructions.md for the authoritative phrasing; rule number may differ per repo). The only exception: user explicitly states the file has changed on disk via external means.
  • Load unrelated domains — if the user asks about the Logger, don't load SignalR docs "just in case".
  • Load more than ~10 files in a single invocation — if the glob matches more, refine the pattern. If the request truly spans many domains, split into multiple sequential invocations with narrower scope each.
  • Skip folder README.md — if the active repo's conventions include a folder-navigation / folder-README-first rule, honour it. README.md in a loaded docs/ folder is always in scope.

Tool usage

This skill is tool-neutral. Map these capabilities to the host agent's tools (per the active repo's CLAUDE.md):

  • Globbing file paths: Glob (Claude Code), file_search (Copilot), Get-ChildItem -Filter
  • Reading files: Read (Claude Code), get_file (Copilot)
  • Parallelizing reads: issue multiple tool calls in a single response where the host supports it

Edge cases

  • No matching docs found: Emit > docs-discovery: no .md matches for tokens [list]. Proceeding with code-search only. This is informational — the task may be in a domain without documentation, which is itself a signal to be careful.
  • Token extraction is ambiguous: Prefer SUPERSET — load a few extra .md files rather than missing relevant ones. Loading 3 extra docs is cheap; missing ISSUES.md and reintroducing a fixed bug is expensive.
  • User says "don't load docs" / "just search the code": Respect it. Skip this skill entirely for that turn.
  • Recursive trigger: If loaded docs reference other .md files via cross-reference, do NOT auto-follow unless the user's request explicitly extends to them. Cross-refs can cascade; relevance-bounded glob is the primary mechanism.