14 KiB
docs/ folders via Glob using topic keywords extracted from the user's message, loads paired main/ISSUES/TODO .md sets as one unit, and respects the no-re-read rule (skips already-loaded files). Use when the user's request mentions any domain concept, class name, file area, or feature behavior — invoke BEFORE the first code_search/get_file/Grep on source files. Typical triggers: any coding/refactoring task, "let's fix X", "review Y", "how does Z work", or any question that would otherwise lead to source-code exploration.
compatibility: Designed for Claude Code and GitHub Copilot (VS). Uses the host agent's Glob/Read tools.
metadata:
author: Fullepi
docs-discovery
Ensure relevant .md documentation is loaded before any source-code search or modification, so the LLM has the documented behaviour, known issues, and planned work in context. Saves many Grep / get_file rounds — reading a handful of .md files upfront is cheaper than rediscovering information via code search.
Before you start
This skill READS .md files and updates the LLM's [LOADED_DOCS: ...] state. It MUST NOT modify any file. Follow the no-re-read and explicit-consent-for-modifications rules from the active repo's copilot-instructions.md (rule numbers may differ per repo — refer to the rule NAMES).
Step 1 — Extract topic keywords
Parse the user's most recent message (and the wider conversation tail if relevant) for concrete concepts. Examples:
- Class / type names:
AcLoggerBase,AsyncPipeReaderInput,AcBinaryHubProtocol,<Consumer>SignalRClient(any derived/consumer-specific type) - Feature areas: "logger", "log writer", "serializer", "SignalR", "hub protocol", "chunked framing", "connection builder", "options"
- File hints:
Program.cs,AcLoggerBase.cs,SIGNALR.md - Patterns / idioms: "DI factory", "appsettings", "mode negotiation"
Derive root topic tokens from these — singular, lowercase, domain-defining words:
"AcLoggerBase"→logger,logging"SignalR client"→signalr"AcBinarySerializer"→binary,serializer"AcBinaryHubProtocol"→protocol,signalr_binary_protocol,binary"chunked"→signalr_binary_protocol
Keep the set small (usually 1-3 root tokens). If the request genuinely spans multiple domains, include all.
Step 2 — Map tokens to glob patterns (semantic, not hardcoded)
⚠️ CRITICAL — the recursive **/ wildcard is MANDATORY in every glob
The **/ is NOT cosmetic. It matches docs/ at any depth in the workspace:
- repo-root:
<Repo>/docs/TOPIC/ - project-level:
<Repo>/<Project>/docs/TOPIC/← very common for Pattern B layouts - nested:
<Repo>/<Project>/<SubProject>/docs/TOPIC/
Correct form — always: <OptionalRepoPrefix>/**/docs/{TOKEN}/**/*.md
Wrong form — never: <OptionalRepoPrefix>/docs/{TOKEN}/**/*.md (missing the leading **/)
Failure mode (this happens often with Pattern B projects):
- You know the target repo (e.g. via
own-dep-repos) — say<Repo> = AyCode.Core. - You synthesize
<Repo>/docs/{TOKEN}/...because "that's where docs usually live". - Glob returns 0 matches (repo-root
docs/doesn't contain topic folders — only flat reference docs). - You conclude "no docs exist" and fall through to code-search.
- Meanwhile the actual docs sit at
<Repo>/<Project>/docs/{TOKEN}/— one level deeper.
The rule is absolute: NEVER drop the leading **/, even when you "know" the repo. Let the recursive glob find the actual depth. Relative-path guesses based on "usual" layouts are a reliable source of false-empty conclusions.
⚠️ Path separators — forward slashes ONLY in glob patterns
Glob engines (ripgrep, Microsoft.Extensions.FileSystemGlobbing, Node glob, Python pathlib, etc.) all expect forward slashes / in patterns, regardless of host OS. The filesystem layer handles OS-level path normalization; the pattern stays POSIX.
Correct (always): H:/Applications/<Repo>/**/docs/{TOKEN}/**/*.md
Wrong (Windows trap): H:\Applications\<Repo>\**\docs\{TOKEN}\**\*.md
Failure mode: a Windows-backslash glob typically returns 0 matches even when files exist — silent failure. The agent sees "no docs", falls through to code-search, and never realizes the search was malformed.
Rule is absolute: write / in every glob pattern. If your tool/runtime is Windows and you're tempted to "match the OS path style" — don't. The glob engine will not normalize backslashes for you in pattern position.
(Combined with the **/ mandate above: every glob in this skill uses <optional-prefix>/**/<pattern> with forward slashes throughout — both rules together cover the most common false-empty conclusions.)
File layout convention
(See LLM_PROTOCOL_DECISIONS.md entry "Docs migrated to folder+README pattern".)
Topics with multiple files live in named folders: docs/TOPIC/README.md + docs/TOPIC/TOPIC_ISSUES.md + docs/TOPIC/TOPIC_TODO.md (or other TOPIC_*.md companions). Single-file reference docs remain flat at the docs/ root (e.g., docs/ARCHITECTURE.md, docs/GLOSSARY.md).
For each root token, synthesize glob patterns targeting BOTH layouts:
| Token example | Primary glob (folder) | Companion glob (flat + variants) |
|---|---|---|
logger, log, logging |
**/docs/LOGGING/**/*.md |
**/docs/LOGGING_*.md (legacy/variants) |
binary, serializer |
**/docs/BINARY/**/*.md |
**/docs/BINARY_*.md |
signalr, hub |
**/docs/SIGNALR*/**/*.md |
— (covers SIGNALR + SIGNALR_BINARY_PROTOCOL folders) |
protocol, wire, chunked |
**/docs/*PROTOCOL*/**/*.md |
— |
grid, mggrid |
**/docs/MGGRID/**/*.md |
— |
architecture, conventions, glossary |
— (flat, single-file) | **/docs/ARCHITECTURE.md, **/docs/CONVENTIONS.md, **/docs/GLOSSARY.md |
Do NOT require tokens to match a pre-baked list — construct patterns from the token itself uppercased:
- Primary:
**/docs/{TOKEN}/**/*.md(matches everything inside the topic folder) - Companion/variant:
**/docs/{TOKEN}_*.md(matches flat files or variant prefix folders likeSIGNALR_BINARY_PROTOCOL)
Natural language variants (logger/logging, serialize/serializer, binary/binaries) should all be attempted against both the primary and companion patterns.
For README.md discovery (folder-navigation rule): if a topic folder match is found, the README.md in that folder is the entry point and MUST be included in the load set (not just sibling _ISSUES / _TODO files).
(See the CRITICAL section at the top of this Step 2 for the full explanation of why the leading **/ is mandatory — this is the most common cause of false-empty docs conclusions.)
Step 3 — Execute the Glob and dedupe against already-loaded docs
Run each glob pattern via the host agent's Glob tool. Collect all matching absolute paths.
Dedupe against [LOADED_DOCS: ...] prefix:
- If a match is already in LOADED_DOCS → skip it (Rule #3)
- If a match is under
bin/,obj/,node_modules/,Test_Benchmark_Results/, or a worktree-backup path → skip it (not framework docs)
If the total match count exceeds 10, narrow the glob pattern (e.g., require domain token near the filename start, not just substring). LLM context is finite.
False-empty guardrail: if the glob returns 0 matches OR all matched files are 0-byte, do NOT conclude "docs are empty" — first re-validate the glob (typo? literal path substituted?) and retry once with the same token under a corrected **/docs/... pattern (NEVER with an ad-hoc path guess). Only after the validated retry also fails should you fall through to code-search.
Step 4 — Load the filtered set
Read all remaining matches in parallel (batch the Read calls in one tool-use block). The newly-loaded files will appear in your next response's [LOADED_DOCS: ...] prefix under the +K this turn: <short names> delta, per the active repo's Rule #1 format (basename by default; TOPIC/README.md for topic-folder READMEs to disambiguate across the many README.md files the Pattern-B docs layout introduces).
Step 5 — Respect the paired-docs convention
If any {DOMAIN}.md is loaded (e.g., LOGGING.md), ALSO glob and load its companions:
{DOMAIN}_ISSUES.md— known issues / limitations / workarounds{DOMAIN}_TODO.md— planned work / open tickets
These are paired docs and must be loaded as a set. Skipping ISSUES/TODO risks reintroducing fixed bugs or conflicting with ongoing refactors.
Archive files (*_<year>.md)
Closed entries from _ISSUES.md / _TODO.md / LLM_PROTOCOL_DECISIONS.md may be rotated into year-bucketed archive files by the docs-archive skill. Examples:
LOGGING_ISSUES_2025.mdBINARY_TODO_2026.mdLLM_PROTOCOL_DECISIONS_2026.md
Default behaviour: NOT auto-loaded
The Step 2 glob patterns target active companions only — unsuffixed names. Year-suffixed variants are excluded by default. Practically:
**/docs/{TOPIC}/{TOPIC}_ISSUES.mdmatches;**/docs/{TOPIC}/{TOPIC}_ISSUES_2025.mddoes NOT.- If a generic
{TOPIC}_*.mdpattern inadvertently matches year-suffixed files, filter them out before passing to Step 4 (Load).
On-demand read (no user-confirm needed — read-only operation)
Read an archive file when ANY of these signals appears:
- A loaded entry references an archived ID (e.g.,
Superseded by ACCORE-LOG-I-K7M2where the random-suffixed ID resolves only to a_<year>.mdarchive) - A code comment or other doc references an ID resolving only to an archive file
- The user's request describes a behaviour pattern matching an archived
Fixedentry's Description (regression suspicion) - The investigation feels like "this was solved before" — read the topic's archive(s) before re-deriving
- The user explicitly asks about historical context
When read: include in [LOADED_DOCS] like any other .md. Rule #3 (no-re-read) applies. Cite from it like the active file.
This is a read — Rule #5 (consent for modifications) is not engaged. The "don't pre-load" rule is about token economy, not access control.
Glob recap
Active-only (default for topic discovery):
**/docs/{TOKEN}/{TOKEN}_ISSUES.md**/docs/{TOKEN}/{TOKEN}_TODO.md**/docs/{TOKEN}/README.md
On-demand archive lookup:
**/docs/{TOKEN}/{TOKEN}_ISSUES_*.md(where*matches a 4-digit year)**/docs/{TOKEN}/{TOKEN}_TODO_*.md**/LLM_PROTOCOL_DECISIONS_*.md
Cross-repo ID search
When a request asks about all entries of a topic across the workspace (e.g., "show me all logger issues" — potentially spanning multiple repos, each possibly having its own logger-related topic under its own prefix), use prefix-wildcard globs against the entry IDs:
**/*-LOG-I-*— all logger-issue IDs, any repo prefix**/*-LOG-*— broader: all logger entries (issues + TODOs + bugs across any prefix)**/*-XCUT-I-*— all cross-cutting issue IDs across any prefix
The wildcard pattern is repo-agnostic — no central prefix list is consulted; the LLM filters by prefix after retrieval.
Active topic files are scanned by default; archive files follow the on-demand rules above. Filter results by the <PREFIX> dimension after retrieval to narrow to a specific repo scope.
The skill maintains no cross-repo index — globs do the work. This complements topic-folder discovery (Step 2): folder-discovery loads files by token-to-folder mapping; cross-repo ID search resolves specific entries by <PREFIX>-<TOPIC>-<TYPE>-<RAND> pattern.
Step 6 — Proceed to the user's task
The response's [LOADED_DOCS: N files (+K this turn: <basenames>)] prefix (per the active repo's Rule #1) already surfaces the newly-loaded filenames and the cumulative count. No separate confirmation line is needed — the prefix itself is the confirmation. Continue directly to the user's actual request.
If any relevant docs were skipped as already-loaded (Rule #3 dedupe), you MAY optionally mention them inline where relevant (e.g., "I already have LOGGING.md from earlier"). Do not reiterate the full loaded list.
Do NOT
- Re-read any
.mdfile already in[LOADED_DOCS: ...]— the no-re-read rule is absolute (check the active repo'scopilot-instructions.mdfor the authoritative phrasing; rule number may differ per repo). The only exception: user explicitly states the file has changed on disk via external means. - Load unrelated domains — if the user asks about the Logger, don't load SignalR docs "just in case".
- Load more than ~10 files in a single invocation — if the glob matches more, refine the pattern. If the request truly spans many domains, split into multiple sequential invocations with narrower scope each.
- Skip folder
README.md— if the active repo's conventions include a folder-navigation / folder-README-first rule, honour it.README.mdin a loadeddocs/folder is always in scope.
Tool usage
This skill is tool-neutral. Map these capabilities to the host agent's tools (per the active repo's CLAUDE.md):
- Globbing file paths:
Glob(Claude Code),file_search(Copilot),Get-ChildItem -Filter - Reading files:
Read(Claude Code),get_file(Copilot) - Parallelizing reads: issue multiple tool calls in a single response where the host supports it
Edge cases
- No matching docs found: Emit
> docs-discovery: no .md matches for tokens [list]. Proceeding with code-search only.This is informational — the task may be in a domain without documentation, which is itself a signal to be careful. - Token extraction is ambiguous: Prefer SUPERSET — load a few extra .md files rather than missing relevant ones. Loading 3 extra docs is cheap; missing ISSUES.md and reintroducing a fixed bug is expensive.
- User says "don't load docs" / "just search the code": Respect it. Skip this skill entirely for that turn.
- Recursive trigger: If loaded docs reference other
.mdfiles via cross-reference, do NOT auto-follow unless the user's request explicitly extends to them. Cross-refs can cascade; relevance-bounded glob is the primary mechanism.