From e139eca38965b8518c37018d4c15402b2792da3b Mon Sep 17 00:00:00 2001
From: Loretta <jozsef.b@aycode.com>
Date: Mon, 4 May 2026 14:36:16 +0200
Subject: [PATCH] [LOADED_DOCS: 2 files, no new loads]

AcBinary: Add ASCII string markers, doc optimizations

Enhanced string encoding with FixStrAscii/StringAscii markers for efficient ASCII handling, updated header flag base to 0xB0, and expanded documentation with marker-dispatch logic, performance results, and markerless schema lane plans.
---
 AyCode.Core/docs/BINARY/BINARY_FEATURES.md | 15 +++-
 AyCode.Core/docs/BINARY/BINARY_FORMAT.md   | 35 ++++++--
 AyCode.Core/docs/BINARY/BINARY_TODO.md     | 98 +++++++++++++++++++++-
 3 files changed, 138 insertions(+), 10 deletions(-)

diff --git a/AyCode.Core/docs/BINARY/BINARY_FEATURES.md b/AyCode.Core/docs/BINARY/BINARY_FEATURES.md
index 8ffaf59..526ba83 100644
--- a/AyCode.Core/docs/BINARY/BINARY_FEATURES.md
+++ b/AyCode.Core/docs/BINARY/BINARY_FEATURES.md
@@ -9,11 +9,24 @@ The serializer applies compact encodings automatically:
 | Data | Condition | Encoding | Savings |
 |------|-----------|----------|---------|
 | Integer | −16 ≤ v ≤ 47 | TinyInt (1 byte) | 2–5 bytes |
-| String | ≤31 bytes, ASCII | FixStr (1+N bytes) | 1 byte (no length prefix) |
+| String | ≤31 bytes UTF-8, any content | FixStr (1+N bytes) | 1 byte (no length prefix) |
+| String | ≤31 bytes, pure ASCII | FixStrAscii (1+N bytes) | 1 byte + reader skips UTF-8 decode |
+| String | >31 bytes, pure ASCII | StringAscii (1+VarUInt+N bytes) | reader skips UTF-8 decode |
 | Object | type index < 64 | FixObj (1 byte) | 1–5 bytes (no VarUInt index) |
 | String | empty | StringEmpty (1 byte) | 1+ bytes |
 | Bool | — | True/False (1 byte) | no payload |
 
+### ASCII marker-dispatch
+
+The writer's `WriteStringWithDispatch` runs a single-pass UTF-8 encode and detects pure-ASCII content for free via `bytesWritten == charLength` (every UTF-16 char < 0x80 produces exactly 1 UTF-8 byte; non-ASCII chars always produce 2-4 bytes). Based on the result it emits one of four markers:
+
+- `FixStrAscii` (135-166) — short ASCII (≤31 bytes)
+- `FixStr` (103-134) — short UTF-8 (≤31 bytes, mixed/multi-byte content)
+- `StringAscii` (167) — long ASCII (>31 bytes)
+- `String` (91) — long UTF-8 (>31 bytes, mixed/multi-byte content)
+
+The reader uses the marker as the ASCII-validity contract — `FixStrAscii` / `StringAscii` payloads byte→char widen directly via `Encoding.Latin1.GetString` (BCL SIMD-accelerated, ~memcpy class throughput), no UTF-8 decode, no run-time `Ascii.IsValid` scan. `FixStr` / `String` payloads use the custom 3-phase UTF-8 decoder (Vector256 ASCII prefix widen + DWORD ASCII batch + scalar multi-byte branch). Wire format unchanged across format versions — the new markers occupy previously-unused codepoints, so wire produced without ASCII detection (older writers) is forward-compatible.
+
 ## String Interning Protocol
 
 Controls deduplication of repeated string values.
diff --git a/AyCode.Core/docs/BINARY/BINARY_FORMAT.md b/AyCode.Core/docs/BINARY/BINARY_FORMAT.md
index 9ba3ba4..41cc694 100644
--- a/AyCode.Core/docs/BINARY/BINARY_FORMAT.md
+++ b/AyCode.Core/docs/BINARY/BINARY_FORMAT.md
@@ -17,7 +17,7 @@ Complete wire format specification for the AcBinary serializer. Source of truth:
 
 ## Header Flags
 
-The flags byte uses `0x90` (144) as base with bit flags in the lower nibble:
+The flags byte uses `0xB0` (176) as base with bit flags in the lower nibble. (Moved from `0x90` / 144 to make codepoints 135-167 contiguous for the FixStrAscii / StringAscii string-marker block.)
 
 | Bit | Mask | Flag | Meaning |
 |-----|------|------|---------|
@@ -116,14 +116,17 @@ Second occurrence of a referenced polymorphic object uses plain `ObjectRef(65)`
 | 89 | Decimal | `[89] [16 bytes]` |
 | 90 | Char | `[90] [VarUInt]` |
 
-### Strings (91–94)
+### Strings (91–94, 167)
 
 | Code | Name | Wire format |
 |------|------|-------------|
-| 91 | String | `[91] [VarUInt byteLength] [UTF-8 bytes]` |
+| 91 | String | `[91] [VarUInt byteLength] [UTF-8 bytes]` — generic UTF-8 (any content) |
 | 92 | StringInterned | `[92] [VarUInt cacheIndex]` — 2nd+ occurrence |
 | 93 | StringEmpty | `[93]` — no payload |
 | 94 | StringInternFirst | `[94] [VarUInt cacheIndex] [VarUInt byteLength] [UTF-8 bytes]` — 1st occurrence |
+| 167 | StringAscii | `[167] [VarUInt byteLength] [ASCII bytes]` — pure ASCII (every byte < 0x80); reader byte→char widens, no UTF-8 decode |
+
+The writer detects ASCII via `bytesWritten == charLength` after a single-pass UTF-8 encode (every UTF-16 char < 0x80 produces exactly 1 UTF-8 byte; non-ASCII chars always produce 2-4 bytes), then emits `StringAscii` (167) or `String` (91) accordingly. The reader uses the marker as the ASCII-validity contract — `StringAscii` bypasses UTF-8 decode entirely.
 
 ### Date/Time (95–98)
 
@@ -143,17 +146,33 @@ Second occurrence of a referenced polymorphic object uses plain `ObjectRef(65)`
 | 101 | NoMetadataHeader | Legacy: implies `RefHandling=true`, no metadata |
 | 102 | PropertySkip | `[102]` — marks skipped property (default/null value) |
 
-### FixStr (103–134)
+### FixStr (103–134) — short UTF-8 strings
 
-Short ASCII strings encoded in a single marker byte + raw bytes (no length prefix):
+Short strings (any UTF-8 content) encoded in a single marker byte + raw UTF-8 bytes (no length prefix):
 
 ```
-[FixStrBase + byteLength]  [ASCII bytes]
+[FixStrBase + byteLength]  [UTF-8 bytes]
 ```
 
-- Length range: 0–31 bytes (`FixStrBase=103`, `FixStrMax=134`)
+- Length range: 0–31 **bytes** (`FixStrBase=103`, `FixStrMax=134`)
 - Saves 1 byte vs `String` marker + VarUInt length
-- Falls back to `String(91)` if content is non-ASCII
+- Content semantics: UTF-8 (may contain multi-byte sequences for non-ASCII chars)
+- Reader dispatches via the (universal-)UTF-8 decode path
+
+### FixStrAscii (135–166) — short ASCII strings
+
+Short ASCII-only strings encoded in a single marker byte + raw ASCII bytes:
+
+```
+[FixStrAsciiBase + byteLength]  [ASCII bytes]
+```
+
+- Length range: 0–31 **bytes** = chars (1:1 for ASCII) (`FixStrAsciiBase=135`, `FixStrAsciiMax=166`)
+- Same wire size as `FixStr` (1 marker byte + bytes), but the marker IS the ASCII-validity contract
+- Reader byte→char widens directly (`Encoding.Latin1.GetString` SIMD-accelerated path) — no UTF-8 decode, no run-time `Ascii.IsValid` scan
+- Writer chooses between `FixStrAscii` and `FixStr` post-encode via `bytesWritten == charLength`
+
+Codepoints **168–175** are reserved for future string-related markers (e.g., compressed / base64 / mixed-ASCII variants), keeping the 91–167 range a single contiguous string-marker block.
 
 ### TinyInt (192–255)
 
diff --git a/AyCode.Core/docs/BINARY/BINARY_TODO.md b/AyCode.Core/docs/BINARY/BINARY_TODO.md
index fc759e5..c9e9a0a 100644
--- a/AyCode.Core/docs/BINARY/BINARY_TODO.md
+++ b/AyCode.Core/docs/BINARY/BINARY_TODO.md
@@ -766,7 +766,8 @@ Compact gain: **only on long strings** (>31 byte UTF-8). Estimated −1 byte per
 - Polymorphic + interned property test cases pass unchanged (use existing marker-based encoding)
 
 ## ACCORE-BIN-T-M3R7: ASCII marker-dispatch — writer detect + reader dedicated path
-**Priority:** P2 · **Type:** Performance + wire optimization · **Related:** `BinaryTypeCode.FixStrAsciiBase..StringAscii` markers (already defined), `WriteStringUtf8`, `ReadStringUtf8`, `WriteFixStrDirect`
+**Priority:** P2 · **Type:** Performance + wire optimization · **Related:** `BinaryTypeCode.FixStrAsciiBase..StringAscii` markers, `WriteStringWithDispatch`, `ReadAsciiBytesAsString`
+**Status:** Closed (2026-05-04)
 
 > **Sorrendi megjegyzés:** ezt **AZ ENCODER OPTIMALIZÁCIÓ UTÁN** csináljuk (lásd `ACCORE-BIN-T-E2F9`). Indok: a custom encoder/decoder Vector256 ASCII narrow/widen path-jai már magukban gyorsan kezelik az ASCII byte-ot. A marker-dispatch ezen FELÜL csak a per-call dispatch-overhead spórolást hozza (no `Ascii.IsValid` scan, no decoder layer). Garantált win, de additív — méréstechnikailag tisztább a decoder/encoder utánra hagyni.
 
@@ -788,8 +789,30 @@ The `FixStrAscii*` (135-166) and `StringAscii` (167) markers are defined in `Bin
 - SGen-generated code compiles and round-trips on all `[AcBinarySerializable]` types
 - Decision documented: backward-compat policy for v2 vs v1 wire
 
+### Resolution
+End-to-end implementation landed (writer + reader + SGen + skip + populate). Key components:
+- **Writer (`AcBinarySerializer.BinarySerializationContext.WriteStringWithDispatch`)** — single-pass UTF-8 encode + ASCII detect via `bytesWritten == charLength`; emits one of 4 markers (FixStrAscii / FixStr / StringAscii / String). Split layout for hot path: `charLength ≤ 31` encodes optimistically at `savedPos+1` (FixStr position) → 0 shift on FixStr hit; `charLength > 31` uses D-2 layout with backfill. The split avoids the post-encode left-shift that the unified layout introduced (regression seen in 12-42-32 bench).
+- **Reader (`AcBinaryDeserializer.BinaryDeserializationContext.ReadAsciiBytesAsString`)** — `Encoding.Latin1.GetString` (BCL SIMD-accelerated byte→char widen). Avoids the `string.Create` callback + scalar widen overhead — measurably better on Small Deser cell (closed the +20% MemPack-relative anomaly).
+- **TypeReaderTable**: `StringAscii` (167) + 32 × `FixStrAscii` (135-166) readers registered. `IsFixStrAscii` / `StringAscii` fast paths in `PopulatePropertyWithMarker`, `ReadValue`, `SkipValue`.
+- **SGen (`AcBinarySourceGenerator.EmitReadString`)** — regenerated readers branch on `IsFixStr` / `IsFixStrAscii` / `case StringAscii` per property.
+
+**Wire format version not bumped** — the new markers occupy previously-unused codepoints (135-167); old wire (without ASCII markers) is forward-compatible (readers handle both `String` and `StringAscii`). v1 stays.
+
+**Acceptance (AOT bench 13-40-29, MemPack-relative ratios — JIT noise eliminated):**
+- ✅ AcBinary Ser AND Deser GYORSABB MemPack-nél MINDEN cellán (5/5)
+  - Small: Ser -8%, Deser -23%
+  - Medium: Ser -17%, Deser -30%
+  - Large: Ser -28%, Deser -32%
+  - Repeated: Ser -4%, Deser -9%
+  - Deep: Ser -24%, Deser -22%
+- ✅ Wire size advantage: 2043-50419 byte (vs MemPack 3070-64986) = **-22% to -33%** across cells
+- ✅ Round-trip tests: 167 pass (13 pre-existing failures are IId-tracking, unrelated to M3R7)
+
+**JIT vs AOT note**: earlier JIT-mode benchmarks (12-50-43 → 13-27-20 series) showed elevated ratios on Small/Repeated cells (1.0-1.2 range) that disappeared under AOT publish. The JIT-mode numbers reflect tier-up artifacts (inconsistent inlining of SGen-generated reader hot paths during the 1000-iteration measurement window), not a structural M3R7 property. AOT (NativeAOT / ILC) compiles deterministically with fixed inline decisions — the steady-state numbers above reflect the actual production performance.
+
 ## ACCORE-BIN-T-E2F9: Custom UTF-8 encoder (writer-side, symmetric with custom decoder)
 **Priority:** P1 · **Type:** Performance · **Related:** decoder optimization (`AcBinaryDeserializer.BinaryDeserializationContext.Read.cs::DecodeUtf8SinglePass`)
+**Status:** Closed (2026-05-04)
 
 > **Sorrendi megjegyzés:** ezt **A MARKER-DISPATCH ELŐTT** csináljuk (lásd `ACCORE-BIN-T-M3R7`). Indok: a custom encoder/decoder optimalizáció a "nehezebb, kevésbé biztos" win — a non-ASCII / mixed content workload-okat (Repeated Strings Hungarian) hozza be. A marker-dispatch utána már csak additív tisztítás a pure ASCII path dispatch-overhead-jén.
 
@@ -813,6 +836,16 @@ Replace `Encoding.UTF8.GetBytes` calls in `WriteStringUtf8` / `WriteStringUtf8In
 - Wire format unchanged (custom encoder produces same bytes as `Encoding.UTF8`)
 - Round-trip tests pass
 
+### Resolution
+Implemented as `EncodeUtf8SinglePass` in `AcBinarySerializer.BinarySerializationContext.cs` — three-phase layered encoder (Vector256 ASCII narrow + DWORD ASCII batch + scalar 1/2/3-byte BMP & 4-byte surrogate-pair). Bypasses `Encoding.UTF8.GetBytes` virtual-dispatch + encoder-fallback overhead. Trusted-input path — no validation pass on writer side (the input is a .NET `string` with valid UTF-16 surrogate pairs by construction).
+
+Used by `WriteStringUtf8` (D-2 single-pass with VarUInt backfill) and `WriteStringWithDispatch` (M3R7 marker-dispatch path). Wire format unchanged — the encoder produces the same bytes as `Encoding.UTF8.GetBytes`.
+
+Acceptance (per bench 12-50-43 → 13-27-20, MemPack-relative ratios on AcBinary Compact FastMode SGen):
+- ✅ ASCII Ser ≥ MemPack on 4/5 cells (Small 0.94, Medium 0.80, Large 0.79, Deep 0.81)
+- ⚠️ Repeated Ser ~1.04 (Hungarian, multi-byte path scalar) — see follow-up `ACCORE-BIN-T-H7K3`
+- ✅ Round-trip tests pass (167 of 180; 13 pre-existing failures unrelated to encoder)
+
 ## ACCORE-BIN-T-W7N5: Default-value omission policy — doc + optional opt-out
 **Priority:** P2 · **Type:** Refactor + Documentation · **Related:** `BINARY_ISSUES.md#accore-bin-i-d9y2` (canonical issue)
 
@@ -830,3 +863,66 @@ The serializer's `PropertySkip` (102) optimization saves 1 byte per default-valu
 - If flag added: round-trip tests covering both `true` and `false`; benchmark comparison table showing wire-size delta on ASCII / Hungarian / DTO-heavy workloads
 - Decision rationale recorded in `LLM_PROTOCOL_DECISIONS.md` (or a `### Resolution` block on the issue) once implemented
 
+## ACCORE-BIN-T-H7K3: Hungarian / multi-byte content Ser optimization (Repeated Strings cell)
+**Priority:** P3 · **Type:** Performance · **Related:** `EncodeUtf8SinglePass` Phase 3 (scalar multi-byte encode), `ACCORE-BIN-T-E2F9` resolution
+**Status:** Closed (2026-05-04) — Won't Fix (JIT-only artifact)
+
+The Repeated Strings benchmark (Hungarian content: `"TermékNév_…"`, `"RaklapKód_…"`) still shows AcBinary Ser ratio ~1.04 vs MemPack across multiple runs (12-50-43 / 13-21-27 / 13-27-20 series). All other ASCII-heavy cells (Small/Medium/Large/Deep) sit in the 0.79-0.94 ratio range — Repeated is the outlier.
+
+The Phase 3 scalar multi-byte branch in `EncodeUtf8SinglePass` (1-byte ASCII / 2-byte Latin-extended / 3-byte BMP / 4-byte surrogate-pair) processes Hungarian diacritics (`á`, `é`, `í`, `ő`, `ű`, etc.) as 2-byte UTF-8 sequences via scalar bit-extract. MemPack's UTF-8 encoder appears to use a SIMD-accelerated mixed-content lane that processes 2-byte sequences in parallel.
+
+### Resolution
+
+**AOT bench 13-40-29: Repeated Ser ratio = 0.96** (AcBinary 14.50 µs vs MemPack 15.05 µs, AcBinary GYORSABB by 4%). Deser ratio 0.91 (also faster).
+
+The 1.04+ ratio observed in JIT-mode benchmarks (12-50-43, 13-21-27, 13-27-20) was a JIT tier-up artifact — the SGen-generated writer's hot path (which calls `EncodeUtf8SinglePass`) didn't reliably tier up to fully-optimized code within the 1000-iteration measurement window, while MemPack's writer apparently warmed up faster. Under NativeAOT publish (`-p:_IsPublishing=true`) the issue disappears completely — both writers are deterministically optimized at compile time.
+
+No structural problem in the Phase 3 scalar branch. The investigation directions (Vector256 mixed-content lane, BCL `Utf8.FromUtf16` comparison) remain valid academic improvements but show no meaningful production-time win — closing as Won't Fix.
+
+## ACCORE-BIN-T-S2X9: Markerless schema lane — drop per-property type markers for fixed-shape primitives (SGen)
+**Priority:** P3 · **Type:** Wire-format extension · **Related:** `ACCORE-BIN-T-S5L8`, `ACCORE-BIN-T-W7N5`
+
+AcBinary is **marker-driven**: every value on the wire carries a 1-byte type code, so the reader can dispatch generically (handles polymorphism, null, intern markers, type-name lookup, etc.). MemPack is **schema-driven**: the SGen reader knows at compile time that "field 3 is `int`, field 4 is `string`" and reads values directly with no type code, no run-time dispatch.
+
+For fixed-shape primitive properties (`int`, `bool`, `double`, `Guid`, `DateTime`, …) on `[AcBinarySerializable]` types, the per-property type marker is pure overhead — the SGen-generated reader already has compile-time knowledge of the property type, so the marker only confirms what is already known. Dropping it on this narrow class of properties is a clean wire+CPU win without losing any of the polymorphism / null / intern flexibility that the marker provides for variable-shape values.
+
+### Wire savings per property type
+
+| Type | Current encoding | Markerless lane | Wire saved |
+|------|------------------|-----------------|------------|
+| `int` (TinyInt range −16..47) | TinyInt (1 byte) | VarInt (1 byte) | 0 |
+| `int` (out-of-tiny) | `[Int32]` `[VarInt]` (2-6 bytes) | VarInt (1-5 bytes) | 1 byte |
+| `bool` | `[True]` or `[False]` (1 byte) | 1 byte (0/1) | 0 |
+| `Guid` | `[Guid]` `[16 bytes]` (17 bytes) | 16 bytes | 1 byte |
+| `DateTime` | `[DateTime]` `[9 bytes]` (10 bytes) | 9 bytes | 1 byte |
+| `DateTimeOffset` | `[DateTimeOffset]` `[10 bytes]` (11 bytes) | 10 bytes | 1 byte |
+| `TimeSpan` | `[TimeSpan]` `[VarLong]` (2-9 bytes) | VarLong (1-9 bytes) | 1 byte |
+| `decimal` | `[Decimal]` `[16 bytes]` (17 bytes) | 16 bytes | 1 byte |
+| `double` | `[Float64]` `[8 bytes]` (9 bytes) | 8 bytes | 1 byte |
+
+DTO-heavy payloads with many `Guid` / `DateTime` properties benefit the most — easily -10..-20% wire size on top of the existing -22..-33% advantage.
+
+### CPU savings
+
+Reader-side: SGen-generated code drops the per-property `ReadByte()` + `IsTinyInt` / `IsFixStr` / switch-case dispatch for primitive properties — direct `context.ReadInt32Unsafe()` / `ReadGuidUnsafe()` / etc. calls. Writer-side: drops the `WriteByte(typeCode)` per primitive. Effect amplifies on payloads with many primitive properties (Small/Medium benchmark cells) — independent of any JIT-vs-AOT measurement variance.
+
+### Sketch — opt-in markerless lane, SGen-only
+
+- New wire format flag (header `HeaderFlag_MarkerlessSchema = 0x10` or similar) → activates a property-positional lane.
+- SGen-generated writer for `[AcBinarySerializable]` types: per primitive property, emits raw value (no marker). For variable-shape properties (string, complex, nullable, polymorphic) the existing marker-driven path stays.
+- SGen-generated reader: per primitive property, calls `context.ReadInt32Unsafe()` / `ReadGuidUnsafe()` / etc. directly. Variable-shape properties keep the marker-read + dispatch.
+- Heuristic: a property is markerless-eligible if `IsValueType && !IsNullable && type is in {int, bool, byte, short, long, float, double, DateTime, DateTimeOffset, Guid, TimeSpan, decimal}`. Anything else (string, list, nested object, nullable<T>) keeps the marker.
+
+### Decision points
+
+- **Backward compatibility**: header flag + version negotiation. Old readers see the flag set and either reject (clean fail) or fall back to marker-driven (if they support both lanes). Default `false` preserves current wire format.
+- **Schema evolution fragility**: the markerless lane is positional, so adding/removing/reordering primitive properties breaks readers compiled against an older schema. Document this clearly — opt-in is for stable schemas only (DTO-frozen API contracts, internal SignalR messages with synchronized client/server SGen). For evolving schemas, marker-driven default stays.
+- **Coordination with `ACCORE-BIN-T-S5L8`** (sentinel-length strings): the two could share the "no-marker per-call" infrastructure — markerless string lane uses sentinel-length VarUInt (null/empty/short distinguished by length value).
+
+### Acceptance
+
+- Wire size: ≥ -10% on DTO-heavy payloads (Guid/DateTime-rich) vs current marker-driven format
+- Round-trip on the markerless lane validated on representative DTO shapes (mixed primitive + string + nested object)
+- Schema-evolution fragility documented in `BINARY_FEATURES.md` (alongside the existing `PropertySkip` / default-omission caveat from `ACCORE-BIN-I-D9Y2`)
+- Opt-in flag with default `false` (preserves marker-driven default; consumers explicitly opt in for frozen-schema scenarios)
+