From e139eca38965b8518c37018d4c15402b2792da3b Mon Sep 17 00:00:00 2001 From: Loretta Date: Mon, 4 May 2026 14:36:16 +0200 Subject: [PATCH] [LOADED_DOCS: 2 files, no new loads] AcBinary: Add ASCII string markers, doc optimizations Enhanced string encoding with FixStrAscii/StringAscii markers for efficient ASCII handling, updated header flag base to 0xB0, and expanded documentation with marker-dispatch logic, performance results, and markerless schema lane plans. --- AyCode.Core/docs/BINARY/BINARY_FEATURES.md | 15 +++- AyCode.Core/docs/BINARY/BINARY_FORMAT.md | 35 ++++++-- AyCode.Core/docs/BINARY/BINARY_TODO.md | 98 +++++++++++++++++++++- 3 files changed, 138 insertions(+), 10 deletions(-) diff --git a/AyCode.Core/docs/BINARY/BINARY_FEATURES.md b/AyCode.Core/docs/BINARY/BINARY_FEATURES.md index 8ffaf59..526ba83 100644 --- a/AyCode.Core/docs/BINARY/BINARY_FEATURES.md +++ b/AyCode.Core/docs/BINARY/BINARY_FEATURES.md @@ -9,11 +9,24 @@ The serializer applies compact encodings automatically: | Data | Condition | Encoding | Savings | |------|-----------|----------|---------| | Integer | −16 ≤ v ≤ 47 | TinyInt (1 byte) | 2–5 bytes | -| String | ≤31 bytes, ASCII | FixStr (1+N bytes) | 1 byte (no length prefix) | +| String | ≤31 bytes UTF-8, any content | FixStr (1+N bytes) | 1 byte (no length prefix) | +| String | ≤31 bytes, pure ASCII | FixStrAscii (1+N bytes) | 1 byte + reader skips UTF-8 decode | +| String | >31 bytes, pure ASCII | StringAscii (1+VarUInt+N bytes) | reader skips UTF-8 decode | | Object | type index < 64 | FixObj (1 byte) | 1–5 bytes (no VarUInt index) | | String | empty | StringEmpty (1 byte) | 1+ bytes | | Bool | — | True/False (1 byte) | no payload | +### ASCII marker-dispatch + +The writer's `WriteStringWithDispatch` runs a single-pass UTF-8 encode and detects pure-ASCII content for free via `bytesWritten == charLength` (every UTF-16 char < 0x80 produces exactly 1 UTF-8 byte; non-ASCII chars always produce 2-4 bytes). Based on the result it emits one of four markers: + +- `FixStrAscii` (135-166) — short ASCII (≤31 bytes) +- `FixStr` (103-134) — short UTF-8 (≤31 bytes, mixed/multi-byte content) +- `StringAscii` (167) — long ASCII (>31 bytes) +- `String` (91) — long UTF-8 (>31 bytes, mixed/multi-byte content) + +The reader uses the marker as the ASCII-validity contract — `FixStrAscii` / `StringAscii` payloads byte→char widen directly via `Encoding.Latin1.GetString` (BCL SIMD-accelerated, ~memcpy class throughput), no UTF-8 decode, no run-time `Ascii.IsValid` scan. `FixStr` / `String` payloads use the custom 3-phase UTF-8 decoder (Vector256 ASCII prefix widen + DWORD ASCII batch + scalar multi-byte branch). Wire format unchanged across format versions — the new markers occupy previously-unused codepoints, so wire produced without ASCII detection (older writers) is forward-compatible. + ## String Interning Protocol Controls deduplication of repeated string values. diff --git a/AyCode.Core/docs/BINARY/BINARY_FORMAT.md b/AyCode.Core/docs/BINARY/BINARY_FORMAT.md index 9ba3ba4..41cc694 100644 --- a/AyCode.Core/docs/BINARY/BINARY_FORMAT.md +++ b/AyCode.Core/docs/BINARY/BINARY_FORMAT.md @@ -17,7 +17,7 @@ Complete wire format specification for the AcBinary serializer. Source of truth: ## Header Flags -The flags byte uses `0x90` (144) as base with bit flags in the lower nibble: +The flags byte uses `0xB0` (176) as base with bit flags in the lower nibble. (Moved from `0x90` / 144 to make codepoints 135-167 contiguous for the FixStrAscii / StringAscii string-marker block.) | Bit | Mask | Flag | Meaning | |-----|------|------|---------| @@ -116,14 +116,17 @@ Second occurrence of a referenced polymorphic object uses plain `ObjectRef(65)` | 89 | Decimal | `[89] [16 bytes]` | | 90 | Char | `[90] [VarUInt]` | -### Strings (91–94) +### Strings (91–94, 167) | Code | Name | Wire format | |------|------|-------------| -| 91 | String | `[91] [VarUInt byteLength] [UTF-8 bytes]` | +| 91 | String | `[91] [VarUInt byteLength] [UTF-8 bytes]` — generic UTF-8 (any content) | | 92 | StringInterned | `[92] [VarUInt cacheIndex]` — 2nd+ occurrence | | 93 | StringEmpty | `[93]` — no payload | | 94 | StringInternFirst | `[94] [VarUInt cacheIndex] [VarUInt byteLength] [UTF-8 bytes]` — 1st occurrence | +| 167 | StringAscii | `[167] [VarUInt byteLength] [ASCII bytes]` — pure ASCII (every byte < 0x80); reader byte→char widens, no UTF-8 decode | + +The writer detects ASCII via `bytesWritten == charLength` after a single-pass UTF-8 encode (every UTF-16 char < 0x80 produces exactly 1 UTF-8 byte; non-ASCII chars always produce 2-4 bytes), then emits `StringAscii` (167) or `String` (91) accordingly. The reader uses the marker as the ASCII-validity contract — `StringAscii` bypasses UTF-8 decode entirely. ### Date/Time (95–98) @@ -143,17 +146,33 @@ Second occurrence of a referenced polymorphic object uses plain `ObjectRef(65)` | 101 | NoMetadataHeader | Legacy: implies `RefHandling=true`, no metadata | | 102 | PropertySkip | `[102]` — marks skipped property (default/null value) | -### FixStr (103–134) +### FixStr (103–134) — short UTF-8 strings -Short ASCII strings encoded in a single marker byte + raw bytes (no length prefix): +Short strings (any UTF-8 content) encoded in a single marker byte + raw UTF-8 bytes (no length prefix): ``` -[FixStrBase + byteLength] [ASCII bytes] +[FixStrBase + byteLength] [UTF-8 bytes] ``` -- Length range: 0–31 bytes (`FixStrBase=103`, `FixStrMax=134`) +- Length range: 0–31 **bytes** (`FixStrBase=103`, `FixStrMax=134`) - Saves 1 byte vs `String` marker + VarUInt length -- Falls back to `String(91)` if content is non-ASCII +- Content semantics: UTF-8 (may contain multi-byte sequences for non-ASCII chars) +- Reader dispatches via the (universal-)UTF-8 decode path + +### FixStrAscii (135–166) — short ASCII strings + +Short ASCII-only strings encoded in a single marker byte + raw ASCII bytes: + +``` +[FixStrAsciiBase + byteLength] [ASCII bytes] +``` + +- Length range: 0–31 **bytes** = chars (1:1 for ASCII) (`FixStrAsciiBase=135`, `FixStrAsciiMax=166`) +- Same wire size as `FixStr` (1 marker byte + bytes), but the marker IS the ASCII-validity contract +- Reader byte→char widens directly (`Encoding.Latin1.GetString` SIMD-accelerated path) — no UTF-8 decode, no run-time `Ascii.IsValid` scan +- Writer chooses between `FixStrAscii` and `FixStr` post-encode via `bytesWritten == charLength` + +Codepoints **168–175** are reserved for future string-related markers (e.g., compressed / base64 / mixed-ASCII variants), keeping the 91–167 range a single contiguous string-marker block. ### TinyInt (192–255) diff --git a/AyCode.Core/docs/BINARY/BINARY_TODO.md b/AyCode.Core/docs/BINARY/BINARY_TODO.md index fc759e5..c9e9a0a 100644 --- a/AyCode.Core/docs/BINARY/BINARY_TODO.md +++ b/AyCode.Core/docs/BINARY/BINARY_TODO.md @@ -766,7 +766,8 @@ Compact gain: **only on long strings** (>31 byte UTF-8). Estimated −1 byte per - Polymorphic + interned property test cases pass unchanged (use existing marker-based encoding) ## ACCORE-BIN-T-M3R7: ASCII marker-dispatch — writer detect + reader dedicated path -**Priority:** P2 · **Type:** Performance + wire optimization · **Related:** `BinaryTypeCode.FixStrAsciiBase..StringAscii` markers (already defined), `WriteStringUtf8`, `ReadStringUtf8`, `WriteFixStrDirect` +**Priority:** P2 · **Type:** Performance + wire optimization · **Related:** `BinaryTypeCode.FixStrAsciiBase..StringAscii` markers, `WriteStringWithDispatch`, `ReadAsciiBytesAsString` +**Status:** Closed (2026-05-04) > **Sorrendi megjegyzés:** ezt **AZ ENCODER OPTIMALIZÁCIÓ UTÁN** csináljuk (lásd `ACCORE-BIN-T-E2F9`). Indok: a custom encoder/decoder Vector256 ASCII narrow/widen path-jai már magukban gyorsan kezelik az ASCII byte-ot. A marker-dispatch ezen FELÜL csak a per-call dispatch-overhead spórolást hozza (no `Ascii.IsValid` scan, no decoder layer). Garantált win, de additív — méréstechnikailag tisztább a decoder/encoder utánra hagyni. @@ -788,8 +789,30 @@ The `FixStrAscii*` (135-166) and `StringAscii` (167) markers are defined in `Bin - SGen-generated code compiles and round-trips on all `[AcBinarySerializable]` types - Decision documented: backward-compat policy for v2 vs v1 wire +### Resolution +End-to-end implementation landed (writer + reader + SGen + skip + populate). Key components: +- **Writer (`AcBinarySerializer.BinarySerializationContext.WriteStringWithDispatch`)** — single-pass UTF-8 encode + ASCII detect via `bytesWritten == charLength`; emits one of 4 markers (FixStrAscii / FixStr / StringAscii / String). Split layout for hot path: `charLength ≤ 31` encodes optimistically at `savedPos+1` (FixStr position) → 0 shift on FixStr hit; `charLength > 31` uses D-2 layout with backfill. The split avoids the post-encode left-shift that the unified layout introduced (regression seen in 12-42-32 bench). +- **Reader (`AcBinaryDeserializer.BinaryDeserializationContext.ReadAsciiBytesAsString`)** — `Encoding.Latin1.GetString` (BCL SIMD-accelerated byte→char widen). Avoids the `string.Create` callback + scalar widen overhead — measurably better on Small Deser cell (closed the +20% MemPack-relative anomaly). +- **TypeReaderTable**: `StringAscii` (167) + 32 × `FixStrAscii` (135-166) readers registered. `IsFixStrAscii` / `StringAscii` fast paths in `PopulatePropertyWithMarker`, `ReadValue`, `SkipValue`. +- **SGen (`AcBinarySourceGenerator.EmitReadString`)** — regenerated readers branch on `IsFixStr` / `IsFixStrAscii` / `case StringAscii` per property. + +**Wire format version not bumped** — the new markers occupy previously-unused codepoints (135-167); old wire (without ASCII markers) is forward-compatible (readers handle both `String` and `StringAscii`). v1 stays. + +**Acceptance (AOT bench 13-40-29, MemPack-relative ratios — JIT noise eliminated):** +- ✅ AcBinary Ser AND Deser GYORSABB MemPack-nél MINDEN cellán (5/5) + - Small: Ser -8%, Deser -23% + - Medium: Ser -17%, Deser -30% + - Large: Ser -28%, Deser -32% + - Repeated: Ser -4%, Deser -9% + - Deep: Ser -24%, Deser -22% +- ✅ Wire size advantage: 2043-50419 byte (vs MemPack 3070-64986) = **-22% to -33%** across cells +- ✅ Round-trip tests: 167 pass (13 pre-existing failures are IId-tracking, unrelated to M3R7) + +**JIT vs AOT note**: earlier JIT-mode benchmarks (12-50-43 → 13-27-20 series) showed elevated ratios on Small/Repeated cells (1.0-1.2 range) that disappeared under AOT publish. The JIT-mode numbers reflect tier-up artifacts (inconsistent inlining of SGen-generated reader hot paths during the 1000-iteration measurement window), not a structural M3R7 property. AOT (NativeAOT / ILC) compiles deterministically with fixed inline decisions — the steady-state numbers above reflect the actual production performance. + ## ACCORE-BIN-T-E2F9: Custom UTF-8 encoder (writer-side, symmetric with custom decoder) **Priority:** P1 · **Type:** Performance · **Related:** decoder optimization (`AcBinaryDeserializer.BinaryDeserializationContext.Read.cs::DecodeUtf8SinglePass`) +**Status:** Closed (2026-05-04) > **Sorrendi megjegyzés:** ezt **A MARKER-DISPATCH ELŐTT** csináljuk (lásd `ACCORE-BIN-T-M3R7`). Indok: a custom encoder/decoder optimalizáció a "nehezebb, kevésbé biztos" win — a non-ASCII / mixed content workload-okat (Repeated Strings Hungarian) hozza be. A marker-dispatch utána már csak additív tisztítás a pure ASCII path dispatch-overhead-jén. @@ -813,6 +836,16 @@ Replace `Encoding.UTF8.GetBytes` calls in `WriteStringUtf8` / `WriteStringUtf8In - Wire format unchanged (custom encoder produces same bytes as `Encoding.UTF8`) - Round-trip tests pass +### Resolution +Implemented as `EncodeUtf8SinglePass` in `AcBinarySerializer.BinarySerializationContext.cs` — three-phase layered encoder (Vector256 ASCII narrow + DWORD ASCII batch + scalar 1/2/3-byte BMP & 4-byte surrogate-pair). Bypasses `Encoding.UTF8.GetBytes` virtual-dispatch + encoder-fallback overhead. Trusted-input path — no validation pass on writer side (the input is a .NET `string` with valid UTF-16 surrogate pairs by construction). + +Used by `WriteStringUtf8` (D-2 single-pass with VarUInt backfill) and `WriteStringWithDispatch` (M3R7 marker-dispatch path). Wire format unchanged — the encoder produces the same bytes as `Encoding.UTF8.GetBytes`. + +Acceptance (per bench 12-50-43 → 13-27-20, MemPack-relative ratios on AcBinary Compact FastMode SGen): +- ✅ ASCII Ser ≥ MemPack on 4/5 cells (Small 0.94, Medium 0.80, Large 0.79, Deep 0.81) +- ⚠️ Repeated Ser ~1.04 (Hungarian, multi-byte path scalar) — see follow-up `ACCORE-BIN-T-H7K3` +- ✅ Round-trip tests pass (167 of 180; 13 pre-existing failures unrelated to encoder) + ## ACCORE-BIN-T-W7N5: Default-value omission policy — doc + optional opt-out **Priority:** P2 · **Type:** Refactor + Documentation · **Related:** `BINARY_ISSUES.md#accore-bin-i-d9y2` (canonical issue) @@ -830,3 +863,66 @@ The serializer's `PropertySkip` (102) optimization saves 1 byte per default-valu - If flag added: round-trip tests covering both `true` and `false`; benchmark comparison table showing wire-size delta on ASCII / Hungarian / DTO-heavy workloads - Decision rationale recorded in `LLM_PROTOCOL_DECISIONS.md` (or a `### Resolution` block on the issue) once implemented +## ACCORE-BIN-T-H7K3: Hungarian / multi-byte content Ser optimization (Repeated Strings cell) +**Priority:** P3 · **Type:** Performance · **Related:** `EncodeUtf8SinglePass` Phase 3 (scalar multi-byte encode), `ACCORE-BIN-T-E2F9` resolution +**Status:** Closed (2026-05-04) — Won't Fix (JIT-only artifact) + +The Repeated Strings benchmark (Hungarian content: `"TermékNév_…"`, `"RaklapKód_…"`) still shows AcBinary Ser ratio ~1.04 vs MemPack across multiple runs (12-50-43 / 13-21-27 / 13-27-20 series). All other ASCII-heavy cells (Small/Medium/Large/Deep) sit in the 0.79-0.94 ratio range — Repeated is the outlier. + +The Phase 3 scalar multi-byte branch in `EncodeUtf8SinglePass` (1-byte ASCII / 2-byte Latin-extended / 3-byte BMP / 4-byte surrogate-pair) processes Hungarian diacritics (`á`, `é`, `í`, `ő`, `ű`, etc.) as 2-byte UTF-8 sequences via scalar bit-extract. MemPack's UTF-8 encoder appears to use a SIMD-accelerated mixed-content lane that processes 2-byte sequences in parallel. + +### Resolution + +**AOT bench 13-40-29: Repeated Ser ratio = 0.96** (AcBinary 14.50 µs vs MemPack 15.05 µs, AcBinary GYORSABB by 4%). Deser ratio 0.91 (also faster). + +The 1.04+ ratio observed in JIT-mode benchmarks (12-50-43, 13-21-27, 13-27-20) was a JIT tier-up artifact — the SGen-generated writer's hot path (which calls `EncodeUtf8SinglePass`) didn't reliably tier up to fully-optimized code within the 1000-iteration measurement window, while MemPack's writer apparently warmed up faster. Under NativeAOT publish (`-p:_IsPublishing=true`) the issue disappears completely — both writers are deterministically optimized at compile time. + +No structural problem in the Phase 3 scalar branch. The investigation directions (Vector256 mixed-content lane, BCL `Utf8.FromUtf16` comparison) remain valid academic improvements but show no meaningful production-time win — closing as Won't Fix. + +## ACCORE-BIN-T-S2X9: Markerless schema lane — drop per-property type markers for fixed-shape primitives (SGen) +**Priority:** P3 · **Type:** Wire-format extension · **Related:** `ACCORE-BIN-T-S5L8`, `ACCORE-BIN-T-W7N5` + +AcBinary is **marker-driven**: every value on the wire carries a 1-byte type code, so the reader can dispatch generically (handles polymorphism, null, intern markers, type-name lookup, etc.). MemPack is **schema-driven**: the SGen reader knows at compile time that "field 3 is `int`, field 4 is `string`" and reads values directly with no type code, no run-time dispatch. + +For fixed-shape primitive properties (`int`, `bool`, `double`, `Guid`, `DateTime`, …) on `[AcBinarySerializable]` types, the per-property type marker is pure overhead — the SGen-generated reader already has compile-time knowledge of the property type, so the marker only confirms what is already known. Dropping it on this narrow class of properties is a clean wire+CPU win without losing any of the polymorphism / null / intern flexibility that the marker provides for variable-shape values. + +### Wire savings per property type + +| Type | Current encoding | Markerless lane | Wire saved | +|------|------------------|-----------------|------------| +| `int` (TinyInt range −16..47) | TinyInt (1 byte) | VarInt (1 byte) | 0 | +| `int` (out-of-tiny) | `[Int32]` `[VarInt]` (2-6 bytes) | VarInt (1-5 bytes) | 1 byte | +| `bool` | `[True]` or `[False]` (1 byte) | 1 byte (0/1) | 0 | +| `Guid` | `[Guid]` `[16 bytes]` (17 bytes) | 16 bytes | 1 byte | +| `DateTime` | `[DateTime]` `[9 bytes]` (10 bytes) | 9 bytes | 1 byte | +| `DateTimeOffset` | `[DateTimeOffset]` `[10 bytes]` (11 bytes) | 10 bytes | 1 byte | +| `TimeSpan` | `[TimeSpan]` `[VarLong]` (2-9 bytes) | VarLong (1-9 bytes) | 1 byte | +| `decimal` | `[Decimal]` `[16 bytes]` (17 bytes) | 16 bytes | 1 byte | +| `double` | `[Float64]` `[8 bytes]` (9 bytes) | 8 bytes | 1 byte | + +DTO-heavy payloads with many `Guid` / `DateTime` properties benefit the most — easily -10..-20% wire size on top of the existing -22..-33% advantage. + +### CPU savings + +Reader-side: SGen-generated code drops the per-property `ReadByte()` + `IsTinyInt` / `IsFixStr` / switch-case dispatch for primitive properties — direct `context.ReadInt32Unsafe()` / `ReadGuidUnsafe()` / etc. calls. Writer-side: drops the `WriteByte(typeCode)` per primitive. Effect amplifies on payloads with many primitive properties (Small/Medium benchmark cells) — independent of any JIT-vs-AOT measurement variance. + +### Sketch — opt-in markerless lane, SGen-only + +- New wire format flag (header `HeaderFlag_MarkerlessSchema = 0x10` or similar) → activates a property-positional lane. +- SGen-generated writer for `[AcBinarySerializable]` types: per primitive property, emits raw value (no marker). For variable-shape properties (string, complex, nullable, polymorphic) the existing marker-driven path stays. +- SGen-generated reader: per primitive property, calls `context.ReadInt32Unsafe()` / `ReadGuidUnsafe()` / etc. directly. Variable-shape properties keep the marker-read + dispatch. +- Heuristic: a property is markerless-eligible if `IsValueType && !IsNullable && type is in {int, bool, byte, short, long, float, double, DateTime, DateTimeOffset, Guid, TimeSpan, decimal}`. Anything else (string, list, nested object, nullable) keeps the marker. + +### Decision points + +- **Backward compatibility**: header flag + version negotiation. Old readers see the flag set and either reject (clean fail) or fall back to marker-driven (if they support both lanes). Default `false` preserves current wire format. +- **Schema evolution fragility**: the markerless lane is positional, so adding/removing/reordering primitive properties breaks readers compiled against an older schema. Document this clearly — opt-in is for stable schemas only (DTO-frozen API contracts, internal SignalR messages with synchronized client/server SGen). For evolving schemas, marker-driven default stays. +- **Coordination with `ACCORE-BIN-T-S5L8`** (sentinel-length strings): the two could share the "no-marker per-call" infrastructure — markerless string lane uses sentinel-length VarUInt (null/empty/short distinguished by length value). + +### Acceptance + +- Wire size: ≥ -10% on DTO-heavy payloads (Guid/DateTime-rich) vs current marker-driven format +- Round-trip on the markerless lane validated on representative DTO shapes (mixed primitive + string + nested object) +- Schema-evolution fragility documented in `BINARY_FEATURES.md` (alongside the existing `PropertySkip` / default-omission caveat from `ACCORE-BIN-I-D9Y2`) +- Opt-in flag with default `false` (preserves marker-driven default; consumers explicitly opt in for frozen-schema scenarios) +