AcBinary: Hot/cold marker split for string deserialization

Refactored string property deserialization to separate hot (common, no-feature) and cold (feature-engaged) marker handling, improving JIT inlining and cold-start performance. Introduced `TryReadStringProperty` (hot, inlined) and `TryReadStringColdPath` (cold, optimized) methods. Updated method attributes for better JIT control and clarified WASM string-cache dead code. Added `BINARY_BYTECODE_OPTIMIZATION.md` and updated related docs. Removed AutoMapper, updated logging package versions, and adjusted project files and settings accordingly.
This commit is contained in:
Loretta 2026-05-18 15:20:56 +02:00
parent f68b797a9f
commit f631fd4b78
12 changed files with 436 additions and 27 deletions

View File

@ -104,7 +104,12 @@
"Bash(awk 'NR == 68 { print; print \"\"; print \" // Writer-side emit pass \\(GenWriter + GenScanProperties + EmitProp + EmitScan* + EmitDirect*Write +\"; print \" // EmitSkip + EmitVal + EmitMarkerless + helpers\\) moved to AcBinarySourceGenerator.GenWriter.cs.\"; next } { print }' tmp.cs)",
"Bash(awk 'NR < 73 || NR > 930 { print }' AcBinarySourceGenerator.cs)",
"Bash(awk 'NR == 72 { print; print \" // Reader-side emit pass \\(GenReader + EmitReadProp + EmitRead* helpers\\) moved to\"; print \" // AcBinarySourceGenerator.GenReader.cs.\"; next } { print }' tmp.cs)",
"Bash(rm -f \"AyCode.Core.Serializers.Console/Benchmarks/\"*.cs && rmdir \"AyCode.Core.Serializers.Console/Benchmarks\" && rm -f \"AyCode.Core.Serializers.Console/BenchmarkResult.cs\" && echo \"Deleted Console-side moved files.\")"
"Bash(rm -f \"AyCode.Core.Serializers.Console/Benchmarks/\"*.cs && rmdir \"AyCode.Core.Serializers.Console/Benchmarks\" && rm -f \"AyCode.Core.Serializers.Console/BenchmarkResult.cs\" && echo \"Deleted Console-side moved files.\")",
"Bash(ls \"H:\\\\Applications\\\\Aycode\\\\Source\\\\AyCode.Blazor\\\\\" 2>&1 | head -20)",
"Bash(stat -c '%y %s %n' \\\\ *)",
"Bash(xargs stat -c '%y %s %n')",
"Bash(xargs -I {} stat -c '%y %s %n' {})",
"Bash(xargs -I {} stat -c '%y %n' {})"
]
}
}

View File

@ -99,9 +99,15 @@ public partial class AcBinarySourceGenerator
return;
}
// String FastWire markerless fast-path: int32 sentinel header (-1 = null, 0 = empty, N > 0 = content).
// Wire-symmetric with `WriteStringGenerated` (SGen) and `WriteStringUtf16Markerless` (Runtime).
// Skips the typeCode-read entirely in FastWire mode; falls through to markered dispatch in Compact.
// ACCORE-BIN-T-K9M3 Ötlet A (refined) — caller-driven hot/cold split. SGen-emit reads the marker
// byte locally + dispatches FastWire/PropertySkip checks at the call site; the shared
// BinaryDeserializationContext.TryReadStringProperty handles only the hot marker switch
// (small body → high inline confidence). Cold markers go through TryReadStringColdPath
// (AggressiveOptimization, Tier-1 direct). The || short-circuit ensures cold is called only
// when hot didn't match — common case has zero method-call overhead beyond the inlined Try body.
// PropertySkip lands in the cold path's "return false" sink, so the property is left at default
// (don't-touch contract preserved). enableInternString stays a no-op at the emit site (StringInterned
// sits inside the cold path body now — writer-side feature gating handles non-emission).
if (p.TypeKind == PropertyTypeKind.String)
{
sb.AppendLine($"{i}if (context.FastWire)");
@ -111,9 +117,10 @@ public partial class AcBinarySourceGenerator
sb.AppendLine($"{i}else");
sb.AppendLine($"{i}{{");
sb.AppendLine($"{i} var tc_{p.Name} = context.ReadByte();");
sb.AppendLine($"{i} if (tc_{p.Name} != BinaryTypeCode.PropertySkip)");
sb.AppendLine($"{i} string? v_{p.Name};");
sb.AppendLine($"{i} if (context.TryReadStringProperty(tc_{p.Name}, out v_{p.Name}) || context.TryReadStringColdPath(tc_{p.Name}, out v_{p.Name}))");
sb.AppendLine($"{i} {{");
EmitReadString(sb, a, $"tc_{p.Name}", i + " ", enableInternString);
sb.AppendLine($"{i} {a} = v_{p.Name}!;");
sb.AppendLine($"{i} }}");
sb.AppendLine($"{i}}}");
return;

View File

@ -7,7 +7,7 @@
<ItemGroup>
<PackageReference Include="MessagePack" Version="3.1.4" />
<PackageReference Include="Microsoft.Extensions.Logging.Abstractions" Version="9.0.5" />
<PackageReference Include="Microsoft.Extensions.Logging.Abstractions" Version="9.0.11" />
<PackageReference Include="Newtonsoft.Json" Version="13.0.3" />
</ItemGroup>

View File

@ -12,11 +12,11 @@
</ItemGroup>
<ItemGroup>
<PackageReference Include="AutoMapper" Version="15.0.1" />
<PackageReference Include="MessagePack" Version="3.1.4" />
<PackageReference Include="Microsoft.Extensions.Configuration.EnvironmentVariables" Version="9.0.11" />
<PackageReference Include="Microsoft.Extensions.Configuration.Json" Version="9.0.11" />
<PackageReference Include="Microsoft.Extensions.Logging.Abstractions" Version="9.0.5" />
<PackageReference Include="Microsoft.Extensions.Logging" Version="9.0.11" />
<PackageReference Include="Microsoft.Extensions.Logging.Abstractions" Version="9.0.11" />
<PackageReference Include="Microsoft.Extensions.Options" Version="9.0.11" />
<PackageReference Include="Newtonsoft.Json" Version="13.0.3" />
</ItemGroup>

View File

@ -38,7 +38,6 @@ Core library for the AyCode platform. Targets .NET 9 (set in `AyCode.Core.target
|---|---|
| `AyCode.Utils` | Shared utilities (project reference) |
| `AyCode.Core.Serializers.SourceGenerator` | Binary serializer source generation (analyzer) |
| `AutoMapper` | Object mapping |
| `MessagePack` | MessagePack serialization |
| `Newtonsoft.Json` | JSON serialization (legacy, alongside System.Text.Json) |
| `Microsoft.Extensions.Configuration.*` | appsettings.json + environment variable support |

View File

@ -437,7 +437,7 @@ public static partial class AcBinaryDeserializer
return null; // len < 0 (sentinel -1)
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
[MethodImpl(MethodImplOptions.NoInlining)]
public string ReadStringUtf8(int length)
{
if (length == 0)
@ -447,8 +447,10 @@ public static partial class AcBinaryDeserializer
EnsureAvailable(length);
// WASM optimization: cache short strings to reduce allocations
if (_useStringCaching && length <= _maxCachedStringLength)
// WASM optimization: cache short strings to reduce allocations.
// `false &&` prefix dead-codes the branch — currently unused workload-wide;
// remove the prefix to re-enable when WASM string-cache benchmarking resumes.
if (false && _useStringCaching && length <= _maxCachedStringLength)
{
return ReadStringUtf8Cached(length);
}
@ -516,7 +518,8 @@ public static partial class AcBinaryDeserializer
// Cached short-string path (WASM optimization) — leverages full-content hash + Ascii.Equals
// verification (which is a no-op fast path on ASCII content).
if (_useStringCaching && byteLength <= _maxCachedStringLength)
// `false &&` prefix dead-codes the branch — currently unused workload-wide.
if (false && _useStringCaching && byteLength <= _maxCachedStringLength)
{
return ReadStringUtf8Cached(byteLength);
}
@ -584,8 +587,9 @@ public static partial class AcBinaryDeserializer
EnsureAvailable(byteLength);
// WASM string-cache fast path — if cached, byte-cmp validates and returns the canonical instance
if (_useStringCaching && byteLength <= _maxCachedStringLength)
// WASM string-cache fast path — if cached, byte-cmp validates and returns the canonical instance.
// `false &&` prefix dead-codes the branch — currently unused workload-wide.
if (false && _useStringCaching && byteLength <= _maxCachedStringLength)
{
return ReadStringUtf8Cached(byteLength);
}
@ -599,6 +603,7 @@ public static partial class AcBinaryDeserializer
});
}
[MethodImpl(MethodImplOptions.NoInlining)]
private string ReadStringUtf8Cached(int length)
{
var slice = _buffer.AsSpan(_position, length);
@ -649,7 +654,7 @@ public static partial class AcBinaryDeserializer
/// has been consumed. 1-pass decode. Header read in a single uint load (vs 2 ushort loads). Shared
/// by runtime dispatch + SGen-emit.
/// </summary>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
[MethodImpl(MethodImplOptions.AggressiveOptimization)]
internal string ReadStringMedium()
{
var packed = ReadUInt32Unsafe();
@ -664,7 +669,7 @@ public static partial class AcBinaryDeserializer
/// a corrupted-wire guard for negative casts from uint values > <c>Int32.MaxValue</c>. Shared by
/// runtime dispatch + SGen-emit.
/// </summary>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
[MethodImpl(MethodImplOptions.AggressiveOptimization)]
internal string ReadStringBig()
{
var packed = ReadUInt64Unsafe();
@ -752,6 +757,88 @@ public static partial class AcBinaryDeserializer
return str;
}
/// <summary>
/// ACCORE-BIN-T-K9M3 Ötlet A (refined) — property-level string **hot**-marker dispatch.
/// The caller is responsible for reading the marker byte and handling FastWire; this method
/// dispatches the hot markers only (FixStrAscii, StringSmall, Null, StringEmpty) inline.
/// <para><b>Caller protocol (from SGen-emit):</b></para>
/// <code>
/// if (context.FastWire) {
/// obj.X = context.ReadStringUtf16Markerless()!;
/// } else {
/// var tc = context.ReadByte();
/// string? v;
/// if (context.TryReadStringProperty(tc, out v) || context.TryReadStringColdPath(tc, out v)) {
/// obj.X = v!;
/// }
/// // else: PropertySkip / unknown marker → property left at default (don't-touch contract)
/// }
/// </code>
/// <para><b>Returns:</b> <c>true</c> if a hot marker matched (<paramref name="value"/> set —
/// includes deliberate <c>null</c> on <see cref="BinaryTypeCode.Null"/>); <c>false</c> if the
/// marker is not in the hot set — caller short-circuits via <c>||</c> to
/// <see cref="TryReadStringColdPath"/>.</para>
/// <para><b>Body kept minimal</b> so AggressiveInlining stays effective: only the marker dispatch
/// (4-case hot switch + FixStrAscii range check in default). FastWire short-circuit, ReadByte,
/// PropertySkip and cold-marker dispatch are all the caller's responsibility — splitting these
/// out of the body keeps the inliner's complexity-budget calculation favourable. ACCORE-BIN-T-K9M3
/// Ötlet A v1 (which kept FastWire/PropertySkip/cold-call inside the body) regressed by ~3-5% on
/// the Des side because the JIT bailed on inlining; this refined split aims to fit the inline budget.</para>
/// </summary>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
internal bool TryReadStringProperty(byte tc, out string? value)
{
value = null;
switch (tc)
{
case BinaryTypeCode.StringSmall: value = ReadStringSmall(); return true;
case BinaryTypeCode.Null: return true;
case BinaryTypeCode.StringEmpty: value = string.Empty; return true;
default:
// Hot path: FixStrAscii (short ASCII string values — property codes, IDs, names).
if (BinaryTypeCode.IsFixStrAscii(tc))
{
var falen = BinaryTypeCode.DecodeFixStrAsciiLength(tc);
value = falen == 0 ? string.Empty : ReadAsciiBytesAsString(falen);
return true;
}
break;
}
// Cold marker, PropertySkip, or unknown — caller continues via short-circuit ||
// to <see cref="TryReadStringColdPath"/>; value left at null.
return false;
}
/// <summary>
/// Cold-path companion to <see cref="TryReadStringProperty"/>. Dispatches the **cold** markers
/// (StringMedium / StringBig / StringAscii long / StringInterned / InternFirst*). Returns
/// <c>true</c> if a cold marker matched (caller assigns <paramref name="value"/> to the
/// property); <c>false</c> if the marker is <see cref="BinaryTypeCode.PropertySkip"/> or an
/// unknown / corrupted value (caller leaves the property untouched — the safer behaviour for
/// wire corruption).
/// <para><see cref="MethodImplOptions.AggressiveOptimization"/> forces Tier-1 direct compilation
/// — the body is too large for AggressiveInlining (6 marker cases + decode-helpers), but the
/// compile-once Tier-1 quality makes the rare-marker dispatch path predictable and tight. The
/// caller pays one method-call cost only when the wire actually carries a cold marker.</para>
/// </summary>
[MethodImpl(MethodImplOptions.AggressiveOptimization)]
internal bool TryReadStringColdPath(byte tc, out string? value)
{
switch (tc)
{
case BinaryTypeCode.StringMedium: value = ReadStringMedium(); return true;
case BinaryTypeCode.StringBig: value = ReadStringBig(); return true;
case BinaryTypeCode.StringAscii: value = ReadPlainStringAscii(); return true;
case BinaryTypeCode.StringInterned: value = GetInternedString((int)ReadVarUInt()); return true;
case BinaryTypeCode.StringInternFirstSmall: value = ReadAndRegisterInternedStringSmall(); return true;
case BinaryTypeCode.StringInternFirstMedium: value = ReadAndRegisterInternedStringMedium(); return true;
}
// PropertySkip OR unknown marker — caller leaves the property at default value
// (safer than the previous silent null-assignment on unknown).
value = null;
return false;
}
/// <summary>
/// Full-content hash for string caching.
/// CRITICAL: DO NOT SIMPLIFY <20> prevents hash collisions for similar property names.

View File

@ -258,14 +258,13 @@ public static partial class AcBinaryDeserializer
public string GetInternedString(int cacheIndex)
{
var result = _internCache![cacheIndex];
if (result == null)
{
throw new AcBinaryDeserializationException(
$"Interned string at cache index '{cacheIndex}' was not populated.",
_position);
}
return (string)result;
#if DEBUG
if (result == null)
throw new AcBinaryDeserializationException($"Interned string at cache index '{cacheIndex}' was not populated.", _position);
#endif
return (string)result!;
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]

View File

@ -1139,7 +1139,7 @@ public static partial class AcBinaryDeserializer
/// outer marker has been read; symmetric to <see cref="BinaryDeserializationContext{T}.WriteStringUtf8"/>
/// on the writer side.
/// </summary>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
[MethodImpl(MethodImplOptions.NoInlining)]
private static string ReadPlainString<TInput>(BinaryDeserializationContext<TInput> context)
where TInput : struct, IBinaryInputBase
{

View File

@ -0,0 +1,248 @@
# AcBinary Wire — Hot/Cold Bytecode Layout
Working notes for a wire-format reorganization where the BinaryTypeCode marker space is partitioned into a **hot range** (no-feature common path) and a **cold range** (feature-engaged paths). Single-branch dispatch on the marker byte (`(tc & 0x80) == 0` or `tc < ColdStart`) selects between an inline hot switch and a per-category cold method.
> Not a TODO entry. LLM-only working notes. Sibling to `BINARY_SGEN_OPTIMIZATION.md` (per-property emit condensation companion).
## Core concept
The marker-byte tier reflects the wire-format's perf-architecture: hot markers serve the no-feature workload at maximum speed (small inline switch in SGen-emit), cold markers carry the feature-engaged variants behind a single method call.
**Boundary aligns with feature flags:**
- `EnableRefHandling=false``ObjectRef*`, ref-aware element/dict wrappers never on wire
- `EnableInternString=false``StringInterned`, `StringInternFirst*` never on wire
- `EnableMetadata=false``ObjectWithMetadata*` never on wire
- `EnablePolymorphDetect=false``ObjectWithTypeName*`, FixObj slot fallback never on wire
A type with all features off emits **only hot markers**; the reader's cold branch is provably dead code for that type — JIT-foldable when SGen-emit can prove it at compile time.
### Strict no-feature collapse — zero regression baseline
When SGen proves all four feature flags off for a type (`EnableRefHandling = EnableInternString = EnableMetadata = EnablePolymorphDetect = false`), the cold branch AND the `IsHot` check itself **both** drop from the generated emit:
```csharp
// Strict no-feature emit — no IsHot check, no cold fallback
switch (tc) {
case BinaryTypeCode.Object: ... ; break;
case BinaryTypeCode.Null: ... ; break;
}
```
The strict no-feature SGen-emit is **structurally identical to the original pre-refactor inline emit** — every dispatch the same shape, same instructions. **Zero regression** on the no-feature hot path. The hot/cold infrastructure adds nothing the type doesn't need.
Feature-engaged types pay the IsHot check (3-4 cmp) plus the cold method call (when a cold marker hits) — both costs covered many times over by the feature's own value (see cost-benefit below).
## Hot vs cold markers across all categories
| Category | HOT markers (no-feature) | COLD markers (feature-engaged) |
|---|---|---|
| String | `FixStrAscii (range)`, `StringSmall`, `Null`, `StringEmpty` | `StringMedium`, `StringBig`, `StringAscii (long)`, `StringInterned`, `StringInternFirstSmall/Medium` |
| Complex | `Object`, `Null` | `ObjectRefFirst`, `ObjectRef`, `ObjectWithMetadata`, `ObjectWithMetadataRefFirst`, `ObjectWithTypeName*`, `FixObj slot (0-63)` |
| Collection element | `Array`, `EmptyArray`, `Null` | Ref-aware element wrappers, ChunkedArray variants |
| Dictionary entry | `Dict`, `EmptyDict`, `Null` | Ref-aware key/value variants |
| Primitive | `Int32`, `Int64`, `Double`, `Decimal`, `Enum`, `TinyInt (range)`, `Boolean`, `Guid`, etc. | Large-variant primitive markers (rare) |
## Two implementation paths
### B — `IsHotXxx` helper, wire-format intact (preferred for incremental)
Per-category small helpers on `BinaryTypeCode`:
```csharp
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static bool IsHotStringMarker(byte tc) =>
IsFixStrAscii(tc) || tc == StringSmall || tc == Null || tc == StringEmpty;
```
SGen-emit:
```csharp
if (BinaryTypeCode.IsHotStringMarker(tc)) {
// inline hot switch (3-5 case)
} else {
obj.X = context.ReadStringCold(tc);
}
```
- 3-4 cmp instructions inlined per dispatch site (branch predictor learns common-case bias)
- Zero wire-format change; persisted blobs unaffected
- Per-category helper introduced incrementally, switch-by-switch validation
- No global atomic refactor required
### A — Renumber BinaryTypeCode, hot range first
Hot codes contiguous (e.g. 0x00-0x7F), cold (0x80-0xFF). Single dispatch:
```csharp
if ((tc & 0x80) == 0) {
// hot inline switch
} else {
obj.X = context.ReadStringCold(tc);
}
```
- 1 `test+jne` instruction per dispatch site (minimum overhead)
- Hot switch contiguous range → JIT may emit jump table
- Wire-format breaking change — version bump, persisted blob migration required
- Architecturally cleanest but disruptive
**Recommended sequence:** start with B (incremental, reversible), graduate to A when a wire-format-breaking change is incurred for another reason anyway.
## Per-category cold dispatch methods
Each on the `BinaryDeserializationContext<TInput>`, `[MethodImpl(MethodImplOptions.AggressiveOptimization)]` (Tier-1 direct, skip QuickJIT):
| Method | Category | Markers handled |
|---|---|---|
| `ReadStringCold(byte tc)` | String | Medium, Big, Ascii-long, Interned, InternFirstSmall, InternFirstMedium |
| `ReadComplexCold<T>(byte tc)` | Complex | ObjectRefFirst, ObjectRef, ObjectWithMetadata*, FixObj-slot fallback, ObjectWithTypeName* |
| `ReadCollectionElementCold<T>(byte tc)` | Collection element | Ref-aware element wrappers, large-array variants |
| `ReadDictEntryCold<TK, TV>(byte tc)` | Dictionary entry | Ref-aware key/value wrappers |
| `ReadEnumCold(byte tc)` | Enum | Rare numeric variants |
| `ReadPrimitiveCold[T](byte tc)` | Primitive | Large-variant primitives (rare) |
Generic specialization on the complex/collection/dict cold methods multiplies native bodies per `<T>` — acceptable cost given Tier-1 optimization and rare hot-path engagement. Alternative: erase to `Type` parameter on cold methods to keep one native body — Tier-1 amortizes reflection cost.
## Single source of truth — category-bounded
The hot/cold method-set ideal — "SGen-emit and runtime path call identical methods" — applies **fully** for categories without type-specific dispatch, **partially** for those with it:
| Category | Marker dispatch shared? | Type-specific work shared? | Notes |
|---|:---:|:---:|---|
| String | ✅ | ✅ (no type-specific) | One `ReadStringHotMarker(tc)` / `ReadStringColdMarker(tc)` pair, both paths call same |
| Primitive (Int32/Int64/Double/Decimal/Guid/...) | ✅ | ✅ (no type-specific) | Same |
| Enum | ✅ | ✅ (cast-only, generic) | Same |
| Complex | ✅ | ⚠️ Diverges | SGen: generic-static-abstract direct dispatch. Runtime: TypeReaderTable + `GetWrapper(Type)` metadata-driven (current architecture preserved) |
| Collection / Dictionary | ✅ | ⚠️ Diverges | Same as Complex — element/key/value reader is type-specific |
### Type-specific dispatch divergence
For Complex (and Collection-element / Dict-entry), the type-specific reader call diverges between paths because C# generic methods cannot accept a runtime `Type` as a generic parameter without reflection (`MethodInfo.MakeGenericMethod(...).Invoke(...)`) — not viable on the hot path (slow, AOT-trim-hostile).
**SGen path** — both type and reader known at compile time, uses generic + static abstract:
```csharp
// Existing precedent: IBinaryInputBase.IsTrustedSingleSegment (static abstract bool) — production-used,
// NativeAOT-verified via the AyCode.Core.Serializers.Console.csproj <PublishAot>true</PublishAot> path.
internal interface IGeneratedBinaryReader<TSelf>
where TSelf : IGeneratedBinaryReader<TSelf>
{
static abstract void ReadProperties<TInput>(object value, BinaryDeserializationContext<TInput> context)
where TInput : struct, IBinaryInputBase;
}
// Context method:
[MethodImpl(MethodImplOptions.AggressiveInlining)]
internal T ReadHotComplexMarker<T, TReader>(byte tc)
where T : class, new()
where TReader : IGeneratedBinaryReader<TReader>
{
var rc = new T();
TReader.ReadProperties(rc, this); // static abstract — JIT direct call per-spec
return rc;
}
// SGen-emit:
obj.Tag = ctx.ReadHotComplexMarker<SharedTag_All_True, SharedTag_All_True_GeneratedReader>(tc);
```
ILC sees concrete `<T, TReader>` pair at the SGen-emit call site → generates specialization → direct call in native code. No reflection, no DAM-attribute required on the hot path.
**Runtime path** — type known only at runtime, keeps current architecture:
- `TypeReaderTable<TInput>.Readers[tc]` for marker dispatch (could be migrated to direct switch-call pattern but a separate refactor)
- `GetWrapper(Type)` for type-specific Complex dispatch (metadata + compiled-expression property accessors)
- The existing `IGeneratedBinaryReader` (non-static-abstract) interface stays — used by runtime cross-type populate path
### Net wins per path
- **SGen path:** full hot/cold method-set with generic-static-abstract for Complex/Collection/Dict. Maximum perf, single dispatch model, JIT-inline-friendly.
- **Runtime path:** marker dispatch shared with SGen (String / Primitive / Enum hot+cold methods). Type-specific dispatch unchanged (TypeReaderTable + metadata). Gains the marker-dispatch hot path, not the type-specific.
## Cost-benefit — why this is win-win, not trade-off
Feature-cold path cost vs feature-engaged value:
| Feature | Cold-path overhead (per use) | Feature benefit (per use) | Net |
|---|---:|---:|---:|
| String interning | ~5 ns (cold method call) | ~50 B/repeat (50-char strings) + ~50 ns UTF-8 encode skipped + intern-cache O(1) dedup | **10× positive** |
| Id-tracking (IId ref) | ~5 ns | Whole subgraph re-serialization skipped: ~100-1000 ns + N×wire-bytes saved | **20-200× positive** |
| Ref-handling (hash-based) | ~5 ns | Non-IId reference dedup: ~50-200 ns + N×wire-bytes saved | **10-40× positive** |
| Metadata mode | ~5 ns | Schema-evolution support, mismatched-property tolerance | Correctness, not perf |
| Polymorphism | ~5 ns | Runtime type dispatch — no alternative on the wire | Correctness, not perf |
**Conclusion:** the cold-path method-call cost is paid back many times over by the feature's own value. No-feature workloads pay zero. Feature-engaged workloads pay ~5 ns to save 50-1000 ns. Both ends benefit.
## Implementation sequence
**The whole approach is gated on a JIT-inlining empirical check.** The TryReadStringProperty (Ötlet A) attempt failed because the inliner bailed on the body — the proposed hot/cold split aims to fix this by shrinking the hot body, but **JIT inliner decisions are opaque** and can only be confirmed empirically. The sequence below is **falsifiable** — Phase 0 is a kill-switch.
### Phase 0 — Disasm prototype (kill-switch gate)
Test the **worst realistic candidate**, not the easiest. If String — the largest hot body in the proposed architecture — inlines successfully, every other category (Primitive, Enum, Complex, Collection, Dict) is smaller and will inline too. If String fails, the architecture is not viable on the current JIT regardless of how the smaller categories would fare. (Testing Enum first proves nothing — its ~10-15 IL body is far below any plausible inline threshold.)
Build the hot string-marker method (3 marker cases + FixStrAscii range check):
```csharp
[MethodImpl(MethodImplOptions.AggressiveInlining)]
internal string? ReadHotStringMarker(byte tc) // caller already read byte + checked PropertySkip
{
if (BinaryTypeCode.IsFixStrAscii(tc))
{
var len = BinaryTypeCode.DecodeFixStrAsciiLength(tc);
return len == 0 ? string.Empty : ReadAsciiBytesAsString(len);
}
return tc switch {
BinaryTypeCode.StringSmall => ReadStringSmall(),
BinaryTypeCode.Null => null,
BinaryTypeCode.StringEmpty => string.Empty,
_ => null // unreachable — caller guaranteed hot range
};
}
```
**Source-level if-chain vs switch is stylistic for this case count.** Roslyn converts small (≤3-4) and sparse-value switches to cmp-chain IL (`bne`/`beq` sequence), not the `switch` IL instruction. Our marker values (91/92/93 non-contiguous) force cmp-chain regardless of source form. The inliner's complexity-scoring sees identical IL either way — no inline-decision difference between forms.
Add a single SGen-emit call site, build, run:
```
DOTNET_JitDisasm=*ReadProperties* DOTNET_TC_QuickJit=0 dotnet run ...
```
Inspect the caller body in the emitted disasm:
- **Inline success:** `ReadHotStringMarker` body fully inlined — `IsFixStrAscii` check + marker dispatch + the inner `ReadStringSmall` / `ReadAsciiBytesAsString` calls appear directly in `ReadProperties`. → architecture VIABLE for String → other categories almost certainly inline too → proceed to Phase 1.
- **Inline failure:** caller body contains `call ReadHotStringMarker` — JIT bailed. → architecture NOT VIABLE on this JIT for the String hot body. Document outcome as Tanulság (lesson-learned). Keep current inline-emit for String; consider proceeding for smaller-body categories (Primitive/Enum) where success is more likely.
**Cost:** ~2-3 hours. Decision is per-category (String fail doesn't automatically kill Primitive/Enum).
### Phase 1 — Build out viable categories (only if Phase 0 ✓)
1. **Audit BinaryTypeCode** — current values per category, hot vs cold classification
2. **B path: per-category `IsHot*` helpers**`BinaryTypeCode.IsHotStringMarker`, `IsHotComplexMarker`, etc.
3. **Per-category hot+cold method pairs**`BinaryDeserializationContext.ReadHotXxxMarker(tc)` (`[AggressiveInlining]`) + `ReadColdXxxMarker(tc)` (`[AggressiveOptimization]`)
4. **SGen-emit conversion, one category at a time** with per-category disasm-recheck:
- **Primitive** first (smallest blast radius, simplest body)
- Disasm-verify hot method inlines at caller site → BDN/Console F median-of-3 → confirm no regression
- **String** next (slightly larger body — riskier; if Primitive succeeded but String fails, fall back to inline emit just for String)
- **Enum** (similar to Primitive)
- **Complex** — also requires `IGeneratedBinaryReader<TSelf>` generic-static-abstract interface refactor; bigger touch surface but follows same inline-verification protocol
- **Collection / Dictionary** last (element/entry dispatch reuses Complex pattern)
5. **Strict-mode SGen-emit collapse** — when feature flags prove cold dead, emit just the hot switch (no IsHot check, no cold fallback). K9M3 Phase C provides the metadata.
6. **A path (later, gated)** — BinaryTypeCode renumber when a wire-format-breaking change is independently scheduled
### Per-category abort gates
Each category's Phase 1 step has its own disasm-verify checkpoint. If a category fails its inline gate (body too complex despite shrinking attempts), that category falls back to the inline-emit architecture — the others continue. No all-or-nothing commitment.
## Open questions
- Per-category cold method generic specialization vs `Type`-parameter erasure: bin-size vs Tier-1 reflection cost trade-off — measure before committing
- Hot-range size budget: how many markers fit in the proposed hot range (e.g., 0x00-0x7F = 128 codes), accounting for FixStrAscii's 32-byte range consumption + TinyInt range
- ChunkedArray and other large-collection variants: which side of the boundary? (Hot if EnableRefHandling-independent, cold if feature-engaged)
- Cost-benefit on the writer side: does the writer-side cold path (intern-cache hash + lookup, IdentityMap probe) carry the same +/- profile as the reader side?
## Cross-references
- `BINARY_SGEN_OPTIMIZATION.md` — per-property emit condensation companion (sister optimization, same K9M3 ecosystem)
- `BINARY_TODO.md#accore-bin-t-k9m3` — Phase C feature-conditional emit (the layer this builds on)
- `BINARY_FORMAT.md` — current wire-format spec (will need update post-renumber if A path taken)
- `BINARY_SGEN.md` — SGen architecture (the SGen-emit refactor target)

View File

@ -206,6 +206,69 @@ Smaller IL → faster cold-start JIT, smaller assembly, smaller i-cache footprin
- F.4 if-else order: hot path is `Object` (no ref-handling, fresh instance) — should be the first branch checked, not `PropertySkip`. Current emit checks `PropertySkip` first (early exit), correct for sparse-property streams (schema evolution), suboptimal for dense streams. Profile both orderings.
- Disasm-baseline project (`AyCode.Core.Benchmarks.Disasm`): structure-only or also a perf companion to BDN? Decide post-baseline.
## Future review — context shape + attribute strategy
LLM-only notes. Not a TODO entry — capture for a later audit pass when refactoring stabilizes.
### Context thinning
- Audit field layout of `BinarySerializationContext<TOutput>` + `BinaryDeserializationContext<TInput>`: hot fields (`_buffer` / `_position` / `_bufferLength`) first cache line; cold (`_stringCache`, `_internCache`, intern maps) separate.
- Generic specialization multiplier: every field × `<TInput>` / `<TOutput>` specialization count → field-add cost scales.
- Feature-gated fields removable when matching `Enable*Feature` consistently false across consumers (cross-ref ACBIN-T-K9M3 Phase C).
### Attribute strategy
| Pattern | When | Rationale |
|---|---|---|
| `[AggressiveInlining]` | ≤ ~5 IL body, branch-free, single-statement | High inline confidence |
| `[AggressiveOptimization]` | Warmup-sensitive hot loop, needs Tier-1 direct | Skip Tier-0 latency |
| `[NoInlining]` | Cold throw-helpers, rare-marker decoders, slow paths | Prevent caller-bloat |
| (no attribute) | All else | Let JIT decide |
- Audit drift: `[AggressiveInlining]` bodies may have grown past inline-threshold after later edits.
- Explicit `[NoInlining]` on `Throw*` / `*Slow` / exception formatters / coldpath helpers.
- Hint ≠ guarantee. Ötlet A regression confirms: complex bodies bail despite hint; the only reliable inlining is small bodies (≤ ~5 IL) OR inline emit (no method boundary).
### Context split — Strict vs Hybrid (larger refactor)
Single `BinaryDeserializationContext<TInput>` currently carries 100% hybrid-mode code (FixObj wrapper slot table, `GetWrapper(Type, byte)`, `_wrapperSlots[]`, `_nextRuntimeSlot`, TypeReaderTable bridge). Strict-mode SGen paths drag this dead weight.
**Target hierarchy** (inheritance without virtuals — derived adds fields/methods, never overrides):
```
BinaryDeserializationContextBase<TInput> // common: buffer, ReadByte/Var*, ReadString*, intern, ScanInternString, FastWire
BinaryDeserializationContextStrict<TInput> // empty initially; future strict-only optimizations
BinaryDeserializationContextHybrid<TInput> // wrapperSlots[], GetWrapper(Type,byte), _nextRuntimeSlot, TypeReaderTable bridge
```
**Dispatch:** SGen type-analysis decides strict-or-hybrid at compile time (all reachable types `[AcBinarySerializable]` → strict; any non-marked → hybrid). `Deserialize<T>` creates the matching context type; `IGeneratedBinaryReader<TInput, TContext>` generic over context type.
**Win on strict path:**
- Complex emit per-property: -3 lines (FixObj slot fallback dead-coded out)
- Heap alloc per Deserialize: ~30-40 B smaller context
- Native code per `<TInput>` strict-spec: less branch, less i-cache
- Cross-type populate fallback paths physically absent → wire-corruption surfaces as missing-method instead of silent fallback
**Refactor sequence:**
1. Extract `Base<TInput>` (everything strict-needed: buffer state, primitives, intern cache, FastWire flag)
2. Move hybrid-only state/methods to `Hybrid<TInput>` derived
3. `Strict<TInput>` = empty derived initially (=== Base)
4. SGen emit: type-analysis picks Strict-emit or Hybrid-emit path; emit references typed concrete context
5. Reader interface generic-context-paraméter: `IGeneratedBinaryReader<TInput, TContext>`
6. Test reorganization: existing tests run on hybrid-context; add strict-graph fixtures
**Effort:** ~1-2 days. Symmetric work on `BinarySerializationContext` writer side. Gated on stable baseline (Ötlet A regression resolved first).
### Static-name ASCII-cache + UTF-16-raw fallback
Type-names (polymorphism path) and property-names (UseMetadata=true) are **static** — known at metadata-construction time. `Ascii.IsValid` scan once → cache as `bool` on metadata; SGen W9F1 emits compile-time const.
Wire path bypasses UTF-8 entirely:
- ASCII (common): marker + length + 1 B/char → byte-widen on read (`ReadAsciiBytesAsString`-like)
- Non-ASCII (rare): marker + length + 2 B/char UTF-16 raw memcpy → `MemoryMarshal.Cast<byte,char>` on read
Eliminates `ReadPlainString` / `ReadStringUtf8(int length)` / `DecodeUtf8(int byteLength)` (legacy 2-pass BCL chain, currently used only by polymorphism). Wire-format change scoped to `ObjectWithTypeName` / `ObjectWithTypeNameRefFirst` markers; user-string-property values (H2Q6 markers, runtime-arrived) unaffected.
## Cross-references
- `BINARY_TODO.md#accore-bin-t-k9m3` — wire-codec hoist + Phase C feature-conditional emit (sister work).

View File

@ -12,6 +12,7 @@ AcBinary serialization system. Primary goal: **speed** (two-phase scan+serialize
- [`BINARY_WRITERS.md`](BINARY_WRITERS.md) — Writer internals (streaming, buffering)
- [`BINARY_SGEN.md`](BINARY_SGEN.md) — Source generator (`AyCode.Core.Serializers.SourceGenerator`)
- [`BINARY_SGEN_OPTIMIZATION.md`](BINARY_SGEN_OPTIMIZATION.md) — SGen per-property emit micro-optimization brainstorming / methodology notes (working doc, not a TODO)
- [`BINARY_BYTECODE_OPTIMIZATION.md`](BINARY_BYTECODE_OPTIMIZATION.md) — Wire-format hot/cold marker layout reorganization (sibling working doc, feature-flag-aligned bytecode space partition)
- [`BINARY_ISSUES.md`](BINARY_ISSUES.md) — Known issues and limitations (binary serializer core)
- [`BINARY_TODO.md`](BINARY_TODO.md) — Planned work / open tickets (binary serializer core)
- [`BINARY_ASYNCPIPE_ISSUES.md`](BINARY_ASYNCPIPE_ISSUES.md) — Known issues and limitations (streaming I/O layer: `AsyncPipeReaderInput` + `AsyncPipeWriterOutput`)

View File

@ -14,7 +14,7 @@
The `<clear />` element explicitly drops all parent NuGet configurations
(machine-level + user-level + any walk-up Directory-level config) for this repo's
scope. Only `nuget.org` remains, matching the public packages declared in the
csproj files (AutoMapper, MessagePack, MemoryPack, Newtonsoft.Json, Microsoft.*).
csproj files (MessagePack, MemoryPack, Newtonsoft.Json, Microsoft.*).
Determinism: every dev machine + CI agent resolves the same source set. Side
effect-free with respect to other repos sharing the same developer machine — their