AyCode.Core/AyCode.Core/docs/BINARY/BINARY_TODO.md

87 KiB
Raw Blame History

AcBinarySerializer — TODO

This page covers planned work for the binary serializer core (format, SGen, options, deserialization context, buffer writer). Work specific to the streaming I/O layer (AsyncPipeReaderInput + AsyncPipeWriterOutput, multi-message wire framing, sliding-window buffer, producer-consumer synchronization) is tracked separately in BINARY_ASYNCPIPE_TODO.md.

Priority legend

  • P0 blocker · P1 important · P2 nice-to-have · P3 idea

ACCORE-BIN-T-S8P4: Replace JSON-in-Binary request parameters

Priority: P1 · Type: Refactor · Status: Closed (2026-04-26, landed in commits cdd54d3 2026-04-05 + 3b70070 2026-04-06) · Related: ../XCUT/XCUT_ISSUES.md#accore-xcut-i-x8q1 (canonical), AyCode.Services/docs/SIGNALR/SIGNALR_TODO.md

Migrate client→server request parameters from JSON-in-Binary envelope to direct Binary serialization (matching response path). Coordinated change across client, server, and all consuming projects. Do NOT attempt as side-effect of unrelated work.

Acceptance: SignalPostJsonDataMessage<T> replaced by a SignalPostBinaryDataMessage<T> (or equivalent); no JSON round-trip on the wire for request params; benchmarks confirm no regression.

Resolution

  • What: Length-prefixed, per-parameter binary format introduced via SignalRSerializationHelper.SerializeParametersToBinary / DeserializeParametersFromBinary; further unified into SignalParams (single byte[] carrying packed method parameters with SetParameterValues / GetParameterValues).
  • Where: AyCode.Services/SignalRs/AcSignalRClientBase.cs, AcWebSignalRHubBase.cs, ISignalParams.cs (server + client dispatch); IAcSignalRHubClient.cs (legacy wrappers).
  • Equivalent (not literal SignalPostBinaryDataMessage<T>): SignalParams was chosen over a 1:1 binary wrapper class — fewer indirections on the hot path, type-safe pack/unpack, and DataSerializerType field on SignalReceiveParams for response format indication.
  • Wire impact: No JSON round-trip on the wire for request params; this is a breaking change vs. previous JSON-in-Binary clients/servers (see commit message).
  • Legacy types: SignalPostJsonMessage, SignalPostJsonDataMessage<T>, SignalPostMessage<T>, ISignalPostMessage<T> all marked [Obsolete] in IAcSignalRHubClient.cs; deletion tracked separately in AyCode.Services/docs/SIGNALR/SIGNALR_TODO.md#accore-sig-t-s3n8 (gated on consumer migration).

ACCORE-BIN-T-Q2N7: Re-evaluate DiscountProductMapping SGen exclusion

Priority: P3 · Type: Investigation · Related: BINARY_ISSUES.md#accore-bin-i-f1w8

Investigate whether the new int Id shadowing pattern can be handled by SGen (via base-class introspection, property-setter lookup on the base) to eliminate the runtime compiled-expression fallback for this entity class.

ACCORE-BIN-T-W9F1: Generate BinarySerializeTypeMetadata / BinaryDeserializeTypeMetadata at compile time

Priority: P1 · Type: Performance · Related: BINARY_ISSUES.md#accore-bin-i-n6q3

Eliminate the dominant first-call cost (reflection + Expression.Compile in metadata ctor) for SGen types by emitting pre-built metadata from the source generator.

Design outline:

  • TypeMetadataBase / BinarySerializeTypeMetadata / BinaryDeserializeTypeMetadata get a second constructor that accepts pre-computed values (hashes, MinWriteSize, ComplexPropertyCount, flags, IsIId, IdAccessorType, etc.). No reflection executes in this ctor.
  • Source generator keeps its existing s_typeNameHash / s_propertyHashes static fields (hot-path access stays static, zero indirection) and passes the same references to the metadata — single source of truth, no duplicate computation.
  • ModuleInit registers both the writer/reader and the pre-built metadata into a GeneratedMetadataRegistry. GetWrapperSlow consults this registry first, falling back to the reflection-based MetadataFactory for runtime-only types.
  • Lazy RuntimeInit() pattern for Expression.Compile property accessors:
    • TypeMetadataBase gets volatile bool _runtimeInitialized + internal void RuntimeInit() (idempotent, no lock needed).
    • GetWrapperSlow calls metadata.RuntimeInit() only when wrapper.GeneratedWriter == null || !Options.UseGeneratedCode — SGen types skip it entirely (they never touch runtime accessors on their own metadata; non-SGen child types have their own metadata and run the factory path normally).
    • Hybrid mode stays correct: an SGen type on the SGen path never uses its own property accessors; a non-SGen child type's metadata runs the reflection ctor as today.
  • volatile guards the flag; multiple contexts may race into RuntimeInit, second run is a no-op.

Thread safety: GlobalMetadataCache is ConcurrentDictionary; generated metadata is registered once at ModuleInit; wrapper construction is per-context and unchanged.

Acceptance:

  • Cold benchmark: first Serialize<T> of a fresh SGen type shows no reflection / Expression.Compile on the call stack.
  • Runtime fallback (UseGeneratedCode=false) still produces identical wire output and uses the full metadata accessors.
  • Deserialize side has parity (same approach for BinaryDeserializeTypeMetadata).
  • Existing tests pass; wire format unchanged.

ACCORE-BIN-T-T5J8: JIT Tier 1 warmup for generated hot methods

Priority: P2 · Type: Performance · Related: BINARY_ISSUES.md#accore-bin-i-n6q3

After ACCORE-BIN-T-W9F1 lands, JIT of generated WriteProperties / ScanObject / ScanForDuplicates becomes the dominant residual first-call cost for SGen types. Options to evaluate (benchmark before committing):

  • [MethodImpl(MethodImplOptions.AggressiveOptimization)] on the generated hot methods — skips Tier 0, compiles directly at Tier 1. Simple generator change. Trade-off: larger one-time JIT cost in exchange for eliminating the Tier 0→1 recompile step.
  • Background prewarm from ModuleInit: Task.Run(() => RuntimeHelpers.PrepareMethod(handle)) for each registered writer/reader method. Parallelizes JIT with app startup. Keep it opt-in (option flag) to avoid surprising consumers with extra startup threads.
  • ReadyToRun (R2R) in consuming projects' publish config — pre-compiles IL to native at publish time. External to SGen, complementary. Document as a recommended publish setting.
  • Code chunking (split generated methods exceeding a property threshold into sub-methods, e.g. WriteProperties_Part1 / _Part2) — measure first. Only beneficial for unusually large types (20+ properties / nested collections). Call overhead can offset gains; JIT inliner may already handle reasonably-sized methods well.
  • try / finally audit on hot path — On .NET 9 (project's minimum target), JIT silently refuses to inline any method containing an EH region (AggressiveInlining is ignored). [.NET 10 partially lifts this for same-module try-finally — see dotnet/runtime#112998, merged 2025-03-20 — but catch, cross-module, and P/Invoke-stub cases stay blocked. Until project's minimum runtime moves to .NET 10, treat EH as an absolute inlining barrier; even after the upgrade, several sub-cases keep the rule.] Audit scope:
    • Hand-written bridges: WriteValueGenerated / WriteObjectGenerated / WriteStringGenerated / ScanValueGenerated and any helper called from generated WriteProperties for accidental try/finally / using blocks.
    • SGen output template (AcBinarySourceGenerator.cs): generated WriteProperties / ScanObject / ScanForDuplicates / ReadObject / ReadProperties MUST stay straight-line. Future feature additions ([CustomSerializer] / [CustomDeserializer] hooks, OnSerializing / OnDeserialized callbacks, validation attributes, rented-buffer using blocks) are tempting candidates for try/catch/finally — emit them in separate cold helpers, never inline into the generated hot method. A single accidental try block in WriteProperties makes the whole generated method non-inlinable, killing the SGen Root Fast Path benefit.
    • Resource cleanup (Pool/ArrayPool/Dispose) belongs in Serialize<T> entry-frame only, not in per-property helpers or generated hot methods. See BINARY_IMPLEMENTATION.md Rule #3 (Inlining barriers) and BINARY_SGEN.md (SGen Output Constraints).
  • stackalloc size discipline on hot path — On .NET 9, methods containing localloc (any C# stackalloc) historically blocked inlining. Modern .NET allows inlining only for fixed-size stackalloc ≤ 32 bytes outside loops (see dotnet/runtime#7113) — anything larger or loop-nested still blocks. Our typical scratch-buffer patterns (UTF-8 encoding scratch, ArrayPool fallbacks) sit far above 32 bytes (256+), so any helper containing such a stackalloc is non-inlinable. Combined with try/finally for ArrayPool.Return cleanup, the method is doubly non-inlinable on .NET 9. Plan accordingly: keep stackalloc-using helpers as deliberate cold call-frames, not as AggressiveInlining candidates.
  • Native AOT — out of scope for this TODO; separate architectural decision with deployment-model implications.

Acceptance:

  • Benchmark a realistic entity graph (≥ 3 referenced child types) and show first-call time within ~10% of steady-state after ACCORE-BIN-T-W9F1 + chosen mitigation(s).
  • Document which combination is recommended for SignalR hot-path workloads vs. batch serialization.

ACCORE-BIN-T-Z3K8: Replace IId<T> interface dependency with convention/attribute-based Id detection

Priority: P1 · Type: Refactor

The binary serializer currently detects Id-tracking properties via the IId<T> interface (AyCode.Interfaces). This couples the serializer to a framework-specific abstraction and forces consumer types to implement the interface for tracking participation. Move to a POCO-friendly detection scheme:

  • IdDetectionMode.Convention (default) — convention-based; any property named Id is treated as the tracking key. Zero-friction onboarding.
  • IdDetectionMode.Attribute — explicit; only properties marked with a serializer-native [Id] (or similar) attribute are tracked.
  • [IgnoreId] attribute — escape hatch in Convention mode to exclude an Id-named property from tracking when the developer wants explicit opt-out.

Implicit contract for Convention mode: within a single class, the Id property must be type-level unique. Whether it semantically represents a primary key or a sequence number is irrelevant — the tracker keys by (Type, Id), so per-type uniqueness is the only requirement. Violating this invariant typically signals a domain-modelling problem, not a serializer bug. Design rationale discussed in conversation 2026-04-27.

Acceptance:

  • Binary serializer no longer references IId<T> in any execution path (no interface checks, no where T : IId<TKey> constraints in the serializer surface).
  • Wire format unchanged.
  • Existing consumers using IId<T>-implementing types still work transparently in Convention mode (their Id property is detected via convention).
  • New consumers can use plain POCOs with no AyCode.Interfaces dependency.
  • IdDetectionMode exposed on AcBinaryOptions (or successor options class post-rebrand).
  • Default mode = Convention.

ACCORE-BIN-T-N7V1: Replace [JsonIgnore] dependency with serializer-native ignore attribute

Priority: P2 · Type: Refactor

Property exclusion from binary serialization currently relies on [JsonIgnore] (Newtonsoft.Json). This couples the binary serializer to a third-party JSON library's attribute and is conceptually wrong — a binary serializer should not consult a JSON-specific marker for its exclusion semantics.

Define a serializer-native ignore attribute (working name [BinaryIgnore]; final name TBD pending broader rebrand). For backward compatibility during transition, also continue recognizing [JsonIgnore] with a deprecation note.

Possible cross-cutting consideration: if Toon and other future serializers also need property-exclusion, a single shared attribute (e.g., [SerializerIgnore] in a common abstractions package) may be cleaner than per-serializer attributes. Decide before naming finalizes — this may belong in XCUT_TODO.md rather than purely BINARY scope.

Acceptance:

  • Native ignore attribute defined in the binary serializer's namespace (or shared abstractions package, pending the cross-cutting decision above).
  • Both native attribute and [JsonIgnore] recognized during a transitional period; native attribute takes precedence on conflict.
  • [JsonIgnore] recognition flagged for removal in a future major version (track in a follow-up cleanup TODO once consumer projects have migrated).
  • No new code dependency on Newtonsoft.Json for property-exclusion logic.

ACCORE-BIN-T-Y6R2: Implement projection serialization phase 1 (runtime path)

Priority: P1 · Type: Feature · Related: ../adr/0001-binary-projection-serialization.md (canonical)

Implement the phase 1 runtime path of source→target projection serialization per ADR 0001. See the ADR for full context, decision rationale, alternatives, consequences, and acceptance criteria.

Sibling rebrand-prep TODOs: ACCORE-BIN-T-Z3K8 (IId migration), ACCORE-BIN-T-N7V1 (JsonIgnore replacement).

ACCORE-BIN-T-K3W7: Rename BufferWriterChunkSize to reflect actual semantics

Priority: P3 · Type: Refactor · Breaking: Yes (public option API) · Streaming impact: see BINARY_ASYNCPIPE_TODO.md for the streaming-side companion considerations (chunk-on-wire vs internal-buffer semantics)

The property name BufferWriterChunkSize is misleading: across the three output paths it does NOT consistently represent a "chunk".

Output path What BufferWriterChunkSize actually controls Wire-format chunk?
ArrayBinaryOutput (Byte[] API) Initial buffer capacity of the internal byte[] No
BufferWriterBinaryOutput (IBufferWriter overload) Internal buffer size — how much data accumulates before Advance() + new GetMemory() on the underlying writer No
AsyncPipeWriterOutput (streaming) Both internal buffer and wire-format chunk frame size for chunked framing Yes (only here)
Receive side (AsyncPipeReaderInput) Initial receive buffer = BufferWriterChunkSize × 2 No (just sizing hint)

Only the streaming AsyncPipeWriterOutput path has a wire-format "chunk" concept (chunked framing for length-prefixed segments). On the other 75% of paths the property name reads as if the serializer were segmenting the payload, which is not what happens.

Possible directions (decide before implementing):

  1. Single rename, semantic-neutralBufferWriterChunkSizeBufferWriterBufferSize or BufferWriterPageSize. Minimal API surface change, single-property semantics preserved. Downside: still slightly off for the streaming path where there IS chunked framing.
  2. Two-property splitInternalBufferSize (universal: how much data accumulates before Advance/Grow) + StreamingChunkSize (only meaningful for AsyncPipeWriterOutput; separate knob, defaults to InternalBufferSize). Cleanest semantics, most ceremony, slightly more options to document.
  3. Single rename, streaming-honest — Keep as BufferWriterChunkSize but document explicitly that on non-streaming paths the value is repurposed as buffer size. Cheapest change (docs only). Downside: doesn't fix the underlying confusion the field name causes.

Pick one before touching code. Option 2 is the most correct but adds API surface; Option 1 is the pragmatic middle.

Affected callers / docs to update on rename:

  • AcBinarySerializerOptions.cs (definition)
  • AcBinarySerializer.cs × 3 sites (ArrayBinaryOutput ctor, BufferWriterBinaryOutput ctor, AsyncPipeWriterOutput ctor)
  • AcBinaryDeserializer.cs × 1 site (receive-side initial capacity derivation)
  • AsyncPipeReaderInput.cs — XML doc cross-refs
  • BINARY_WRITERS.md, BINARY_TODO.md (this entry), BINARY_ISSUES.md (line 151 — already lists BufferWriterChunkSize among the struct-mutation issue's affected setters)
  • Consumer-side: AyCode.Services/SignalRs/AcBinaryHubProtocol.cs ctor mutates _options.BufferWriterChunkSize = options.BufferSize; — see BINARY_ISSUES.md#accore-bin-i-... (struct-mutation context). Coordinate the rename with the struct-mutation fix to avoid two cross-cutting churn waves on the same property.

Acceptance:

  • Property renamed (or split) per the chosen direction; all internal references updated.
  • XML docs reflect the actual semantics on each output path (initial capacity / advance threshold / chunk frame size — whichever applies).
  • Consumer-side usage in AcBinaryHubProtocol updated; if Option 2 is chosen, the protocol uses StreamingChunkSize (the streaming knob), not the universal one.
  • Wire format unchanged. Default values unchanged (65535 / equivalent).
  • Migration note in CHANGELOG / release notes since this is a breaking change to AcBinarySerializerOptions.

ACCORE-BIN-T-M4D2: Add ReadOnlyMemory<byte> / Memory<byte> deserialize overloads

Priority: P3 · Type: Feature

The public AcBinaryDeserializer.Deserialize surface accepts byte[] (with optional offset/length) and ReadOnlySequence<byte>, but not ReadOnlyMemory<byte> / Memory<byte>. Consumers that hold a ReadOnlyMemory<byte> (cached payloads, message-broker frames, in-memory pipe slices) must call .ToArray() to round-trip through byte[] — unnecessary copy + GC alloc.

Implementation:

  • Deserialize<T>(ReadOnlyMemory<byte> data, AcBinarySerializerOptions options) and the non-generic Type-based variant.
  • Body: MemoryMarshal.TryGetArray(data, out var seg) → array-backed path delegates to Deserialize<T>(seg.Array!, seg.Offset, seg.Count, options) (zero-copy). Non-array-backed fallback (rare — custom MemoryManager<T> with native memory) copies into a pooled byte[].
  • Memory<byte> overload trivially delegates to the ReadOnlyMemory<byte> one (Memory<byte> is implicitly convertible).
  • No new input-strategy struct needed — reuses existing ArrayBinaryInput.

Acceptance:

  • Both overloads compile and pass round-trip tests against byte[]-equivalent input.
  • Array-backed path measurably zero-alloc (BenchmarkDotNet allocation diagnoser).
  • Non-array-backed path documented as fallback (separate using var pooled = MemoryPool<byte>.Shared.Rent(...) style copy).
  • API doc-strings cross-reference the existing byte[] and ReadOnlySequence<byte> overloads.

ACCORE-BIN-T-S7X3: Add ReadOnlySpan<byte> deserialize overload

Priority: P2 · Type: Feature · Related: ACCORE-BIN-T-M4D2

The MemoryPack-style Deserialize<T>(ReadOnlySpan<byte>) API enables direct deserialization from stack-allocated buffers (stackalloc byte[256]), pinned native memory (fixed blocks), and ReadOnlyMemory<byte>.Span slices without round-tripping through a heap-allocated byte[]. The current AcBinary surface lacks this entry point.

Design tension: the existing IBinaryInputBase.Initialize(out byte[] buffer, ...) contract returns a byte[] — a ReadOnlySpan<byte> cannot be stored in a regular struct field, only in a ref struct field. Two implementation paths to evaluate:

  1. ref struct SpanBinaryInput + interface bump to support ref byte buffer / int length fields. Pure zero-copy from any span. Cost: BinaryDeserializationContext<TInput> and IBinaryInputBase need a parallel ref-struct-friendly track (the existing pooled context cannot hold a ref struct). Major surgery on the deser core.
  2. MemoryMarshal.CreateReadOnlySpanFromNullTerminated-style hack — accept ReadOnlySpan<byte>, use Unsafe.AsRef/MemoryMarshal.GetReference to obtain a ref byte, then copy into a pooled byte[] before deserialization. Not zero-copy, defeats the purpose. Reject.
  3. Pinned-buffer trampoline — accept ReadOnlySpan<byte>, allocate a Memory<byte> view via a MemoryManager<byte>-like wrapper, delegate to ReadOnlyMemory<byte> overload. Awkward, allocations per call. Reject.

Recommendation: option (1) is the only correct path, but it's a substantial refactor — measure first whether real consumer demand justifies the surgery. The current byte[]-based pool-pattern outperforms MemoryPack on the dominant use-cases per existing benchmarks; this overload addresses an API-surface gap, not a perf gap.

Acceptance:

  • Deserialize<T>(ReadOnlySpan<byte> data, AcBinarySerializerOptions options) compiles and round-trips against byte[]-equivalent input.
  • Zero-alloc path verified for stackalloc-source spans (BenchmarkDotNet allocation diagnoser).
  • IBinaryInputBase (or successor interface) refactor preserves backward compatibility for existing ArrayBinaryInput / SequenceBinaryInput / AsyncPipeReaderInputAdapter consumers.
  • Doc-strings cross-reference the byte[] / ReadOnlyMemory<byte> (ACCORE-BIN-T-M4D2) / ReadOnlySequence<byte> overloads with use-case guidance.

ACCORE-BIN-T-T8K3: Add SerializeAsync(Stream, T) async overloads with mode-driven output strategy

Priority: P1 · Type: Feature · Related: ACCORE-BIN-T-N9G6 (Type-based coordination)

The mainstream serializer ecosystem (System.Text.Json, MessagePack, Newtonsoft.Json, MemoryPack) all expose SerializeAsync(Stream, T) as a primary entry point — async file I/O, network response body, log streaming. AcBinary's public API surface MUST include this overload regardless of what we do internally; consumers expect a Stream parameter and don't navigate PipeWriter.Create(stream) workarounds. Market-entry-blocking otherwise.

Mode-driven output strategy — three lanes for three workload shapes

AcBinary already models the three output strategies in BinaryProtocolMode (AyCode.Services/SignalRs/BinaryProtocolMode.cs) for the SignalR side. The same three-lane shape applies to the public SerializeAsync(Stream) API. Promote the concept to AcBinary core scope (e.g. AcBinaryOutputMode in AyCode.Core/Serializers/Binaries/) and let the SignalR BinaryProtocolMode either alias it or migrate to it. Migration timing: the existing BinaryProtocolMode keeps shipping until the new public API is stabilized; both names live for one major version, then BinaryProtocolMode becomes a using-alias.

Mode Output strategy Peak memory Pipeline parallelism Use when
Bytes (default) Serialize(T) → byte[] + stream.WriteAsync(bytes) Full payload in byte[] (pooled) No Typical payloads (<10 MB), throughput-focus
Segment BufferWriterBinaryOutputPipeWriter, single closing flush PipeWriter pause-threshold-bounded (~64 KB Kestrel default) No Mid-size payloads, zero-copy desired
AsyncSegment SerializeChunked(PipeWriter), per-chunk async flush Chunk-size-bounded (~8 KB at default BufferWriterChunkSize) Yes (on parallel-capable PipeWriter — Kestrel / Pipe) Very large payloads (>10 MB), memory-tight hosts, parallel-capable transport

Honest performance positioning vs. MemoryPack — three real axes

MemoryPack's SerializeAsync(Stream) is pseudo-streaming — serializes the entire payload into a pool-allocated linked-list buffer first (ReusableLinkedArrayBufferWriter), then writes the completed buffer to the stream in a single closing fence. Peak memory ≈ payload size; no pipeline parallelism. AcBinary's Bytes mode is architecturally similar (single pooled contiguous byte[] vs. MemoryPack's linked-list) — comparable peak-memory cost, often faster on the wire due to one contiguous WriteAsync call.

AcBinary's AsyncSegment mode is architecturally different in three real ways MemoryPack cannot match:

Axis Bytes mode (default) AsyncSegment mode MemoryPack SerializeAsync
Heap allocation per call Pooled byte[] rent (peak ≈ payload size) Truly zeroArrayPool + pooled context + MemoryMarshal.TryGetArray direct-buffer-write into the transport's own byte[] Pool-allocated linked-list buffer per call (peak ≈ payload size)
Peak managed memory ≈ payload size ≈ chunk size (BufferWriterChunkSize, e.g. 4-8 KB) ≈ payload size
GC pressure Touches GC pool on every call Never touches GC for the serialize itself Touches GC pool on every call
Pipeline parallelism No Yes on parallel-capable PipeWriter (Kestrel transport, new Pipe()) No
GB-scale payload OOM risk on memory-tight hosts Works OOM risk

The AsyncSegment zero-alloc claim is literal, not "almost zero": AsyncPipeWriterOutput.AcquireChunk calls _pipeWriter.GetMemory(chunkSize) and uses MemoryMarshal.TryGetArray(memory, out segment) to obtain the transport's own internal byte[] — the serializer writes directly into it. With chunkSize aligned to the transport's internal buffer (e.g. NamedPipe-server pipe-buffer-size), one chunk is one kernel-level transfer; no managed-side double-fragmentation.

Throughput nuance — AsyncSegment cost on Stream-backed transports

AsyncSegment IS slightly slower than Bytes on StreamPipeWriter-backed transports (NamedPipe / FileStream / NetworkStream), but not for the reason that initially seems obvious:

  • The cost is NOT "managed-side double-fragmentation on top of OS-level fragmentation" — that's not what happens. MemoryMarshal.TryGetArray zero-copy direct-buffer-writes mean the managed chunking is the same chunking the kernel does anyway, not redundant.
  • The cost IS the per-chunk async-await round-trip (SyncAwaitFlush(_lastFlush) blocks until the kernel acknowledges the write), forced sequential by the StreamPipeWriter._tailMemory reset race (ACCORE-BIN-I-...). N async cycles vs 1 in Bytes mode.
  • Empirically the gap is roughly 1.2-1.5x on NamedPipe — not 2-5x. The dominant cost on these transports is the transport itself (Windows IRP / Linux FIFO syscall overhead), independent of the serializer mode.

When AsyncSegment wins outright:

  • GC-sensitive hot-paths (server hubs, real-time game tick loops, mobile UI thread, embedded targets): zero-alloc + zero-GC-pressure beats a 1.2x throughput edge every time.
  • Memory-tight hosts (mobile, WASM, container-trimmed, embedded): chunk-bounded peak memory is the only option.
  • GB-scale payloads: Bytes OOMs; AsyncSegment works.
  • Kestrel transport / parallel-capable Pipe: pipeline parallelism makes AsyncSegment faster than Bytes for medium-to-large payloads.

When Bytes wins outright:

  • Tipikus NuGet workload (small-to-medium payload, throughput priority, GC-tolerant): one async cycle vs N is the simpler, faster path.
  • MemoryStream (in-memory): one large byte[] copy decisively beats N managed chunks.

Marketing claim — three-way honest comparison

"AcBinary offers a real choice. Bytes mode for typical throughput-priority workloads (matches MemoryPack's pseudo-streaming, often faster on the wire). AsyncSegment mode for the workloads MemoryPack cannot serve: zero-alloc serialize for GC-sensitive hot-paths, chunk-bounded peak memory for tight-budget hosts, GB-scale payloads, and pipeline parallelism on parallel-capable transports. You pick the mode; MemoryPack picks for you."

This is honest — does not overclaim universal speed, does not hide the small AsyncSegment cost on Stream-backed transports, AND clearly surfaces the three differentiator axes (alloc / memory / parallelism) where AcBinary architecturally beats MemoryPack.

Implementation outline:

  • New enum AcBinaryOutputMode { Bytes = 0, Segment = 1, AsyncSegment = 2 } in AyCode.Core/Serializers/Binaries/. Default Bytes.
  • New mode field on AcBinarySerializerOptions: AcBinaryOutputMode OutputMode { get; set; } = AcBinaryOutputMode.Bytes;. (Note: subject to ACCORE-BIN-I-L8N5 thread-safety treatment — defensive copy / immutable refactor coordination.)
  • public static ValueTask SerializeAsync<T>(T value, Stream stream, AcBinarySerializerOptions? options = null, bool leaveOpen = false, CancellationToken ct = default):
    • Switch on options.OutputMode:
      • Bytesvar bytes = Serialize(value, options); await stream.WriteAsync(bytes, ct); ArrayPool.Return(bytes);
      • Segmentvar pw = PipeWriter.Create(stream, new(leaveOpen: leaveOpen)); Serialize(value, pw, options); await pw.CompleteAsync();
      • AsyncSegmentvar pw = PipeWriter.Create(stream, new(leaveOpen: leaveOpen)); SerializeChunked(value, pw, options); await pw.CompleteAsync();
  • public static ValueTask SerializeAsync(object? value, Type type, Stream stream, ...) — non-generic, same dispatch (coordinated with ACCORE-BIN-T-N9G6).
  • leaveOpen parameter standard for stream-async serializers (System.Text.Json, MessagePack convention).
  • The Bytes mode uses a pooled byte[] from ArrayBinaryOutput to keep alloc cost amortized.

SignalR migration coordination: the existing BinaryProtocolMode enum (in AyCode.Services) keeps shipping unchanged until the new public API is stabilized. After stabilization, BinaryProtocolMode becomes a deprecated alias of AcBinaryOutputMode, eventually removed in a major-bump. No SignalR-side churn during this TODO's implementation.

Acceptance:

  • SerializeAsync<T> round-trips against Deserialize<T>(byte[]) via MemoryStream in all three modes.
  • Cancellation propagates correctly (OperationCanceledException on cancelled token mid-stream).
  • Throughput matrix benchmark: 4 transports (MemoryStream, FileStream, NamedPipeStream, NetworkStream) × 3 modes × 3 payload sizes (small ~1 KB / medium ~100 KB / large ~10 MB). Results documented in Test_Benchmark_Results/Benchmark/SerializeAsync_Stream_Modes.LLM (or similar) and surfaced as a doc-string table for consumer guidance.
  • Memory-bounded benchmark: 100 MB payload to FileStream in AsyncSegment mode → peak managed-heap delta ≤ 1 MB throughout. Same payload in Bytes mode → peak ~100 MB (expected, documented).
  • API doc-string contains a "When to use which mode?" decision matrix; explicitly compares with MemoryPack's pseudo-streaming.
  • leaveOpen parameter behaves per the System.Text.Json / MessagePack convention across all three modes.

ACCORE-BIN-T-D7K4: Add DeserializeAsync(Stream, T) async overloads with mode-driven input strategy

Priority: P1 · Type: Feature · Related: ACCORE-BIN-T-T8K3 (companion write-side overload), ACCORE-BIN-T-N9G6 (non-generic Type-based dispatch)

Companion to T8K3 on the receive side. The mainstream serializer ecosystem (System.Text.Json, MessagePack, Newtonsoft.Json, MemoryPack) all expose DeserializeAsync<T>(Stream) — the symmetric counterpart of SerializeAsync(Stream, T). AcBinary's public API surface MUST include this overload for parity; consumers expect a Stream parameter for receive paths (file load, HTTP response body, network stream) and don't navigate PipeReader.Create(stream) workarounds. Market-entry-blocking otherwise.

Implementation: zero new IBinaryInputBase impl needed

The existing receive-side primitives cover the full strategy space via BCL PipeReader.Create(stream):

Mode Input strategy Peak memory Pipeline parallelism Use when
Bytes (default) await stream.CopyToAsync(MemoryStream)Deserialize<T>(byte[]) (existing overload) Full payload as byte[] (pooled) No Typical payloads (<10 MB), throughput-focus
Segment await PipeReader.Create(stream).ReadAsync()Deserialize<T>(ReadOnlySequence<byte>) (existing overload) PipeReader pause-threshold-bounded (~64 KB) No Mid-size payloads, no full byte[] alloc desired
AsyncSegment AsyncPipeReaderInput + DrainFromAsync(PipeReader.Create(stream)) + Deserialize<T>(input) (existing overload) Chunk-size-bounded (~8 KB) Yes (producer drain Task in parallel with deser Task) Very large payloads (>10 MB), memory-tight hosts

The AcBinaryOutputMode enum (introduced by T8K3) is symmetric — it controls deser-input strategy as well. The same enum value picks the matching read path. No new IBinaryInputBase implementation needed — the trio of existing inputs (ArrayBinaryInput, SequenceBinaryInput, AsyncPipeReaderInput) already cover all three modes; the new overload is a thin shim that wraps the Stream and routes to the right existing overload.

Public API shape

public static ValueTask<T?> DeserializeAsync<T>(
    Stream stream,
    AcBinarySerializerOptions? options = null,
    bool leaveOpen = false,
    CancellationToken ct = default);

// Non-generic Type-based variant (coordinated with N9G6):
public static ValueTask<object?> DeserializeAsync(
    Stream stream,
    Type targetType,
    AcBinarySerializerOptions? options = null,
    bool leaveOpen = false,
    CancellationToken ct = default);

Implementation outline (per mode)

// Bytes mode (default — simplest path, sub-LOH-friendly fast path):
public static async ValueTask<T?> DeserializeAsync_Bytes<T>(Stream stream, ..., CancellationToken ct)
{
    var rented = ArrayPool<byte>.Shared.Rent((int)Math.Min(stream.CanSeek ? stream.Length : 4096, int.MaxValue));
    try
    {
        var totalRead = 0;
        int read;
        while ((read = await stream.ReadAsync(rented.AsMemory(totalRead), ct)) > 0)
        {
            totalRead += read;
            if (totalRead == rented.Length) { /* grow rented */ }
        }
        return Deserialize<T>(rented, 0, totalRead, options);
    }
    finally { ArrayPool<byte>.Shared.Return(rented); }
}

// Segment mode (PipeReader.Create wrapping, then drain to ReadOnlySequence):
public static async ValueTask<T?> DeserializeAsync_Segment<T>(Stream stream, ..., CancellationToken ct)
{
    var pipeReader = PipeReader.Create(stream, new(leaveOpen: leaveOpen));
    var result = await pipeReader.ReadAtLeastAsync(int.MaxValue, ct);   // drain whole stream
    var seq = result.Buffer;
    var obj = Deserialize<T>(seq, options);
    pipeReader.AdvanceTo(seq.End);
    await pipeReader.CompleteAsync();
    return obj;
}

// AsyncSegment mode (chunked streaming pipeline, parallel drain + deser):
public static async ValueTask<T?> DeserializeAsync_AsyncSegment<T>(Stream stream, ..., CancellationToken ct)
{
    using var input = new AsyncPipeReaderInput(options.BufferWriterChunkSize * 2, multiMessage: false);
    var pipeReader = PipeReader.Create(stream, new(leaveOpen: leaveOpen));
    var deserTask = Task.Run(() => Deserialize<T>(input, options), ct);
    await input.DrainFromAsync(pipeReader, ct);
    await pipeReader.CompleteAsync();
    return await deserTask;
}

Honest performance positioning

Symmetric to T8K3's analysis:

  • Bytes mode: simplest, single contiguous byte[] (pooled) → Deserialize<T>(byte[]). Comparable to MemoryPack's DeserializeAsync (which does similar full-buffer-then-deser). Best for typical workloads.
  • Segment mode: zero-copy from PipeReader's natural ReadOnlySequence<byte> — no extra byte[] allocation. Best for mid-size payloads where allocation matters but pipeline overlap doesn't.
  • AsyncSegment mode: producer-drain Task and consumer-deser Task in parallel via AsyncPipeReaderInput. Wall-clock = max(network-drain, deser-CPU) + small overlap-cost. Best for large payloads + slow transports (network, mobile, satellite — where transit dominates and overlap pays).

Acceptance

  • DeserializeAsync<T> round-trips against SerializeAsync(Stream, T) (T8K3) via MemoryStream in all three modes.
  • Cancellation propagates correctly (OperationCanceledException on cancelled token mid-stream); partial-buffer state cleaned up; pooled byte[] returned even on cancellation.
  • Throughput matrix benchmark (mirror of T8K3): 4 transports (MemoryStream, FileStream, NamedPipeStream, NetworkStream) × 3 modes × 3 payload sizes. Results documented in Test_Benchmark_Results/Benchmark/DeserializeAsync_Stream_Modes.LLM.
  • Memory-bounded benchmark: 100 MB payload from FileStream in AsyncSegment mode → peak managed-heap delta ≤ 1 MB throughout. Same payload in Bytes mode → peak ~100 MB (expected, documented).
  • API doc-string contains a "When to use which mode?" decision matrix; cross-references T8K3's symmetric write-side guidance.
  • leaveOpen parameter behaves per the System.Text.Json / MessagePack convention across all three modes.

ACCORE-BIN-T-N9G6: Add non-generic Type-based Serialize(object, Type, ...) overloads

Priority: P2 · Type: Feature · Status: Closed (2026-05-04) · Related: ACCORE-BIN-T-T8K3

Resolution

Added in AcBinarySerializer.cs:

  • Serialize(object?, Type, opts)byte[]
  • Serialize(object?, Type, IBufferWriter<byte>, opts)int
  • SerializeChunked(object?, Type, PipeWriter, opts)int
  • SerializeChunkedFramed(object?, Type, PipeWriter, opts)int

Added in AcBinaryDeserializer.cs:

  • DeserializeFromPipeReaderAsync<T>(PipeReader, opts, ct)Task<T?>
  • DeserializeFromPipeReaderAsync(PipeReader, Type, opts, ct)Task<object?>

The Deserialize(byte[], Type, opts) / Deserialize(ReadOnlySequence<byte>, Type, opts) / Deserialize(AsyncPipeReaderInput, Type, opts) overloads already existed.

Consumed by ASP.NET Core MVC formatter package (AyCode.Services/Mvc/) — AcBinaryInputFormatter, AcBinaryOutputFormatter, AddAcBinaryFormatters extension. Media type: application/vnd.acbinary.

Plugin frameworks, ASP.NET ModelBinding, DI middleware, and DataContractSerializer-style "generic-API container" use-cases need to serialize an object whose type is known only at runtime. Current AcBinary surface forces a reflection trampoline through the generic Serialize<T>:

// Today's workaround (slow + noisy):
typeof(AcBinarySerializer).GetMethod("Serialize", new[] { type, typeof(AcBinarySerializerOptions) })
    .MakeGenericMethod(type).Invoke(null, new[] { value, options });

Implementation outline:

  • public static byte[] Serialize(object? value, Type type, AcBinarySerializerOptions? options = null)
  • public static int Serialize(object? value, Type type, IBufferWriter<byte> writer, AcBinarySerializerOptions? options = null)
  • public static int SerializeChunked(object? value, Type type, PipeWriter writer, AcBinarySerializerOptions? options = null) and Pipe overload
  • public static int SerializeChunkedFramed(object? value, Type type, PipeWriter writer, AcBinarySerializerOptions? options = null) and Pipe overload
  • public static ValueTask SerializeAsync(object? value, Type type, Stream stream, ...) — coordinated with ACCORE-BIN-T-T8K3
  • Internal dispatch: value.GetType() is the runtime type; the Type type parameter constrains the declared type for polymorphism handling (ObjectWithTypeName write decision).

Acceptance:

  • All non-generic overloads round-trip via the generic deserializer's Deserialize(byte[], Type) overload.
  • Plugin-style scenario: serialize IList<dynamic> of mixed-type elements → all elements correctly typed in the wire output.
  • API doc-strings call out the performance characteristics (slightly slower than generic due to runtime Type lookup but without the reflection trampoline cost).

ACCORE-BIN-T-R4P2: Expose low-level ref Writer-style API for custom formatters

Priority: P3 · Type: Feature

The MemoryPack-style Serialize<T>(ref MemoryPackWriter writer, in T value) low-level API enables:

  • Custom formatters that compose write primitives without the full Serialize entry-point overhead.
  • Nested-into-existing-stream scenarios where the caller already owns a writer-style cursor.
  • Test harnesses that exercise specific wire-format paths in isolation.

Today's BufferWriterBinaryOutput standalone-mode partly fills this gap — exposing WriteByte, WriteVarUInt, WriteStringUtf8, etc. — but it is not a ref struct, not a documented low-level public API for external custom formatters, and the relationship with BinarySerializationContext<TOutput> is unclear from the consumer's perspective.

Design tension (decide before implementing):

  1. Promote BufferWriterBinaryOutput to documented public surface — add doc, examples, supported usage patterns. Cheapest, but the standalone-mode is currently a side-feature, not a primary API; documenting it commits to its current shape.
  2. New ref struct AcBinaryWriter wrapper around BufferWriterBinaryOutput (or a dedicated impl) — explicit "this is the low-level writer" signal. More API surface but clearer mental model. Aesthetic alignment with MemoryPack.
  3. Skip entirely — the IBufferWriter<byte> overload is already lower-level than most consumers need; custom formatters can write to an ArrayBufferWriter<byte> and use IBufferWriter-style primitives. This is what BufferWriterBinaryOutput already does internally.

Recommendation: option 3 is honest — the existing IBufferWriter<byte> overload covers the use case, and adding a ref struct AcBinaryWriter is mostly aesthetic alignment with MemoryPack. Re-evaluate when there's a concrete custom-formatter request that the current API can't accommodate.

Acceptance (if implemented):

  • AcBinaryWriter ref struct (or equivalent) compiles, supports the same write primitives as BufferWriterBinaryOutput standalone-mode.
  • At least one example custom formatter ships in tests (e.g., a Vector3 struct formatter).
  • Doc-string clearly distinguishes when to use the low-level writer vs. the high-level Serialize<T> entry-point.

ACCORE-BIN-T-U6Y8: Attribute-driven polymorphism via [AcBinaryUnion] + SGen (opt-in, AOT-friendly)

Priority: P1 (if AOT target required) / P2 (non-AOT only) · Type: Feature

Design philosophy alignment: AcBinary's market positioning is "JSON-style flexibility with MessagePack-class speed" — attributes are opt-in optimization, never required. The runtime polymorphism path (AQN-based, today's default) stays the default and continues to work for arbitrary unattributed types. This TODO adds a fast/AOT path alongside it, never replaces it.

AcBinary today handles polymorphism at runtime: the wire writes ObjectWithTypeName(72) + AQN string, and the deserializer calls Type.GetType(aqn) to resolve. This is flexible (no upfront declaration), but has three significant drawbacks for some consumers:

  • AOT-incompatibleType.GetType(AQN) requires reflection metadata that the Native AOT trimmer strips by default. The runtime polymorphism path does not work at all under Native AOT. Hard blocker for AOT-targeting consumers (Blazor WASM, MAUI mobile, container-trimmed deployments).
  • Slower — AQN string parse + reflection lookup vs. a closed switch (tag) in code-gen.
  • Larger wire format — full AQN string (often 100+ bytes) vs. a single-byte tag.

Design — three coordinated pieces:

1. New 5th bool parameter on [AcBinarySerializable]: EnablePolymorphismFeature

Mirrors the existing EnableMetadataFeature / EnableIdTrackingFeature / EnableRefHandlingFeature / EnableInternStringFeature pattern. Per-type opt-out / opt-in via attribute parameter.

public AcBinarySerializableAttribute(
    bool enableMetadataFeature,
    bool enableIdTrackingFeature,
    bool enableRefHandlingFeature,
    bool enableInternStringFeature,
    bool enablePolymorphismFeature)   // ← ÚJ, default: true

Three behavior modes per type:

  • EnablePolymorphismFeature = falsedisabled. SGen never emits polymorphism dispatch for this type; runtime path also short-circuits — runtime type ≠ declared type is silently treated as declared (or throws, decision TBD). Use for hot-path closed types where polymorphism is impossible-by-design and the perf/AOT cost is unwanted.
  • EnablePolymorphismFeature = true (default), no [AcBinaryUnion]runtime options control. Behaves per AcBinarySerializerOptions.PolymorphismMode (Runtime/AQN today). This preserves the JSON-style flexibility for unattributed bases.
  • EnablePolymorphismFeature = true + [AcBinaryUnion(...)] declared → union-switch dispatch. SGen emits a closed switch (tag) dispatch using the declared subtype set. Fast + AOT-friendly. Overrides the options-level default for this type.

2. New [AcBinaryUnion(byte tag, Type subtype)] attribute

Multiple instances per base class / interface declare the closed polymorphism set:

[AcBinarySerializable]   // EnablePolymorphismFeature defaults to true
[AcBinaryUnion(0, typeof(Cat))]
[AcBinaryUnion(1, typeof(Dog))]
public abstract partial class Animal { ... }

SGen detects [AcBinaryUnion] on abstract / base type → emits the switch-based write/read dispatch instead of falling through to runtime AQN.

3. New PolymorphismMode enum on AcBinarySerializerOptions

Options-level default for unattributed polymorphism (i.e. the case where EnablePolymorphismFeature = true but no [AcBinaryUnion] is declared):

  • Runtime (today's default) — AQN-based. Flexible, AOT-incompatible.
  • Throw — fail fast on any polymorphic write that lacks a [AcBinaryUnion] attribute. AOT-friendly diagnostic mode for migration scenarios.

Note: there is no UnionAttribute-only mode — declaration is per-type via the attribute, not options-global. The options-level mode only governs the fallback when no [AcBinaryUnion] is present.

Wire-format addition:

New marker (e.g. UnionTagBase = <TBD>) + [byte tag][inner Object], parallel to existing ObjectWithTypeName(72). Slot number to be assigned avoiding clashes with existing 64134 / 192255 ranges.

Implementation outline:

  • AcBinarySerializableAttribute — new ctor parameter enablePolymorphismFeature, all existing ctors default it to true (backward compatible).
  • AcBinaryUnionAttribute — new attribute, AttributeUsage(AttributeTargets.Class | Interface, AllowMultiple = true).
  • Source generator — emit WriteUnion<TBase>(value, ctx, depth) and ReadUnion<TBase>(ctx, depth) static methods on the union-base type's generated writer/reader. Skipped entirely when EnablePolymorphismFeature = false.
  • Wire-format new marker + [byte tag][inner Object] body.
  • Runtime path: WriteValueNonPrimitive checks the wrapper's PolymorphismFeatureEnabled flag; when false, skips the value.GetType() != declaredType polymorphism branch entirely.

Acceptance:

  • EnablePolymorphismFeature = false: SGen-emitted dispatch contains zero is-typeof / GetType branches; runtime path also short-circuits. Verify in JIT disassembly.
  • EnablePolymorphismFeature = true, no union: runtime AQN polymorphism works as today (full backward compat); preserved JSON-style flexibility for unattributed bases.
  • EnablePolymorphismFeature = true + [AcBinaryUnion]: AOT-test (Native AOT publish) compiles and round-trips a polymorphic graph — Type.GetType() is never invoked on this path.
  • Benchmark: union-switch polymorphism measurably faster than AQN polymorphism on deser side (typed switch vs. reflection lookup).
  • Wire format documented in BINARY_FORMAT.md; BINARY_FEATURES.md cross-references the attribute pattern; BINARY_OPTIONS.md documents PolymorphismMode. AcBinarySerializableAttribute doc-string explains all three behavior modes.

ACCORE-BIN-T-B7H4: Implement AcBinarySerializerOptions thread-safety fix

Priority: P2 · Type: Refactor · Related: BINARY_ISSUES.md#accore-bin-i-l8n5 (canonical issue)

The latent thread-safety problem documented in ACCORE-BIN-I-L8N5 — mutable set; properties on AcBinarySerializerOptions shared across concurrent serialize/deserialize calls — needs a fix before AcBinary ships as a NuGet package. The package cannot constrain how consumers scope their options instances; defensive contract is needed in the serializer itself.

Three candidate fix directions (decide before implementing):

  1. Defensive copy on ingress — add AcBinarySerializerOptions Clone() method (member-wise copy). Every API entry point that retains an options instance clones it on entry. External mutation to the original becomes invisible to the holder.

    • Pro: non-breaking. Existing consumer code unchanged. No major version bump required.
    • Pro: API surface change limited to one new Clone() method.
    • Con: per-call clone overhead (small, but non-zero). Cache keyed on options-identity becomes invalid for downstream code using reference equality.
    • Con: doesn't fix the underlying mutability — internal code can still race-mutate the cloned snapshot if a method retains both the snapshot and modifies it concurrently.
  2. Immutable record refactorset;init; on all configuration properties. Mutation requires with-expression which produces a new instance.

    • Pro: type-system-strong guarantee. Race becomes a compile error, not a runtime corruption risk.
    • Pro: zero runtime overhead (init-only is compile-time check; record class semantics are unchanged at runtime).
    • Con: breaking change for any consumer doing opts.UseGeneratedCode = false after construction. Major version bump.
    • Con: source-generator coordination needed if SGen emits options-builder code that mutates properties.
  3. Read-only flag pattern (à la JsonSerializerOptions.MakeReadOnly()) — mutable by default, holder calls MakeReadOnly() on entry; subsequent property setters throw InvalidOperationException.

    • Pro: BCL-precedent — Microsoft adopted it for JsonSerializerOptions in .NET 7 (dotnet/runtime#74431) for exactly this problem. Familiar pattern for consumers.
    • Pro: minimal API surface change (one new method + IsReadOnly flag property).
    • Pro: per-call overhead = single bool check per setter call. Negligible.
    • Con: opt-in by the holder — if a custom consumer-side wrapper forgets to call MakeReadOnly(), the safety hole stays open for that wrapper's clients. Documentation-driven safety, not type-system-driven.
    • Con: bypasses static-analysis tooling (the setter signature stays public; the throw is runtime). IDE doesn't surface "this property is currently read-only" in autocomplete.

Recommendation: Option 3 (MakeReadOnly pattern) is the BCL-precedent, lowest-friction migration path. Microsoft adopted it for JsonSerializerOptions in .NET 7 to solve the same problem; AcBinary should follow the same pattern for consistency with consumers' mental model and zero migration cost.

Coordination with the existing AcBinaryHubProtocol setter side-effect (the second risk surface in ACCORE-BIN-I-L8N5): the protocol ctor currently mutates the caller-provided options reference (_options.BufferWriterChunkSize = options.BufferSize). After the fix:

  • Option 1 (Clone): ctor mutates the cloned snapshot → no side-channel to the caller. Fix transparent.
  • Option 2 (Immutable): ctor cannot mutate; needs to construct a new options via with-expression. Breaking change in the ctor's options-handling.
  • Option 3 (MakeReadOnly): ctor mutates before calling MakeReadOnly() — same as today, but explicit "frozen" point afterwards. Caller-side mutation post-ctor is now a runtime throw.

Implementation outline (Option 3 path):

  1. AcBinarySerializerOptions.IsReadOnly { get; } — public bool property.
  2. AcBinarySerializerOptions.MakeReadOnly() — sets the flag; idempotent (no-op if already set).
  3. All set; accessors guard: if (IsReadOnly) throw new InvalidOperationException("AcBinarySerializerOptions has been made read-only and can no longer be mutated. Construct a new options instance instead.");.
  4. AcBinarySerializer.Serialize<T> entry (and all sibling entries — Deserialize<T>, SerializeChunked, etc.): options.MakeReadOnly() before any property read.
  5. AcBinaryHubProtocol ctor: complete the BufferWriterChunkSize mutation before calling options.MakeReadOnly(). After ctor returns, the options instance is frozen for that protocol's lifetime.
  6. Doc-string update on AcBinarySerializerOptions class header: explicit "thread-safety contract" section explaining the freeze-on-first-use semantics.

Acceptance:

  • Concurrent stress test (16 threads × 1000 iterations) on a shared AcBinarySerializerOptions instance with property-mutation-attempts mid-iteration — all mutations after MakeReadOnly() throw InvalidOperationException; no silent corruption observed.
  • Existing tests pass unchanged (the MakeReadOnly is opt-in for the serializer entries; tests that build options + use them once continue to work transparently).
  • BINARY_ISSUES.md#accore-bin-i-l8n5 Status updated to Closed (YYYY-MM-DD) with a ### Resolution sub-section pointing to this TODO + the implementing commit.
  • Doc-string on AcBinarySerializerOptions documents the freeze-on-first-use contract; BINARY_FEATURES.md or BINARY_OPTIONS.md cross-references the BCL-precedent (JsonSerializerOptions.MakeReadOnly).

ACCORE-BIN-T-F8N3: Switch source-generator type-name hashing from simple-name to fully-qualified-name

Priority: P3 · Type: Refactor · Related: ACCORE-BIN-T-I3P8 (override mechanism for residual collisions)

The source generator's ComputeFnvHash(typeSymbol.Name) uses the simple name only (e.g. "User", not "MyApp.A.User"). Cross-namespace types with the same simple name silently collide on s_typeNameHash. The hash is currently only consumed by the WireMode=Metadata inline metadata-write path (cross-version property compat) — the framework explicitly does NOT add wire-format type-id (per CLAUDE.md Rule #7: type-dispatch is consumer responsibility, see BINARY_ASYNCPIPE_ISSUES.md#accore-bin-i-t6v2). Within UseMetadata, the simple-name collision can still cause silent property-set mismatches between two types with the same short name in different namespaces — this TODO fixes that.

Change scope (AcBinarySourceGenerator.cs) — 4 call sites: ComputeFnvHash(typeSymbol.Name)ComputeFnvHash(typeSymbol.ToDisplayString()):

  • Self type-name hash (~line 358)
  • Child type-name hash (~line 157)
  • Element type-name hash (~line 254)
  • Dict-value type-name hash (~line 311)

No runtime code changes; output regenerates with new constants on next build.

Breaking change scope: any saved binary stream that uses WireMode=Metadata and was produced by an older version embeds the old simple-name hash; consumers reading those streams with the new hash compute would mismatch and throw. Pre-1.0: acceptable. Post-1.0 would require a WireMode=Metadata format-version bump.

Acceptance:

  • All *_GeneratedWriter.g.cs files regenerate with FQN-based s_typeNameHash values.
  • Existing tests pass (auto-regen propagates; no manual hash literals in tests).
  • Wire format identical for WireMode=Compact (no metadata embedded).
  • UseMetadata=true paths produce different hashes — explicitly tested via round-trip.

ACCORE-BIN-T-I3P8: [AcBinaryTypeId(...)] attribute — explicit type-id override

Priority: P3 · Type: Feature · Related: ACCORE-BIN-T-F8N3 (FQN base hash being overridden)

Once ACCORE-BIN-T-F8N3 reduces collision frequency by switching to FQN, residual FQN-hash collisions are still possible (32-bit hash space, birthday paradox). Currently the only consumer of s_typeNameHash is the WireMode=Metadata inline metadata-write path — a residual collision there causes a silent property-set mismatch.

[AcBinaryTypeId(0x12345)] attribute on a class:

  • Source generator emits s_typeNameHash = 0x12345 instead of computing FNV.
  • Two types with the same [AcBinaryTypeId(...)] value → compile-time / first-use error.

Useful for:

  • Resolving rare FQN-hash collisions deterministically (within WireMode=Metadata).
  • Pinning a stable type-id across class renames (wire-compat across versions in Metadata mode).
  • Future-proofing: if a Layer 1 consumer (hypothetically) builds a type-dispatch above AcBinary using s_typeNameHash, the same override mechanism applies.

Acceptance:

  • New attribute class shipped alongside [AcBinarySerializable].
  • Generator honours the override (emits explicit constant instead of FNV result).
  • Tests: rename a class with [AcBinaryTypeId]s_typeNameHash unchanged.

ACCORE-BIN-T-X2M5: Evaluate xxHash3 vs FNV-1a for type-name hashes

Priority: P3 · Type: Investigation · Related: ACCORE-BIN-T-F8N3

FNV-1a is currently used for both s_typeNameHash and s_propertyHashes. For compile-time hashing, performance is irrelevant. For collision resistance:

  • FNV-1a 32-bit: ~50% collision at ~77K types (birthday paradox). Adequate for small/medium projects, marginal for large ones with many auto-generated types.
  • xxHash3 32-bit: comparable mathematical properties to FNV-1a (both non-cryptographic).
  • xxHash3 64-bit: dramatically better collision resistance (~50% at ~5B entries), at the cost of 8 wire bytes instead of 4.

Trigger: real collisions observed (1000+ types per assembly + cross-assembly aggregation), or community feedback indicating collision pain.

Investigation questions (no code change without a triggering pain signal):

  1. Switch to xxHash3 32-bit (incremental improvement) — but doubles the change scope (touch property hashes too if uniformity desired).
  2. Switch to xxHash3 64-bit (8 wire bytes instead of 4) — meaningful collision resistance, modest wire cost.
  3. Stay on FNV-1a + force [AcBinaryTypeId] for collisions — minimal change, devops burden.

Investigation only — defer until pain signal arrives.

ACCORE-BIN-T-K9E4: [RequiresDynamicCode] + [RequiresUnreferencedCode] on Runtime-only methods

Priority: P3 · Type: Refactor · Related: BINARY_FEATURES.md#nativeaot-compatibility

The Runtime path (factories in AcSerializerCommon + wrapper-based deserialize fallback in AcBinaryDeserializer) currently works under NativeAOT thanks to DAMs propagation + RuntimeFeature.IsDynamicCodeSupported guards, but the trimmer still emits warnings for the well-known blind spots (polymorphism via obj.GetType(), nested-type chain via generic argument extraction). The library suppresses these with [UnconditionalSuppressMessage] and documented justification.

A complementary signal would be to mark the Runtime entry points (or the factories themselves) with [RequiresDynamicCode("AcBinary Runtime path uses Reflection.Emit / closed-generic instantiation; use [AcBinarySerializable] + SGen for NativeAOT.")] and [RequiresUnreferencedCode("...")]. Effect:

  • AOT publish in consumer's project surfaces a warning at the call site → consumer chooses SGen or accepts the Runtime cost
  • Mirrors the System.Text.Json reflection-mode pattern ([RequiresDynamicCode] on JsonSerializer.Serialize<T> overloads)
  • One-codebase, no NuGet split needed
  • Cheap implementation — attribute placement only

Coordination: [RequiresDynamicCode] is contagious; every caller must either propagate it or suppress with [UnconditionalSuppressMessage]. Scope:

  • Public Serialize<T> / Deserialize<T> entry points stay attribute-free (consumer-facing)
  • Runtime fallback methods get the attribute (contained inside the library)
  • The DAMs annotations we already have stay — they're orthogonal (one prevents trim, the other warns about JIT-only behavior)

Acceptance:

  • Consumer's AOT publish surfaces a IL2026/IL3050 warning when UseGeneratedCode=false is set or an unattributed type is deserialized
  • SGen path is warning-free
  • Library compiles 0 warnings (suppressions added at the propagation barrier)
  • BINARY_FEATURES.md NativeAOT Compatibility section updated to mention the explicit warning signal

ACCORE-BIN-T-A2J7: Optional AyCode.Core.Aot NuGet variant (SGen-only build)

Priority: P3 · Type: Feature · Related: BINARY_FEATURES.md#nativeaot-compatibility, ACCORE-BIN-T-K9E4

Binary-size-sensitive AOT consumers (Blazor WASM, MAUI mobile, embedded, container-trimmed) benefit from a smaller library variant that strips the Runtime fallback path entirely. Estimated savings: ~80-150 KB of native code (~25-60 KB compressed wire size for WASM publish).

Strippable code in the .Aot variant:

Component LOC Purpose Removable in Aot?
AcSerializerCommon.Create* (7 factory methods + Expression-tree code) ~150 Runtime delegate compilation Yes
TypeMetadataBase runtime metadata path (CompiledConstructor, IdGetters via Expression.Compile) ~300 Reflection-based metadata Yes
AcBinaryDeserializer wrapper-based runtime fallback (PopulateObjectPropertiesIndexed, ReadObjectCoreWithWrapper non-SGen branches, CreateInstance(type) Activator-fallback) ~500 Runtime polymorphic dispatch Yes
Property accessor runtime delegate fields (_dynamicGetter, typed getter/setter caches outside SGen) ~150 Boxed property access Yes
System.Linq.Expressions transitive dependency Expression-tree IL emission Yes (when nothing else in graph uses it)

Implementation sketch (avoid #if-erdő via file-level split):

AyCode.Core/Serializers/
  AcSerializerCommon.cs              // SGen-safe shared parts
  AcSerializerCommon.Runtime.cs      // 7 Create* factory methods only here
  AcBinaryDeserializer.cs            // SGen path
  AcBinaryDeserializer.Runtime.cs    // wrapper-based runtime fallback path
  TypeMetadataBase.cs                // SGen-safe metadata
  TypeMetadataBase.Runtime.cs        // Expression.Compile-based ctor + accessor wiring

Two .csproj files:

  • AyCode.Core.csproj — full package (current); includes all files
  • AyCode.Core.Aot.csproj<Compile Remove="**/*.Runtime.cs" />; sets <PackageId>AyCode.Core.Aot</PackageId>; same version as full

Trade-offs:

  • No #if directives in business code — physically separate file groups
  • Source mostly shared via SDK include/exclude semantics
  • DAMs annotations and trim-suppressions only land in the full package; .Aot variant is genuinely trim-clean by construction
  • "Strict SGen" semantics in .Aot: a non-SGen type at deser time throws clearly instead of silently falling back. Marketing positioning: "guaranteed SGen path, no hidden slow lane".
  • ⚠️ Two NuGet IDs, two changelogs, version sync (CI-automatable)
  • ⚠️ Consumer must pick the right package — wrong choice = breaking switch later

Coordination:

  • Land ACCORE-BIN-T-K9E4 first ([RequiresDynamicCode] attributes) — if that pattern handles the consumer-side scenarios well, .Aot may not be needed
  • The current Runtime fallback code is already well-isolated (mostly in AcSerializerCommon factories + AcBinaryDeserializer wrapper-based methods), so the file-split refactor is mechanically straightforward
  • Marketing decision: is binary size a central pillar? If yes, .Aot is a NuGet differentiator; if not, K9E4 alone is enough

Acceptance:

  • AyCode.Core.Aot.csproj produces a NuGet ~25-60 KB smaller than AyCode.Core after compression
  • .Aot build emits zero IL/AOT trim warnings (no suppressions needed because the Runtime path code is physically removed)
  • Round-trip tests pass on .Aot for all SGen types
  • .Aot throws a clear InvalidOperationException (not MissingMethodException) when a non-[AcBinarySerializable] type is encountered at deser time
  • BINARY_FEATURES.md NativeAOT Compatibility section documents both packages and when to choose which

ACCORE-BIN-T-V4N2: .NET 11 SIMD-specialized UTF-8 decoder via multi-targeting

Priority: P3 · Type: Performance · Related: AcBinaryDeserializer.BinaryDeserializationContext.Read.cs::DecodeUtf8

The custom UTF-8 → UTF-16 decoder in DecodeUtf8 / CountUtf8Chars / DecodeUtf8ToChars currently targets .NET 9 — scalar two-pass with optional Vector256 ASCII prefix widen + DWORD ASCII batch (per Phase 1 optimization). .NET 11 (planned ~Nov 2026) exposes additional SIMD intrinsics that can meaningfully accelerate the decoder on AVX-512-capable hosts, particularly the vpcompressb-style mask-driven byte compression that simdutf relies on for its 64-byte AVX-512 transcoder.

Why .NET 11 specifically (and not .NET 10)

  • .NET 10: incremental SIMD improvements, but the changes that affect us are mostly inside the BCL (Encoding.UTF8.GetString internal SIMD widening). Our custom decoder bypasses the BCL — we don't benefit unless we hand-roll the same SIMD ourselves with .NET 9 intrinsics, which already work today. Multi-targeting net9.0;net10.0 adds CI/test overhead with marginal payoff. Skip.
  • .NET 11: PR #120628 (Vector512/Vector256 SIMD for UTF-8 utilities) was closed without merge but signals upcoming work in this area. Future iterations are expected to expose Avx512Vbmi-style mask-compress intrinsics that today require unsafe / Vector128-emulation paths. Target this once the framework lands.

Implementation outline (when triggered)

  • Multi-target <TargetFrameworks>net9.0;net11.0</TargetFrameworks> on AyCode.Core.csproj
  • #if NET11_0_OR_GREATER block in DecodeUtf8 selects an AVX-512-aware path: process 64-byte blocks via Vector512<byte> + vpcompressb for byte-stream extraction, fall back to the .NET 9 scalar+Vector256 path on non-AVX-512 hardware (Avx512Vbmi.IsSupported runtime check)
  • Reuse the .NET 9 scalar path for short strings (<64 bytes) — SIMD setup cost dominates
  • New benchmark cells comparing .NET 9 vs .NET 11 builds on the same hardware

Acceptance

  • dotnet test passes on both target frameworks
  • Benchmark on AVX-512 hardware (Sapphire Rapids / Zen 4+) shows ≥1.5x non-ASCII deser speedup vs .NET 9 build for strings ≥256 bytes
  • Short-string perf (≤64 bytes) within ±5% of .NET 9 build (no regression from multi-target setup)
  • BINARY_FEATURES.md documents the SIMD path selection logic

Trigger

  • Wait for .NET 11 release (or RC)
  • Re-evaluate once dotnet/runtime UTF-8 SIMD utilities re-land (post-PR #120628 follow-up)
  • Skip entirely if .NET 11 BCL Encoding.UTF8.GetString becomes fast enough that hybrid (≥256 bytes → BCL, <256 → custom) wins without hand-rolled SIMD

ACCORE-BIN-T-S5L8: Sentinel-length encoding for strings (wire-size optimization, both modes)

Priority: P3 · Type: Wire-format optimization · Related: AcBinarySerializer.WriteString, AcBinaryDeserializer.ReadValue string dispatch

The leading string-marker byte (String / StringEmpty / Null) exists primarily to distinguish null vs empty vs non-empty before dispatching. For non-polymorphic, non-interned string properties the marker can be replaced by a single sentinel-length VarUInt:

[VarUInt sentinelLength] [content bytes if applicable]
   sentinelLength == 0    → null
   sentinelLength == 1    → empty string
   sentinelLength == N+1  → string of N bytes/chars, content follows

MemoryPack-style encoding pattern. Applies to both Compact (UTF-8) and FastWire (UTF-16 raw) modes; the content following the sentinel differs by mode.

Per-mode impact

FastWire mode — wire layout today: [String marker][VarUInt charCount][UTF-16 raw bytes]. Sentinel saves 1 byte per non-null string.

TestData Current FastWire wire Estimated with sentinel Δ
Small 3122 B ~3050 B -2%
Medium 10905 B ~10500 B -4%
Large 68603 B ~67000 B -2%
Repeated 16244 B ~15700 B -3%
Deep 15514 B ~14900 B -4%

Closes the +1.7-8.1% FastWire wire gap vs MemoryPack to near zero or favorable while keeping AcBinary FastWire's +9-20% speed advantage.

Compact mode — wire layout today varies by length:

  • Short (≤31 byte): [FixStr+length][UTF-8 bytes] — already 1-byte marker, ties sentinel.
  • Long (>31 byte): [String marker][VarUInt byteCount][UTF-8 bytes] — sentinel saves 1 byte (the marker).

Compact gain: only on long strings (>31 byte UTF-8). Estimated 1 byte per long string. Workload-dependent: if most strings are short or use interning, gain is small. If many long mixed-content strings, meaningful saving.

Limitations (both modes)

  • Polymorphic object properties: marker needed for type discrimination. Sentinel encoding only applies when the property type is statically string or string?.
  • Interning incompatible: sentinel cannot express StringInternFirst / StringInterned markers (those carry cache-index semantics). Interned properties keep marker-based encoding. FastWire mode already disables interning by design (consistent); Compact mode needs per-property dispatch (interned → marker, non-interned → sentinel).
  • Compact-mode FixStr ties: short strings (≤31 byte UTF-8) gain nothing in Compact (FixStr is already 1-byte marker+length). The optimization wins only on long strings in Compact.

Implementation outline (rough — refine when implementing)

  1. Writer: branch in WriteString on property metadata flags (IsString, IsNotInterned, IsNotPolymorphic). If sentinel-eligible, emit VarUInt sentinelLength + content. Else fall through to existing marker-based encoding.
  2. Reader: matching branch in property reader. If sentinel-eligible (per property metadata), read VarUInt sentinelLength, dispatch on 0/1/N+1.
  3. SGen: emit sentinel-encoding variant for non-polymorphic non-interned string typed properties; emit existing marker-encoding for the rest.
  4. Wire format version bump OR header flag indicating sentinel-encoding-active. (Cross-version compat policy decided when implementing.)

Trigger

  • After D-2 / decoder optimization / marker-dispatch land (compact-mode focus completes)
  • When wire-size positioning becomes a primary pillar for NuGet release
  • Re-evaluate scope at implementation time — exact gain in Compact depends on consumer workload (long-string ratio, interning patterns)

Acceptance

  • FastWire mode: AcBinary wire ≤ MemoryPack on at least 4 of 5 test cells
  • Compact mode: long-string wire bytes -1 each, no regression on short or interned strings
  • Speed benchmark: no regression vs current encoding (essentially zero CPU cost — sentinel is shifted bookkeeping)
  • Cross-version compat: documented format version bump + clean fail on old reader / new wire mismatch
  • Polymorphic + interned property test cases pass unchanged (use existing marker-based encoding)

ACCORE-BIN-T-M3R7: ASCII marker-dispatch — writer detect + reader dedicated path

Priority: P2 · Type: Performance + wire optimization · Related: BinaryTypeCode.FixStrAsciiBase..StringAscii markers, WriteStringWithDispatch, ReadAsciiBytesAsString Status: Closed (2026-05-04)

Sorrendi megjegyzés: ezt AZ ENCODER OPTIMALIZÁCIÓ UTÁN csináljuk (lásd ACCORE-BIN-T-E2F9). Indok: a custom encoder/decoder Vector256 ASCII narrow/widen path-jai már magukban gyorsan kezelik az ASCII byte-ot. A marker-dispatch ezen FELÜL csak a per-call dispatch-overhead spórolást hozza (no Ascii.IsValid scan, no decoder layer). Garantált win, de additív — méréstechnikailag tisztább a decoder/encoder utánra hagyni.

The FixStrAscii* (135-166) and StringAscii (167) markers are defined in BinaryTypeCode.cs with helper methods (IsAsciiString, IsFixStrAscii, EncodeFixStrAscii, DecodeFixStrAsciiLength). Encoding/decoding logic NOT yet implemented — currently both writer and reader use the universal String / FixStr markers.

Implementation

  • Writer: in WriteStringUtf8 / WriteFixStrDirect, after UTF-8 encoding (D-2 path), check bytesWritten == charLength (= ASCII iff equal). If ASCII, emit FixStrAscii (≤31 byte) or StringAscii (>31 byte). Else emit existing FixStr / String. Free detect — both numbers already computed by D-2.
  • Reader: in ReadStringUtf8 (or upstream marker dispatch), branch on marker. ASCII markers → dedicated byte→char widening path (no UTF-8 decode, no Ascii.IsValid scan, no decoder dispatch). Non-ASCII markers → existing custom UTF-8 decoder.
  • SGen: regenerate readers/writers to dispatch on the new markers.
  • Re-enable ASCII fast paths: uncomment writer FixStr dispatch in AcBinarySerializer.cs and reader Ascii.IsValid block in ReadStringUtf8 — these temporarily disabled blocks become the marker-aware paths (no IsValid scan needed since the marker is the contract).

Wire format change

  • Format version bump (1 → 2). Old readers fail clean on new wire (version mismatch). New readers must reject old wire OR support backward read.

Acceptance

  • Repeated Strings (Hungarian content) Deser: AcBinary closes the ~10% gap vs MemoryPack
  • Pure ASCII tests (Small/Medium/Large/Deep): AcBinary Ser AND Deser ≥ MemoryPack
  • Wire size: minimum -25% vs MemoryPack across all test cells
  • SGen-generated code compiles and round-trips on all [AcBinarySerializable] types
  • Decision documented: backward-compat policy for v2 vs v1 wire

Resolution

End-to-end implementation landed (writer + reader + SGen + skip + populate). Key components:

  • Writer (AcBinarySerializer.BinarySerializationContext.WriteStringWithDispatch) — single-pass UTF-8 encode + ASCII detect via bytesWritten == charLength; emits one of 4 markers (FixStrAscii / FixStr / StringAscii / String). Split layout for hot path: charLength ≤ 31 encodes optimistically at savedPos+1 (FixStr position) → 0 shift on FixStr hit; charLength > 31 uses D-2 layout with backfill. The split avoids the post-encode left-shift that the unified layout introduced (regression seen in 12-42-32 bench).
  • Reader (AcBinaryDeserializer.BinaryDeserializationContext.ReadAsciiBytesAsString)Encoding.Latin1.GetString (BCL SIMD-accelerated byte→char widen). Avoids the string.Create callback + scalar widen overhead — measurably better on Small Deser cell (closed the +20% MemPack-relative anomaly).
  • TypeReaderTable: StringAscii (167) + 32 × FixStrAscii (135-166) readers registered. IsFixStrAscii / StringAscii fast paths in PopulatePropertyWithMarker, ReadValue, SkipValue.
  • SGen (AcBinarySourceGenerator.EmitReadString) — regenerated readers branch on IsFixStr / IsFixStrAscii / case StringAscii per property.

Wire format version not bumped — the new markers occupy previously-unused codepoints (135-167); old wire (without ASCII markers) is forward-compatible (readers handle both String and StringAscii). v1 stays.

Acceptance (AOT bench 13-40-29, MemPack-relative ratios — JIT noise eliminated):

  • AcBinary Ser AND Deser GYORSABB MemPack-nél MINDEN cellán (5/5)
    • Small: Ser -8%, Deser -23%
    • Medium: Ser -17%, Deser -30%
    • Large: Ser -28%, Deser -32%
    • Repeated: Ser -4%, Deser -9%
    • Deep: Ser -24%, Deser -22%
  • Wire size advantage: 2043-50419 byte (vs MemPack 3070-64986) = -22% to -33% across cells
  • Round-trip tests: 167 pass (13 pre-existing failures are IId-tracking, unrelated to M3R7)

JIT vs AOT note: earlier JIT-mode benchmarks (12-50-43 → 13-27-20 series) showed elevated ratios on Small/Repeated cells (1.0-1.2 range) that disappeared under AOT publish. The JIT-mode numbers reflect tier-up artifacts (inconsistent inlining of SGen-generated reader hot paths during the 1000-iteration measurement window), not a structural M3R7 property. AOT (NativeAOT / ILC) compiles deterministically with fixed inline decisions — the steady-state numbers above reflect the actual production performance.

ACCORE-BIN-T-E2F9: Custom UTF-8 encoder (writer-side, symmetric with custom decoder)

Priority: P1 · Type: Performance · Related: decoder optimization (AcBinaryDeserializer.BinaryDeserializationContext.Read.cs::DecodeUtf8SinglePass) Status: Closed (2026-05-04)

Sorrendi megjegyzés: ezt A MARKER-DISPATCH ELŐTT csináljuk (lásd ACCORE-BIN-T-M3R7). Indok: a custom encoder/decoder optimalizáció a "nehezebb, kevésbé biztos" win — a non-ASCII / mixed content workload-okat (Repeated Strings Hungarian) hozza be. A marker-dispatch utána már csak additív tisztítás a pure ASCII path dispatch-overhead-jén.

Replace Encoding.UTF8.GetBytes calls in WriteStringUtf8 / WriteStringUtf8Internal / WriteFixStrDirect (collectively the writer's UTF-8 encode path, post-D-2) with a hand-rolled SIMD encoder. Symmetric to the decoder optimization (V4N2 / Read.cs::DecodeUtf8SinglePass).

Layered structure (mirrors decoder)

  • Phase 1 — Vector256 ASCII narrow: 16 chars (Vector256) → 16 bytes (Vector128) via Vector256.Narrow. ASCII detect via (v & 0xFF80).ExtractMostSignificantBits() == 0 (any high bit on UTF-16 char). Break on first non-ASCII char.
  • Phase 2 — DWORD ASCII batch: 4 chars at a time, OR-mask test, 4 bytes per iter when ASCII.
  • Phase 3 — Scalar multi-byte encode: 1-byte (ASCII) / 2-byte (Latin extended) / 3-byte (BMP) / 4-byte (surrogate pair → supplementary plane) UTF-8 encoding via direct bit-extract. No fallback dispatch — input is trusted UTF-16 (string).
  • Use System.Text.Unicode.Utf8.FromUtf16 as fallback target for scalar correctness — or skip BCL entirely with manual bit-pack.

Why

Encoding.UTF8.GetBytes carries virtual-dispatch + encoder-fallback overhead even with SIMD ASCII fast path internally. Custom encoder skips this. ~15-30% Ser improvement on ASCII content, ~5-10% on non-ASCII (multi-byte path stays scalar).

Trigger

  • NEXT — implementation order P1 before marker-dispatch (M3R7)
  • Re-evaluate if .NET 11 BCL UTF-8 GetBytes becomes faster (PR #120628 follow-up)

Acceptance

  • Writer-side benchmark: ≥15% Ser speedup on ASCII content (Small/Medium/Large/Deep), ≥5% on non-ASCII (Repeated)
  • Wire format unchanged (custom encoder produces same bytes as Encoding.UTF8)
  • Round-trip tests pass

Resolution

Implemented as EncodeUtf8SinglePass in AcBinarySerializer.BinarySerializationContext.cs — three-phase layered encoder (Vector256 ASCII narrow + DWORD ASCII batch + scalar 1/2/3-byte BMP & 4-byte surrogate-pair). Bypasses Encoding.UTF8.GetBytes virtual-dispatch + encoder-fallback overhead. Trusted-input path — no validation pass on writer side (the input is a .NET string with valid UTF-16 surrogate pairs by construction).

Used by WriteStringUtf8 (D-2 single-pass with VarUInt backfill) and WriteStringWithDispatch (M3R7 marker-dispatch path). Wire format unchanged — the encoder produces the same bytes as Encoding.UTF8.GetBytes.

Acceptance (per bench 12-50-43 → 13-27-20, MemPack-relative ratios on AcBinary Compact FastMode SGen):

  • ASCII Ser ≥ MemPack on 4/5 cells (Small 0.94, Medium 0.80, Large 0.79, Deep 0.81)
  • ⚠️ Repeated Ser ~1.04 (Hungarian, multi-byte path scalar) — see follow-up ACCORE-BIN-T-H7K3
  • Round-trip tests pass (167 of 180; 13 pre-existing failures unrelated to encoder)

ACCORE-BIN-T-W7N5: Default-value omission policy — doc + optional opt-out

Priority: P2 · Type: Refactor + Documentation · Related: BINARY_ISSUES.md#accore-bin-i-d9y2 (canonical issue)

The serializer's PropertySkip (102) optimization saves 1 byte per default-valued property by omitting the full value from the wire — relying on the consumer-side type definition to have the same default(T). This is a latent correctness risk documented in ACCORE-BIN-I-D9Y2. This entry tracks the mitigation plan; full failure-mode analysis lives in the issue.

Decision tree (TBD when implementing)

  1. Doc-only: position as a deliberate protobuf-style feature; consumer keeps type defaults stable across versions. Lowest cost, maximum benchmark wire-size advantage retained.
  2. Option flag: AcBinarySerializerOptions.OmitDefaults boolean. Default true (preserves current behavior + benchmark numbers). false writes every property in full — opt-out for fragile-class-evolution scenarios.
  3. Both: ship doc + flag. Default behavior unchanged; consumers who hit silent-corruption have an explicit opt-out.

Acceptance (when implementing)

  • BINARY_FEATURES.md adds a "Default-Value Omission" section documenting the semantic and the tradeoff (with cross-ref to ACCORE-BIN-I-D9Y2)
  • If flag added: round-trip tests covering both true and false; benchmark comparison table showing wire-size delta on ASCII / Hungarian / DTO-heavy workloads
  • Decision rationale recorded in LLM_PROTOCOL_DECISIONS.md (or a ### Resolution block on the issue) once implemented

ACCORE-BIN-T-H7K3: Hungarian / multi-byte content Ser optimization (Repeated Strings cell)

Priority: P3 · Type: Performance · Related: EncodeUtf8SinglePass Phase 3 (scalar multi-byte encode), ACCORE-BIN-T-E2F9 resolution Status: Closed (2026-05-04) — Won't Fix (JIT-only artifact)

The Repeated Strings benchmark (Hungarian content: "TermékNév_…", "RaklapKód_…") still shows AcBinary Ser ratio ~1.04 vs MemPack across multiple runs (12-50-43 / 13-21-27 / 13-27-20 series). All other ASCII-heavy cells (Small/Medium/Large/Deep) sit in the 0.79-0.94 ratio range — Repeated is the outlier.

The Phase 3 scalar multi-byte branch in EncodeUtf8SinglePass (1-byte ASCII / 2-byte Latin-extended / 3-byte BMP / 4-byte surrogate-pair) processes Hungarian diacritics (á, é, í, ő, ű, etc.) as 2-byte UTF-8 sequences via scalar bit-extract. MemPack's UTF-8 encoder appears to use a SIMD-accelerated mixed-content lane that processes 2-byte sequences in parallel.

Resolution

AOT bench 13-40-29: Repeated Ser ratio = 0.96 (AcBinary 14.50 µs vs MemPack 15.05 µs, AcBinary GYORSABB by 4%). Deser ratio 0.91 (also faster).

The 1.04+ ratio observed in JIT-mode benchmarks (12-50-43, 13-21-27, 13-27-20) was a JIT tier-up artifact — the SGen-generated writer's hot path (which calls EncodeUtf8SinglePass) didn't reliably tier up to fully-optimized code within the 1000-iteration measurement window, while MemPack's writer apparently warmed up faster. Under NativeAOT publish (-p:_IsPublishing=true) the issue disappears completely — both writers are deterministically optimized at compile time.

No structural problem in the Phase 3 scalar branch. The investigation directions (Vector256 mixed-content lane, BCL Utf8.FromUtf16 comparison) remain valid academic improvements but show no meaningful production-time win — closing as Won't Fix.

ACCORE-BIN-T-S2X9: Markerless schema lane — drop per-property type markers for fixed-shape primitives (SGen)

Priority: P3 · Type: Wire-format extension · Related: ACCORE-BIN-T-S5L8, ACCORE-BIN-T-W7N5

AcBinary is marker-driven: every value on the wire carries a 1-byte type code, so the reader can dispatch generically (handles polymorphism, null, intern markers, type-name lookup, etc.). MemPack is schema-driven: the SGen reader knows at compile time that "field 3 is int, field 4 is string" and reads values directly with no type code, no run-time dispatch.

For fixed-shape primitive properties (int, bool, double, Guid, DateTime, …) on [AcBinarySerializable] types, the per-property type marker is pure overhead — the SGen-generated reader already has compile-time knowledge of the property type, so the marker only confirms what is already known. Dropping it on this narrow class of properties is a clean wire+CPU win without losing any of the polymorphism / null / intern flexibility that the marker provides for variable-shape values.

Wire savings per property type

Type Current encoding Markerless lane Wire saved
int (TinyInt range 16..47) TinyInt (1 byte) VarInt (1 byte) 0
int (out-of-tiny) [Int32] [VarInt] (2-6 bytes) VarInt (1-5 bytes) 1 byte
bool [True] or [False] (1 byte) 1 byte (0/1) 0
Guid [Guid] [16 bytes] (17 bytes) 16 bytes 1 byte
DateTime [DateTime] [9 bytes] (10 bytes) 9 bytes 1 byte
DateTimeOffset [DateTimeOffset] [10 bytes] (11 bytes) 10 bytes 1 byte
TimeSpan [TimeSpan] [VarLong] (2-9 bytes) VarLong (1-9 bytes) 1 byte
decimal [Decimal] [16 bytes] (17 bytes) 16 bytes 1 byte
double [Float64] [8 bytes] (9 bytes) 8 bytes 1 byte

DTO-heavy payloads with many Guid / DateTime properties benefit the most — easily -10..-20% wire size on top of the existing -22..-33% advantage.

CPU savings

Reader-side: SGen-generated code drops the per-property ReadByte() + IsTinyInt / IsFixStr / switch-case dispatch for primitive properties — direct context.ReadInt32Unsafe() / ReadGuidUnsafe() / etc. calls. Writer-side: drops the WriteByte(typeCode) per primitive. Effect amplifies on payloads with many primitive properties (Small/Medium benchmark cells) — independent of any JIT-vs-AOT measurement variance.

Sketch — opt-in markerless lane, SGen-only

  • New wire format flag (header HeaderFlag_MarkerlessSchema = 0x10 or similar) → activates a property-positional lane.
  • SGen-generated writer for [AcBinarySerializable] types: per primitive property, emits raw value (no marker). For variable-shape properties (string, complex, nullable, polymorphic) the existing marker-driven path stays.
  • SGen-generated reader: per primitive property, calls context.ReadInt32Unsafe() / ReadGuidUnsafe() / etc. directly. Variable-shape properties keep the marker-read + dispatch.
  • Heuristic: a property is markerless-eligible if IsValueType && !IsNullable && type is in {int, bool, byte, short, long, float, double, DateTime, DateTimeOffset, Guid, TimeSpan, decimal}. Anything else (string, list, nested object, nullable) keeps the marker.

Decision points

  • Backward compatibility: header flag + version negotiation. Old readers see the flag set and either reject (clean fail) or fall back to marker-driven (if they support both lanes). Default false preserves current wire format.
  • Schema evolution fragility: the markerless lane is positional, so adding/removing/reordering primitive properties breaks readers compiled against an older schema. Document this clearly — opt-in is for stable schemas only (DTO-frozen API contracts, internal SignalR messages with synchronized client/server SGen). For evolving schemas, marker-driven default stays.
  • Coordination with ACCORE-BIN-T-S5L8 (sentinel-length strings): the two could share the "no-marker per-call" infrastructure — markerless string lane uses sentinel-length VarUInt (null/empty/short distinguished by length value).

Acceptance

  • Wire size: ≥ -10% on DTO-heavy payloads (Guid/DateTime-rich) vs current marker-driven format
  • Round-trip on the markerless lane validated on representative DTO shapes (mixed primitive + string + nested object)
  • Schema-evolution fragility documented in BINARY_FEATURES.md (alongside the existing PropertySkip / default-omission caveat from ACCORE-BIN-I-D9Y2)
  • Opt-in flag with default false (preserves marker-driven default; consumers explicitly opt in for frozen-schema scenarios)