AyCode.Core/AyCode.Core/docs/BINARY/BINARY_ISSUES.md

8.5 KiB
Raw Blame History

Binary Serializer — Known Issues & Limitations

Deserialization

BIN-I-1: Non-array-backed memory — per-segment copy

Status: Open Affects: SequenceBinaryInput Path: ExtractArray() fallback when MemoryMarshal.TryGetArray fails

When ReadOnlySequence<byte> segments are backed by native memory (not managed byte[]), each segment is copied into a new byte[]. This is unavoidable — the context requires byte[] for Unsafe.ReadUnaligned, AsSpan, and Encoding.GetString.

Impact: Negligible. Non-array-backed ReadOnlyMemory is extremely rare (custom MemoryManager<T> with native memory, memory-mapped files). All standard .NET pools (ArrayPool, MemoryPool.Shared, Kestrel pipe) are array-backed.

BIN-I-2: Cross-boundary scratch buffer is not pooled across calls

Status: Open Affects: SequenceBinaryInput._scratchBuffer

The scratch buffer is ArrayPool.Rent-ed on first cross-boundary read and reused within a single deserialization. It is Return-ed in Release() after deserialization completes. However, the next deserialization will rent again.

Impact: Minimal. ArrayPool.Shared reuses buffers efficiently. The scratch is typically small (4-16 bytes for fixed-width boundary reads). Large scratch (>4KB) only occurs when a string or byte[] straddles a segment boundary.

Possible optimization: Store the scratch buffer on the pooled BinaryDeserializationContext and reuse across deserializations. Low priority — ArrayPool overhead is negligible.

BIN-I-3: ReadBytes always copies

Status: Open Affects: BinaryDeserializationContext.ReadBytes(int length)

ReadBytes allocates a new byte[] and copies from the buffer. This is unavoidable because the caller owns the returned array, and the source buffer (pipe segment or serialized data) may be recycled.

BIN-I-4: ReadStringUtf8 requires contiguous buffer

Status: Open Affects: BinaryDeserializationContext.ReadStringUtf8(int length)

Encoding.GetString and Ascii.IsValid require contiguous memory. For multi-segment reads, EnsureAvailable copies cross-boundary bytes into the scratch buffer first. This is the same approach SequenceReader<byte> uses internally.

Possible optimization: Span-by-span UTF-8 decode for cross-boundary strings (like MessagePack). Low priority — most strings are shorter than a segment (4KB).

Serialization

BIN-I-5: BufferWriterBinaryOutput fallback path allocates per-chunk

Status: Open Affects: BufferWriterBinaryOutput.AcquireChunk fallback

When MemoryMarshal.TryGetArray fails on IBufferWriter.GetMemory() (native memory-backed writer), a byte[] is rented from ArrayPool per chunk and copied to the writer on Grow/Flush. Same as BIN-I-1 — non-array-backed writers are extremely rare.

BIN-I-6: AsyncPipeWriterOutput uses sync GetResult() for backpressure

Status: Open Affects: AsyncPipeWriterOutput.Grow()_lastFlush.GetAwaiter().GetResult()

When the previous PipeWriter.FlushAsync() hasn't completed by the next Grow() call, the serializer blocks the thread until the flush completes. This is necessary because IHubProtocol.WriteMessage is void (synchronous by design).

Impact: Minimal under normal conditions. PipeWriter.FlushAsync() writes to an in-memory Kestrel pipe (not directly to the network) and typically completes synchronously. Only blocks when the pipe's internal buffer hits its pause threshold (~1MB), which requires an extremely slow client + large payload. The Bytes mode (default) has the same blocking characteristic — it blocks the thread for the entire serialization + single flush.

Possible optimization: AsyncSegment mode (future) with a custom async WriteMessageAsync protocol interface, enabling await on flush instead of GetResult().

BIN-I-7: AsyncPipeWriterOutput fallback path — same as BIN-I-5

Status: Open Affects: AsyncPipeWriterOutput.AcquireChunk fallback

Same TryGetArray fallback as BufferWriterBinaryOutput (BIN-I-5). Kestrel PipeWriter.GetMemory() always returns array-backed memory — fallback is for non-standard PipeWriter implementations only.

Deserialization (PipeReader)

BIN-I-8: PipeReaderBinaryInput uses sync ReadAsync().GetResult()

Status: Open Affects: PipeReaderBinaryInput.Initialize() and TryAdvanceSegment()

Same constraint as BIN-I-6 — IBinaryInputBase interface is synchronous. ReadAsync().GetAwaiter().GetResult() blocks when waiting for more data from the pipe. Currently not used in production (SignalR delivers complete messages via TryParseMessage). Reserved for future direct-pipe deserialization scenarios.

Source Generator (SGen)

BIN-I-9: CS8625 warnings for non-nullable reference types

Status: Open Affects: Generated reader code

The source generator emits null assignments for non-nullable reference type properties during deserialization (before the value is read from the stream). This produces CS8625 warnings. Functionally harmless — the property is always assigned before use.

BIN-I-10: First-run cold-start overhead

Status: Open Affects: First Serialize<T>/Deserialize<T> per [AcBinarySerializable] type, per process

Cold-start cost chain on first use of an SGen type (before BIN-T-3 lands):

  1. BinarySerializeTypeMetadata ctor — reflection property enumeration + GetCustomAttribute scans
  2. Expression.Compile per property accessor (dynamic getter + typed getters) — dominant cost
  3. TypeMetadataWrapper ctor — GeneratedWriterRegistry + GeneratedReaderRegistry lookups, tracking state init
  4. JIT of WriteObject / WriteObjectProperties / scan pass
  5. JIT of generated WriteProperties / ScanObject / ScanForDuplicates (size scales with property count)
  6. Cascade: each referenced child type repeats steps 15

Subsequent calls hit cached metadata/wrappers → only Tier 0→1 JIT transition remains (background, async).

Dominant cost today: #1#2 (reflection + Expression.Compile). After BIN-T-3, the dominant residual cost shifts to #4#5 (JIT), addressed by BIN-T-4.

Impact: Measurable first-call latency — larger for types with many properties or deep graphs. For SignalR workloads the first message per entity type pays this tax.

BIN-I-11: Consumer entity with new Id shadowing — excluded from SGen

Status: Open Affects: Any consumer entity whose base class hides BaseEntity.Id with readonly new int Id { get; } pattern (e.g. DiscountProductMapping in Mango.Nop.Core)

When the base class shadows Id with a setter-less new int Id { get; }, SGen can't emit a setter without CS0200. Runtime falls back to compiled-expression serialization for these types. Low priority — affects a small number of consumer entities.

Related TODO: BINARY_TODO.md#bin-t-2

Buffer Writer (BWO)

BIN-I-12: Struct copy semantics

Status: Open Affects: BufferWriterBinaryOutput value-type assignment

Assigning a BufferWriterBinaryOutput value creates an independent copy. State changes (e.g. _committedBytes via Grow/Flush) are not reflected in the original. Copy back after use if needed.

BIN-I-13: Initialize resets tracking

Status: Open Affects: BufferWriterBinaryOutput.Initialize (context mode)

Initialize sets _committedBytes = 0. Standalone bytes written before are lost if the BWO is then passed to a context. Call FlushAndReset() first, or track standalone bytes separately.

BIN-I-14: Constructor acquires chunk

Status: Open Affects: BufferWriterBinaryOutput ctor

AcquireChunk runs in ctor for standalone readiness. Redundant if only context mode is used (context Initialize acquires its own). Not a leak — consecutive GetMemory without Advance returns overlapping memory.

BIN-I-15: No mode mixing

Status: Open Affects: BufferWriterBinaryOutput — context vs standalone mode

A single instance must not use context + standalone modes simultaneously — buffer states desynchronize. One mode per lifecycle phase; FlushAndReset() as boundary between modes.

Cross-cutting (canonical home: ../XCUT/)

XCUT-I-1: JSON-in-Binary request parameters — cross-ref

Canonical entry: ../XCUT/XCUT_ISSUES.md#xcut-i-1. Summary: client→server request parameters currently use JSON inside a Binary envelope (SignalPostJsonDataMessage<T>); response path is already pure Binary. Planned migration is tracked in BINARY_TODO.md#bin-t-1 but requires coordinated client+server+consumer changes. Do NOT attempt as a side-effect.