Refactor SequenceBinaryInput: zero-copy, docs, issues

- Rewrote SequenceBinaryInput for lazy TryGet iteration (no segment array allocation), zero-copy access to segment backing arrays, and efficient cross-boundary reads using a reusable ArrayPool scratch buffer.
- Added Release() to IBinaryInputBase; now always called after deserialization to return scratch buffer.
- BufferWriterChunkSize is now mutable; set to 4096 for SignalR protocol for better pipe alignment.
- Added and updated documentation: detailed input buffer lifecycle, cross-boundary handling, and new BINARY_ISSUES.md and SIGNALR_ISSUES.md for known limitations and planned optimizations.
- No breaking API changes; improves performance, memory usage, and diagnostics for multi-segment binary deserialization.
This commit is contained in:
Loretta 2026-04-07 10:33:38 +02:00
parent 91194fcfa3
commit 9f909f6380
12 changed files with 271 additions and 89 deletions

View File

@ -297,7 +297,11 @@ public static partial class AcBinaryDeserializer
var context = DeserializationContextPool<TInput>.Get(options);
context.InitInput(input);
try { return (T?)DeserializeCore(context, targetType); }
finally { DeserializationContextPool<TInput>.Return(context); }
finally
{
context.Input.Release();
DeserializationContextPool<TInput>.Return(context);
}
}
/// <summary>
@ -315,7 +319,11 @@ public static partial class AcBinaryDeserializer
var context = DeserializationContextPool<TInput>.Get(options);
context.InitInput(input);
try { return DeserializeCore(context, targetType); }
finally { DeserializationContextPool<TInput>.Return(context); }
finally
{
context.Input.Release();
DeserializationContextPool<TInput>.Return(context);
}
}
/// <summary>

View File

@ -155,7 +155,7 @@ public sealed class AcBinarySerializerOptions : AcSerializerOptions
///
/// Default: 65536 (64 KB)
/// </summary>
public int BufferWriterChunkSize { get; init; } = 65536;
public int BufferWriterChunkSize { get; set; } = 65536;
/// <summary>
/// Optional property-level filter invoked before metadata registration and serialization.

View File

@ -49,4 +49,10 @@ public struct ArrayBinaryInput : IBinaryInputBase
[MethodImpl(MethodImplOptions.NoInlining)]
public bool TryAdvanceSegment(ref byte[] buffer, ref int position, ref int bufferLength, int needed)
=> false;
/// <summary>
/// No-op — ArrayBinaryInput has no rented buffers to release.
/// </summary>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public void Release() { }
}

View File

@ -31,4 +31,11 @@ public interface IBinaryInputBase
/// </summary>
[MethodImpl(MethodImplOptions.NoInlining)]
bool TryAdvanceSegment(ref byte[] buffer, ref int position, ref int bufferLength, int needed);
/// <summary>
/// Releases any rented buffers (e.g. ArrayPool scratch in SequenceBinaryInput).
/// Must be called after deserialization completes.
/// For ArrayBinaryInput: no-op.
/// </summary>
void Release();
}

View File

@ -7,57 +7,42 @@ namespace AyCode.Core.Serializers.Binaries;
/// <summary>
/// Binary input that reads directly from a ReadOnlySequence (e.g. SignalR pipe, network stream).
/// Processes segments one-by-one without linearizing the entire payload.
/// Iterates segments lazily via TryGet — no upfront ArraySegment[] allocation.
///
/// For values that span segment boundaries (e.g. a 4-byte int split across 2 segments),
/// copies the overlapping bytes into a scratch buffer and reads from there.
/// The context's _buffer always points to the current segment's backing byte[] (zero-copy).
/// Cross-boundary values (straddling two+ segments) are copied into a small ArrayPool scratch buffer.
/// After the scratch read, _afterCrossBoundary restores the context to the segment's backing array.
///
/// Mirrors BufferWriterBinaryOutput pattern from the serializer side.
/// Typical overhead for a 225KB payload with 4096-byte segments:
/// ~224.5KB zero-copy reads, ~500 bytes scratch copies (at ~55 segment boundaries).
/// </summary>
public struct SequenceBinaryInput : IBinaryInputBase
{
// Pre-extracted segments from the ReadOnlySequence.
// Using ArraySegment avoids holding onto ReadOnlyMemory (which can't get byte[] without TryGetArray).
private readonly ArraySegment<byte>[] _segments;
private int _currentSegment;
private ReadOnlySequence<byte> _sequence;
private SequencePosition _nextPosition;
// Scratch buffer for cross-boundary reads — dynamically sized for large reads (strings, byte arrays)
// ArrayPool scratch for cross-boundary reads — lazy rent, reused across boundaries
private byte[]? _scratchBuffer;
// After a cross-boundary read, the next TryAdvanceSegment must load
// the remainder of _currentSegment (already adjusted) without incrementing.
private bool _afterCrossBoundary;
// After cross-boundary: saved state of the last touched segment for restore
private byte[]? _savedBuffer;
private int _savedPosition;
private int _savedBufferLength;
/// <summary>
/// Creates a SequenceBinaryInput from a multi-segment ReadOnlySequence.
/// Pre-extracts all segments as ArraySegment for fast iteration.
/// Creates a SequenceBinaryInput from a ReadOnlySequence.
/// Does NOT pre-extract segments — iterates lazily via TryGet.
/// </summary>
public SequenceBinaryInput(ReadOnlySequence<byte> sequence)
{
var segmentCount = 0;
foreach (var _ in sequence)
segmentCount++;
_segments = new ArraySegment<byte>[segmentCount];
var i = 0;
foreach (var memory in sequence)
{
if (MemoryMarshal.TryGetArray(memory, out var segment))
{
_segments[i++] = segment;
}
else
{
// Non-array-backed memory: copy to a temp array
var temp = new byte[memory.Length];
memory.Span.CopyTo(temp);
_segments[i++] = new ArraySegment<byte>(temp, 0, temp.Length);
}
}
_currentSegment = 0;
_sequence = sequence;
_nextPosition = sequence.Start;
_scratchBuffer = null;
_afterCrossBoundary = false;
_savedBuffer = null;
_savedPosition = 0;
_savedBufferLength = 0;
}
/// <summary>
@ -66,13 +51,10 @@ public struct SequenceBinaryInput : IBinaryInputBase
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public void Initialize(out byte[] buffer, out int position, out int bufferLength)
{
if (_segments.Length == 0)
if (!_sequence.TryGet(ref _nextPosition, out var memory))
throw new AcBinaryDeserializationException("Empty sequence — no segments to read.");
var seg = _segments[0];
buffer = seg.Array!;
position = seg.Offset;
bufferLength = seg.Offset + seg.Count;
ExtractArray(memory, out buffer, out position, out bufferLength);
}
/// <summary>
@ -82,78 +64,121 @@ public struct SequenceBinaryInput : IBinaryInputBase
[MethodImpl(MethodImplOptions.NoInlining)]
public bool TryAdvanceSegment(ref byte[] buffer, ref int position, ref int bufferLength, int needed)
{
// After cross-boundary scratch read: load the remainder of the current segment
// (already adjusted in TryReadCrossBoundary) without incrementing.
// After cross-boundary scratch read: restore to the last touched segment's backing array
if (_afterCrossBoundary)
{
_afterCrossBoundary = false;
var seg = _segments[_currentSegment];
buffer = seg.Array!;
position = seg.Offset;
bufferLength = seg.Offset + seg.Count;
return seg.Count > 0;
buffer = _savedBuffer!;
position = _savedPosition;
bufferLength = _savedBufferLength;
return position < bufferLength;
}
// Calculate remaining bytes in current segment
var remaining = bufferLength - position;
if (remaining > 0 && remaining < needed)
{
// Cross-boundary read: value spans two segments
// Cross-boundary: value spans segment boundary
return TryReadCrossBoundary(ref buffer, ref position, ref bufferLength, needed, remaining);
}
// Current segment fully consumed — advance to next
_currentSegment++;
if (_currentSegment >= _segments.Length)
return TryLoadNextSegment(ref buffer, ref position, ref bufferLength);
}
/// <summary>
/// Returns the ArrayPool scratch buffer if one was rented.
/// Must be called after deserialization completes.
/// </summary>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public void Release()
{
if (_scratchBuffer != null)
{
ArrayPool<byte>.Shared.Return(_scratchBuffer);
_scratchBuffer = null;
}
}
/// <summary>
/// Loads the next segment from the sequence via TryGet.
/// Extracts the backing byte[] for zero-copy access.
/// </summary>
private bool TryLoadNextSegment(ref byte[] buffer, ref int position, ref int bufferLength)
{
if (!_sequence.TryGet(ref _nextPosition, out var memory) || memory.Length == 0)
return false;
var seg2 = _segments[_currentSegment];
buffer = seg2.Array!;
position = seg2.Offset;
bufferLength = seg2.Offset + seg2.Count;
ExtractArray(memory, out buffer, out position, out bufferLength);
return true;
}
/// <summary>
/// Handles a read that spans two segments by copying the overlapping bytes
/// into a scratch buffer, then setting up the context to read from it.
/// After this read, the next EnsureAvailable will advance to the remainder of the new segment.
/// Handles a read that spans N segments by copying the overlapping bytes
/// into an ArrayPool scratch buffer. After this read, the next TryAdvanceSegment
/// restores the context to the last touched segment's backing array.
/// </summary>
private bool TryReadCrossBoundary(ref byte[] buffer, ref int position, ref int bufferLength, int needed, int remaining)
{
_currentSegment++;
if (_currentSegment >= _segments.Length)
return false;
// Rent scratch (or reuse if large enough)
if (_scratchBuffer == null || _scratchBuffer.Length < needed)
{
if (_scratchBuffer != null)
ArrayPool<byte>.Shared.Return(_scratchBuffer);
_scratchBuffer = ArrayPool<byte>.Shared.Rent(needed);
}
var nextSeg = _segments[_currentSegment];
var fromNext = Math.Min(needed - remaining, nextSeg.Count);
var scratchNeeded = remaining + fromNext;
// Dynamically size scratch buffer — handles large reads (strings, byte arrays)
if (_scratchBuffer == null || _scratchBuffer.Length < scratchNeeded)
_scratchBuffer = new byte[Math.Max(32, scratchNeeded)];
// Copy tail of current segment
// 1) Copy tail of current segment
Buffer.BlockCopy(buffer, position, _scratchBuffer, 0, remaining);
var filled = remaining;
// Copy head of next segment
Buffer.BlockCopy(nextSeg.Array!, nextSeg.Offset, _scratchBuffer, remaining, fromNext);
// 2) Copy from subsequent segments until we have enough
while (filled < needed)
{
if (!_sequence.TryGet(ref _nextPosition, out var memory) || memory.Length == 0)
return false;
// Set up context to read from scratch buffer
ExtractArray(memory, out var segArray, out var segOffset, out var segBufferLength);
var segCount = segBufferLength - segOffset;
var take = Math.Min(needed - filled, segCount);
Buffer.BlockCopy(segArray, segOffset, _scratchBuffer, filled, take);
filled += take;
// Save last touched segment for _afterCrossBoundary restore
_savedBuffer = segArray;
_savedPosition = segOffset + take;
_savedBufferLength = segBufferLength;
}
// Context reads from scratch buffer
buffer = _scratchBuffer;
position = 0;
bufferLength = scratchNeeded;
// Adjust the current segment to skip the bytes we already copied.
// The _afterCrossBoundary flag ensures the next TryAdvanceSegment
// loads this remainder without incrementing _currentSegment.
_segments[_currentSegment] = new ArraySegment<byte>(
nextSeg.Array!,
nextSeg.Offset + fromNext,
nextSeg.Count - fromNext);
bufferLength = filled;
_afterCrossBoundary = true;
return true;
}
/// <summary>
/// Extracts the backing byte[] from a ReadOnlyMemory segment.
/// Array-backed (99.9%): zero-copy reference to backing array.
/// Non-array-backed (native memory): copies to a managed byte[].
/// </summary>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static void ExtractArray(ReadOnlyMemory<byte> memory, out byte[] buffer, out int position, out int bufferLength)
{
if (MemoryMarshal.TryGetArray(memory, out var segment))
{
buffer = segment.Array!;
position = segment.Offset;
bufferLength = segment.Offset + segment.Count;
}
else
{
var temp = new byte[memory.Length];
memory.Span.CopyTo(temp);
buffer = temp;
position = 0;
bufferLength = temp.Length;
}
}
}

View File

@ -0,0 +1,58 @@
# Binary Serializer — Known Issues & Limitations
## Deserialization
### DESER-1: Non-array-backed memory — per-segment copy
**Status:** By design
**Affects:** `SequenceBinaryInput`
**Path:** `ExtractArray()` fallback when `MemoryMarshal.TryGetArray` fails
When `ReadOnlySequence<byte>` segments are backed by native memory (not managed `byte[]`), each segment is copied into a `new byte[]`. This is unavoidable — the context requires `byte[]` for `Unsafe.ReadUnaligned`, `AsSpan`, and `Encoding.GetString`.
**Impact:** Negligible. Non-array-backed `ReadOnlyMemory` is extremely rare (custom `MemoryManager<T>` with native memory, memory-mapped files). All standard .NET pools (`ArrayPool`, `MemoryPool.Shared`, Kestrel pipe) are array-backed.
### DESER-2: Cross-boundary scratch buffer is not pooled across calls
**Status:** Acceptable
**Affects:** `SequenceBinaryInput._scratchBuffer`
The scratch buffer is `ArrayPool.Rent`-ed on first cross-boundary read and reused within a single deserialization. It is `Return`-ed in `Release()` after deserialization completes. However, the next deserialization will rent again.
**Impact:** Minimal. `ArrayPool.Shared` reuses buffers efficiently. The scratch is typically small (4-16 bytes for fixed-width boundary reads). Large scratch (>4KB) only occurs when a string or byte[] straddles a segment boundary.
**Possible optimization:** Store the scratch buffer on the pooled `BinaryDeserializationContext` and reuse across deserializations. Low priority — `ArrayPool` overhead is negligible.
### DESER-3: ReadBytes always copies
**Status:** By design
**Affects:** `BinaryDeserializationContext.ReadBytes(int length)`
`ReadBytes` allocates a new `byte[]` and copies from the buffer. This is unavoidable because the caller owns the returned array, and the source buffer (pipe segment or serialized data) may be recycled.
### DESER-4: ReadStringUtf8 requires contiguous buffer
**Status:** By design
**Affects:** `BinaryDeserializationContext.ReadStringUtf8(int length)`
`Encoding.GetString` and `Ascii.IsValid` require contiguous memory. For multi-segment reads, `EnsureAvailable` copies cross-boundary bytes into the scratch buffer first. This is the same approach `SequenceReader<byte>` uses internally.
**Possible optimization:** Span-by-span UTF-8 decode for cross-boundary strings (like MessagePack). Low priority — most strings are shorter than a segment (4KB).
## Serialization
### SER-1: BufferWriterBinaryOutput fallback path allocates per-chunk
**Status:** Acceptable
**Affects:** `BufferWriterBinaryOutput.AcquireChunk` fallback
When `MemoryMarshal.TryGetArray` fails on `IBufferWriter.GetMemory()` (native memory-backed writer), a `byte[]` is rented from `ArrayPool` per chunk and copied to the writer on `Grow`/`Flush`. Same as DESER-1 — non-array-backed writers are extremely rare.
## Source Generator (SGen)
### SGEN-1: CS8625 warnings for non-nullable reference types
**Status:** Known
**Affects:** Generated reader code
The source generator emits `null` assignments for non-nullable reference type properties during deserialization (before the value is read from the stream). This produces CS8625 warnings. Functionally harmless — the property is always assigned before use.

View File

@ -86,3 +86,18 @@ Most important architectural decision in the output layer.
**Current:** writes on `BinarySerializationContext<TOutput>` (sealed class, hot path). Output struct handles only `Initialize`/`Grow`/`Flush` (cold path).
**Rule:** Do NOT move write methods to output. Measure with full benchmark suite before proposing changes.
## IBinaryInputBase (Read Side Mirror)
Deserialization mirrors the output pattern. `IBinaryInputBase` provides buffer lifecycle; all read methods live on `BinaryDeserializationContext<TInput>`.
```csharp
void Initialize(out byte[] buffer, out int position, out int bufferLength);
bool TryAdvanceSegment(ref byte[] buffer, ref int position, ref int bufferLength, int needed);
void Release();
```
- **ArrayBinaryInput:** single `byte[]`, `TryAdvanceSegment => false` (JIT-eliminated), `Release` no-op.
- **SequenceBinaryInput:** lazy `TryGet` iteration over `ReadOnlySequence<byte>`. Context `_buffer` points to segment backing `byte[]` (zero-copy). Cross-boundary: `ArrayPool` scratch, N-segment loop. `Release` returns scratch to pool.
> Known issues and limitations: `BINARY_ISSUES.md`

View File

@ -60,6 +60,7 @@ public class AcBinaryHubProtocol : IHubProtocol
public AcBinaryHubProtocol(AcBinarySerializerOptions options)
{
_options = options;
_options.BufferWriterChunkSize = 4096;
}
/// <summary>

View File

@ -4,6 +4,7 @@ Custom binary SignalR protocol, client infrastructure, message tagging, and seri
> **Architecture:** For full dispatch flow, tag system, and tech debt documentation see `AyCode.Services/docs/SIGNALR.md`.
> **Binary protocol:** For wire format, zero-copy pipeline, and three-path read logic see `AyCode.Services/docs/SIGNALR_BINARY_PROTOCOL.md`.
> **Known issues:** `AyCode.Services/docs/SIGNALR_ISSUES.md`
## Key Files

View File

@ -148,7 +148,11 @@ Zero-copy when possible: if single-segment and backing array matches exactly →
`struct SequenceBinaryInput : IBinaryInputBase` — reads from `ReadOnlySequence<byte>` without linearizing. Lazy iteration via `ReadOnlySequence.TryGet` — zero constructor allocation, no pre-extracted segment array.
Cross-boundary reads (e.g. 4-byte int split across 2 segments) use a small scratch buffer (32 bytes). Remainder tracking via `_remainderArray/Offset/Count` — no segment array mutation.
The context's `_buffer` always points directly to the current segment's backing `byte[]` (zero-copy). Cross-boundary reads (value straddling segment boundary) copy only the affected bytes into a small `ArrayPool`-rented scratch buffer. After the scratch read, `_afterCrossBoundary` flag restores the context to the next segment's backing array.
Typical overhead for 225KB payload with 4096-byte segments: ~224.5KB zero-copy, ~500 bytes scratch copy at ~55 boundaries. The scratch buffer is rented once (lazy, on first boundary) and reused across all boundaries. `Release()` returns it to `ArrayPool` after deserialization.
> Known issues: `AyCode.Core/docs/BINARY_ISSUES.md`
## Config

View File

@ -0,0 +1,57 @@
# SignalR — Known Issues & Limitations
## Protocol
### PROTO-1: Server-side IsRawBytesData pre-serialize
**Status:** Planned removal
**Affects:** `AcWebSignalRHubBase.SendMessageToClient`
The server forwards the client's `IsRawBytesData` flag in the response `SignalParams`. This causes the protocol to return raw `byte[]` instead of deserializing. The original design pre-serialized on the server side, but with the zero-copy typed deserialization path (`SignalDataType`), this is redundant.
**Plan:** Remove `IsRawBytesData` forwarding from server response path. The client should use `SignalDataType` for typed deserialization and explicit `byte[]` type for raw data.
### PROTO-2: Parameter serialization is per-parameter
**Status:** Known performance concern
**Affects:** `SignalParams.SetParameterValues` / `GetParameterValues`
Each parameter is individually serialized via `ToBinary()` / `BinaryTo(Type)` — N context pool acquire/release cycles. For many small primitives (int, bool, string) the per-call overhead may exceed a single bulk serialization.
**Possible optimization:** Batch fast-path — single serialization context for all parameters. Benchmark first.
### PROTO-3: Parameter serialization is AcBinary only
**Status:** Limitation
**Affects:** `SignalParams.SetParameterValues` / `GetParameterValues`
Uses `ToBinary()` / `BinaryTo()` exclusively. JSON parameter support would require dispatching on `DataSerializerType` + `AcJsonSerializer` reference. Low priority — binary is the primary transport.
## Transport
### TRANS-1: BufferWriterChunkSize defaults to 64KB for SignalR
**Status:** Optimization opportunity
**Affects:** `AyCodeBinaryHubProtocol` default constructor, write path
The default `BufferWriterChunkSize` is 65536 (from `AcBinarySerializerOptions.Default`). For SignalR/Kestrel, 4096 aligns better with the transport's internal segment size, reducing latency-to-first-byte.
**Plan:** Set `BufferWriterChunkSize = 4096` in `AyCodeBinaryHubProtocol` default constructor. The options property already exists (`AcBinarySerializerOptions.BufferWriterChunkSize`). Non-SignalR paths keep 64KB default.
### TRANS-2: WebSocket buffer sizes are hardcoded
**Status:** Acceptable
**Affects:** `AcSignalRClientBase` connection setup
Transport max message size (30MB) and application buffer (30MB) are hardcoded. Sufficient for current payloads but not configurable per-deployment.
## DataSource
### DS-1: GetAll returns raw byte[] for populate/merge
**Status:** By design
**Affects:** `AcSignalRDataSource.LoadDataSourceAsync`
The `GetAll` path uses `IsRawBytesData = true` to receive raw `byte[]` from the protocol, then deserializes into the existing list via `PopulateMerge`. This avoids allocating a temporary `List<T>` for merge. The extra copy (pipe → byte[]) is the trade-off.
**Possible optimization:** Direct typed deserialization with merge support in the deserializer (PopulateMerge from `ReadOnlySequence<byte>`). Requires deserializer API changes.

View File

@ -32,7 +32,7 @@ For full specification see `AyCode.Core/docs/BINARY_FORMAT.md`.
| **FixStr** | Compact string marker (103134). Encodes type + length in one byte for ASCII strings ≤31 bytes. |
| **TinyInt** | Compact integer marker (192255). Encodes small integers (16 to 47) in a single byte. |
| **VarInt / VarUInt** | Variable-length integer encoding. LEB128 for unsigned, ZigZag + LEB128 for signed. |
| **SequenceBinaryInput** | `struct : IBinaryInputBase` for reading from `ReadOnlySequence<byte>` (multi-segment pipe data). Lazy iteration via `TryGet` — zero constructor allocation. Cross-boundary reads use scratch buffer. Used by `AcBinaryDeserializer.Deserialize(ReadOnlySequence)` for multi-segment data. |
| **SequenceBinaryInput** | `struct : IBinaryInputBase` for reading from `ReadOnlySequence<byte>` (multi-segment pipe data). Lazy iteration via `TryGet` — zero constructor allocation. Context `_buffer` points directly to segment backing `byte[]` (zero-copy). Cross-boundary values use `ArrayPool`-rented scratch buffer (rent once, reuse, `Release()` returns). N-segment loop handles values spanning any number of segments. |
| **ArrayBinaryInput** | `struct : IBinaryInputBase` for reading from contiguous `byte[]`. Zero-copy when pipe is single-segment. Default fast-path for deserialization. |
| **HeaderFlags** | Byte at stream position 1 encoding serialization options: metadata, reference handling mode, cache count presence. Base `0x90`. |
| **Two-Phase Serialization** | Scan pass detects multi-referenced objects, serialize pass writes output using reference table. Required for `ReferenceHandling.All`. |