|
|
|
|
@ -0,0 +1,75 @@
|
|
|
|
|
# Binaries Implementation Details
|
|
|
|
|
|
|
|
|
|
This document covers the low-level technical decisions, memory management strategies, and internal structure of the `AcBinary` serializer. These details are intended for framework developers modifying or extending the serialization pipeline.
|
|
|
|
|
|
|
|
|
|
For format specifications, see `BINARY_FORMAT.md`. For options and presets, see `BINARY_OPTIONS.md`. For features, see `BINARY_FEATURES.md`.
|
|
|
|
|
|
|
|
|
|
## Zero-Allocation Buffer Management
|
|
|
|
|
|
|
|
|
|
The core design philosophy of `AcBinarySerializer` is **Zero Virtual Dispatch** and **Zero Direct Allocation** on the hot path.
|
|
|
|
|
|
|
|
|
|
### Context-Owned Buffer State
|
|
|
|
|
Instead of passing an `IBinaryWriter` interface down the object graph (which forces a virtual method call for every single byte or integer written), the `BinarySerializationContext<TOutput>` strictly owns the buffer state:
|
|
|
|
|
|
|
|
|
|
```csharp
|
|
|
|
|
internal byte[] _buffer = null!;
|
|
|
|
|
internal int _position;
|
|
|
|
|
internal int _bufferEnd;
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
All write methods (`WriteByte`, `WriteVarUInt`, `WriteStringUtf8`, etc.) are declared directly on the sealed context class and aggressively inlined.
|
|
|
|
|
|
|
|
|
|
### No Temporary Buffers for Strings
|
|
|
|
|
When writing strings, the serializer **never** allocates an intermediate `byte[]` to perform UTF-8 encoding. It writes directly to the context's pinned `_buffer` using `System.Span` and `Ascii.FromUtf16` / `Utf8NoBom.GetBytes`.
|
|
|
|
|
|
|
|
|
|
**Speculative ASCII Fast Path:**
|
|
|
|
|
1. Assume the string is purely ASCII (byte length = char length).
|
|
|
|
|
2. Ensure capacity for `value.Length`.
|
|
|
|
|
3. Try `Ascii.FromUtf16` directly into the final buffer.
|
|
|
|
|
4. If it hits a non-ASCII character, it safely aborts. The serializer then rewinds the `position`, calculates true UTF-8 byte length, and uses `Utf8.GetBytes`.
|
|
|
|
|
|
|
|
|
|
### Abstracting the Output (The `TOutput` Strategy)
|
|
|
|
|
To support both `byte[]` returns and streaming models (via `IBufferWriter<byte>`), the context is generic over `TOutput : struct, IBinaryOutputBase`.
|
|
|
|
|
|
|
|
|
|
The `TOutput` struct is **only** invoked on the cold path (when `_position >= _bufferEnd`).
|
|
|
|
|
- `ArrayBinaryOutput`: Rents a newly doubled array from `ArrayPool`, copies existing data across, and returns the old array to the pool. When serialization finishes, a final buffer slice is returned (often avoiding `ToArray()` allocations altogether if wrapped in a `BinarySerializationResult`).
|
|
|
|
|
- `BufferWriterBinaryOutput`: Attempts to acquire a new chunk directly from the underlying `IBufferWriter` using `MemoryMarshal.TryGetArray`. If the `IBufferWriter` isn't backed by an array (e.g. native memory), it falls back to renting a temporary buffer from `ArrayPool`, writing to it, and later copying the chunk to the `IBufferWriter` on `Flush()` or `Grow()`. Otherwise, it writes directly to the `IBufferWriter`'s backing array without extra copying.
|
|
|
|
|
|
|
|
|
|
Because `TOutput` is a generic struct constraint, the JIT completely devirtualizes the `Grow()` calls.
|
|
|
|
|
|
|
|
|
|
## Direct Object Write (IsDirectObjectWrite)
|
|
|
|
|
|
|
|
|
|
When `UseMetadata = false`, there is no need to track inline property name hashes. The [Source Generator] (SGen) can completely bypass the generic `WriteObject` loop.
|
|
|
|
|
|
|
|
|
|
When SGen code executes, it checks `IsDirectObjectWrite`. If true, it writes the `Object(64)` marker and immediately outputs properties sequentially without any framework-level `foreach` or reflection. This reduces the overhead of an object write effectively to the time it takes to write raw bytes to an array.
|
|
|
|
|
|
|
|
|
|
## Property State Buffering
|
|
|
|
|
|
|
|
|
|
During the `UseMetadata=true` cross-type deserialization phase, properties might arrive in a different order than expected. The deserializer maintains a temporary state buffer (rented from `ArrayPool`) to track which properties have been seen and which have been populated, allowing it to efficiently map wire hashes to local property setters.
|
|
|
|
|
|
|
|
|
|
## High-Performance Coding Guidelines (LLM / Contributor Rules)
|
|
|
|
|
|
|
|
|
|
The following patterns are strictly enforced within the serialization pipeline. AI agents and developers modifying this layer MUST adhere to these rules to maintain zero-allocation and high-throughput characteristics.
|
|
|
|
|
|
|
|
|
|
### 1. The "Write Plan" (O(1) Reference Tracking)
|
|
|
|
|
**Rule:** Never use `Dictionary.ContainsKey`, `HashSet.Contains`, or similar lookups during the serialization hot path.
|
|
|
|
|
|
|
|
|
|
**Implementation:**
|
|
|
|
|
Reference tracking and string interning use a two-phase approach:
|
|
|
|
|
1. **Scan Pass:** Walks the object graph and populates an array-based `WriteDuplicateEntry[]` "Write Plan" representing occurrences of duplicates.
|
|
|
|
|
2. **Serialize Pass:** Iterates sequentially. The framework advances an integer cursor (`WriteVisitIndex`) and compares it to a pre-calculated index via `TryConsumeWritePlanEntry()`. This provides `O(1)` duplicate detection without any hashing overhead during the actual byte-writing phase.
|
|
|
|
|
|
|
|
|
|
### 2. Unsafe and SIMD Memory Access
|
|
|
|
|
**Rule:** Do not use `BitConverter` or manual bit-shifting for primitive unmanaged types or block copying.
|
|
|
|
|
|
|
|
|
|
**Implementation:**
|
|
|
|
|
- For structs and unmanaged types (e.g., `Guid`, `DateTime`, `decimal`), use `Unsafe.WriteUnaligned<T>` and `MemoryMarshal.AsBytes()` to write directly to the buffer.
|
|
|
|
|
- For contiguous block memory copies (e.g., `byte[]`), use `System.Span<T>.CopyTo` or SIMD hardware acceleration (e.g., `Vector<byte>` and `Vector.IsHardwareAccelerated` as seen in `WriteBytesSimd`).
|
|
|
|
|
|
|
|
|
|
### 3. Hot-Path vs. Cold-Path Inlining (VarInt/VarUInt)
|
|
|
|
|
**Rule:** Ensure the JIT can inline the common cases by explicitly isolating the slow paths.
|
|
|
|
|
|
|
|
|
|
**Implementation:**
|
|
|
|
|
Methods called in tight loops are small and decorated with `[MethodImpl(MethodImplOptions.AggressiveInlining)]`. Heavy branching or loop logic (which prevents inlining) is extracted into separate methods:
|
|
|
|
|
- **Hot path:** Checks if the value fits in a single byte (e.g., `value < 0x80`). Aggressively inlined.
|
|
|
|
|
- **Cold path:** Multi-byte encoding logic is placed in a secondary method (e.g., `WriteVarUIntMultiByteUnsafe`) decorated with `[MethodImpl(MethodImplOptions.NoInlining)]`. This keeps the calling method's IL size extremely small and processor cache-friendly.
|