388 lines
19 KiB
Markdown
388 lines
19 KiB
Markdown
# AcBinary Wire Format
|
||
|
||
Complete wire format specification for the AcBinary serializer. Source of truth: [`AyCode.Core/Serializers/Binaries/BinaryTypeCode.cs`](../AyCode.Core/Serializers/Binaries/BinaryTypeCode.cs).
|
||
|
||
## Stream Layout
|
||
|
||
```
|
||
[version : 1 byte] [flags : 1 byte] [cacheCount : VarUInt?] [payload...]
|
||
```
|
||
|
||
- **version** — `FormatVersion = 1` (current).
|
||
- **flags** — See [Header Flags](#header-flags).
|
||
- **cacheCount** — Present only when `HeaderFlag_HasCacheCount` is set. Number of type wrapper slots used by serializer.
|
||
|
||
## Header Flags
|
||
|
||
The flags byte uses `0x90` (144) as base with bit flags in the lower nibble:
|
||
|
||
| Bit | Mask | Flag | Meaning |
|
||
|-----|------|------|---------|
|
||
| 0 | `0x01` | Metadata | Property hash metadata included (cross-type deserialization) |
|
||
| 1 | `0x02` | RefHandling_OnlyId | Reference tracking for `IId` objects only |
|
||
| 2 | `0x04` | RefHandling_All | Reference tracking for all objects (always combined with bit 1) |
|
||
| 3 | `0x08` | HasCacheCount | VarUInt cache count follows the flags byte |
|
||
|
||
**Reference handling modes:** None = `0x00`, OnlyId = `0x02`, All = `0x06` (bits 1+2).
|
||
|
||
## Variable-Length Encoding
|
||
|
||
### VarUInt (unsigned)
|
||
|
||
LEB128: 7 data bits per byte, MSB = continuation flag.
|
||
|
||
```
|
||
value < 128 → 1 byte [0xxxxxxx]
|
||
value < 16384 → 2 bytes [1xxxxxxx] [0xxxxxxx]
|
||
value < 2097152 → 3 bytes ...
|
||
(max 5 bytes for uint32)
|
||
```
|
||
|
||
### VarInt (signed)
|
||
|
||
ZigZag encoding maps signed to unsigned, then LEB128:
|
||
|
||
```
|
||
encode: (value << 1) ^ (value >> 31)
|
||
decode: (raw >> 1) ^ -(raw & 1)
|
||
```
|
||
|
||
Maps: `0 → 0`, `-1 → 1`, `1 → 2`, `-2 → 3`, etc.
|
||
|
||
### VarULong (unsigned 64-bit)
|
||
|
||
Same LEB128 encoding, max 10 bytes for uint64.
|
||
|
||
## Type Markers
|
||
|
||
All markers defined in `BinaryTypeCode.cs`. `SlotCount = 64`.
|
||
|
||
### FixObj (0–63)
|
||
|
||
Single-byte object type. The marker byte **is** the type slot index — no additional type identifier needed.
|
||
|
||
```
|
||
[FixObj(N)] [properties...]
|
||
```
|
||
|
||
**Slot allocation:** Slots 0–63 are reserved for runtime polymorphic types, assigned dynamically on first encounter during serialization. Source-generated (SGen) types receive slots starting at 64+ via `AllocateWrapperSlot()` (sequential, `Interlocked.Increment`). SGen slots are compile-time stable; runtime slots depend on serialization order.
|
||
|
||
### Complex Types (64–71)
|
||
|
||
| Code | Name | Wire format |
|
||
|------|------|-------------|
|
||
| 64 | Object | `[64] [VarUInt typeIndex] [properties...]` |
|
||
| 65 | ObjectRef | `[65] [VarUInt refCacheIndex]` |
|
||
| 66 | Array | `[66] [VarUInt count] [elements...]` |
|
||
| 67 | Dictionary | `[67] [VarUInt count] [key, value pairs...]` |
|
||
| 68 | ByteArray | `[68] [VarUInt length] [raw bytes]` |
|
||
| 69 | ObjectWithMetadata | `[69] [VarUInt typeIndex] [VarUInt hashCount] [hashes...] [properties...]` |
|
||
| 70 | ObjectRefFirst | `[70] [VarUInt refCacheIndex] [object body...]` |
|
||
| 71 | ObjectWithMetadataRefFirst | `[71] [VarUInt refCacheIndex] [metadata + properties...]` |
|
||
|
||
### Polymorphic Types (72–75)
|
||
|
||
Used when runtime type differs from declared property type and `UseMetadata=false`.
|
||
|
||
| Code | Name | Wire format |
|
||
|------|------|-------------|
|
||
| 72 | ObjectWithTypeName | `[72] [UTF8 typeName] [inner marker] [body...]` — prefix, inner Object/Array/Dict follows |
|
||
| 73 | ObjectWithTypeNameRefFirst | `[73] [UTF8 typeName] [VarUInt refCacheIndex] [properties...]` — combined, no inner marker |
|
||
| 74 | ObjectWithTypeIndex | `[74] [VarUInt typeIndex] [inner marker] [body...]` — prefix |
|
||
| 75 | ObjectWithTypeIndexRefFirst | `[75] [VarUInt typeIndex] [VarUInt refCacheIndex] [properties...]` — combined |
|
||
|
||
Second occurrence of a referenced polymorphic object uses plain `ObjectRef(65)` — no polymorphic prefix needed.
|
||
|
||
### Primitives (76–90)
|
||
|
||
| Code | Name | Wire format |
|
||
|------|------|-------------|
|
||
| 76 | Null | `[76]` — no payload |
|
||
| 77 | True | `[77]` — no payload |
|
||
| 78 | False | `[78]` — no payload |
|
||
| 79 | Int8 | `[79] [1 byte]` |
|
||
| 80 | UInt8 | `[80] [1 byte]` |
|
||
| 81 | Int16 | `[81] [VarInt]` |
|
||
| 82 | UInt16 | `[82] [VarUInt]` |
|
||
| 83 | Int32 | `[83] [VarInt]` |
|
||
| 84 | UInt32 | `[84] [VarUInt]` |
|
||
| 85 | Int64 | `[85] [VarLong]` |
|
||
| 86 | UInt64 | `[86] [VarULong]` |
|
||
| 87 | Float32 | `[87] [4 bytes IEEE 754]` |
|
||
| 88 | Float64 | `[88] [8 bytes IEEE 754]` |
|
||
| 89 | Decimal | `[89] [16 bytes]` |
|
||
| 90 | Char | `[90] [VarUInt]` |
|
||
|
||
### Strings (91–94)
|
||
|
||
| Code | Name | Wire format |
|
||
|------|------|-------------|
|
||
| 91 | String | `[91] [VarUInt byteLength] [UTF-8 bytes]` |
|
||
| 92 | StringInterned | `[92] [VarUInt cacheIndex]` — 2nd+ occurrence |
|
||
| 93 | StringEmpty | `[93]` — no payload |
|
||
| 94 | StringInternFirst | `[94] [VarUInt cacheIndex] [VarUInt byteLength] [UTF-8 bytes]` — 1st occurrence |
|
||
|
||
### Date/Time (95–98)
|
||
|
||
| Code | Name | Wire format |
|
||
|------|------|-------------|
|
||
| 95 | DateTime | `[95] [8 bytes ticks]` |
|
||
| 96 | DateTimeOffset | `[96] [8 bytes ticks] [VarInt offsetMinutes]` |
|
||
| 97 | TimeSpan | `[97] [VarLong ticks]` |
|
||
| 98 | Guid | `[98] [16 bytes]` |
|
||
|
||
### Other Markers
|
||
|
||
| Code | Name | Wire format |
|
||
|------|------|-------------|
|
||
| 99 | Enum | `[99] [VarInt underlyingValue]` |
|
||
| 100 | MetadataHeader | Legacy: implies `RefHandling=true` + metadata present |
|
||
| 101 | NoMetadataHeader | Legacy: implies `RefHandling=true`, no metadata |
|
||
| 102 | PropertySkip | `[102]` — marks skipped property (default/null value) |
|
||
|
||
### FixStr (103–134)
|
||
|
||
Short ASCII strings encoded in a single marker byte + raw bytes (no length prefix):
|
||
|
||
```
|
||
[FixStrBase + byteLength] [ASCII bytes]
|
||
```
|
||
|
||
- Length range: 0–31 bytes (`FixStrBase=103`, `FixStrMax=134`)
|
||
- Saves 1 byte vs `String` marker + VarUInt length
|
||
- Falls back to `String(91)` if content is non-ASCII
|
||
|
||
### TinyInt (192–255)
|
||
|
||
Single-byte integer encoding for small values:
|
||
|
||
```
|
||
value = marker - 192 - 16 (range: -16 to 47)
|
||
marker = value + 16 + 192 (64 values total)
|
||
```
|
||
|
||
Saves 2+ bytes vs `Int32(83)` + VarInt for frequently occurring small integers.
|
||
|
||
## Compact Encoding Selection
|
||
|
||
The serializer applies compact encodings automatically:
|
||
|
||
| Data | Condition | Encoding | Savings |
|
||
|------|-----------|----------|---------|
|
||
| Integer | −16 ≤ v ≤ 47 | TinyInt (1 byte) | 2–5 bytes |
|
||
| String | ≤31 bytes, ASCII | FixStr (1+N bytes) | 1 byte (no length prefix) |
|
||
| Object | type index < 64 | FixObj (1 byte) | 1–5 bytes (no VarUInt index) |
|
||
| String | empty | StringEmpty (1 byte) | 1+ bytes |
|
||
| Bool | — | True/False (1 byte) | no payload |
|
||
|
||
## String Interning Protocol
|
||
|
||
Controls deduplication of repeated string values.
|
||
|
||
**Modes** (`StringInterningMode`):
|
||
- `None` — all strings inline, no overhead
|
||
- `Attribute` — only `[AcStringIntern]` properties interned (default)
|
||
- `All` — all strings within length limits interned
|
||
|
||
**Length limits:** `MinStringInternLength=4`, `MaxStringInternLength=64` (configurable).
|
||
|
||
**Wire protocol:**
|
||
1. Serializer pre-scans all eligible strings to build a plan (which strings repeat)
|
||
2. First occurrence: `[StringInternFirst(94)] [VarUInt cacheIndex] [VarUInt byteLength] [UTF-8 bytes]`
|
||
3. Subsequent: `[StringInterned(92)] [VarUInt cacheIndex]`
|
||
4. Single-occurrence strings: written as normal `String`/`FixStr` (no interning overhead)
|
||
|
||
## Reference Tracking
|
||
|
||
Prevents infinite loops and preserves object identity for repeated references.
|
||
|
||
**Modes** (`ReferenceHandlingMode`):
|
||
- `None` — no tracking (fastest, use when graph is a tree)
|
||
- `OnlyId` — track only `IId` objects (matched by ID value)
|
||
- `All` — track all reference types (two-phase scan required)
|
||
|
||
**Two-phase process:**
|
||
1. **Scan pass** (`ScanPass.cs`) — walks the object graph, detects multi-referenced objects and repeated strings. Builds a `WriteDuplicateEntry[]` array (the "write plan") containing `VisitIndex`, `CacheMapIndex`, `IsFirst`, and `Value` for each duplicate.
|
||
2. **Sort** — write plan entries are sorted by `VisitIndex` to match the write pass traversal order.
|
||
3. **Serialize pass** — consumes the sorted write plan via `TryConsumeWritePlanEntry()`. A cursor (`_nextWritePlanVisitIndex`) advances through the plan in O(1) — no dictionary lookups during serialization.
|
||
|
||
**Wire protocol:**
|
||
- First occurrence: `[ObjectRefFirst(70)] [VarUInt refCacheIndex] [object body...]`
|
||
- Subsequent: `[ObjectRef(65)] [VarUInt refCacheIndex]`
|
||
|
||
## Property Ordering
|
||
|
||
Properties are serialized in a deterministic order defined by `TypeMetadataBase.GetUnfilteredProperties()`:
|
||
|
||
1. Walk the inheritance chain from **derived → base** (`currentType.BaseType` loop)
|
||
2. At each level, collect declared public instance properties
|
||
3. Sort **alphabetically** (`StringComparer.Ordinal`) within each level
|
||
4. Result: **base properties first, then derived, alphabetical within each level**
|
||
|
||
This order is stable across serializer/deserializer as long as the type hierarchy doesn't change.
|
||
|
||
### Cross-Type Deserialization (UseMetadata)
|
||
|
||
When `UseMetadata=true`, property name hashes (FNV-1a via `FnvHash.ComputeString`) are written per type, enabling schema evolution:
|
||
|
||
- **Serializer** writes property hashes in the metadata section (`ObjectWithMetadata(69)`)
|
||
- **Deserializer** builds an index mapping array (`GetIndexMapping()`) that maps source property indices to destination indices by matching FNV-1a hashes
|
||
- This allows deserialization even when source and destination types have different property sets or ordering
|
||
|
||
When `UseMetadata=false`, properties are matched by **positional index only** — source and destination must have identical property layouts.
|
||
|
||
## Configuration Options
|
||
|
||
Options defined in `AcBinarySerializerOptions` (inherits `AcSerializerOptions`). Each option controls which code paths execute and how the wire format changes.
|
||
|
||
### WireMode
|
||
|
||
| Value | Integers | Strings | Output size | Speed |
|
||
|-------|----------|---------|-------------|-------|
|
||
| `Compact` (default) | VarInt/VarUInt (1–5 bytes) | UTF-8 with speculative ASCII fast path | Smaller | Slightly slower |
|
||
| `Fast` | Fixed-width raw bytes (4/8 bytes) | UTF-16 memcpy (`charCount * 2` bytes) | Larger | Fastest encode/decode |
|
||
|
||
**Format difference for strings:**
|
||
- Compact: `[VarUInt byteLength] [UTF-8 bytes]` — speculative ASCII (1 pass if all ASCII, rewind+UTF-8 fallback otherwise)
|
||
- Fast: `[VarUInt charCount] [raw UTF-16 bytes]` — zero-encoding memcpy
|
||
|
||
**Code branch:** `context.FastWire` flag set at `context.Reset()`. Checked in `WriteStringUtf8()` and integer write methods. FixStr optimization is skipped in Fast mode (UTF-8 specific).
|
||
|
||
### ReferenceHandling
|
||
|
||
| Value | Tracked objects | Scan pass | Header flags | Wire markers |
|
||
|-------|----------------|-----------|--------------|-------------|
|
||
| `None` | Nothing | Skipped | `0x00` | Standard object markers only |
|
||
| `OnlyId` | `IId` objects only (by ID value) | Partial | `0x02` | `ObjectRefFirst(70)` + `ObjectRef(65)` |
|
||
| `All` (default) | All reference types | Full graph walk | `0x06` | `ObjectRefFirst(70)` + `ObjectRef(65)` |
|
||
|
||
**Format impact:** When enabled, multi-referenced objects are written once with `ObjectRefFirst(70) + VarUInt(refCacheIndex)` on first encounter, then replaced by `ObjectRef(65) + VarUInt(refCacheIndex)` on subsequent encounters. Header `HasCacheCount` flag is set and cache count written.
|
||
|
||
**Interaction with `ThrowOnCircularReference` (default: `true`):**
|
||
- `true` + ref handling enabled: all objects tracked for cycle detection, throws `InvalidOperationException` on circular reference
|
||
- `false` + ref handling enabled: only IId types tracked for deduplication, non-IId circular refs silently truncated at `MaxDepth`
|
||
|
||
### UseMetadata
|
||
|
||
| Value | Wire markers | Property matching | Overhead |
|
||
|-------|-------------|-------------------|----------|
|
||
| `false` (default) | `FixObj`/`Object` | Positional index only — types must match | None |
|
||
| `true` | `ObjectWithMetadata(69)` / `ObjectWithMetadataRefFirst(71)` | FNV-1a property name hashes | 4 bytes per property per type |
|
||
|
||
**Format impact:** When enabled, each type's first occurrence writes `[VarUInt hashCount] [FNV-1a hash × N]` before properties. Deserializer uses hashes to build source→destination index mapping, enabling cross-type deserialization (different property sets/ordering).
|
||
|
||
**Code branch:** `context.UseMetadata` controls whether `ObjectWithMetadata(69)` or plain `Object(64)` markers are used. When `false`, `IsDirectObjectWrite=true` allows source-generated writers to bypass `WriteObject` entirely and inline property writes.
|
||
|
||
**Related:** `CheckDuplicatePropName` (default: `true`) — throws if FNV-1a hash collision detected between property names of the same type. Disable in production for performance.
|
||
|
||
### UseStringInterning
|
||
|
||
| Value | Eligible strings | Scan overhead | Wire markers |
|
||
|-------|-----------------|---------------|-------------|
|
||
| `None` | Nothing | None | `String(91)` / `FixStr` only |
|
||
| `Attribute` (default) | Properties with `[AcStringIntern(true)]` | Scans marked properties | `StringInternFirst(94)` + `StringInterned(92)` |
|
||
| `All` | All strings within length limits | Scans all strings | `StringInternFirst(94)` + `StringInterned(92)` |
|
||
|
||
**Length limits:** `MinStringInternLength` (default: 4) and `MaxStringInternLength` (default: 64, 0=unlimited). Strings outside this range are always written inline.
|
||
|
||
**Format impact:** Interned strings on first occurrence: `[StringInternFirst(94)] [VarUInt cacheIndex] [string data]`. Subsequent: `[StringInterned(92)] [VarUInt cacheIndex]` (1–2 bytes vs full string). Single-occurrence strings are never interned — no overhead for unique strings.
|
||
|
||
**Code branch:** `context.StringInternEligible` flag set per-property before `WriteString`. Scan pass builds a `WriteDuplicateEntry[]` plan; write pass consumes it via cursor.
|
||
|
||
### MaxDepth
|
||
|
||
| Value | Behavior |
|
||
|-------|----------|
|
||
| `255` (default) | Effectively unlimited nesting |
|
||
| `0` | Root level only — nested objects/collections written as `Null(76)` |
|
||
| `N` | Objects deeper than N levels written as `Null(76)` |
|
||
|
||
**Format impact:** Depth-exceeded values appear as `Null(76)` in the stream — indistinguishable from actual null values. No special marker.
|
||
|
||
**Code branch:** Checked at entry of every object/collection write: `if (depth > MaxDepth) { WriteByte(Null); return; }`.
|
||
|
||
### UseCompression
|
||
|
||
| Value | Method | Granularity | Memory |
|
||
|-------|--------|-------------|--------|
|
||
| `None` (default) | No compression | — | — |
|
||
| `Block` | LZ4 single block | Entire payload | Full buffer in memory |
|
||
| `BlockArray` | LZ4 chunked | 64KB chunks | Streaming-friendly, lower peak memory |
|
||
|
||
**Format impact:** Compression is applied **post-serialization** as a transparent wrapper — the inner wire format is unchanged. Both modes are pure managed C# (WASM-compatible, no native dependencies).
|
||
|
||
**Code branch:** Applied in `AcBinarySerializer.Serialize()` after the serialization context produces the raw buffer: `if (UseCompression != None) Lz4.Compress(buffer, mode)`. Decompression is automatic on deserialize.
|
||
|
||
### PropertyFilter
|
||
|
||
Optional delegate `BinaryPropertyFilter?` (default: `null`). When set, invoked for each property to decide inclusion.
|
||
|
||
```
|
||
delegate bool BinaryPropertyFilter(in BinaryPropertyFilterContext context);
|
||
```
|
||
|
||
**BinaryPropertyFilterContext fields:** `DeclaringType`, `PropertyName`, `PropertyType`, `Instance` (null during metadata phase), `IsMetadataPhase`, `GetValue()` (lazy).
|
||
|
||
**Format impact:** Excluded properties are completely absent from the stream — no marker, no placeholder. The deserializer must use `UseMetadata=true` or identical filter to correctly match property indices.
|
||
|
||
**Code branch:** `context.HasPropertyFilter` checked in `ShouldSerializeProperty()`. Called twice: once during metadata registration (`Instance=null`), once during write phase.
|
||
|
||
### PropertyMapper
|
||
|
||
Optional delegate `PropertyMapperDelegate?` (default: `null`) for cross-type deserialization property remapping.
|
||
|
||
```
|
||
delegate PropertyInfo? PropertyMapperDelegate(PropertyInfo sourceProperty, Type destinationType);
|
||
```
|
||
|
||
**Purpose:** Maps properties between different class hierarchies (renamed properties, external DTOs). Result is cached — zero overhead on same-type operations (`Deserialize<T>`).
|
||
|
||
### WASM Options
|
||
|
||
| Option | Default | Purpose |
|
||
|--------|---------|---------|
|
||
| `IsWasm` | `OperatingSystem.IsBrowser()` | Auto-detect WASM environment |
|
||
| `UseStringCaching` | follows `IsWasm` | Cache short strings during deserialization to reduce GC pressure |
|
||
| `MaxCachedStringLength` | 64 | Max string length to cache |
|
||
|
||
**Format impact:** None — these are deserialization-only optimizations. When `UseStringCaching=true`, the deserializer maintains an intern cache for strings ≤ `MaxCachedStringLength` chars. Disabled automatically when `StringInternFirst` marker is encountered (interning takes precedence).
|
||
|
||
### Other Options
|
||
|
||
| Option | Type | Default | Purpose |
|
||
|--------|------|---------|---------|
|
||
| `UseGeneratedCode` | bool | `true` | Use source-generated writers/readers when available |
|
||
| `InitialBufferCapacity` | int | 4096 | Starting buffer size (bytes) for serialization output |
|
||
| `RemoveOrphanedItems` | bool | `false` | During `PopulateMerge`: remove destination collection items with no matching source ID |
|
||
| `UseAsync` | bool | `false` | Async context pool return via ThreadPool. Auto-disabled in WASM and when `ReferenceHandling=None` |
|
||
| `MaxContextPoolSize` | int | 8 | Max serialization contexts kept in pool |
|
||
|
||
## Presets
|
||
|
||
| Preset | WireMode | Metadata | StringInterning | RefHandling | MaxDepth | Compression | Other |
|
||
|--------|----------|----------|-----------------|-------------|----------|-------------|-------|
|
||
| `Default` | Compact | false | Attribute | All | 255 | None | — |
|
||
| `FastMode` | Compact | false | None | None | 255 | None | No scan pass |
|
||
| `ShallowCopy` | Compact | false | None | None | **0** | None | Root level only |
|
||
| `WasmOptimized` | Compact | false | Attribute | All | 255 | None | +StringCaching |
|
||
| `WithoutReferenceHandling` | Compact | false | Attribute | **None** | 255 | None | No scan pass |
|
||
| `WithoutMetadata` | Compact | **false** | Attribute | All | 255 | None | — |
|
||
|
||
**Performance implication of presets:**
|
||
- `Default` / `WasmOptimized` — two-phase (scan + serialize) due to `ReferenceHandling=All`
|
||
- `FastMode` / `ShallowCopy` — single-phase (no scan pass) since both interning and refs are disabled
|
||
- The scan pass adds ~20-30% overhead; disable it when the object graph is a simple tree
|
||
|
||
## Option Interactions
|
||
|
||
Key interdependencies that affect which code branches execute:
|
||
|
||
| Combination | Effect |
|
||
|-------------|--------|
|
||
| `ReferenceHandling=None` + `UseStringInterning=None` | **No scan pass** — fastest path, single-phase serialization |
|
||
| `ReferenceHandling=All` + `UseMetadata=true` | Uses `ObjectWithMetadataRefFirst(71)` marker — combined ref + metadata |
|
||
| `UseMetadata=false` + `UseGeneratedCode=true` | `IsDirectObjectWrite=true` — generated code inlines property writes, bypasses `WriteObject` |
|
||
| `UseMetadata=true` + `PropertyFilter` set | Filter invoked twice (metadata phase + write phase); filter results must be stable |
|
||
| `WireMode=Fast` + `UseStringInterning!=None` | Interned strings still use the fast string path (UTF-16 for first occurrence, VarUInt index for subsequent) |
|
||
| `UseCompression!=None` + any other option | Compression is orthogonal — applied post-serialization, inner format unchanged |
|