AyCode.Core/docs/BINARY_FORMAT.md

20 KiB
Raw Blame History

AcBinary Wire Format

Complete wire format specification for the AcBinary serializer. Source of truth: AyCode.Core/Serializers/Binaries/BinaryTypeCode.cs.

Stream Layout

[version : 1 byte]  [flags : 1 byte]  [cacheCount : VarUInt?]  [payload...]
  • versionFormatVersion = 1 (current).
  • flags — See Header Flags.
  • cacheCount — Present only when HeaderFlag_HasCacheCount is set. Number of type wrapper slots used by serializer.

Header Flags

The flags byte uses 0x90 (144) as base with bit flags in the lower nibble:

Bit Mask Flag Meaning
0 0x01 Metadata Property hash metadata included (cross-type deserialization)
1 0x02 RefHandling_OnlyId Reference tracking for IId objects only
2 0x04 RefHandling_All Reference tracking for all objects (always combined with bit 1)
3 0x08 HasCacheCount VarUInt cache count follows the flags byte

Reference handling modes: None = 0x00, OnlyId = 0x02, All = 0x06 (bits 1+2).

Variable-Length Encoding

VarUInt (unsigned)

LEB128: 7 data bits per byte, MSB = continuation flag.

value < 128       → 1 byte   [0xxxxxxx]
value < 16384     → 2 bytes  [1xxxxxxx] [0xxxxxxx]
value < 2097152   → 3 bytes  ...
(max 5 bytes for uint32)

VarInt (signed)

ZigZag encoding maps signed to unsigned, then LEB128:

encode: (value << 1) ^ (value >> 31)
decode: (raw >> 1) ^ -(raw & 1)

Maps: 0 → 0, -1 → 1, 1 → 2, -2 → 3, etc.

VarULong (unsigned 64-bit)

Same LEB128 encoding, max 10 bytes for uint64.

Type Markers

All markers defined in BinaryTypeCode.cs. SlotCount = 64.

FixObj (063)

Single-byte object type. The marker byte is the type slot index — no additional type identifier needed.

[FixObj(N)]  [properties...]

Slot allocation: Slots 063 are reserved for runtime polymorphic types, assigned dynamically on first encounter during serialization. Source-generated (SGen) types receive slots starting at 64+ via AllocateWrapperSlot() (sequential, Interlocked.Increment). SGen slots are compile-time stable; runtime slots depend on serialization order.

Complex Types (6471)

Code Name Wire format
64 Object [64] [VarUInt typeIndex] [properties...]
65 ObjectRef [65] [VarUInt refCacheIndex]
66 Array [66] [VarUInt count] [elements...]
67 Dictionary [67] [VarUInt count] [key, value pairs...]
68 ByteArray [68] [VarUInt length] [raw bytes]
69 ObjectWithMetadata [69] [VarUInt typeIndex] [VarUInt hashCount] [hashes...] [properties...]
70 ObjectRefFirst [70] [VarUInt refCacheIndex] [object body...]
71 ObjectWithMetadataRefFirst [71] [VarUInt refCacheIndex] [metadata + properties...]

Polymorphic Types (7275)

Used when runtime type differs from declared property type and UseMetadata=false.

Code Name Wire format
72 ObjectWithTypeName [72] [UTF8 typeName] [inner marker] [body...] — prefix, inner Object/Array/Dict follows
73 ObjectWithTypeNameRefFirst [73] [UTF8 typeName] [VarUInt refCacheIndex] [properties...] — combined, no inner marker
74 ObjectWithTypeIndex [74] [VarUInt typeIndex] [inner marker] [body...] — prefix
75 ObjectWithTypeIndexRefFirst [75] [VarUInt typeIndex] [VarUInt refCacheIndex] [properties...] — combined

Second occurrence of a referenced polymorphic object uses plain ObjectRef(65) — no polymorphic prefix needed.

Primitives (7690)

Code Name Wire format
76 Null [76] — no payload
77 True [77] — no payload
78 False [78] — no payload
79 Int8 [79] [1 byte]
80 UInt8 [80] [1 byte]
81 Int16 [81] [VarInt]
82 UInt16 [82] [VarUInt]
83 Int32 [83] [VarInt]
84 UInt32 [84] [VarUInt]
85 Int64 [85] [VarLong]
86 UInt64 [86] [VarULong]
87 Float32 [87] [4 bytes IEEE 754]
88 Float64 [88] [8 bytes IEEE 754]
89 Decimal [89] [16 bytes]
90 Char [90] [VarUInt]

Strings (9194)

Code Name Wire format
91 String [91] [VarUInt byteLength] [UTF-8 bytes]
92 StringInterned [92] [VarUInt cacheIndex] — 2nd+ occurrence
93 StringEmpty [93] — no payload
94 StringInternFirst [94] [VarUInt cacheIndex] [VarUInt byteLength] [UTF-8 bytes] — 1st occurrence

Date/Time (9598)

Code Name Wire format
95 DateTime [95] [8 bytes ticks]
96 DateTimeOffset [96] [8 bytes ticks] [VarInt offsetMinutes]
97 TimeSpan [97] [VarLong ticks]
98 Guid [98] [16 bytes]

Other Markers

Code Name Wire format
99 Enum [99] [VarInt underlyingValue]
100 MetadataHeader Legacy: implies RefHandling=true + metadata present
101 NoMetadataHeader Legacy: implies RefHandling=true, no metadata
102 PropertySkip [102] — marks skipped property (default/null value)

FixStr (103134)

Short ASCII strings encoded in a single marker byte + raw bytes (no length prefix):

[FixStrBase + byteLength]  [ASCII bytes]
  • Length range: 031 bytes (FixStrBase=103, FixStrMax=134)
  • Saves 1 byte vs String marker + VarUInt length
  • Falls back to String(91) if content is non-ASCII

TinyInt (192255)

Single-byte integer encoding for small values:

value = marker - 192 - 16    (range: -16 to 47)
marker = value + 16 + 192    (64 values total)

Saves 2+ bytes vs Int32(83) + VarInt for frequently occurring small integers.

Compact Encoding Selection

The serializer applies compact encodings automatically:

Data Condition Encoding Savings
Integer 16 ≤ v ≤ 47 TinyInt (1 byte) 25 bytes
String ≤31 bytes, ASCII FixStr (1+N bytes) 1 byte (no length prefix)
Object type index < 64 FixObj (1 byte) 15 bytes (no VarUInt index)
String empty StringEmpty (1 byte) 1+ bytes
Bool True/False (1 byte) no payload

String Interning Protocol

Controls deduplication of repeated string values.

Modes (StringInterningMode):

  • None — all strings inline, no overhead
  • Attribute — only [AcStringIntern] properties interned (default)
  • All — all strings within length limits interned

Length limits: MinStringInternLength=4, MaxStringInternLength=64 (configurable).

Wire protocol:

  1. Serializer pre-scans all eligible strings to build a plan (which strings repeat)
  2. First occurrence: [StringInternFirst(94)] [VarUInt cacheIndex] [VarUInt byteLength] [UTF-8 bytes]
  3. Subsequent: [StringInterned(92)] [VarUInt cacheIndex]
  4. Single-occurrence strings: written as normal String/FixStr (no interning overhead)

Reference Tracking

Prevents infinite loops and preserves object identity for repeated references.

Modes (ReferenceHandlingMode):

  • None — no tracking (fastest, use when graph is a tree)
  • OnlyId — track only IId objects (matched by ID value)
  • All — track all reference types (two-phase scan required)

Two-phase process:

  1. Scan pass (ScanPass.cs) — walks the object graph, detects multi-referenced objects and repeated strings. Builds a WriteDuplicateEntry[] array (the "write plan") containing VisitIndex, CacheMapIndex, IsFirst, and Value for each duplicate.
  2. Sort — write plan entries are sorted by VisitIndex to match the write pass traversal order.
  3. Serialize pass — consumes the sorted write plan via TryConsumeWritePlanEntry(). A cursor (_nextWritePlanVisitIndex) advances through the plan in O(1) — no dictionary lookups during serialization.

Wire protocol:

  • First occurrence: [ObjectRefFirst(70)] [VarUInt refCacheIndex] [object body...]
  • Subsequent: [ObjectRef(65)] [VarUInt refCacheIndex]

Example — same object referenced twice:

Input:  { Users: [userA, userA] }   (same instance)

Scan pass → WritePlan:
  [{VisitIndex:2, CacheMapIndex:0, IsFirst:true},
   {VisitIndex:3, CacheMapIndex:0, IsFirst:false}]

Wire output (Compact mode, ReferenceHandling=All):
  [version=1] [flags=0x96]  [VarUInt cacheCount=1]     ← header
  [FixObj(0)]                                           ← root object
    [Array(66)] [VarUInt(2)]                            ← Users array, 2 elements
      [ObjectRefFirst(70)] [VarUInt(0)] [props...]      ← userA, 1st occurrence
      [ObjectRef(65)] [VarUInt(0)]                      ← userA, 2nd (2 bytes only)

Property Ordering

Properties are serialized in a deterministic order defined by TypeMetadataBase.GetUnfilteredProperties():

  1. Walk the inheritance chain from derived → base (currentType.BaseType loop)
  2. At each level, collect declared public instance properties
  3. Sort alphabetically (StringComparer.Ordinal) within each level
  4. Result: base properties first, then derived, alphabetical within each level

This order is stable across serializer/deserializer as long as the type hierarchy doesn't change.

Cross-Type Deserialization (UseMetadata)

When UseMetadata=true, property name hashes (FNV-1a via FnvHash.ComputeString) are written per type, enabling schema evolution:

  • Serializer writes property hashes in the metadata section (ObjectWithMetadata(69))
  • Deserializer builds an index mapping array (GetIndexMapping()) that maps source property indices to destination indices by matching FNV-1a hashes
  • This allows deserialization even when source and destination types have different property sets or ordering

When UseMetadata=false, properties are matched by positional index only — source and destination must have identical property layouts.

Edge cases:

  • Hash collision (CheckDuplicatePropName=true, default): throws InvalidOperationException. When false: collision silently ignored — risk of data corruption.
  • Source has unknown property (not in destination): silently skipped via SkipValue(), no error.
  • Destination has extra property (not in source): left at default value (new instance) or unchanged (populate mode).

Configuration Options

Options defined in AcBinarySerializerOptions (inherits AcSerializerOptions). Each option controls which code paths execute and how the wire format changes.

WireMode

Value Integers Strings Output size Speed
Compact (default) VarInt/VarUInt (15 bytes) UTF-8 with speculative ASCII fast path Smaller Slightly slower
Fast Fixed-width raw bytes (4/8 bytes) UTF-16 memcpy (charCount * 2 bytes) Larger Fastest encode/decode

Format difference for strings:

  • Compact: [VarUInt byteLength] [UTF-8 bytes] — speculative ASCII (1 pass if all ASCII, rewind+UTF-8 fallback otherwise)
  • Fast: [VarUInt charCount] [raw UTF-16 bytes] — zero-encoding memcpy

Code branch: context.FastWire flag set at context.Reset(). Checked in WriteStringUtf8() and integer write methods. FixStr optimization is skipped in Fast mode (UTF-8 specific).

ReferenceHandling

Value Tracked objects Scan pass Header flags Wire markers
None Nothing Skipped 0x00 Standard object markers only
OnlyId IId objects only (by ID value) Partial 0x02 ObjectRefFirst(70) + ObjectRef(65)
All (default) All reference types Full graph walk 0x06 ObjectRefFirst(70) + ObjectRef(65)

Format impact: When enabled, multi-referenced objects are written once with ObjectRefFirst(70) + VarUInt(refCacheIndex) on first encounter, then replaced by ObjectRef(65) + VarUInt(refCacheIndex) on subsequent encounters. Header HasCacheCount flag is set and cache count written.

Interaction with ThrowOnCircularReference (default: true):

  • true + ref handling enabled: all objects tracked for cycle detection, throws InvalidOperationException on circular reference
  • false + ref handling enabled: only IId types tracked for deduplication, non-IId circular refs silently truncated at MaxDepth

UseMetadata

Value Wire markers Property matching Overhead
false (default) FixObj/Object Positional index only — types must match None
true ObjectWithMetadata(69) / ObjectWithMetadataRefFirst(71) FNV-1a property name hashes 4 bytes per property per type

Format impact: When enabled, each type's first occurrence writes [VarUInt hashCount] [FNV-1a hash × N] before properties. Deserializer uses hashes to build source→destination index mapping, enabling cross-type deserialization (different property sets/ordering).

Code branch: context.UseMetadata controls whether ObjectWithMetadata(69) or plain Object(64) markers are used. When false, IsDirectObjectWrite=true allows source-generated writers to bypass WriteObject entirely and inline property writes.

Related: CheckDuplicatePropName (default: true) — throws if FNV-1a hash collision detected between property names of the same type. Disable in production for performance.

UseStringInterning

Value Eligible strings Scan overhead Wire markers
None Nothing None String(91) / FixStr only
Attribute (default) Properties with [AcStringIntern(true)] Scans marked properties StringInternFirst(94) + StringInterned(92)
All All strings within length limits Scans all strings StringInternFirst(94) + StringInterned(92)

Length limits: MinStringInternLength (default: 4) and MaxStringInternLength (default: 64, 0=unlimited). Strings outside this range are always written inline.

Format impact: Interned strings on first occurrence: [StringInternFirst(94)] [VarUInt cacheIndex] [string data]. Subsequent: [StringInterned(92)] [VarUInt cacheIndex] (12 bytes vs full string). Single-occurrence strings are never interned — no overhead for unique strings.

Code branch: context.StringInternEligible flag set per-property before WriteString. Scan pass builds a WriteDuplicateEntry[] plan; write pass consumes it via cursor.

MaxDepth

Value Behavior
255 (default) Effectively unlimited nesting
0 Root level only — nested objects/collections written as Null(76)
N Objects deeper than N levels written as Null(76)

Format impact: Depth-exceeded values appear as Null(76) in the stream — indistinguishable from actual null values. No special marker.

Code branch: Checked at entry of every object/collection write: if (depth > MaxDepth) { WriteByte(Null); return; }.

UseCompression

Value Method Granularity Memory
None (default) No compression
Block LZ4 single block Entire payload Full buffer in memory
BlockArray LZ4 chunked 64KB chunks Streaming-friendly, lower peak memory

Format impact: Compression is applied post-serialization as a transparent wrapper — the inner wire format is unchanged. Both modes are pure managed C# (WASM-compatible, no native dependencies).

Code branch: Applied in AcBinarySerializer.Serialize() after the serialization context produces the raw buffer: if (UseCompression != None) Lz4.Compress(buffer, mode). Decompression is automatic on deserialize.

PropertyFilter

Optional delegate BinaryPropertyFilter? (default: null). When set, invoked for each property to decide inclusion.

delegate bool BinaryPropertyFilter(in BinaryPropertyFilterContext context);

BinaryPropertyFilterContext fields: DeclaringType, PropertyName, PropertyType, Instance (null during metadata phase), IsMetadataPhase, GetValue() (lazy).

Format impact: Excluded properties are completely absent from the stream — no marker, no placeholder. The deserializer must use UseMetadata=true or identical filter to correctly match property indices.

Code branch: context.HasPropertyFilter checked in ShouldSerializeProperty(). Called twice: once during metadata registration (Instance=null), once during write phase.

PropertyMapper

Optional delegate PropertyMapperDelegate? (default: null) for cross-type deserialization property remapping.

delegate PropertyInfo? PropertyMapperDelegate(PropertyInfo sourceProperty, Type destinationType);

Purpose: Maps properties between different class hierarchies (renamed properties, external DTOs). Result is cached — zero overhead on same-type operations (Deserialize<T>).

WASM Options

Option Default Purpose
IsWasm OperatingSystem.IsBrowser() Auto-detect WASM environment
UseStringCaching follows IsWasm Cache short strings during deserialization to reduce GC pressure
MaxCachedStringLength 64 Max string length to cache

Format impact: None — these are deserialization-only optimizations. When UseStringCaching=true, the deserializer maintains an intern cache for strings ≤ MaxCachedStringLength chars. Disabled automatically when StringInternFirst marker is encountered (interning takes precedence).

Other Options

Option Type Default Purpose
UseGeneratedCode bool true Use source-generated writers/readers when available
InitialBufferCapacity int 4096 Starting buffer size (bytes) for serialization output
RemoveOrphanedItems bool false During PopulateMerge: remove destination collection items with no matching source ID
UseAsync bool false Async context pool return via ThreadPool. Auto-disabled in WASM and when ReferenceHandling=None
MaxContextPoolSize int 8 Max serialization contexts kept in pool

Presets

Preset WireMode Metadata StringInterning RefHandling MaxDepth Compression Other
Default Compact false Attribute All 255 None
FastMode Compact false None None 255 None No scan pass
ShallowCopy Compact false None None 0 None Root level only
WasmOptimized Compact false Attribute All 255 None +StringCaching
WithoutReferenceHandling Compact false Attribute None 255 None No scan pass
WithoutMetadata Compact false Attribute All 255 None

Performance implication of presets:

  • Default / WasmOptimized — two-phase (scan + serialize) due to ReferenceHandling=All
  • FastMode / ShallowCopy — single-phase (no scan pass) since both interning and refs are disabled
  • The scan pass adds ~20-30% overhead; disable it when the object graph is a simple tree

Option Interactions

Key interdependencies that affect which code branches execute:

Combination Effect
ReferenceHandling=None + UseStringInterning=None No scan pass — fastest path, single-phase serialization
ReferenceHandling=All + UseMetadata=true Uses ObjectWithMetadataRefFirst(71) marker — combined ref + metadata
UseMetadata=false + UseGeneratedCode=true IsDirectObjectWrite=true — generated code inlines property writes, bypasses WriteObject
UseMetadata=true + PropertyFilter set Filter invoked twice (metadata phase + write phase); filter results must be stable
WireMode=Fast + UseStringInterning!=None Interned strings still use the fast string path (UTF-16 for first occurrence, VarUInt index for subsequent)
UseCompression!=None + any other option Compression is orthogonal — applied post-serialization, inner format unchanged