AyCode.Core/AyCode.Core/docs/BINARY/BINARY_FORMAT.md

7.5 KiB
Raw Blame History

AcBinary Wire Format

Complete wire format specification for the AcBinary serializer. Source of truth: Serializers/Binaries/BinaryTypeCode.cs.

Features (interning, ref tracking, property ordering): BINARY_FEATURES.md | Options/presets: BINARY_OPTIONS.md Implementation (zero-alloc, buffer management): BINARY_IMPLEMENTATION.md | SGen architecture: BINARY_SGEN.md

Stream Layout

[version : 1 byte]  [flags : 1 byte]  [cacheCount : VarUInt?]  [payload...]
  • versionFormatVersion = 1 (current).
  • flags — See Header Flags.
  • cacheCount — Present only when HeaderFlag_HasCacheCount is set. Number of type wrapper slots used by serializer.

Header Flags

The flags byte uses 0xB0 (176) as base with bit flags in the lower nibble. (Moved from 0x90 / 144 to make codepoints 135-167 contiguous for the FixStrAscii / StringAscii string-marker block.)

Bit Mask Flag Meaning
0 0x01 Metadata Property hash metadata included (cross-type deserialization)
1 0x02 RefHandling_OnlyId Reference tracking for IId objects only
2 0x04 RefHandling_All Reference tracking for all objects (always combined with bit 1)
3 0x08 HasCacheCount VarUInt cache count follows the flags byte

Reference handling modes: None = 0x00, OnlyId = 0x02, All = 0x06 (bits 1+2).

Variable-Length Encoding

VarUInt (unsigned)

LEB128: 7 data bits per byte, MSB = continuation flag.

value < 128       → 1 byte   [0xxxxxxx]
value < 16384     → 2 bytes  [1xxxxxxx] [0xxxxxxx]
value < 2097152   → 3 bytes  ...
(max 5 bytes for uint32)

VarInt (signed)

ZigZag encoding maps signed to unsigned, then LEB128:

encode: (value << 1) ^ (value >> 31)
decode: (raw >> 1) ^ -(raw & 1)

Maps: 0 → 0, -1 → 1, 1 → 2, -2 → 3, etc.

VarULong (unsigned 64-bit)

Same LEB128 encoding, max 10 bytes for uint64.

Type Markers

All markers defined in BinaryTypeCode.cs. SlotCount = 64.

FixObj (063)

Single-byte object type. The marker byte is the type slot index — no additional type identifier needed.

[FixObj(N)]  [properties...]

Slot allocation: Slots 063 reserved for runtime polymorphic types, assigned dynamically on first encounter. Source-generated (SGen) types receive slots from 64+ via AllocateWrapperSlot() (sequential, Interlocked.Increment). SGen slots are compile-time stable; runtime slots depend on serialization order.

Complex Types (6471)

Code Name Wire format
64 Object [64] [VarUInt typeIndex] [properties...]
65 ObjectRef [65] [VarUInt refCacheIndex]
66 Array [66] [VarUInt count] [elements...]
67 Dictionary [67] [VarUInt count] [key, value pairs...]
68 ByteArray [68] [VarUInt length] [raw bytes]
69 ObjectWithMetadata [69] [VarUInt typeIndex] [VarUInt hashCount] [hashes...] [properties...]
70 ObjectRefFirst [70] [VarUInt refCacheIndex] [object body...]
71 ObjectWithMetadataRefFirst [71] [VarUInt refCacheIndex] [metadata + properties...]

Polymorphic Types (7275)

Used when runtime type differs from declared property type and UseMetadata=false.

Code Name Wire format
72 ObjectWithTypeName [72] [UTF8 typeName] [inner marker] [body...] — prefix, inner Object/Array/Dict follows
73 ObjectWithTypeNameRefFirst [73] [UTF8 typeName] [VarUInt refCacheIndex] [properties...] — combined, no inner marker
74 ObjectWithTypeIndex [74] [VarUInt typeIndex] [inner marker] [body...] — prefix
75 ObjectWithTypeIndexRefFirst [75] [VarUInt typeIndex] [VarUInt refCacheIndex] [properties...] — combined

Second occurrence of a referenced polymorphic object uses plain ObjectRef(65) — no polymorphic prefix needed.

Primitives (7690)

Code Name Wire format
76 Null [76] — no payload
77 True [77] — no payload
78 False [78] — no payload
79 Int8 [79] [1 byte]
80 UInt8 [80] [1 byte]
81 Int16 [81] [VarInt]
82 UInt16 [82] [VarUInt]
83 Int32 [83] [VarInt]
84 UInt32 [84] [VarUInt]
85 Int64 [85] [VarLong]
86 UInt64 [86] [VarULong]
87 Float32 [87] [4 bytes IEEE 754]
88 Float64 [88] [8 bytes IEEE 754]
89 Decimal [89] [16 bytes]
90 Char [90] [VarUInt]

Strings (9194, 167)

Code Name Wire format
91 String [91] [VarUInt byteLength] [UTF-8 bytes] — generic UTF-8 (any content)
92 StringInterned [92] [VarUInt cacheIndex] — 2nd+ occurrence
93 StringEmpty [93] — no payload
94 StringInternFirst [94] [VarUInt cacheIndex] [VarUInt byteLength] [UTF-8 bytes] — 1st occurrence
167 StringAscii [167] [VarUInt byteLength] [ASCII bytes] — pure ASCII (every byte < 0x80); reader byte→char widens, no UTF-8 decode

The writer detects ASCII via bytesWritten == charLength after a single-pass UTF-8 encode (every UTF-16 char < 0x80 produces exactly 1 UTF-8 byte; non-ASCII chars always produce 2-4 bytes), then emits StringAscii (167) or String (91) accordingly. The reader uses the marker as the ASCII-validity contract — StringAscii bypasses UTF-8 decode entirely.

Date/Time (9598)

Code Name Wire format
95 DateTime [95] [8 bytes ticks]
96 DateTimeOffset [96] [8 bytes ticks] [VarInt offsetMinutes]
97 TimeSpan [97] [VarLong ticks]
98 Guid [98] [16 bytes]

Other Markers

Code Name Wire format
99 Enum [99] [VarInt underlyingValue]
100 MetadataHeader Legacy: implies RefHandling=true + metadata present
101 NoMetadataHeader Legacy: implies RefHandling=true, no metadata
102 PropertySkip [102] — marks skipped property (default/null value)

FixStr (103134) — short UTF-8 strings

Short strings (any UTF-8 content) encoded in a single marker byte + raw UTF-8 bytes (no length prefix):

[FixStrBase + byteLength]  [UTF-8 bytes]
  • Length range: 031 bytes (FixStrBase=103, FixStrMax=134)
  • Saves 1 byte vs String marker + VarUInt length
  • Content semantics: UTF-8 (may contain multi-byte sequences for non-ASCII chars)
  • Reader dispatches via the (universal-)UTF-8 decode path

FixStrAscii (135166) — short ASCII strings

Short ASCII-only strings encoded in a single marker byte + raw ASCII bytes:

[FixStrAsciiBase + byteLength]  [ASCII bytes]
  • Length range: 031 bytes = chars (1:1 for ASCII) (FixStrAsciiBase=135, FixStrAsciiMax=166)
  • Same wire size as FixStr (1 marker byte + bytes), but the marker IS the ASCII-validity contract
  • Reader byte→char widens directly (Encoding.Latin1.GetString SIMD-accelerated path) — no UTF-8 decode, no run-time Ascii.IsValid scan
  • Writer chooses between FixStrAscii and FixStr post-encode via bytesWritten == charLength

Codepoints 168175 are reserved for future string-related markers (e.g., compressed / base64 / mixed-ASCII variants), keeping the 91167 range a single contiguous string-marker block.

TinyInt (192255)

Single-byte integer encoding for small values:

value = marker - 192 - 16    (range: -16 to 47)
marker = value + 16 + 192    (64 values total)

Saves 2+ bytes vs Int32(83) + VarInt for frequently occurring small integers.