AyCode.Core/AyCode.Core/docs/BINARY/BINARY_FEATURES.md

5.6 KiB
Raw Blame History

AcBinary Features

Advanced serialization features built on top of the wire format. For core type markers and encoding see BINARY_FORMAT.md. For configuration options and presets see BINARY_OPTIONS.md. For internal architecture and memory management see BINARY_IMPLEMENTATION.md. For source generation details see BINARY_SGEN.md.

Compact Encoding Selection

The serializer applies compact encodings automatically:

Data Condition Encoding Savings
Integer 16 ≤ v ≤ 47 TinyInt (1 byte) 25 bytes
String ≤31 bytes, ASCII FixStr (1+N bytes) 1 byte (no length prefix)
Object type index < 64 FixObj (1 byte) 15 bytes (no VarUInt index)
String empty StringEmpty (1 byte) 1+ bytes
Bool True/False (1 byte) no payload

String Interning Protocol

Controls deduplication of repeated string values.

Modes (StringInterningMode):

  • None — all strings inline, no overhead
  • Attribute — only [AcStringIntern] properties interned (default)
  • All — all strings within length limits interned

Length limits: MinStringInternLength=4, MaxStringInternLength=64 (configurable).

Wire protocol:

  1. Serializer pre-scans all eligible strings to build a plan (which strings repeat)
  2. First occurrence: [StringInternFirst(94)] [VarUInt cacheIndex] [VarUInt byteLength] [UTF-8 bytes]
  3. Subsequent: [StringInterned(92)] [VarUInt cacheIndex]
  4. Single-occurrence strings: written as normal String/FixStr (no interning overhead)

Reference Tracking

Prevents infinite loops and preserves object identity for repeated references.

Modes (ReferenceHandlingMode):

  • None — no tracking (fastest, use when graph is a tree)
  • OnlyId — track only IId objects (matched by ID value)
  • All — track all reference types (two-phase scan required)

Two-phase process:

  1. Scan pass (ScanPass.cs) — walks the object graph, detects multi-referenced objects and repeated strings. Builds a WriteDuplicateEntry[] array (the "write plan") containing VisitIndex, CacheMapIndex, IsFirst, and Value for each duplicate.
  2. Sort — write plan entries are sorted by VisitIndex to match the write pass traversal order.
  3. Serialize pass — consumes the sorted write plan via TryConsumeWritePlanEntry(). A cursor (_nextWritePlanVisitIndex) advances through the plan in O(1) — no dictionary lookups during serialization.

Wire protocol:

  • First occurrence: [ObjectRefFirst(70)] [VarUInt refCacheIndex] [object body...]
  • Subsequent: [ObjectRef(65)] [VarUInt refCacheIndex]

Example — same object referenced twice:

Input:  { Users: [userA, userA] }   (same instance)

Scan pass → WritePlan:
  [{VisitIndex:2, CacheMapIndex:0, IsFirst:true},
   {VisitIndex:3, CacheMapIndex:0, IsFirst:false}]

Wire output (Compact mode, ReferenceHandling=All):
  [version=1] [flags=0x96]  [VarUInt cacheCount=1]     ← header
  [FixObj(0)]                                           ← root object
    [Array(66)] [VarUInt(2)]                            ← Users array, 2 elements
      [ObjectRefFirst(70)] [VarUInt(0)] [props...]      ← userA, 1st occurrence
      [ObjectRef(65)] [VarUInt(0)]                      ← userA, 2nd (2 bytes only)

Hybrid Execution Model (Runtime vs Source Generated)

Two execution modes, seamlessly interoperable in a single serialization run:

Mode Trigger Property access When to use
SGen [AcBinarySerializable] + UseGeneratedCode=true Unsafe.As<T> direct Hot-path types
Runtime No attribute or UseGeneratedCode=false Compiled delegates 3rd-party types, fallback

SGen root types use a fast path that skips the full dispatch chain (~12 calls → 3 checks). SGen children call directly into other SGen writers; non-SGen children fall back to runtime via bridge methods. Wire format is identical regardless of mode.

Full SGen architecture, bridge methods, generated code patterns, wrapper slots: BINARY_SGEN.md

Property Ordering

Properties are serialized in a deterministic order defined by TypeMetadataBase.GetUnfilteredProperties():

  1. Walk the inheritance chain from derived → base (currentType.BaseType loop)
  2. At each level, collect declared public instance properties
  3. Sort alphabetically (StringComparer.Ordinal) within each level
  4. Result: base properties first, then derived, alphabetical within each level

This order is stable across serializer/deserializer as long as the type hierarchy doesn't change.

Cross-Type Deserialization (UseMetadata)

When UseMetadata=true, property name hashes (FNV-1a via FnvHash.ComputeString) are written per type, enabling schema evolution:

  • Serializer writes property hashes in the metadata section (ObjectWithMetadata(69))
  • Deserializer builds an index mapping array (GetIndexMapping()) that maps source property indices to destination indices by matching FNV-1a hashes
  • This allows deserialization even when source and destination types have different property sets or ordering

When UseMetadata=false, properties are matched by positional index only — source and destination must have identical property layouts.

Edge cases:

  • Hash collision (CheckDuplicatePropName=true, default): throws InvalidOperationException. When false: collision silently ignored — risk of data corruption.
  • Source has unknown property (not in destination): silently skipped via SkipValue(), no error.
  • Destination has extra property (not in source): left at default value (new instance) or unchanged (populate mode).