INDB Semantic Type System
INDB v0.7.0 treats data types not just as storage formats, but as semantic primitives. The type system models the complexity of "Thought Processing" — where data has location, structure, and varying levels of transparency.
This system is strictly enforced in core/types.py and extends beyond standard JSON.
1. The Type Primitives
The core set of types covers the spectrum from physical grounding to abstract privacy.
| Type | Semantic Role | Python Representative | Validation Constraint |
|---|---|---|---|
blind |
The Secret. Opaque, client-encrypted data. | str (Base64) |
encoding="base64" |
binary |
The Payload. Raw binary data (images, files, media). | bytes |
max_size, mime_type |
location |
The Anchor. Spatial grounding via OSM. | LocationField |
lat (-90..90), lon (-180..180), osm_id (optional) |
struct |
The Graph. Nested complexity. | dict |
Recursively validated |
bigint |
The Scale. Nanosecond precision/crypto. | int |
> 2^53 supported |
string |
The Narrative. Textual content. | str |
max_length, regex |
number |
The Value. Quantifiable data. | float, int |
min, max |
null |
The Void. Absence of data. | None |
- |
2. Conceptual Deep Dive
2.1 The Anchor: location
In INDB, Space is Index. Thoughts do not exist in a vacuum; they are "grounded" in reality.
- Concept: A location field transforms a passive record into an active spatial entity. It allows the Cognitive Engine to answer: "How relevant is this thought to where I am right now?"
- Optimization: Built for OpenStreetMap (OSM). Supports coordinates (lat/lon) and validated references to real-world objects via osm_id — Nodes, Ways, and Relations.
Two formats are accepted:
// Simple string path (virtual namespacing)
{ "location": "books/Dostoevsky/ThePrince" }
// Full OSM object (geo-spatial anchor)
{ "location": { "lat": 55.7558, "lon": 37.6173, "osm_id": "node/1234567" } }
OSM tokens in raw_data_anchor use the prefix osm: (e.g. osm:London, UK) and are penalised during fusion scoring to prevent over-clustering on the same real-world location.
2.2 The Secret: blind
INDB acknowledges that some thoughts are private. The blind type introduces the concept of Zero-Knowledge Storage.
- Concept: A "Black Box" container. The database guarantees Opacity.
- Nobody without the key can see the data — not even we, the database operators. The server has no decryption key; this is architectural, not policy.
- Server View: Sees only a sealed box (Base64 string). It cannot open it, index its contents, or use it for "Cognitive Processing".
- What we do record: where and when — the fact of presence is fixed. Location and timestamp are mandatory when blind_payload is set.
The Black Box Contract
"You may hide what is inside. You may not hide that you were here."
When blind_payload is present on an event, two fields become mandatory:
| Field | Requirement | Reason |
|---|---|---|
location |
Required | Spatial accountability — the fact of presence is anchored |
timestamp |
Within ±5 min of server time | Replay attack prevention — old captured events are rejected |
This is enforced at the kernel level (routes/events.py). There is no bypass.
Why this matters: Zero-knowledge storage does not mean zero accountability. The content is yours alone. The fact that you were here, when, and where — belongs to the record.
- Client View: Possesses the key. Sees the raw semantic content.
- Behavior: The JIT Renderer and Emotional Firewall inherently skip
blindfields, mathematically ensuring that sensitive data never leaks into derived narratives.
User Story: The "Therapy Safe"
Alice uses an AI Agent to organize her life. She trusts the agent to schedule meetings but wants to store her therapy session notes in the same system without the AI "reading" or "analyzing" them to suggest products. She uses the
blindtype.
Flow: Zero-Knowledge Lifecycle
-
Encryption (Client Side) Alice's client generates a random 32-byte key (kept locally).
-
Ingest (Transport) Client sends the blob. The server sees only the blob.
POST /api/v2/events -
Storage (Persistence) INDB stores the event.
Locationis indexed (so Alice can search "Dr. Smith").blind_payloadis stored effectively as a binary blob.-
Crucially: The "Cognitive Engine" scanning for "Anxiety" sees nothing in the payload. It only sees "Event at Dr. Smith's".
-
Retrieval (Access) Alice requests her history.
GET /api/v2/events?context=therapyServer returns the encrypted blob intact. -
Decryption (Client Side) Alice's client uses the local key to unlock the memory.
2.3 The Payload: binary
INDB supports raw binary data for media, files, and large payloads.
- Concept: The binary type allows storage of non-textual data (images, videos, audio, documents) alongside structured events.
- Server View: Stores raw bytes efficiently using msgpack binary format.
- Client View: Can upload/download binary data via gRPC streaming or HTTP multipart.
- Use Cases:
- Media Storage: Photos, videos, audio recordings
- Document Attachments: PDFs, Word docs, spreadsheets
- Sensor Data: Raw binary sensor readings, telemetry
- Encrypted Files: Pre-encrypted files stored as opaque blobs
User Story: The "Sensor Logger"
Bob's IoT device captures high-frequency sensor data (accelerometer, gyroscope) at 1000Hz. Instead of converting to JSON (expensive), he sends raw binary packets directly to INDB via gRPC streaming.
Flow: Binary Data Lifecycle
-
Capture (Client Side)
-
Ingest (gRPC Stream) Client streams binary chunks via gRPC.
-
Storage (Autonomous) INDB stores binary data in
indb.binusing msgpack's binary type. - Efficient: No base64 encoding overhead
- Searchable: Metadata is indexed, binary is opaque
-
Encrypted: AES-256-GCM protects the entire file
-
Retrieval (Streaming) Client requests binary data.
Key Differences: binary vs blind
- binary: Server knows it's binary, can validate size/type, but doesn't interpret content
- blind: Server doesn't even know what's inside (client-encrypted)
2.4 The Graph: struct
Thoughts are rarely flat key-value pairs. They have depth.
- Concept: struct allows for recursive nesting, modeling complex relationships (e.g., a Memory containing a Context which contains a Trigger).
- Validation: Unlike a generic JSON object, a struct can have a strict internal schema, ensuring that even complex nested thoughts adhere to the system's logic.
3. Schema Definition
Schemas define the "Shape of Thought". They are constructed using FieldSchema.
from core.types import FieldType, FieldSchema
# A complex "Secure Journal Entry" schema
journal_schema = FieldSchema(
name="entry",
field_type=FieldType.STRUCT,
nested_schema={
"timestamp": FieldSchema("ts", FieldType.BIGINT),
"location": FieldSchema("geo", FieldType.LOCATION),
# The content is hidden from the DB core
"content": FieldSchema("data", FieldType.BLIND, constraints={"algo": "chacha20"}),
"tags": FieldSchema("meta", FieldType.STRING, constraints={"max_length": 10})
}
)
4. Type Inference
For ease of use (ingest), INDB includes an inference engine (infer_type) that guesses the semantic intent of raw data:
- Dictionary with lat/lon keys → FieldType.LOCATION
- Integer > 2^53 → FieldType.BIGINT
- Base64-like string (if configured) → FieldType.BLIND (Context dependent)
- Nested Dictionary → FieldType.STRUCT
- Raw bytes object → FieldType.BINARY
5. Binary Data Examples
Image Storage
# Store image with event
with open("photo.jpg", "rb") as f:
image_data = f.read()
event = Event(
data={"description": "Sunset photo"},
binary_payload=image_data
)
gRPC Streaming
# Stream large file in chunks
def upload_file(filename):
with open(filename, "rb") as f:
while chunk := f.read(1024 * 1024): # 1MB chunks
yield IngestRequest(
binary_payload=chunk,
metadata={"filename": filename}
)