Store.Project.Conversation.Format (fnord v0.9.40)

View Source

Format detection and parsing for conversation files. Two formats coexist:

  • v0 - the legacy timestamp-prefixed shape: <unix_ts>:<json>. The <unix_ts> is a numeric prefix that lets list/1 and timestamp/1 sort conversations without parsing the JSON. Read-only at this point - no code path emits v0; older files still on disk are read transparently.

  • v1 - pure JSON. The top-level object carries version: 1, timestamp: <unix_int>, and the same messages/metadata/memory/ tasks keys v0 has. Current on-the-wire shape; what every writer emits.

Why this module exists

All worktrees in a project share .fnord/projects/<project>/conversations/. Background services (MemoryIndexer, ConversationIndexer) in any worktree can read any conversation file. If one build starts emitting v1 files while another build only reads v0, the older build flags every v1 file as corrupt and skips it - data loss in practice.

The two-step rollout that got us here:

  1. Phase 1b - shipped a build whose readers understand BOTH v0 and v1, while the writer continued to emit v0. Reader-tolerant + writer-conservative.
  2. Phase 2c - flipped the writer to v1. Older Phase-1b readers parse the new files unchanged. Writer-aggressive.

This module is now at Phase 2c. v1 is the canonical on-disk format; v0 is a read-only legacy shape that still appears in files written before the flip.

Heal-on-read (and forward migration)

v0 files in the wild may carry legacy shapes from earlier code paths:

  • tasks map with bare lists or non-canonical statuses - healed by Store.Project.Conversation.TaskListStatusMigration.
  • tool_calls[].function.arguments stored as decoded maps instead of JSON strings - re-encoded by heal_tool_call_arguments/1 here. (See engram memory "Conversation file corruption - responses branch tool arguments" for the atom-table backstory.)

When either heal pass triggers on a v0 file, the repaired content is persisted as v1 via write_v1_blob/2, not back as v0. Two reasons:

  1. The writer is at v1; emitting fresh v0 would create new legacy files.
  2. Older builds without the heal pass would silently mis-parse the healed-in-place v0 shape; a v1 file at least surfaces as a clean format-version skip rather than a corrupt decode.

This means stale v0 files migrate forward incrementally as they are touched. Untouched v0 files stay v0 indefinitely (read-only paths don't rewrite them).

v1 files skip the heal passes entirely. Note that a heal-produced v1 file preserves its legacy message shapes verbatim (only the broken fields are repaired), so v1 content can still contain assistant-with-tool_calls messages - hydrate_message/1 handles those regardless of file version.

Summary

Functions

Detect the format of a raw conversation file's contents. v0 is identified by a \d+: prefix, v1 by a JSON object opener (with optional leading whitespace).

Shared heal-persistence path: write a healed v0 conversation back as v1.

Read a conversation from disk, dispatch to the right parser, apply heal passes for v0 files, and return the canonical in-memory data shape that Store.Project.Conversation.read/1 callers expect.

Extract just the timestamp from a raw file contents string. v0 reads only the prefix (cheap); v1 has to decode the whole JSON object (less cheap, but paid only once a v1 file is encountered).

Write a conversation as a v1 file. The on-disk shape is pure JSON with version: 1 and a top-level integer timestamp field; the legacy <unix_ts>:<json> prefix is gone.

Types

version()

@type version() :: :v0 | :v1

Functions

detect(content)

@spec detect(binary()) :: {:ok, version()} | {:error, :unrecognized}

Detect the format of a raw conversation file's contents. v0 is identified by a \d+: prefix, v1 by a JSON object opener (with optional leading whitespace).

persist_heal_as_v1(conversation, data, ts_int)

@spec persist_heal_as_v1(Store.Project.Conversation.t(), map(), integer()) ::
  :ok | {:error, any()}

Shared heal-persistence path: write a healed v0 conversation back as v1.

Called once by parse_v0/2 after composing the pure heal passes (TaskListStatusMigration.heal/1 and heal_tool_call_arguments/1 here), so the on-disk format after any heal is deterministically v1 - not "whichever pass happened to fire last."

Why heal forward to v1 rather than re-emit v0:

  1. The writer is at v1; emitting fresh v0 would create new legacy files.
  2. Older builds without the heal pass would silently mis-parse a healed v0 file; a v1 file at least surfaces as :corrupt_conversation in those builds so it gets skipped rather than mis-interpreted.

Errors are warnings, not raises - the caller's in-memory copy is already healed, and a failed persist just means the next read will heal again.

read(conversation)

@spec read(Store.Project.Conversation.t()) ::
  {:ok, Store.Project.Conversation.data()} | {:error, any()}

Read a conversation from disk, dispatch to the right parser, apply heal passes for v0 files, and return the canonical in-memory data shape that Store.Project.Conversation.read/1 callers expect.

timestamp_of(contents)

@spec timestamp_of(binary()) :: {:ok, DateTime.t()} | {:error, any()}

Extract just the timestamp from a raw file contents string. v0 reads only the prefix (cheap); v1 has to decode the whole JSON object (less cheap, but paid only once a v1 file is encountered).

write(conversation, data, timestamp)

@spec write(Store.Project.Conversation.t(), map(), integer()) :: :ok | {:error, any()}

Write a conversation as a v1 file. The on-disk shape is pure JSON with version: 1 and a top-level integer timestamp field; the legacy <unix_ts>:<json> prefix is gone.

data is the canonical in-memory map (%{messages:, metadata:, memory:, tasks:}) - the same shape Conversation.read/1 returns. Messages are encoded via their Jason.Encoder impls (every AI.Message struct derives it); other fields encode as plain maps.