[#44] Normalize refs table to deduplicate property strings#80
Conversation
Major version bump because the analyze schema will get some changes.
The refs table repeated the property_path/property_type strings on every row, making it by far the largest part of the analyze database. Store the strings once in new property_names/property_types lookup tables and keep only integer ids in refs. A refs_view rejoins them to expose the original strings. On a large customer project this reduces the database from 515 MB to 351 MB (-32%) with negligible impact on analyze time. Also: - Add PRAGMA user_version = 1; find-refs now rejects older databases with a clear "unsupported schema version" message instead of an obscure SQL error. - Update the material/monoscript views and find-refs queries to read the strings via refs_view (or resolve ids once). - Fix view_material_texture_refs, which used double-quoted string literals that SQLite parsed as identifiers (pre-existing, failed on old databases too).
There was a problem hiding this comment.
Pull request overview
Normalizes the analyzer SQLite schema to significantly reduce database size by deduplicating refs.property_path and refs.property_type strings into lookup tables, while preserving query ergonomics via a refs_view. Updates find-refs, tests, and docs to use the new view and introduces schema versioning to gate incompatible databases.
Changes:
- Normalized
refsby introducingproperty_names/property_typeslookup tables and arefs_viewto preserve the pre-change shape for consumers. - Updated analyze writer to populate lookup tables once per unique string and store integer ids in
refs. - Updated
find-refs, SQL views, tests, and documentation; addedPRAGMA user_version = 1and rejection of older schemas infind-refs.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated no comments.
Show a summary per file
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Addressed at the source in 2cc7e92 rather than with IFNULL: refs_view now uses INNER JOIN, so property_path can never be NULL (every refs row's ids are always populated and their lookup rows written in the same transaction). That makes GetString(1) here unconditionally safe. I avoided IFNULL deliberately — it would silently return '' for a genuinely corrupt/orphaned row instead of failing loudly, which would be worse for a read-only analysis DB.
The refs property_path/property_type ids are always populated and their lookup rows written in the same transaction, so the joins always match. INNER JOIN expresses that invariant and removes the theoretical NULL that LEFT JOIN implied, making find-refs' property_path read unconditionally safe.

Summary
Fixes #44.
On large builds the
refstable dominates the analyze database — it repeated theproperty_pathandproperty_typestrings on every one of its (often millions of) rows. This normalizes those strings into two lookup tables sorefsstores only integer ids, and exposes arefs_viewthat rejoins them so existing string-based queries barely change.On a large customer project (5.8M refs rows) this reduces the database from 515 MB to 351 MB (-32%) with negligible impact on analyze time.
This changes the
refstable schema, so any tooling or SQL that readsrefs.property_path/refs.property_typedirectly needs a small update. Those two columns are no longer text — they now hold integer ids into the newproperty_names/property_typestables.Recommended fix: query
refs_viewinstead ofrefs. The view has the same column names and the same text values the oldrefstable had (object,referenced_object,property_path,property_type), so in most cases it's a one-word change:If you need to join
refsto other tables yourself (e.g. for performance), resolve the strings through the lookup tables:Queries that only use
object/referenced_object(e.g. dependency traversal) are unaffected.Databases now carry a schema version:
PRAGMA user_versionreturns1for this format and0for databases produced by older builds. Tools can check it to fail fast (asfind-refsnow does) rather than hitting confusing errors. There is no in-place migration — regenerate the database by re-runninganalyze.Changes
Schema (
Analyzer/Resources/)property_namesandproperty_typeslookup tables (id, name).refs.property_path/property_typechange fromTEXTtoINTEGERids into those tables.refs_viewreproduces the pre-change shape (object,referenced_object,property_path,property_typeas text) so consumers read the strings without joining by hand.PRAGMA user_version = 1— the first formal schema version (older databases report 0).view_material_shader_refs,view_material_texture_refsandscript_object_viewnow read viarefs_view.Populating during analyze
SerializedFileSQLiteWriterassigns string ids with twoIdProviders and writes each lookup row once (newAddPropertyName/AddPropertyTypecommands, following theAddTypepattern).find-refs
refs_view; resolves them_Scriptpath to its id once per run rather than per object.Bonus fix
view_material_texture_refsused double-quoted string literals ("Texture","Material") that SQLite parses as identifiers, so the view errored. This was pre-existing (it failed on old databases too); fixed to single-quoted literals while rewriting the view.Testing
dotnet test— full suite green (UnityFileSystem 425, Analyzer 62, UnityDataTool 234).BuildReportTests→refs_view).FindRefsTestscoverage:refs_viewround-trip, lookup-table consistency, and the unsupported-schema-version rejection.find-refsrenders property paths correctly (e.g.m_SavedProperties.m_TexEnvs[2].second.m_Texture), and runningfind-refsagainst a pre-change database exits 1 with the friendly message.