{{ message }}
hasher/operations: fix in-flight hash caching, fingerprinting, and modtime-fallback for hash-less remotes#9500
Draft
davispw wants to merge 4 commits into
Draft
Conversation
dea1384 to
ef9f34e
Compare
5 tasks
c3aa7d9 to
c9ee816
Compare
c9ee816 to
fcec711
Compare
5 tasks
fcec711 to
99ffb3c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
This pull request introduces three changes to make the
hasherbackend overlay work reliably with Google Photos, and improve correctness for remotes without native hash support, modtimes, or immediate size reporting:equal(): Falls back to comparing hashes when modtime is unsupported but a common hash type exists.Update) for remotes without native hashes.Previously, running the
hasheroverlay with Google Photos required a complex combination of flags to work around listing and update behaviors:Even with these flags, overwriting a file caused the hasher cache entry to be pruned without replacement, meaning the next sync had to re-verify the hash. However, downloading the file from Google Photos to compute the hash yields incorrect results because Google Photos mutates the file contents (stripping metadata, re-compressing video), resulting in a mismatched MD5 hash. This makes in-flight caching during uploads critical.
Configuration Simplification
With these changes, the required flags are simplified to:
--checksumis no longer needed because the modtime fallback handles this automatically.--gphotos-read-sizeis no longer needed when using--ignore-sizebecause the local size is used for the fingerprint.Technical Details
Modtime Fallback in
equal()When a backend does not support modtimes (
modifyWindow == fs.ModTimeNotSupported) and file sizes match,equal()infs/operations/operations.gopreviously returnedtrueimmediately. If a file's content changed but its size remained the same, the update was silently skipped.We now fall back to hash comparison when a common hash type exists (e.g., MD5 via the
hasheroverlay):Since Google Photos does not support modtime, size is the only fallback. Falling back to hash comparison when modtime is unsupported is a robust default behavior when a hasher cache is active. For backends with no common hash, the previous behavior is preserved.
In-flight Hashing on Update
For hash-less remotes,
hasherpreviously computed the hash in-flight duringPutbut only pruned the cache entry onUpdate(overwrite). Since Google Photos mutates files on upload, downloading the file to compute the hash is not possible.In
backend/hasher/object.go,Updatenow wraps the input reader in ahashingReaderand caches the computed hash in BoltDB on successful upload, ensuring the cache remains populated.Local Size Fingerprinting
The hasher cache keys entries using a fingerprint:
"size,modtime,hash". This caused cache misses in two scenarios on Google Photos:batch_mode = async, uploads return immediately before the item is committed, reporting a size of0. Hasher would cache the hash under0,-,-. On the next sync, the file is listed with its real size (e.g.,12345), causing a cache miss and triggering an infinite upload loop.-1unless--gphotos-read-sizeis used, caching the hash under-1,-,-and causing a mismatch once the real size is known.To resolve this,
putHashesnow accepts an optionallocalSizeoverride. DuringUpdateandPut, it passessrc.Size()(the local file size) as the fingerprint key.Additionally, hasher implements ignore-size fingerprint matching in
backend/hasher/kv.gowhen the globalIgnoreSizeflag is enabled. Whenci.IgnoreSizeis active, the size component is ignored during database lookup, allowing lookups with-1or0remote sizes to successfully hit the cache.Automated Tests
fs/operations/operations_test.go:TestEqualHashFallbackbackend/hasher/hasher_internal_test.go:UpdateInFlightHashing/UnderlyingLacksHashesUpdatefor hash-less remotesUpdateInFlightHashing/UnderlyingSupportsHashesUpdatefor hash-native remotesManual Test Plan (Add, Update, Remove Operations)
Important
This manual test plan relies on the trash workaround and async batch mode panic fix. When verifying the hasher cache overlay with the Google Photos backend, you must have the fixes in
gphotos-async-panic(PR #9502) andgphotos-trash-album(PR #9498) merged/present to prevent nil panics and ensure file overwrites/deletions update correctly on the remote.Prerequisites
Verification Steps
photo_a.jpgandphoto_b.jpgto the cache remote. Verify both are uploaded and their hashes are cached.photo_a.jpgwithphoto_c.jpgand sync. Verify onlyphoto_a.jpgis transferred and its new hash is cached.photo_b.jpglocally and sync. Verify the remote file is trashed and its hash is pruned from the cache.Dependencies (gh-stack)
This PR is independent and has no dependencies.
Was the change discussed in an issue or in the forum before?
As far as I know, this was not discussed previously. This resolves upload loops and fingerprinting issues when using the
hasheroverlay with Google Photos.Checklist