hasher/operations: fix in-flight hash caching, fingerprinting, and modtime-fallback for hash-less remotes by davispw · Pull Request #9500 · rclone/rclone · GitHub
Skip to content

hasher/operations: fix in-flight hash caching, fingerprinting, and modtime-fallback for hash-less remotes#9500

Draft
davispw wants to merge 4 commits into
rclone:masterfrom
davispw:fix-equal-hash-modtime-unsupported
Draft

hasher/operations: fix in-flight hash caching, fingerprinting, and modtime-fallback for hash-less remotes#9500
davispw wants to merge 4 commits into
rclone:masterfrom
davispw:fix-equal-hash-modtime-unsupported

Conversation

@davispw

@davispw davispw commented Jun 7, 2026

Copy link
Copy Markdown

Summary

This pull request introduces three changes to make the hasher backend overlay work reliably with Google Photos, and improve correctness for remotes without native hash support, modtimes, or immediate size reporting:

  1. Modtime Fallback in equal(): Falls back to comparing hashes when modtime is unsupported but a common hash type exists.
  2. In-flight Hashing on Update: Computes and caches hashes in-flight during file overwrites (Update) for remotes without native hashes.
  3. Local Size Fingerprinting: Stores the computed hash under the local source file size fingerprint rather than the remote-reported size, preventing upload loops in async batch mode and listing mismatches.

Previously, running the hasher overlay with Google Photos required a complex combination of flags to work around listing and update behaviors:

--checksum                     # force hash-based equality check (no modtime)
--ignore-checksum              # ignore remote hash (Google Photos changes it)
--gphotos-read-size            # fetch actual file size via HEAD (otherwise -1)
--gphotos-batch-mode=sync      # async caused a nil-pointer panic and duplicate uploads

Even with these flags, overwriting a file caused the hasher cache entry to be pruned without replacement, meaning the next sync had to re-verify the hash. However, downloading the file from Google Photos to compute the hash yields incorrect results because Google Photos mutates the file contents (stripping metadata, re-compressing video), resulting in a mismatched MD5 hash. This makes in-flight caching during uploads critical.

Configuration Simplification

With these changes, the required flags are simplified to:

--ignore-checksum              # still needed (Google Photos may change hash)
--ignore-size                  # tells hasher to ignore remote size on fingerprint matching

Technical Details

Modtime Fallback in equal()

When a backend does not support modtimes (modifyWindow == fs.ModTimeNotSupported) and file sizes match, equal() in fs/operations/operations.go previously returned true immediately. If a file's content changed but its size remained the same, the update was silently skipped.

We now fall back to hash comparison when a common hash type exists (e.g., MD5 via the hasher overlay):

if modifyWindow == fs.ModTimeNotSupported {
    common := src.Fs().Hashes().Overlap(dst.Fs().Hashes())
    if common.Count() == 0 {
        return true  // no common hash type — can't do better
    }
    // Fall through to CheckHashes below
}

Since Google Photos does not support modtime, size is the only fallback. Falling back to hash comparison when modtime is unsupported is a robust default behavior when a hasher cache is active. For backends with no common hash, the previous behavior is preserved.

In-flight Hashing on Update

For hash-less remotes, hasher previously computed the hash in-flight during Put but only pruned the cache entry on Update (overwrite). Since Google Photos mutates files on upload, downloading the file to compute the hash is not possible.

In backend/hasher/object.go, Update now wraps the input reader in a hashingReader and caches the computed hash in BoltDB on successful upload, ensuring the cache remains populated.

Local Size Fingerprinting

The hasher cache keys entries using a fingerprint: "size,modtime,hash". This caused cache misses in two scenarios on Google Photos:

  1. Async Batch Upload Loops: Under batch_mode = async, uploads return immediately before the item is committed, reporting a size of 0. Hasher would cache the hash under 0,-,-. On the next sync, the file is listed with its real size (e.g., 12345), causing a cache miss and triggering an infinite upload loop.
  2. Missing Remote Sizes: Without async mode, Google Photos returns a size of -1 unless --gphotos-read-size is used, caching the hash under -1,-,- and causing a mismatch once the real size is known.

To resolve this, putHashes now accepts an optional localSize override. During Update and Put, it passes src.Size() (the local file size) as the fingerprint key.

Additionally, hasher implements ignore-size fingerprint matching in backend/hasher/kv.go when the global IgnoreSize flag is enabled. When ci.IgnoreSize is active, the size component is ignored during database lookup, allowing lookups with -1 or 0 remote sizes to successfully hit the cache.


Automated Tests

fs/operations/operations_test.go:

Test Covers
TestEqualHashFallback Hash comparison used when modtime unsupported and common hash exists

backend/hasher/hasher_internal_test.go:

Test Covers
UpdateInFlightHashing/UnderlyingLacksHashes Hash computed in-flight and cached on Update for hash-less remotes
UpdateInFlightHashing/UnderlyingSupportsHashes Cache pruned (not re-written) on Update for hash-native remotes
ok  github.com/rclone/rclone/backend/hasher      2.8s
ok  github.com/rclone/rclone/fs/operations       20.9s

Manual Test Plan (Add, Update, Remove Operations)

Important

This manual test plan relies on the trash workaround and async batch mode panic fix. When verifying the hasher cache overlay with the Google Photos backend, you must have the fixes in gphotos-async-panic (PR #9502) and gphotos-trash-album (PR #9498) merged/present to prevent nil panics and ensure file overwrites/deletions update correctly on the remote.

Prerequisites

go build .

# Download three distinct test images
curl -L -o photo_a.jpg "https://picsum.photos/seed/rclone_pr3_a/800/600"
curl -L -o photo_b.jpg "https://picsum.photos/seed/rclone_pr3_b/800/600"
curl -L -o photo_c.jpg "https://picsum.photos/seed/rclone_pr3_c/800/600"

# Configure the hasher overlay remote
rclone config create gphotos_cache hasher remote=gphotos: hashes=md5 max_age=24h

Verification Steps

  1. Add operation: Sync photo_a.jpg and photo_b.jpg to the cache remote. Verify both are uploaded and their hashes are cached.
  2. Skip operation: Re-sync with no changes. Verify zero transfers occur (cache hit).
  3. Update operation: Overwrite photo_a.jpg with photo_c.jpg and sync. Verify only photo_a.jpg is transferred and its new hash is cached.
  4. Skip after update: Re-sync again. Verify zero transfers occur (cache hit on the new hash).
  5. Remove operation: Delete photo_b.jpg locally and sync. Verify the remote file is trashed and its hash is pruned from the cache.

Dependencies (gh-stack)

This PR is independent and has no dependencies.


Was the change discussed in an issue or in the forum before?

As far as I know, this was not discussed previously. This resolves upload loops and fingerprinting issues when using the hasher overlay with Google Photos.


Checklist

  • I have read the contribution guidelines.
  • I have added tests for all changes in this PR if appropriate.
  • I have added documentation for the changes if appropriate.
  • All commit messages are in house style.
  • I'm done, this Pull Request is ready for review :-)

@davispw davispw force-pushed the fix-equal-hash-modtime-unsupported branch 5 times, most recently from dea1384 to ef9f34e Compare June 7, 2026 18:08
@davispw davispw changed the title fs/operations: use hash comparison when modtime is not supported hasher/operations: fix in-flight hash caching, fingerprinting, and modtime-fallback for hash-less remotes Jun 7, 2026
@davispw davispw force-pushed the fix-equal-hash-modtime-unsupported branch 9 times, most recently from c3aa7d9 to c9ee816 Compare June 13, 2026 18:17
@davispw davispw force-pushed the fix-equal-hash-modtime-unsupported branch from fcec711 to 99ffb3c Compare June 13, 2026 18:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant