patch_generate: only calculate binary diffs if requested by pks-t · Pull Request #3922 · libgit2/libgit2 · GitHub
Skip to content

patch_generate: only calculate binary diffs if requested#3922

Merged
ethomson merged 1 commit into
libgit2:masterfrom
pks-t:pks/diff-only-load-binaries-when-requested
Sep 2, 2016
Merged

patch_generate: only calculate binary diffs if requested#3922
ethomson merged 1 commit into
libgit2:masterfrom
pks-t:pks/diff-only-load-binaries-when-requested

Conversation

@pks-t

@pks-t pks-t commented Sep 1, 2016

Copy link
Copy Markdown
Member

When generating diffs for binary files, we load and decompress
the blobs in order to generate the actual diff, which can be very
costly. While we cannot avoid this for the case when we are
called with the GIT_DIFF_SHOW_BINARY flag, we do not have to
load the blobs in the case where this flag is not set, as the
caller is expected to have no interest in the actual content of
binary files.

Fix the issue by only generating a binary diff when the caller is
actually interested in the diff. As libgit2 uses heuristics to
determine that a blob contains binary data by inspecting its size
without loading from the ODB, this saves us quite some time when
diffing in a repository with binary files.

When generating diffs for binary files, we load and decompress
the blobs in order to generate the actual diff, which can be very
costly. While we cannot avoid this for the case when we are
called with the `GIT_DIFF_SHOW_BINARY` flag, we do not have to
load the blobs in the case where this flag is not set, as the
caller is expected to have no interest in the actual content of
binary files.

Fix the issue by only generating a binary diff when the caller is
actually interested in the diff. As libgit2 uses heuristics to
determine that a blob contains binary data by inspecting its size
without loading from the ODB, this saves us quite some time when
diffing in a repository with binary files.
@pks-t

pks-t commented Sep 1, 2016

Copy link
Copy Markdown
Member Author

@carlosmn

carlosmn commented Sep 1, 2016

Copy link
Copy Markdown
Member

We should probably be passing in the minimal information there. The comment for the type doesn't say what fields you can expect to have filled, so it looks like we should define what gets filled and then fill that. Presumably we'd want to pass in the path, object ids and mode so you can display the message that it changed if that's all you're after.

@ethomson

ethomson commented Sep 1, 2016

Copy link
Copy Markdown
Member

I think I would prefer to pass a NULL here if we didn't load it. The path and object IDs and such are in the delta, and is suitable for a caller that just wanted to emulate Binary files a/foo.txt and b/foo.txt differ.

The git_diff_binary contains the old side and the new side of the binary data which is only useful for displaying the actual binary contents and emulating git diff --binary. If we don't load that data then I think NULL would be very appropriate.

This is a good fix, I'm going to merge it and add NULL handling on top.

Thanks @pks-t !

@ethomson

ethomson commented Sep 1, 2016

Copy link
Copy Markdown
Member

After poking around with this a bit more, I realized that NULL probably isn't appropriate but that instead a git_diff_delta should indicate whether there's any data in its files... (It can't look at the file sides since it may have been generated by parsing a patch that said Binary files differ or by running git_diff_blob_to_blob on two zero byte files, which are harder to discern.)

While looking at this I realized that we didn't have any way to parse a patch that had Binary files differ situation, so I'm fixing that up.

@ethomson

ethomson commented Sep 2, 2016

Copy link
Copy Markdown
Member

@ethomson ethomson merged commit ce54e77 into libgit2:master Sep 2, 2016
@pks-t pks-t deleted the pks/diff-only-load-binaries-when-requested branch October 10, 2016 07:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants