Replicating git history for a file takes 1 merge commit and 3 commits, and this is propably one of the most complex workflows I have encountered:
(might not be correct...)
git checkout -b work
git mv file file.tmp
git commit
git checkout -b copy HEAD^
git mv file file2
git commit
git checkout work # can be skipped if you merge "work" instead.
git mergecopy # "work" and "copy" must conflict, stage file.tmp and file2 andcommit the result.
git mv file.tmp file
git commit<git blame is identical for file and file2>
I would love to squash this into a single commit, but git doesn’t have a copy operation or detection. :(
You seem to be making this very complex. But it really isn’t. Yes, git doesn’t track renames. So you are working around it by splitting your operation into 2 commits.
A pure rename.
A file change.
This way 1 is always considered a rename and 2 is just a regular file change with the same path. You may also consider tweaking the default rename detection threshold with flags like --find-renames or options like diff.renameLimit.
Would it be nice if Git tracked renames? Probably. But that isn’t how the data model works so it is unlikely to happen soon. But maybe they could add some metadata.
Huh. I have never in my 19 year career using git, ever wanted to copy a file and pretend all of the history of that file is also the history of the new file. I mean, I don’t think I’ve ever even wanted to copy a file? Why are you copying a file?
Like, maybe I’m just too familiar with git to see the forest for the trees, but what the heck are you doing over there? 😅
And just in case it’s useful, a tip is that you can use git blame -C to have the blame algorithm use a heuristic to try and find a “source” line if it was moved, including from another file, during a commit, and then continue following the history of that line, to try and get the real commit where this was written, not just the last time it was moved around.
I’m splitting a several thousand LOC file, which I don’t have previous history in.
Like, maybe I’m just too familiar with git to see the forest for the trees, but what the heck are you doing over there?
Normally copying a file and committing transfers the authorship to you, because the copy just appears from nothing as a brand new file, never known to git. This would prevent browsing the per-line “who changed this last” history past the copy and obfuscate who wrote what and when.
Interesting. Yeah, sounds like what git blame -C is for, so I’ve never made copies when splitting files, I’ve just moved lines between files naively. But I guess if one’s tools are limited and doesn’t have the ability to -C, then I guess I could respect the hack that is that solution?
I mean, I’m 99% sure git doesn’t store blame or authorship info in the pack files, even as a convenience cache, and just guesses by traversing the patch log with heuristics live when you run blame anyway, so the history mostly doesn’t matter there, but the way you’ve done it does seem to have tricked the heuristics into doing what you want without relying on an option, so that’s neat! It’s an interesting hack, and I like interesting hacks 😛
By the way, if there are down votes, they’re not from me!
I’m not a copyright-lawyer, but I think there are implications on who has authored the code, so preserving this detail can be important. The fancy copy reduced my blame by +90% on the final result.
git blame output can be affected by e.g. ignoring white-space changes.
I can come up with some contrived examples. Maybe someone screwed up the history and they’re trying to repair it such that no one needs to worry about a rebase on their next pull? “compliance”/legal/cya reasons? I also wish to know!
Replicating git history for a file takes 1 merge commit and 3 commits, and this is propably one of the most complex workflows I have encountered:
(might not be correct...) git checkout -b work git mv file file.tmp git commit git checkout -b copy HEAD^ git mv file file2 git commit git checkout work # can be skipped if you merge "work" instead. git merge copy # "work" and "copy" must conflict, stage file.tmp and file2 and commit the result. git mv file.tmp file git commit <git blame is identical for file and file2>I would love to squash this into a single commit, but git doesn’t have a copy operation or detection. :(
You seem to be making this very complex. But it really isn’t. Yes, git doesn’t track renames. So you are working around it by splitting your operation into 2 commits.
This way 1 is always considered a rename and 2 is just a regular file change with the same path. You may also consider tweaking the default rename detection threshold with flags like
--find-renamesor options likediff.renameLimit.Would it be nice if Git tracked renames? Probably. But that isn’t how the data model works so it is unlikely to happen soon. But maybe they could add some metadata.
Huh. I have never in my 19 year career using git, ever wanted to copy a file and pretend all of the history of that file is also the history of the new file. I mean, I don’t think I’ve ever even wanted to copy a file? Why are you copying a file?
Like, maybe I’m just too familiar with git to see the forest for the trees, but what the heck are you doing over there? 😅
And just in case it’s useful, a tip is that you can use
git blame -Cto have the blame algorithm use a heuristic to try and find a “source” line if it was moved, including from another file, during a commit, and then continue following the history of that line, to try and get the real commit where this was written, not just the last time it was moved around.I’m splitting a several thousand LOC file, which I don’t have previous history in.
Normally copying a file and committing transfers the authorship to you, because the copy just appears from nothing as a brand new file, never known to git. This would prevent browsing the per-line “who changed this last” history past the copy and obfuscate who wrote what and when.
(why the downvote?)
Interesting. Yeah, sounds like what
git blame -Cis for, so I’ve never made copies when splitting files, I’ve just moved lines between files naively. But I guess if one’s tools are limited and doesn’t have the ability to-C, then I guess I could respect the hack that is that solution?I mean, I’m 99% sure git doesn’t store blame or authorship info in the pack files, even as a convenience cache, and just guesses by traversing the patch log with heuristics live when you run blame anyway, so the history mostly doesn’t matter there, but the way you’ve done it does seem to have tricked the heuristics into doing what you want without relying on an option, so that’s neat! It’s an interesting hack, and I like interesting hacks 😛
By the way, if there are down votes, they’re not from me!
I’m not a copyright-lawyer, but I think there are implications on who has authored the code, so preserving this detail can be important. The fancy copy reduced my blame by +90% on the final result.
git blame output can be affected by e.g. ignoring white-space changes.
I can come up with some contrived examples. Maybe someone screwed up the history and they’re trying to repair it such that no one needs to worry about a rebase on their next pull? “compliance”/legal/cya reasons? I also wish to know!