Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source History #47

Closed
nyov opened this issue Nov 26, 2016 · 7 comments
Closed

Source History #47

nyov opened this issue Nov 26, 2016 · 7 comments

Comments

@nyov
Copy link

nyov commented Nov 26, 2016

Hello @stevenh,

for a while now it has been bugging me that I have not noticed your move to GitHub early on and as result did not tell you about my SVN mirror of qstat (at https://github.com/nyov/qstat-svn).
This bugs me, since I have gone to some pains to nicely migrate it to git, e.g. doing my best effort of using svn-author renames for attribution, and I feel your history here is partly lacking that.
I thought I was ready to move on and leave things be, but apparently not quite.

And I am aware I'm not the first to bother you with that kind of request, but since it becomes increasingly impossible with time, to make a favorable case for changing history (over hurting existing clones), I feel I should at least make a token attempt now to see if you will consider building future work on a nicer? history yet, then be silent about it ever after.

I also feel that by only importing the qstat2 directory of the svn repo, some interesting history (as the old website) has been wiped.
My version started as an svn mirror (and potentially still is), so (unlike #15) I have not rewritten any early commits, to keep the history faithful and the mirror-workflow working. (I've converted some svn release branches to git tags however.)
But it also means the whole subfolders structure has been imported. So there are qstat2/, and website/ folders currently in trunk.
I am not against the current repository layout here, but I feel the website should retain it's place in the history. Dropping it at this point is not the same as having it wiped out from the whole history.

Additionally I did some work in 2013 to resurrect the website(s) as plain static html and move it to a gh-pages branch. I hope that might be another point in favor.
(And it's interesting to follow the program's roots back to 1997! It would be criminal not to keep that history -if only partial- of our glorious Quake days, while possible? ;)

There has also been an svn tag legacy with binaries for older qstat versions (2.5, 2.4, 2.3). I've dropped that tag in git, to not bloat the repo with binaries, but committed the unpacked sources as separate version tags.

Finally I had also committed patches on the SF bugtracker into branches (pending merge or cleanup), though these seem to be mostly addressed in this repo by now.

So that's my dilemma. I don't feel like driving down a different road with a lackluster fork, and maintaining one is not within my time schedule. But I also don't quite feel like feeding it to /dev/null without a try to reconcile in some way, first.

I am aware rewriting history in a published project is a big no; however I also feel that should be balanced against an effort to attribute those contributors for future code archaeologists (I see you rewrote your own svn user), and not having to work with an ugly history indefinitely.

After all, projects are often enough still slugging around artifacts of lackluster cvs2svn migrations now, do we really want to immortalize "ugly" history as produced by default svn2git migrations (such as the empty commits of the "This commit was manufactured by cvs2svn to create tag xyz" variety in this history currently); what might that become next time?
My mirrors history is not without some warts either (git-svn-id lines), but without having to keep up the mirror sync if svn is dead, that could be rectified now.

Further it might be less the case of rewriting history here, as instead providing an alternative one, since both trees would have no common ancestor commits. Migrating local changes between trees would require some git rebase --onto newbase oldbase HEAD git-fu, but people could do it in their own time (having both trees in parallel) and might be agreeable to do that once for the long-term benefits.

Or perhaps 'changing history' could yet be acceptable as part of a potential repository location change to it's own umbrella "Organization", such as https://github.com/quakestat ?
Or as part of bumping the major release version (it's been "qstat2" for a long time now, and Q3A and Q4 have come and gone)?

I'd be glad to hear your view on the matter.
Thank you for your time.

@nyov
Copy link
Author

nyov commented Nov 26, 2016

And for anyone investigating my clone, the number of branches there now might be confusing.
So these are the interesting branches at this time:
trunk - is yet pointing to SVN trunk
master - is the additions from this GIT repo grafted on the SVN trunk history
multiplay/master - is this multiplay/qstat GIT tree here without common ancestors to master

A git replace ref is available, to "graft" multiplay/master on trunk (as in the master branch, but in a temporary manner, without rewriting their SHA1 id's).
Since git does not pull that refspec by default, one needs to do a
git config --add remote.origin.fetch '+refs/replace/*:refs/replace/*'
to pull it.
git replace should then show it, and git log on the branch will equal master in history besides the SHA1 id's (colordiff -up <(git log --oneline master) <(git log --oneline multiplay/master)).

@stevenh
Copy link
Contributor

stevenh commented Nov 26, 2016

There was no intention to loose any history.

The import should have maintained all attribution for all commits that could be determined. So if this was missing from the original SVN version then yes it would have been missed here too.

I'd be happy add appropriate attributes if this can be done easily without causing issues for others tracking this.

@nyov
Copy link
Author

nyov commented Nov 26, 2016

Thanks for your reply.

And I see. Sadly attribution is something git doesn't manage through repository-wide attributes or metadata, and which can't be amended without rewriting history. Git is dumb like that (it's called the 'stupid content tracker').
Each commit is part of the chain, changing any commit or attribute (such as user email) of one, changes/invalidates the checksums (SHA1 commit id's) of all the followup commits.
Which is vital for integrity verification, comparing any commit-id to someone else's you can be assured your commit and the full history leading up to that commit matches the other person's.
But alas, this means no easy solution like modifying attributes exists for git.

Since svn stores nothing but a username in the commit, but git tools identify user Identities by email, the git-svn importer, without more info, can only convert them to something like svn-user@repository-uuid, which turns up as something like this in the git logs:

commit c9bb0946b60a88709381355b660e584862c3bef0
Author:     l-n <l-n@bac25679-d237-0410-bec6-f8029acc7ebe>
AuthorDate: Mon Aug 9 20:37:33 2004 +0000
Commit:     l-n <l-n@bac25679-d237-0410-bec6-f8029acc7ebe>
CommitDate: Mon Aug 9 20:37:33 2004 +0000

    remove useless fprintf

Git itself does not care about valid e-mails, but tools which care about the email, like git email patch, or github for attribution, have nothing to go on here. So unless you know who this person was (which we do ;), you also can't possibly contact them in future, e.g. about copyright or license changes (as I did for another project).


One example case where history was lost in this conversion, is this:
Compare the commit in this repository here:
9617404
to the equivalent one in the mirrored repo here:
nyov/qstat-svn@e03ea9e

You'll notice the first is empty, because the content related to a path which was outside the imported subdirectory in the git conversion here. The same is true for all other commits which touched the website/ dir or content outside qstat2 from svn. Commits which might have touched files inside and out, will have only retained the parts which pertained to files inside qstat2, essentially would no longer represent the full commit of the original author.

Again no simple fix is possible.
But messing with history is supposed to be hard. So nobody can change it under your nose. Or it would be impossible to trust it.
For this very reason it cannot be done without causing issues for others tracking this.
People would notice. And they would have to do something about it (e.g. by switching to a new repository location).

The options are limited to doing something now, before yet more clones might be affected later on, or not doing anything and living with the partial history.
I'm very sorry for this. I can only offer doing the legwork or helping out if you choose the first option.

@stevenh
Copy link
Contributor

stevenh commented Nov 26, 2016

I don't think enough people maintain forks of this repo for it to be a big issue, as it lets face it its a legacy code base.

Given that if you could provide a step by step to do this I'd be happy to do this.

@nyov
Copy link
Author

nyov commented Nov 27, 2016

Awesome! I should have something shortly.

I also see a lot PRs got merged, so they don't need to be rebased later. That's great.

Oh and it seems you have pushed @illwieckz's history, the tags at least.
That could have been half the work, except his history is missing the website as well :(
So this might be inadvisable because tags are a bit special in git. Git will always prefer local tags over remote, so if someone pulls these now, they won't see the updated ones later, unless they clean up manually.
Which might mean people will have three histories because these tags will keep linking this other history now and not the final one.

@nyov
Copy link
Author

nyov commented Nov 27, 2016

As promised I've done the preparation work for a new git here, but if you want the actual steps I did to clone svn, clean up, and graft current git onto it, I can document that with some more time.

For now, here it is for review: https://github.com/nyov/qstat-tmp
As the svn trunk had 3 subdirectores, but git here has none, I felt I had to fake some history there for cleanliness, I hope that is acceptable (otherwise all those changes would have "fallen" into the Create README.md commit that follows, making it look quite messy).
So I injected these two fakes between svn trunk and the first git commit:
nyov/qstat-tmp@1a187d8
nyov/qstat-tmp@d9f5bcb

/edit: I've also trimmed the initial few commits which are, as illwieckz said, just noise, as well as the empty cvs release branches. No information is lost because of these, but why this repository shows roughly 10 commits less:
history

Let me know if you would find this history suitable or if there seem to be any issues.
@illwieckz, you may be interested in this as well, perhaps?

I believe just putting the new/alternative history into the repository should work fine, but I'll run a few tests and write up the process.

@stevenh
Copy link
Contributor

stevenh commented Sep 23, 2021

Very sorry we never got to this as I never got the issue notification emails.

Given how old this is now going to close, but feel free to reopen if you want to take this forward.

@stevenh stevenh closed this as completed Sep 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants