Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New script needed to pull down tarballs. #33

Open
1 of 2 tasks
dak180 opened this issue Aug 5, 2015 · 11 comments
Open
1 of 2 tasks

New script needed to pull down tarballs. #33

dak180 opened this issue Aug 5, 2015 · 11 comments

Comments

@dak180
Copy link
Member

dak180 commented Aug 5, 2015

A new script to replace distfiles/generate-distfiles-and-finkinfo-mirror.pl is needed.

It should meet the following requirements; given a collection of info files on disk as fetched by one of the selfupdate-server scripts, it should:

  • Download new tarballs as needed ensuring that the resulting tarball matches its hash.
  • Delete tarballs that are no longer referenced by any info files after a certain period of time (say maybe a week or so).

It explicitly should not care about how the info files end up on disk though it should use the timestamp files as described to rate limit itself; it should also set those self same timestamps to signal to other mirrors about general heath.

This is being worked on in distfiles/finkdist.

@dak180
Copy link
Member Author

dak180 commented Aug 5, 2015

@gecko2 @TheSin- mentioned that you might have started working on this; is any of that public somewhere?

@dak180
Copy link
Member Author

dak180 commented Aug 11, 2015

If any @fink/fink-developers have the bandwidth to take this up please do not hesitate to assign this to yourself. ☺

@akhansen
Copy link
Member

akhansen commented Oct 2, 2015

Is there a real reason to delete tarballs apart from disk storage requirements?

I'd advocate not deleting them on such a short timeframe because of the following scenario:

  1. Package foo-version1 only has a source available on our mirrors.
  2. Maintainer updates package foo to version2.
  3. Maintainer discovers that foo-version2 is broken and needs at least temporarily to be reverted back to version1 using an Epoch.

If the discovery that foo-version2 is broken occurs sufficiently late, then foo-version1 can't be built any more.

@dak180
Copy link
Member Author

dak180 commented Oct 2, 2015

We could just as easily specify that it keep them for 3 months or so; the reason indeed is to keep disk space requirements down.

If we can make a system that can delete tarballs after a specified time period at all, we can make that period whatever we think is best and can even make it configurable to fit the needs of different mirror hosts.

@gecko2
Copy link
Member

gecko2 commented Aug 15, 2016

Just updated the script to the one we are using (the one from my experimental tree)

@TheSin-
Copy link
Member

TheSin- commented Aug 15, 2016

hope it doesn't break snitch, since I see you changed the default checkout to something that will need a password. and snitch has been using the one from git and selfupdates it.

@TheSin-
Copy link
Member

TheSin- commented Aug 15, 2016

As for the delete, it should just check all dists and just keep what is in the current dists by default and delete the rest on an interval, that would be the safest. Just adding that note here for when I start the rewrite of this script.

@TheSin-
Copy link
Member

TheSin- commented Aug 16, 2016

Okay my initial rewrite based on @gecko2 work is in ea7d557

it's a WIP and does not restrict any dists yet and has not been fully tested, I've just get it to start a run and start downloading things. It does not to the dists pull, you would write a script that would update the fink and dists dir before running it. That way both fink and dists could be cvs, svn, rsync, git it won't matter. It just requires that the base of the dists dir (which doesn't need to be called dists) contain the 10.x dirs.

@TheSin-
Copy link
Member

TheSin- commented Aug 16, 2016

Also as for deleting, based on how the info files get loaded, it's hard to know what is currently in use and what isn't, I think we might need a different script for cleanup, it would have to load all info files and gather all files that should exist, OR you could run finkdist to a new dir, then move it into place remove the old which would clear any old ones, this would assume no retention of old tar balls though and likely isn't the best idea, just spit balling here ;)

@gecko2
Copy link
Member

gecko2 commented Aug 17, 2016

Already cleaned the download dir earlier by collecting all currently used checksums and removing every not referenced files. That got me around the half of the archive we have the first time i've done that.

@dak180
Copy link
Member Author

dak180 commented Sep 30, 2017

An outline of how a cleanup script might work:

It should take a time to live as an option because different mirror may want to keep things for different amounts of time.

  1. Make a list of all hashes in all info files in all trees
  2. For each file get the hash
  3. Check to see if there exists that hash in the list
    1. If it exists check to to see if a timestamp file (a file at a specified location containing a timestamp) for that hash exists
      1. if it does delete it, otherwise go to the next file
    2. If it does not exist check to to see if a timestamp file for that hash exists
      1. If it does exist read the timestamp in the file and compare it to the current time
        1. if it is greater than the time to live than delete both timestamp and tarball files, otherwise go to the next file
      2. If it does not exist make a timestamp file with the current time and go to the next file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants