Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition deleting build #34

Open
TheLastProject opened this issue Apr 5, 2018 · 11 comments
Open

Race condition deleting build #34

TheLastProject opened this issue Apr 5, 2018 · 11 comments

Comments

@TheLastProject
Copy link
Contributor

I use uploadtool.sh for both macOS .dmg and Linux AppImage on https://github.com/Pext/Pext/releases. Someone, a race condition make the macOS build delete the AppImage build.

At the time this happened, Pext/Pext@5458d76 was the most recent continous commit. Which is build 370 on Travis: https://travis-ci.org/Pext/Pext/builds/362457504.

I won't wait forever with pushing more commits, which will surely fix it again, but hopefully the logs can help you figure out how the race condition happened.

@TheLastProject
Copy link
Contributor Author

Oh no, when I clicked restart build it deleted the old logs :(

@probonopd
Copy link
Owner

Inserting a delay into one of the arches should reduce the probability of timing collissions... I think

@nurupo
Copy link

nurupo commented May 27, 2018

Just want to point out that there are two different race conditions that can happen:

  1. Race condition between jobs of a single build.
    You push a commit to master, Travis starts a build with several jobs, the jobs get to the release deletion part at around the same time, resulting in one job deleting other job's build artifacts.
  2. Race condition between jobs of multiple builds.
    You push several times to master, which starts several concurrent Travis builds, each with different commit hash. Now jobs of those builds keep deleting and creating releases because the hash doesn't or does match that of the release's hash.

All those race conditions are solvable by using the new Stages feature of Travis-CI. The stages feature allows you to combine jobs into stages, and each stage runs sequentially, e.g. jobs from stage 2 don't start running until jobs from stage 1 finish. The jobs within a stage run in parallel, just like they usually are.

So, the idea here is that you have two stages. All of your regular jobs are in the first stage, and each job creates their own draft release. They use continuous-$TRAVIS_BUILD_NUMBER-$TRAVIS_JOB_NUMBER to name the tag/release. Then you have a single job in the 2nd stage. It is guaranteed to run after all other jobs in this build has finished running, so all the releases were already made. It downloads build artifacts of all continuous-$TRAVIS_BUILD_NUMBER-* releases, deletes those releases, and creates a new release continuous-$TRAVIS_BUILD_NUMBER with all the artifacts combined. It also checks if there are any continuous-$NUMBER or continuous-$NUMBER-* releases, with $NUMBER < $TRAVIS_BUILD_NUMBER, and deletes them, which serves as a cleanup of previous continuous builds.

As you can see, the race condition in (1) doesn't happen anymore, the jobs can't overwrite the release, since each job creates their own separate draft release. The race condition of (2) is also removed, as each build creates its own numbered release continuous-$TRAVIS_BUILD_NUMBER, they are not working on the same release called continuous, so there simply can't be any race condition here.

To fix the race conditions you will have to give up on a single continuous release, all releases will be numbered continuous-$TRAVIS_BUILD_NUMBER.

The jobs create draft releases, so they would be visible only to repo owners, regular users won't see them. The users might see several continuous-$NUMBER builds, if a race condition happens when the lower build number creates a release after the higher build number made sure that all releases with lower build numbers are removed. But this not really an issue, at most it just clutters the Releases page, and all those extra releases would be removed with the next git push.

@TheLastProject
Copy link
Contributor Author

To fix the race conditions you will have to give up on a single continuous release, all releases will be numbered continuous-$TRAVIS_BUILD_NUMBER.

That sounds like it'd break appimageupdate though, because it can no longer just watch the continuous branch.

@nurupo
Copy link

nurupo commented May 27, 2018

It could be patched to work with this.

@nurupo
Copy link

nurupo commented May 27, 2018

Another way to go about it, one that allows to have a continuous tag, is to do almost the same: have each job of 1st stage upload build artifacts into its own separate continuous-$TRAVIS_BUILD_NUMBER-$TRAVIS_JOB_NUMBER draft release, than have the job in the 2nd stage collect all those artifacts and delete the created draft releases. So far this is identical to the previous solution proposed, but now comes the difference. Use GitHub or Travis API to make sure there is no newer build running for the current branch (you probably can get by without using API by just git cloning and making sure our commit is the latest in the branch?). If there is a newer build running, then this job will just exit. If there is no newer build running, it will delete the previous continuous release and create a new one, uploading all previously downloaded artifacts to it.

This obviously solves (1) race condition. There also should be no (2) race condition, because even if a new build gets created right after an alredy running build queries the API/git to make sure that there is no newer build, which is a new race condition, there should be enough delay between the two for the older build to finish uplading artifacts before the newer build deletes the continuous release, especially given that all this happends in the 2nd stage, meaning there will be at least 1 job in the 1st stage that needs to be ran, adding the delay. Jjst to rephrase this, if this race does happen, it will happen when older build is in the 2nd stage but newer build just started 1st stage, so you got all the time before the newer build gets to stage 2. You could also add sleep 300 command before checking if there is a newer build, if the delay created by the existance of 1st stage is not enough.

I'm happy with this solution too, it eliminates both race conditions, and the new race condition that could happen is very unlikely to happen, and if it does happen to you, it is fully solvable with a long enough sleep <constant>.

@nurupo
Copy link

nurupo commented May 27, 2018

The only downside to this solution is that if you have, for example, 3 builds running, then the older two builds would not publish build artifacts, because they will see there is a newer third build going on and will exit, and if the newer third build fails, then none of those 3 builds will publish build artifacts. This doesn't happen with the 1st solution I proposed, but with it you don't get the continuous tag without any numbers. So these are the upsides/downsides of those two solutions.

@probonopd
Copy link
Owner

Right now it works for me (tm) and every change is likely to break it...

@nurupo
Copy link

nurupo commented May 28, 2018

Thanks for the heads-up, we will not expect this bug be fixed anytime soon then.

@leozide
Copy link

leozide commented Mar 12, 2019

I was running into this since I upload Linux and Mac continuous builds to my project and found a fix here: https://github.com/olive-editor/olive/blob/master/.travis/after_success.sh

I've copied that code to my yml file here: https://github.com/leozide/leocad/blob/master/.travis.yml

May not solve all cases but it's been working for me so far.

@nurupo
Copy link

nurupo commented May 25, 2019

I have published code for ci-release-publisher today, which tries to avoid race conditions when publishing, so if you are having race condition issues you could take a look into it. This is a rather new project, the repository was created just a few hours ago, so probably no one besides me has used it yet, so there might be some rather obvious bugs, but as long as you are aware of that then by all means give it a try and report any issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants