-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] After upgrade to 3005.3, failure to fetch latest pillar data after being updated in git pillar repo branch #65467
Comments
Hi there! Welcome to the Salt Community! Thank you for making your first contribution. We have a lengthy process for issues and PRs. Someone from the Core Team will follow up as soon as possible. In the meantime, here’s some information that may help as you continue your Salt journey.
There are lots of ways to get involved in our community. Every month, there are around a dozen opportunities to meet with other contributors and the Salt Core team and collaborate in real time. The best way to keep track is by subscribing to the Salt Community Events Calendar. |
3005.2 changed how git pillars were cached to fix a security issue. Having to clear the cache after upgrading is probably expected. |
This does NOT only affect the pillar cache across upgrades. With a running 3006.3 master, the master will NOT refresh a git_pillar branch after the first time that branch is fetched. Steps to test:
|
Are there any chances we may get a fix for this soon? It affects both 3006 and 3005 and it was introduced after the CVE patches so it makes sense to have it fixed within both versions. This kind of issues make it harder to upgrade salt version from the versions with security issues to avoid having breaking issues and keep everything working as desired. |
@cmcmarrow do you have any insight into this? I think this regression happened as a result of your CVE patch 6120bca |
Theres been no github activity on @cmcmarrow 's account since sept. Safe to say hes no longer involved on the project |
I would like to add another detail: We tried to use the new released version 3005.5 which contain additional CVE patches, and we noticed that the pillar refresh issue is partially fixed and happens only on branches containing |
I had wondered if |
@Ikramromdhani Can you upgrade to latest Salt 3006 and retry, Salt 3005 has passed it's bug fix (last August 2023) and it about to run out of CVE support on the Feb 25th 2024, 24 days from now. At this stage nothing will be getting done to Salt 3005 given that it is dead in 24 days, hardly time to debug/test/fix/release cycle, esp. since would have to build classic packages and there is a couple of days in just getting that done. |
@Ikramromdhani Rereading this I remember Chad talking about how caching is changed and how the cache is now only updated correctly, rather than as it was where the cache was needlessly getting updated due to other commands being run, need to go back and find the actual commit etc., it was last year - Autumn time. And yup, Chad is gone, moved on to greener pastures, so I am reading up on GitFS, etc. to get fully familiar (having to do more areas since less people now on the core team), so will update next week with code lines that changed, but I expect you were relying on a side-effect of an operation that has since been fixed (namely incorrectly updating the cache), but will get the full details for you. That said, doesn't mean there isn't an error too. |
We are also observing this error after recently upgrading from
We have observed that when we use
However, if we create a new branch named (This evidence conflicts with #65467 (comment), although the commenter had not actually confirmed it themselves) Note:
|
@donkopotamus , that is what I was about to confirm. Tested today with 3006.6 using pygit2 and still have issues pulling changes for pillars when relying on pillarenv branch having |
I observed that the cache directory organizes cached branches by foldername=branchname, and that branches with For example, with three pillar branches:
If this turns out to be the actual issue, then the obvious solution would appear to be escaping the
or
or
|
@Ikramromdhani Won't make it for Salt 3006.7, that is close to release, waiting on clean tests in the pipeline. Possibly for 3007, but on my backlog, dealing with #65816, which may be related (similar code changes for pillarenv went in at same time for gitfs locks), writing tests for that fix now. |
Not sure if it helps, but it looks like the CVE fix in 3005.2 and then #65017 in 3005.3 that mitigated a negative effect of that fix, probably contributed to this issue |
Yup, the set of changes that Chad did and had to be adjusted, applied to both the gitfs changes he did along the pillarenv and cache changes, etc. That is why I expect some linkage to the gitfs changes. |
Just a heads up that we tested 3006.7 and the problem is still there. Any chances to get a fix for this on salt 3007? This problem is holding us for performing salt upgrades. |
@dmurphy18 since the issue fix did not make it to 3007. Just would like to ask if there are specific version to target? Thank you |
@Ikramromdhani Working on fix for Git FS and the Git Pillar is right after it, fix for Git FS is done but have to refractor a new test after PR review (#65937), after that then work on Git Pillar since it will be possibly affected by GitFS changes. |
@donkopotamus Just making sure, all of your changes on branch are committed and pushed ?, since I am doing the same and seeing the changes, regardless of pillarenv |
@dmurphy18 I understand why such questions need to be asked but yes, changes are definitely committed and pushed! (I've also repeated it this morning) This isn't an isolated test really ... I've tested each minor What else can we provide that might help us close the gap between your test, and our production systems? The config snippets I posted above are our only pillar related config changes on the master, with all other pillar related setting being the default. Here is our
|
@dmurphy18 I believe I have located a symptom of the problem that might help. Its all about the location of the marker file Take my test case above where I created both a branch
Now ... suppose I push a change that affects the pillar, as in my previous example
With a
|
A further refinement of the above:
Meanwhile, on the minion we sleep for five minutes ...
As |
Summary
Addendum: No idea if its relevant ... but the analysis above is for the disk cache for the
I currently have no idea if they affect a cache for specific named branches |
It does not ... if we replace the
then the cache directory looks slightly different, and does not use the branch name at all. (This may explain why you cannot reproduce it ... are you using
The Problem?So, looking at
Now ... look at what's dropping these
Assuming that Next looking at
|
Related, but I wonder what happens if you name a branch |
@OrangeDog I take you out behind the bike shed 🤣 , joking. Some interesting takes on that parameter input, beginning to think might need some sanity checking on valid input etc. , brings horrors of Perl (Halloween/Day of the Dead/All Saints Day after all) and sanity checking from a couple decades ago to mind. Good Point. |
I'm pretty sure it would only be an issue if you never ran a minion on the master, as if you can control the repo you can already add states to do whatever you want. |
@OrangeDog Thankfully a git ref cannot contain
|
@dmurphy18 , I test new version of salt master
|
@dmurphy18 any updates on this issue? we would like to move to using 3006 and we were wondering if a fix could be included in 3006.10 or it may be something we need to live with |
I'm affected by this issue. The only workaround I found is to run on salt master:
I would be happy to assist with testing or providing feedback if needed. However, I am not a developer, so I won't be able to contribute directly to fixing the code. Let me know if there's anything I can do to help further. |
@Ikramromdhani I have been unable to reproduce this with GitPython and pygit2 < 1.15.0 (1.15.0 and above has its own problems, see #67017). I am going to be gone on Christmas Break and then PTO, so won't be working on this till new year 2025. So apart from the later versions of pygit2, there has to be something else in your environment which is affecting functionality, since I am unable to reproduce the issue. Is there any other detail that could be added ?, will be rereading this so see if there is something I missed too. |
@dmurphy18 @Ikramromdhani Can you review again this sequence of comments on this issue, which I believe identifies the exact cause of the issue:
The latter in particular identifies the problem as the piece of code that is responsible for dropping
@dmurphy18 Also relevant in there is the following:
Can you confirm that you are using an A WORKING FIXIf I alter the mentioned code to instead walk the tree at @Ikramromdhani I, for example, no longer blow away the entire
on the branch I need fixed and 60 seconds later that branch will have been updated. (Clearly I don't like having to do this manually all the time though!) |
@dmurphy18 can you confirm you still cannot reproduce when using a configuration as below:
|
@Ikramromdhani Back and looking at this again, it was a long PTO. |
@Ikramromdhani Did do the env with GitPython, will try with pygit2, see
Once I get my test environment set back up. |
@donkopotamus Thanks for those comments, But wondering what not seeing this with pygit2 < 1.15 or on GitPython, given the code examples are in the base classes. Getting a test environment to check that and back up to speed after a long vacation. |
@dmurphy18 , we are still witnessing the issue with |
@Ikramromdhani @donkopotamus Have a solution which worked in hand testing, converting '/' in branch names using |
That doesn't sound like a great solution, as branch names can also contain |
@dmurphy18 As @OrangeDog notes, substituting Git ref names are specifically designed to generally map well, and be safe, on filesystems. The current method of mapping them onto a directory structure does not seem wrong in any sense ... its simpy the traversing of that structure that is wrong. i.e I think:
|
@donkopotamus @OrangeDog Did I mention I hate gitfs 🤣 , I was duplicating this Line 3387 in 2b44a6c
and yeah, I refactored that line, but the replace '/' with '-' was originally there for 5 years previously. I am wondering why
So would it not also treat the '/' in a git branch name as an Understanding the replacement of '/' with dash '-' does preclude Lines 483 to 488 in 2b44a6c
Noting the Still to put the changes out for team review, so I am sure they will have some comments too. Please let me know If I am wrong in my logic about a minor side effect. |
We are trying to upgrade from 3002.2 to something newer(onedir) and stumbled upon this issue. Using However we do not have Edit: It seems I am not experiencing this issue with 3006.9 or I might have introduced some bad config when I was testing. |
@dmurphy18 Just a side comment on this first before talking about
So there is no potential for clash here beyond the extremely miniscule possibility of a hash collision |
SummaryThis is a long and detailed response but this might be a good summary of my thoughts:
My proposed solution is just a few lines of code, but I may well be missing something altogether. Details
Yes ... but in this case that is exactly the point. Currently, branch names that contain a
These are the working directories, with their git repos being in:
This is the current situation for how
Next,
This also is absolutely fine for all the reasons given above. When the pillar is fetched, how does it decide if it needs refetching? If the pillar for Lines 1307 to 1309 in 2b44a6c
ie it looks for
This logic is also fine, and works perfectly well. But where do those Lines 2791 to 2803 in 2b44a6c
The intent of this code should be to drop
Instead, as it uses
Now, requests for the How do we drop a
For example:
Note that this logic all still works fine for the GitPillar when |
@donkopotamus Had the PR passing and it got merged as I was about to edit it to try the os.walk (had played around with os.walk and similar to the git branch name 'doggy/moggy' in a test file). The PR was passing all tests and had the milestone set for 3006.10, and close to release, hence it made sense to merge it (by the merge master - had requested that this morning before I saw your comment, was on PTO Friday and only saw it after lunch). Hence opened new Tech-Debt issue to fix this up to use the os.walk approach after the 3006.x branch is released which is immanent, see #67722 Hence closing this Issue since PR #66954 has been merged |
Description of Issue
A minion/master is unable to get the new pillar data from an alternative branch in git after it was updated on remote git pillar repository. This was working with 3005.1, but broke after upgrade to 3005.3. It also seem to be the case for those using 3006.3. Please refer to community discussion: https://saltstackcommunity.slack.com/archives/C7K04SEJC/p1697576330136199
Setup
Amazon Linux 2 salt master, configured with git repository for states and pillar data.
Steps to Reproduce Issue
Master config
Push an update to an existant pillar called test in repo.git on branch testpillar. On a minion, run salt-call pillar.get test pillarenv=testpillar
Expected behavior
The new value of pillar test should be visible in the output of the salt call on the minion.
Actual behavior
The minion get the old data of pillar test. Please note that when executing the same comand after deleting
/var/cache/salt/master/git_pillar
, the minion get the correct valueVersions Report
The text was updated successfully, but these errors were encountered: