Skip to content

Conversation

@fao89
Copy link
Member

@fao89 fao89 commented Oct 24, 2025

Description

https://issues.redhat.com/browse/AAP-45875

  • What is being changed?
    This change is the foundation for enabling runtime platform feature flags for AAP. This updates the django-ansible-base to be the central location where all platform flags are defined. Components can inherit the ansible_base.feature_flags application to inherit all platform feature flag definitions.
  • Why is this change needed?
    To enable runtime platform feature flags in AAP.
  • How does this change address the issue?
    This change addresses the issue by defining a database flag source, which contains all the feature flags along with their associated metadata. These feature flags are installed into each components database at install-time and kept in sync via resource sync (Gateway is the provider)

Before this can be merged, the following should be done:

  1. Confirm each feature flags metadata
    1. Only Dispatcher and EDA_Analytics feature flags need their metadata confirmed now.

Type of Change

  • New feature (non-breaking change which adds functionality)
  • Documentation update
  • Test update

Self-Review Checklist

  • I have performed a self-review of my code
  • I have added relevant comments to complex code sections
  • I have updated documentation where needed
  • I have considered the security impact of these changes
  • I have considered performance implications
  • I have thought about error handling and edge cases
  • I have tested the changes in my local environment

Testing Instructions

Prerequisites

Steps to Test

Recommend testing this through AAP Dev, but it can be tested directly via the test app as well.

  1. Clone https://github.com/ansible/aap-dev/
  2. Enter aap-dev directory and run make configure-sources
  3. Use this PR for the DAB source
  4. Use this PR for the Gateway source - https://github.com/ansible-automation-platform/aap-gateway/pull/1056

Expected Results

AAP Deploys as expected with database feature flags.

Additional Context

Required Actions

if system_user:
queryset = queryset.exclude(object_id=system_user.id)

return queryset
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems extremely out-of-place for this patch. On the surface it seems reasonable, but I want to ask that we treat it separately so we can have a paper trail (new Jira, etc.). In what case is it known that we had a user sync problem with the _system user?

I want that for basic accountability surrounding the change. I believe there is risk in the change, and I don't want that risk to be attributed to the feature flags feature, which it seems to have nothing to do with it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was having issues with test_app/tests/resource_registry/test_resource_sync.py::test_resource_sync it seems to be flaky

@fao89 fao89 force-pushed the phase2/feature-flags/poc branch from 58dc37b to e27082d Compare October 27, 2025 14:30
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to ask for @chrismeyersfsu review, as he has some patches to code here, philosophically non-conflicting. The sync changes are something I want to advertise more widely.

@fao89 fao89 force-pushed the phase2/feature-flags/poc branch from e27082d to 3b5558b Compare October 27, 2025 14:33
@AlanCoding
Copy link
Member

Only Dispatcher and EDA_Analytics feature flags need their metadata confirmed now.

I'm not sure if we're going to keep the dispatcher feature flag, as it was for a transition period. So it's likely better to never port it to the updated flags system.

"""

queryset = AAPFlag.objects.order_by('id')
permission_classes = try_add_oauth2_scope_permission([IsSuperuserOrAuditor])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@john-westcott-iv remind me, is the rule for using try_add_oauth2_scope_permission that we need it for any view that is surfaced by aap-gateway? That would explain why it is added here, I just want to have it written down.

Copy link
Contributor

@BrennanPaciorek BrennanPaciorek Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct. Without this, any views that don't have the permission class will get un-scoped permissions when using PATs (read tokens effectively become read write tokens).

class FeatureFlagsStatesView(AnsibleBaseDjangoAppApiView, ModelViewSet):
"""
A view class for displaying feature flags states.
To add/update/remove a feature flag, see the instructions in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here comes the big questions - we have a new endpoint being added, follow that.

Under what conditions will this be added? What specific actions are needed downstream of this change? Trying to fill in those blanks:

  • Each service (AWX, eda-server, etc.) already have this added to INSTALLED_APPS, but the new migration means that installers need to be sure that migration is run
  • the UI will need to make use of these new endpoints.

But even from these 2 above points, I have a point of confusion. All services will expose these new endpoints. So then what endpoints should the UI make use of? Which endpoints will work?

If a flag is modified via the Gateway API, will that be actively synchronized to the services, or passively?

If the UI is only going to use the endpoints from the Gateway API, should we disable the new API in the other services or make them read-only?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All services will expose these new endpoints. So then what endpoints should the UI make use of? Which endpoints will work?

Answered:

There is an additional viewset included in aap-gateway that adds partial_update, which is not present here. So this exposes everywhere as read-only. The read-only, GET, view is still in gateway (the resource server) but not in the other components, which are fully read-only.

If a flag is modified via the Gateway API, will that be actively synchronized to the services, or passively?

Actively.

The partial_update method definition does synchronize to components, actively.

If the UI is only going to use the endpoints from the Gateway API, should we disable the new API in the other services or make them read-only?

Yes, already done. 👍

fields = ["name", "state"]

def to_representation(self, instance=None) -> dict:
instance.state = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is surprising to me. What is going on here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't get it too, @zkayyali812 could you please take a look?

Copy link
Contributor

@zkayyali812 zkayyali812 Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont exactly recall why this is here, but I think this is a great call out. Im guessing this line is a remnant, and could likely be removed.

@fao89 fao89 marked this pull request as ready for review October 30, 2025 18:41
Comment on lines +58 to +81
toggle_type = models.CharField(
max_length=20,
null=False,
choices=[('install-time', 'install-time'), ('run-time', 'run-time')],
default='run-time',
help_text=_("Details whether a flag is toggle-able at run-time or install-time. (Default: 'run-time')."),
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked in the proposal and I don't see any mention of this. I would like to get better clarity on how this will be used. The help text isn't actually doing it for me.

The default is incomparable with most currently-declared flags in feature_flags.yml.

By this, I mean, let's get together with UI developers who may work on it, like @nixocio, @PabloHiro on API design.

I just really want to get clarity on the meaning of this field, potential invalid states, and things like that. If we talk about this in advance, I do believe that is likely to prevent unnecessary migration work or bugs later on.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally the feature flags would all be runtime, I believe this is just a way to make it possible to have feature flags that can only be toggled at install time

)
description = models.CharField(max_length=500, null=False, default="", help_text=_("A detailed description giving an overview of the feature flag."))
support_url = models.CharField(max_length=250, null=False, default="", blank=True, help_text="A link to the documentation support URL for the feature")
labels = models.JSONField(null=True, default=list, help_text=_("A list of labels for the feature flag."), blank=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you do a field-level validator here to enforce that it's a list of strings? I don't see any validation, and I don't want to deal with validating at any layer higher than this because it's new and this is the easy way.

)
visibility = models.BooleanField(
default=False,
help_text=_("The visibility of the feature flag. If false, flag is hidden."),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hidden to who?

Comment on lines 47 to 68
support_level = models.CharField(
max_length=25,
null=False,
help_text=_("The support criteria for the feature flag. Must be one of (DEVELOPER_PREVIEW or TECHNOLOGY_PREVIEW)."),
choices=(('DEVELOPER_PREVIEW', 'Developer Preview'), ('TECHNOLOGY_PREVIEW', 'Technology Preview')),
blank=False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

editable=False?

blank=False,
)
ui_name = models.CharField(max_length=64, null=False, blank=False, help_text=_("The pretty name to display in the application User Interface"))
condition = models.CharField(max_length=64, default="boolean", help_text=_("Used to specify a condition, which if met, will enable the feature flag."))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want an example. Also, it seems likely we need to set editable=False.

Ok, maybe almost all of these should set editable=False.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zkayyali812 could you help with that?

@fao89 fao89 force-pushed the phase2/feature-flags/poc branch from 3b5558b to 919e3bf Compare November 4, 2025 20:07
@fao89 fao89 force-pushed the phase2/feature-flags/poc branch 3 times, most recently from c2b991f to 59a1f14 Compare November 4, 2025 20:26
fao89 added 2 commits November 6, 2025 14:25
…ests

- Add FEATURE_CASE_INSENSITIVE_AUTH_MAPS_ENABLED flag definition to feature_flags.yaml
- Update claims.py to use the correct flag name with _ENABLED suffix
- Fix test_claims.py to use proper django-flags API (enable_flag/disable_flag)
- Remove deprecated settings manipulation in favor of feature flags API
- All 237 authentication claims tests now pass

Fixes failing tests that were expecting the missing feature flag for
case-insensitive authentication mapping functionality.

Signed-off-by: Fabricio Aguiar <[email protected]>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
This change resolves failing resource sync tests by ensuring the system user
is properly excluded from orphan detection during resource synchronization.

**Problem:**
Resource sync tests were failing because the system user (_system) was being
incorrectly identified as an "orphaned" resource and deleted during sync
operations. The issue was an inconsistency between:

1. The manifest endpoint (views.py) which already excluded system users
2. The orphan detection logic (sync.py) which did not exclude system users

This caused the system user to be absent from manifests but present in
orphan detection, leading to unintended deletion.

**Solution:**
- Added system user exclusion to get_orphan_resources() function
- Used get_system_user() utility for robust system user identification
- Filter by object_id rather than username for more reliable exclusion
- Maintains consistency with existing manifest endpoint behavior

**Impact:**
- Fixes 4 failing tests: test_resource_sync, test_delete_orphans,
  test_resource_sync_update_conflict, test_resource_sync_create_conflict
- Protects system user from being deleted during resource sync operations
- No impact on regular user resource synchronization behavior

Signed-off-by: Fabricio Aguiar <[email protected]>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
@fao89 fao89 force-pushed the phase2/feature-flags/poc branch from 97f0f83 to 1f5731b Compare November 6, 2025 14:25
@github-actions
Copy link

github-actions bot commented Nov 6, 2025

DVCS PR Check Results:

PR appears valid (JIRA key(s) found)

@sonarqubecloud
Copy link

sonarqubecloud bot commented Nov 6, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants