Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

platform 3251: Add support for protected namespaces and enhance scali… #4

Conversation

Stevesibilia
Copy link
Contributor

@Stevesibilia Stevesibilia commented Dec 5, 2024

User description

…ng functionality

  • Introduced filtering for protected namespaces in working-hours script.
  • Added functions to scale deployments and statefulsets based on label selectors.
  • Updated Terraform configuration to include protected namespaces and relevant label selectors.
  • Enhanced error handling and warnings for filtered namespaces.

PR Type

Enhancement


Description

  • Added comprehensive support for protected namespaces to prevent scaling in sensitive environments
  • Introduced statefulset scaling capabilities alongside existing deployment scaling
  • Enhanced resource scaling with a new generic scale_resources function
  • Added new label selectors for fine-grained control over namespace-wide scaling
  • Improved error handling and warnings for namespace filtering
  • Changed role binding from namespace-scoped to cluster-wide for better permission management

Changes walkthrough 📝

Relevant files
Enhancement
main.tf
Enhanced cluster role and config map for statefulset support

main.tf

  • Added support for statefulsets in cluster role permissions
  • Changed role binding from namespace-scoped to cluster-wide
  • Added protected namespaces and label selectors to config map
  • +10/-8   
    working-hours.sh
    Enhanced scaling script with protected namespaces support

    files/working-hours.sh

  • Added filter_namespaces function to handle protected namespaces
  • Added scale_resources function for generic resource scaling
  • Enhanced namespace filtering and resource scaling logic
  • Added support for statefulset scaling
  • +117/-11
    Configuration changes
    variables.tf
    Added protected namespaces and label selector variables   

    variables.tf

  • Added variables for protected namespaces management
  • Added new label selectors for statefulsets and namespace-wide scaling
  • Added documentation for working hours related variables
  • +37/-5   

    💡 PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

    …ng functionality
    
    - Introduced filtering for protected namespaces in working-hours script.
    - Added functions to scale deployments and statefulsets based on label selectors.
    - Updated Terraform configuration to include protected namespaces and relevant label selectors.
    - Enhanced error handling and warnings for filtered namespaces.
    @sparkfabrik-ai-bot
    Copy link

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
    🧪 No relevant tests
    🔒 Security concerns

    Excessive Permissions:
    The change from namespace-scoped role binding to cluster-wide role binding (kubernetes_cluster_role_binding_v1) grants permissions across all namespaces, which could potentially allow unauthorized access to sensitive resources if the service account is compromised. Consider maintaining namespace-scoped permissions where possible.

    ⚡ Recommended focus areas for review

    Error Handling
    The scale_resources function returns on first error, which could leave the system in an inconsistent state if some resources were scaled but others failed. Consider implementing better error handling or rollback mechanism.

    Resource Race
    The script processes the same namespaces twice if they appear in both NAMESPACES and LABELLED_NAMESPACES_ALL, which could lead to race conditions or unnecessary scaling operations.

    Permission Scope
    The change from namespace-scoped to cluster-wide role binding might grant more permissions than necessary, potentially violating the principle of least privilege.

    Input Validation
    The filter_namespaces function doesn't validate input parameters for empty or malformed values, which could lead to unexpected behavior.

    @sparkfabrik-ai-bot
    Copy link

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Score
    Best practice
    Add error handling for Kubernetes cluster connectivity before performing operations

    Add error handling for the kubectl commands in the filter_namespaces function to
    handle cases where the cluster is unreachable or authentication fails.

    files/working-hours.sh [25-30]

    +if ! kubectl cluster-info > /dev/null 2>&1; then
    +    echo "Error: Unable to connect to Kubernetes cluster" >&2
    +    exit 1
    +fi
     for protected_ns in $(echo "${protected_namespaces}" | tr "," " "); do
         if echo "${filtered_namespaces}" | grep -q "${protected_ns}"; then
             filtered_out="${filtered_out}${protected_ns},"
             filtered_namespaces=$(echo "${filtered_namespaces}" | sed "s/${protected_ns}//g" | sed "s/,,/,/g" | sed "s/^,//" | sed "s/,$//")
         fi
     done
    Suggestion importance[1-10]: 8

    Why: Adding cluster connectivity checks is crucial for preventing script failures and providing clear error messages when the Kubernetes cluster is unreachable, improving reliability and troubleshooting.

    8
    Possible issue
    Validate numeric input parameters to prevent invalid scaling operations

    Add input validation for the GO_TO_REPLICAS variable to ensure it's a positive
    integer to prevent scaling to invalid values.

    files/working-hours.sh [10]

     GO_TO_REPLICAS=${GO_TO_REPLICAS:-1}
    +if ! [[ "$GO_TO_REPLICAS" =~ ^[0-9]+$ ]] || [ "$GO_TO_REPLICAS" -lt 0 ]; then
    +    echo "Error: GO_TO_REPLICAS must be a positive integer" >&2
    +    exit 1
    +fi
    Suggestion importance[1-10]: 7

    Why: Input validation for GO_TO_REPLICAS is important to prevent potential errors from invalid scaling values that could disrupt service availability.

    7
    Enhancement
    Add timeouts to prevent script from hanging on unresponsive operations

    Add timeout to kubectl commands in scale_resources function to prevent hanging in
    case of API server issues.

    files/working-hours.sh [78-81]

    -kubectl -n "${namespace}" scale --replicas "${target_replicas}" "${resource_type}" "${resource}"
    +if ! timeout 30s kubectl -n "${namespace}" scale --replicas "${target_replicas}" "${resource_type}" "${resource}"; then
    +    echo "Error: Scaling operation timed out for ${resource_type} ${resource}" >&2
    +    return 1
    +fi
    Suggestion importance[1-10]: 6

    Why: Adding timeouts to kubectl commands prevents the script from hanging indefinitely during API server issues, improving script reliability and error handling.

    6
    Security
    Prevent accidental deletion of critical system configurations

    Add a lifecycle block to prevent accidental deletion of protected namespaces
    configuration, as these are critical system resources.

    main.tf [150-157]

     resource "kubernetes_config_map_v1" "app_env" {
    +  lifecycle {
    +    prevent_destroy = true
    +  }
       data = {
         "PROTECTED_NAMESPACES" : join(",", local.protected_namespaces),
    Suggestion importance[1-10]: 5

    Why: Adding lifecycle blocks to prevent accidental deletion is a good practice, but the config map itself is not as critical as the suggestion implies since it can be recreated from the protected_namespaces variable.

    5

    @Stevesibilia Stevesibilia requested a review from Monska85 December 5, 2024 16:32
    CHANGELOG.md Outdated Show resolved Hide resolved
    files/working-hours.sh Outdated Show resolved Hide resolved
    files/working-hours.sh Outdated Show resolved Hide resolved
    Copy link
    Contributor

    @Monska85 Monska85 left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    LGTM 👍

    @Stevesibilia Stevesibilia merged commit df239cf into main Dec 6, 2024
    1 check passed
    @Stevesibilia Stevesibilia deleted the 3251-add-namespace-wide-scale-down-and-statefulset-support-to-application-sleep-cycles-terraform branch December 6, 2024 11:39
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    2 participants