Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: dev and prod observer configurations #631

Merged
merged 5 commits into from
Sep 24, 2024
Merged

Conversation

outerlook
Copy link
Contributor

@outerlook outerlook commented Sep 24, 2024

Description

  • initial config for observability
  • add docs about the setup
  • add configuration for both development and production sinks

Related Problem

How Has This Been Tested?

  1. Run the development setup:
    task observer-dev
  2. Open Grafana at http://localhost:3000 and login with admin/admin
  3. Add Prometheus as data source (http://prometheus:9090)
  4. Check that you can see metrics in Grafana

Advanced visualization will be handled with monitoring tools; for this PR having the metrics in Grafana is enough

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features

    • Introduced a new observer-dev task for setting up development services using Docker Compose.
    • Added a new configuration file for Prometheus to scrape metrics from the Vector service.
    • Implemented a Docker Compose configuration for the Observer service in a production environment, including the Vector service.
  • Documentation

    • Added comprehensive documentation for the Observer project, detailing setup and configuration.
  • Configuration

    • Introduced new configuration files for Vector and Prometheus to support metrics collection and transmission.

Introduce Docker Compose setups for Observer in both development and production environments. The configurations include services for Vector, Prometheus, and Grafana, with Prometheus scraping Vector metrics and Grafana set up for visualization. The dev environment uses a local Prometheus exporter, while the prod setup is configured to send metrics to Datadog.
Introduce Observer's configurations for both development and production. Development setup uses Docker Compose to run Vector, Prometheus, and Grafana, while production sends metrics directly to Datadog. Updated Taskfile to include a command for running the development setup.
@outerlook outerlook self-assigned this Sep 24, 2024
Copy link

coderabbitai bot commented Sep 24, 2024

Walkthrough

A new task, observer-dev, has been added to the Taskfile.yml to facilitate the development of observation services using Docker Compose. This includes the introduction of several configuration files for both development and production environments, specifically for Vector and Prometheus. The changes aim to enhance the monitoring setup by defining services, configurations, and metrics collection for effective observability.

Changes

Files Change Summary
Taskfile.yml Added a new task observer-dev to run observer development services using Docker Compose.
deployments/observer/dev-prometheus.yml Added a Prometheus configuration file for scraping metrics from the Vector service.
deployments/observer/observer-compose.yml Introduced a production Docker Compose file for the Vector service, including necessary environment variables for Datadog integration.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Taskfile
    participant DockerCompose
    participant Vector
    participant Prometheus

    User->>Taskfile: Run observer-dev
    Taskfile->>DockerCompose: Start services
    DockerCompose->>Vector: Launch Vector service
    DockerCompose->>Prometheus: Launch Prometheus service
    Vector->>Prometheus: Export metrics
Loading

Assessment against linked issues

Objective Addressed Explanation
Implement collection of host metrics (CPU, Memory, Disk, Network usage, Uptime) using Vector's host_metrics source (571) The changes do not implement host metrics collection as specified.

🐇 In the meadow, where metrics flow,
New tasks and configs help us grow.
With Vector and Prometheus in sight,
Our monitoring's ready, oh what a delight!
Let's hop to the code, let the metrics show,
For observability, we now have the glow! 🌟


Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 1a074f2 and 891c85a.

Files selected for processing (3)
  • Taskfile.yml (1 hunks)
  • deployments/observer/dev-prometheus.yml (1 hunks)
  • deployments/observer/observer-compose.yml (1 hunks)
Files skipped from review as they are similar to previous changes (3)
  • Taskfile.yml
  • deployments/observer/dev-prometheus.yml
  • deployments/observer/observer-compose.yml

Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    -- I pushed a fix in commit <commit_id>, please review it.
    -- Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    -- @coderabbitai generate unit testing code for this file.
    -- @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    -- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    -- @coderabbitai read src/utils.ts and generate unit testing code.
    -- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    -- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@outerlook outerlook requested a review from MicBun September 24, 2024 16:52
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Outside diff range and nitpick comments (10)
deployments/observer/vector-dev-destination.yml (1)

9-9: Add a newline at the end of the file

To adhere to common coding standards and prevent potential issues with some tools, please add a newline character at the end of the file.

Apply this change to add a newline at the end of the file:

 default_namespace: dev-observer
 flush_period_secs: 60
+
Tools
yamllint

[error] 9-9: no new line character at the end of file

(new-line-at-end-of-file)

deployments/observer/vector-prod-destination.yml (2)

1-9: Consider adding more detailed comments for each configuration option.

While the current comment is good, adding more detailed comments for each configuration option would improve maintainability and make it easier for other team members to understand and modify the configuration if needed.

Here's a suggestion for more detailed comments:

# Production destination for Vector, sends metrics to Datadog
sinks:
  datadog-destination:
    type: datadog_metrics  # Specifies the sink type for Datadog metrics
    inputs:
      - hostmetrics  # Source of the metrics to be sent to Datadog
    default_api_key: ${DATADOG_API_KEY}  # API key for authenticating with Datadog
    default_namespace: ${DATADOG_NAMESPACE}  # Namespace for organizing metrics in Datadog
    endpoint: ${DATADOG_ENDPOINT}  # Datadog API endpoint for sending metrics

1-9: Consider implementing error handling and retry logic.

Depending on the criticality of these metrics, you might want to implement error handling and retry logic to ensure data is not lost in case of temporary network issues or Datadog service disruptions.

Vector provides options for retry logic and error handling. Consider adding the following to your configuration:

sinks:
  datadog-destination:
    # ... existing configuration ...
    request:
      retry_attempts: 5
      retry_initial_backoff_secs: 1
      retry_max_duration_secs: 60
    healthcheck:
      enabled: true

This will attempt to retry failed requests up to 5 times with an initial backoff of 1 second, and enable healthchecks to ensure the sink is functioning correctly.

deployments/observer/vector-sources.yml (2)

4-6: LGTM: Appropriate source configuration for host metrics

The 'hostmetrics' source is correctly configured with the 'host_metrics' type, which aligns with Vector's documentation for collecting system-wide metrics. The included documentation link is helpful for future reference.

Consider adding a brief comment explaining the purpose of this specific source, e.g., "Collects various system-level metrics from the host machine."


7-14: LGTM: Comprehensive set of collectors enabled

The configuration enables a good set of essential system metric collectors (cpu, disk, filesystem, load, memory, and network). The exclusion of the 'cgroups' collector with an explanation shows thoughtful configuration.

Consider the following minor improvements:

  1. Update the comment on line 7 to explicitly list all default collectors for clarity.
  2. Add a brief explanation for why each collector is included, especially if this configuration differs from the default.

Example:

collectors: # defaults: [cpu, disk, filesystem, load, host, memory, network, cgroups]
  - cpu      # CPU usage metrics
  - disk     # Disk I/O metrics
  - filesystem # Filesystem usage metrics
  - load     # System load metrics
  - memory   # Memory usage metrics
  - network  # Network traffic metrics
  # - cgroups # Excluded: not needed for discriminated metrics in this context
deployments/observer/observer-compose.yml (1)

11-14: LGTM: Proper use of environment variables for Datadog integration.

The environment variables are correctly defined for Datadog integration, and the use of the ${VAR?} syntax ensures that the compose will fail if these required variables are not set, which is a good practice.

Consider adding documentation on how these environment variables should be managed securely. For example, you could create a .env.example file with placeholder values and instructions on how to set up the actual .env file:

# Create a .env.example file
cat << EOF > deployments/observer/.env.example
DATADOG_API_KEY=your_datadog_api_key_here
DATADOG_NAMESPACE=your_datadog_namespace_here
DATADOG_ENDPOINT=https://api.datadoghq.com
EOF

echo "Created .env.example file with placeholder values."

Then, update the README or a separate documentation file to explain how to use this example file to set up the actual environment variables securely.

deployments/observer/README.md (2)

1-25: LGTM! Consider adding a security note about changing the Grafana password.

The introduction and development setup sections are well-structured and provide clear instructions. The information aligns with the PR objectives and includes essential details about components and ports.

Consider adding a note recommending users to change the default Grafana password after initial setup for improved security. You could add this after line 24:

 - Grafana: 3000 (default admin password: `admin`)
+
+> **Note:** For security reasons, it's recommended to change the default Grafana password after initial setup.

26-34: LGTM! Consider adding instructions for production setup.

The production setup section provides essential information about using Vector with Datadog and lists the required environment variables. This aligns well with the PR objectives and the linked issue #572.

To improve the documentation, consider adding instructions on how to set up and run the production configuration. You could add this after line 34:

### Setup

1. Ensure the required environment variables are set.
2. Run the following command to start the production Observer:

   ```bash
   task observer-prod

(Replace observer-prod with the actual task name if different)

  1. Verify that metrics are being sent to Datadog by checking your Datadog dashboard.

This addition would provide users with a clear path to setting up the production environment.

<details>
<summary>Tools</summary>

<details>
<summary>LanguageTool</summary><blockquote>

[uncategorized] ~32-~32: Loose punctuation mark.
Context: ...vironment Variables  - `DATADOG_API_KEY`: Datadog API key - `DATADOG_NAMESPACE`: ...

(UNLIKELY_OPENING_PUNCTUATION)

</blockquote></details>

</details>

</blockquote></details>
<details>
<summary>deployments/observer/dev-observer-compose.yml (2)</summary><blockquote>

`1-4`: **Add Docker Compose version and consider adding a newline at the end of the file**

The overall structure of the Docker Compose file is correct and the comments provide clear context. However, consider the following improvements:

1. Add a version specification at the beginning of the file. This is recommended for Docker Compose files to ensure compatibility. For example:
   ```yaml
   version: '3.8'
  1. Add a newline character at the end of the file to comply with YAML best practices.

Also applies to: 30-31


5-10: Consider using a specific version tag for the Vector image

The Vector service configuration looks good overall. The configuration files are correctly mounted, and the command is set up properly. However, there's one suggestion for improvement:

Instead of using the latest-alpine tag for the Vector image, consider using a specific version tag. This ensures consistency across different environments and makes it easier to track and reproduce builds. For example:

image: timberio/vector:0.22.0-alpine

Replace 0.22.0 with the desired version number.

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between bf12535 and 1a074f2.

Files selected for processing (8)
  • Taskfile.yml (1 hunks)
  • deployments/observer/README.md (1 hunks)
  • deployments/observer/dev-observer-compose.yml (1 hunks)
  • deployments/observer/dev-prometheus.yml (1 hunks)
  • deployments/observer/observer-compose.yml (1 hunks)
  • deployments/observer/vector-dev-destination.yml (1 hunks)
  • deployments/observer/vector-prod-destination.yml (1 hunks)
  • deployments/observer/vector-sources.yml (1 hunks)
Additional context used
LanguageTool
deployments/observer/README.md

[uncategorized] ~32-~32: Loose punctuation mark.
Context: ...vironment Variables - DATADOG_API_KEY: Datadog API key - DATADOG_NAMESPACE: ...

(UNLIKELY_OPENING_PUNCTUATION)

yamllint
deployments/observer/dev-observer-compose.yml

[error] 31-31: no new line character at the end of file

(new-line-at-end-of-file)

deployments/observer/dev-prometheus.yml

[error] 8-8: no new line character at the end of file

(new-line-at-end-of-file)

deployments/observer/vector-dev-destination.yml

[error] 9-9: no new line character at the end of file

(new-line-at-end-of-file)

Additional comments not posted (15)
deployments/observer/dev-prometheus.yml (2)

1-8: LGTM! The Prometheus configuration looks good.

The configuration is well-structured and correctly set up to scrape metrics from the vector service. The global scrape interval of 15 seconds is a reasonable default, and the job configuration for the vector service is appropriately defined.

Tools
yamllint

[error] 8-8: no new line character at the end of file

(new-line-at-end-of-file)


1-8: Verify complete observability setup, including Vector configuration.

While this Prometheus configuration is correct and necessary for the observability stack, it doesn't directly address the Vector setup for host log collection mentioned in the PR objectives (issue #572). Ensure that the Vector configuration for collecting and tagging host logs from multiple hosts is also included in this PR or a separate one.

To verify the complete observability setup:

  1. Check for Vector configuration files:

  2. If Vector configuration files exist, verify they include host log collection setup:

  3. Confirm the presence of Grafana configuration:

Please ensure all components (Prometheus, Vector, and Grafana) are properly configured to achieve the complete observability setup as outlined in the PR objectives.

Tools
yamllint

[error] 8-8: no new line character at the end of file

(new-line-at-end-of-file)

deployments/observer/vector-dev-destination.yml (3)

1-9: LGTM: Well-structured configuration file

The overall structure of the configuration file is clear and well-organized. The comment at the beginning provides useful context for the purpose of this file.

Tools
yamllint

[error] 9-9: no new line character at the end of file

(new-line-at-end-of-file)


2-6: Verify the input source for the Prometheus sink

The sink configuration for Prometheus looks good. However, please confirm that hostmetrics is the correct and intended input source for this sink. If there are other metrics you want to export to Prometheus, you may need to add them to the inputs array.


7-9: Review Prometheus exporter settings for security and performance

The Prometheus exporter settings look generally good, but consider the following points:

  1. Security: The exporter is set to listen on all interfaces (0.0.0.0). For a development environment, this might be fine, but ensure that appropriate network security measures are in place.

  2. Performance: The flush_period_secs is set to 60 seconds. Depending on your monitoring requirements, you might want to adjust this value. A shorter period will provide more frequent updates but increase network traffic and load on Prometheus.

  3. Namespace: The default_namespace is set to "dev-observer", which is good for distinguishing metrics in a development environment. Ensure this aligns with your naming conventions and monitoring strategy.

To verify the security of the exporter configuration, you can run the following command to check if the port is accessible only from the intended networks:

If the port is accessible, consider restricting it to specific IP ranges or using a reverse proxy for added security.

Tools
yamllint

[error] 9-9: no new line character at the end of file

(new-line-at-end-of-file)

deployments/observer/vector-prod-destination.yml (1)

1-9: LGTM! The Vector configuration for Datadog metrics looks good.

The sink configuration for sending metrics to Datadog is correctly structured and follows best practices by using environment variables for sensitive information.

deployments/observer/vector-sources.yml (2)

1-16: LGTM: Well-structured YAML configuration

The overall structure and syntax of the YAML file are correct. The comments at the beginning provide useful context about the file's purpose and intended usage for both development and production environments.


1-16: Verify alignment with PR objectives and linked issue

The current configuration sets up host metrics collection, which aligns with the PR summary's goal of making metrics accessible in Grafana. However, there seems to be a discrepancy with the linked issue #572, which mentions setting up Vector for collecting host logs from multiple hosts.

Please clarify:

  1. Is log collection intended to be part of this PR, or will it be addressed separately?
  2. If log collection is part of this PR, consider adding a log collection source to this configuration file.

Example log collection source (if needed):

  host_logs:
    type: file
    include:
      - /var/log/**/*.log
    exclude:
      - /var/log/excluded_directory/**/*.log

If log collection is to be implemented:

  1. Ensure proper tagging of logs for identification across multiple hosts.
  2. Consider separating metrics and logs configurations if they become too complex to manage in a single file.
deployments/observer/observer-compose.yml (3)

1-3: LGTM: Clear file header and purpose.

The file header and comment effectively communicate the purpose of this configuration file for the production Observer service.


8-10: LGTM: Proper volume mounts for configuration files.

The volume mounts correctly map the local configuration files to their expected locations in the container. The separation of destination and source configurations is a good practice.

To ensure the referenced configuration files exist and are properly formatted, you can run the following commands:

#!/bin/bash
# Check if the configuration files exist and are valid YAML
for file in deployments/observer/vector-prod-destination.yml deployments/observer/vector-sources.yml; do
    if [ -f "$file" ]; then
        echo "File $file exists."
        if yamllint "$file"; then
            echo "File $file is valid YAML."
        else
            echo "File $file is not valid YAML."
        fi
    else
        echo "File $file does not exist."
    fi
done

7-7: LGTM: Flexible configuration setup, but be cautious with wildcards.

The command configuration allows for multiple configuration files, which provides flexibility. However, be aware that using wildcards could potentially lead to unexpected configurations if unintended files are present in the directory.

To ensure only intended configuration files are present, you can run the following command:

deployments/observer/README.md (1)

1-34: Great job on the README! It effectively covers the Observer setup.

This README file successfully addresses the PR objectives by providing clear instructions for both development and production environments. It aligns well with the goal of establishing initial configurations for observability and addresses the setup of Vector for collecting logs, as mentioned in the linked issue #572.

The document is well-structured, concise, and informative. It covers the essential components (Vector, Prometheus, Grafana) and provides necessary details for setup and configuration.

With the suggested minor improvements (security note for Grafana password and additional production setup instructions), this README will serve as an excellent guide for users setting up the Observer in both development and production environments.

Tools
LanguageTool

[uncategorized] ~32-~32: Loose punctuation mark.
Context: ...vironment Variables - DATADOG_API_KEY: Datadog API key - DATADOG_NAMESPACE: ...

(UNLIKELY_OPENING_PUNCTUATION)

deployments/observer/dev-observer-compose.yml (2)

12-19: Prometheus service configuration looks good

The Prometheus service is well-configured:

  • A specific version tag (v2.30.3) is used for the image, which is a good practice.
  • The configuration file is correctly mounted.
  • The command for specifying the config file is correct.
  • The port mapping (9090) is properly set up.

21-28: Grafana service configuration is good, but consider password security

The Grafana service configuration looks good overall:

  • A specific version tag (8.2.2) is used for the image.
  • The port mapping (3000) is correctly set up.
  • A named volume is used for persistent storage, which is a good practice.

However, note that setting the admin password via an environment variable is not secure for production environments. For the development environment, it's acceptable, but consider using a more secure method (like secrets management) when moving to production.

To ensure this is indeed a development-only configuration, let's check for any production configurations:

Taskfile.yml (1)

53-56: Overall approval of Taskfile changes

The addition of the observer-dev task to the Taskfile is well-structured and consistent with the existing task definitions. Its placement in the file is logical, and it doesn't introduce any conflicts with other tasks.

The changes to the Taskfile are approved, pending the clarifications and improvements suggested in the previous comments.

@outerlook outerlook merged commit e88a116 into main Sep 24, 2024
6 checks passed
@outerlook outerlook deleted the feat/vector-host branch September 24, 2024 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Problem: Vector host metrics collection not implemented
2 participants