-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prod / dev divergence monitoring #315
Comments
went ahead and stood up a quick vm in GCP and setup the environment (setup instructions were reasonably easy to follow, only minor snag was getting the right version of npm onto my ubuntu machine). Did notice as I went to stand up the charts that the terraform main file was making reference to a branch that may have been cleaned up recently.
I removed the reference and instead pointed to the main branch of the project, hopefully that will be okay, if not its easy to clean up the dev cluster and redeploy using the correct branch. It looks like the main.tf file is looking for the honeycomb_api_key arguement to be replaced by an argument called secrets, however upon replacement of that argument I am encountering the following error. Unfortunately I am not familiar enough with the codebase and when I searched through the dev folder I only found four references to this honeycomb api key. I am not sure what it is being used for exactly past a few opentelementry/jeager references in several files within the .terraform folder. I do know how I am setting that api key is not correct, there isn't any documentation I can see on setting that variable correctly.
The error is referring to the secret = "dummy" key value pair I added
You can view the one line change I made here: |
For the terraform module download issue you ran into hashicorp/terraform#30119. It only effects the latest version of terraform - we had the same problem in our CI. |
As to the secrets - yeah that's a recent refactoring we didn't update yet in dev. Also sorry for that. Will fix. |
Alright - I've made a commit and verified that the following sequence brings up a dev environment for me locally:
Make sure you aren't using the latest terraform as it has a bug when referencing GitHub hosted modules. |
Rgr. I'll go ahead and wipe my changes, downgrade my terraform, and get started with this over the weekend. and I'm using the GCP vm as my local development environment. |
Sorry one (or two) last problem with respects to the deployment of the services portion of the makefile that deploys the galoy charts to k3s cluster. It looks like the services I have the least experience with (lightning, as I run my own node, but I have only started looking at lnd) are not starting up correctly:
for the lnd deamon this looks like a secret that may be predeployed prior to start up since you mentioned that vault was being used but I do not see a vault as part of the deployment?
For the lnd container in the lnd1-0 pod, this looks like it could be because the container above never starts up? I am not sure exactly how that is working yet:
I still am interested in learning how these work, but I should be able to update the monitoring section without getting these specific containers running on my system. I can go ahead and update some of these values. I just may have a hard time verifying that the changes depending on how the galoy specific services are interacting with the btcoind and lnd setup. I went ahead and reviewed the values files and am getting starting with replacing some of the overrides that seem obvious in terms of bringing dev and prod closer to together, but I am going to be more conservative since I am new in terms of contributing. More interestingly - I digress on the I also found helm has added a tpl, which is not particularly useful in this specific instance since Galoy is leveraging the community prom charts: I wonder if it could allow you guys to break out your terraform values into a conf file in cases where the team was writing and maintaining its own helm charts? |
Cool inputs - thanks for the ideas! Please treat the failing LND as out of scope here. lndmon will always fail if lnd is failing and it'll be tricky to debug your setup via this ticket. It shouldn't impact the monitoring related work. |
Okay I had to up the size on my vm to get enough resources to run all the software, it looks like half the galoy-dev-galoy namespaced software is expecting a secret Perhaps another secret handled by vault? |
There is currently no vault in the stack. I may have mentioned it as something we want to do. The secret that is missing gets populated by a container that needs access to values that get generated during startup https://github.com/GaloyMoney/charts/blob/main/charts/lnd/templates/export-secrets-configmap.yaml#L8 it probably didn’t work because of the limited resources. Try again from a clean state with enough space. |
@mjschmidt You can also do |
@krtk6160 this makes a lot of sense because the lnd1-0 is not starting, as a result the export-secrets container can't contact the lnd container inside the pod, which means that export secrets container never sets the secrets for galoy banking services. Was able to verify this in the logs as well
I could probably hack it by editing the lnd cm or not hack it by editing the terraform/values file for lnd sice it all seems to be being set by the lnd.conf conf file set in the lnd1 CM, but I would need to know the key value pair lnd is looking for in terms of configuration. Any ideas? I assume its the key for whatever sets the address manager namespace
|
@krtk6160 sorry to ping you again, was wondering if you could or knew what part of the code I could look at to find the answer to this. It looks like a secret wasn't getting mounted so I went into the bitcoin-values yaml and turned that on. It didn't appear to be a port problem when I was looking at how bitcoind and lnd are suppose to interact over port 18443. |
So it looks like if you're using the terraform you cannot turn secret creation in the bitoin chart on. |
The reason why we have |
right right, I got that, I am just unsure as to why my "lnd1-credentials" secret is not being created. From what I gather off readit I think that my lnd daemon not starting up is what is preventing my lnd container from being able to connect to bitcoin, while the lnd daemon is not starting due to a missing secret. |
I was hoping that turning on the bitcoin "create secret" would be what created the lnd secret as well. |
@mjschmidt it would be interesting to verify wether or not you can run the setup locally with k3d as intended. If it does you have a working example to compare with your remote vm setup - if not then we can more easily help you debug as you'd be running it on a setup we intend to support and can try to reproduce. |
Went ahead and attempted the galoy stack on my local machine (I was worried I would have problems attempting on a windows10) and I got the same error unfortunately that I had with the ubuntu setup with the lnd-credentials kubernetes secret halting the deployment. On the bright side, while it didn't take me very long, it was a good opportunity to get docker setup on my local machine as I rarely use Windows based distributions for development (mostly linux). |
|
@mjschmidt just FYI we've also seen issues bringing up lnd locally since switching images from lncm to the lightning labs curated one. Will let you know when its sorted out. |
The settings in the default monitoring/values.yml are currently somewhat out of date.
Many values are being overridden in production.
Ideally we would like to:
Here are the current production overrides - note that some values are injected via terraform templates (eg
${graphql_playground_url}
) - don't know how best to set defaults for that. At least for dev setup we should probably hard code the values.Another file containing sensitive information is also merged in:
The text was updated successfully, but these errors were encountered: