Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: non-deterministic device name? #75

Open
rsyring opened this issue Mar 2, 2022 · 7 comments
Open

Discussion: non-deterministic device name? #75

rsyring opened this issue Mar 2, 2022 · 7 comments
Labels
question Further information is requested upstream Issues with the provider that are caused by issues in the Tailscale API

Comments

@rsyring
Copy link

rsyring commented Mar 2, 2022

I'm not exactly sure where this belongs or what I'm asking for, but figured I should note it somewhere. I've noticed in my efforts to get Tailscale installed automatically on a host and then use that device for further work in Terraform, that the device name used by Tailscale is not guaranteed.

In Terraform, I might want to create a server "enterprise" in AWS, set it's hostname as "enterprise", install & authorize Tailscale as part of the instance's first-boot configuration, and then wait_for (#72) the device so I can use it's IP to setup a DNS record for that host.

But, if there is already a device in Tailscale named "enterprise" then it looks like Tailscale will create the device as "enterprise1" and happily move on. That obviously breaks things if then use tailscale_device with name as enterprise.example.com.

I've not yet tried to see Tailscale's behavior is any different using tailscale up --hostname. Also, at least in my case, the impact of this is lessened if deletions (#68) become supported.

@rsyring rsyring added the enhancement New feature or request label Mar 2, 2022
@rsyring
Copy link
Author

rsyring commented Mar 2, 2022

Could be related if the resolution to this is a stable/deterministic ID for a host that can be used in Terraform: tailscale/tailscale#1532

@davidsbond
Copy link
Contributor

Would you be better supported if the terraform provider allowed you to query devices based on an alternative field than name? The provider could allow you to search based on hostname so you get more deterministic behaviour when you use tailscale up --hostname?

@rsyring
Copy link
Author

rsyring commented Mar 2, 2022

@davidsbond thanks again for the time you are taking to discuss all of this. Since we don't have a real provider that is creating the device and, under the hood, returning that device's information from the create API call, we have to lookup the device somehow, some way in a GET API call. That, obviously, requires an identifier.

I think the only way that a different way to lookup the device, i.e. a field other than name, would be if tailscale up gave an option to accept an identifier and used it to create the device. That identifier would then need to be exposed through the API.

As far as I can tell, giving some other field just doesn't matter b/c that value is not known ahead of time and can't be used to identify the device when using tailcale_device.

@davidsbond
Copy link
Contributor

davidsbond commented Mar 2, 2022

@davidsbond thanks again for the time you are taking to discuss all of this.

No problem, thanks for being a user and helping improve the provider.

Since we don't have a real provider that is creating the device and, under the hood, returning that device's information from the create API call, we have to lookup the device somehow, some way in a GET API call. That, obviously, requires an identifier.

Can the value to the --hostname flag not be that identifier currently? The documentation for the flag currently states:

hostname to use instead of the one provided by the OS

So in theory, you can provide anything that could be a valid hostname in this field and it will accept it. So you could obtain the specific device using the hostname you provide the tailscale up command if the tailscale_device data source is updated to allow you to query on that field.

As far as I can tell, giving some other field just doesn't matter b/c that value is not known ahead of time and can't be used to identify the device when using tailcale_device.

Why can't it be known ahead of time if you can provide it to the tailscale up command? IIRC you mentioned previously about AWS instances where you were providing a bootstrapping script. I'm more familiar with GCP VMs, but they provide functionality that allows you to pass metadata into the VM which you can access from scripts. Assuming the same is possible with AWS instances, couldn't you do the following:

  • Use a variable in your terraform configuration for the hostname
  • Use said variable in your terraform configuration that creates the aws_instance, providing it as metadata or an environment variable (or however AWS lets you do it)
  • Obtain that metadata within your bootstrapping script which calls :
    tailscale up --hostname ${TAILSCALE_HOSTNAME_FROM_SOMEWHERE}
  • Use a theoretical hostname field with the original hostname variable on the tailscale_device data source to look it up

This way, you have greater control and could avoid conflicting hostnames if all your devices are terraform-managed. I'm not familiar with your infrastructure, but you could go as far to use a random_string resource to generate a unique hostname per device?

@rsyring
Copy link
Author

rsyring commented Mar 3, 2022

So you could obtain the specific device using the hostname you provide the tailscale up command if the tailscale_device data source is updated to allow you to query on that field.

Isn't that what we do now? The hostname becomes the host part of the device name, which is how tailscale_device works. That all works fine and the process you described above to bootstrap a server is very similar to what I'm doing.

could avoid conflicting hostnames

This is really the key point. Yes, I can do that now. But it means the device's name in Tailscale now has some random value appended to it, which works but is ugly. And, on the off chance something weird happens and you still have a conflicting hostname, there is nothing to indicate this. The failure mode is silent, which I really don't like.

To be clear, this is the problem:

# Host 1
$ tailscale status
...snip...
100.115.118.27  zinc                 randy.syring@ linux   -

Now, let's say I don't realize that name is used and I run a cloud init script based on my Terraform scripts to run the equivalent of:

# Host 2
$ tailscale up --hostname zinc --auth-key ...
Success.
$ tailscale status
100.94.30.82    zinc-1               randy.syring@ linux   -
...snip...
100.115.118.27  zinc                 randy.syring@ linux   -

I have something like this in Terraform:

data "tailscale_device" "this" {
  name = "zinc.example.com"
  wait_for = 90
}

This will pull data from the first zinc host, the IP would be 100.115.118.27 (the previously existing device) not 100.94.300.82 (which we just created with Terraform). There will be no error. The error will come later as I try to use the IP address, in a DNS record for example, and then can't figure out why the server I'm trying to connect to through Tailscale doesn't seem to be the server I want.

This isn't really a problem with this provider, it's just a silent "footgun" and I didn't want to keep this edge case to myself.

@davidsbond davidsbond added question Further information is requested upstream Issues with the provider that are caused by issues in the Tailscale API and removed enhancement New feature or request labels Mar 3, 2022
@davidsbond
Copy link
Contributor

Thanks for all the detail, without having some random string appended to the hostname I'm not sure any other workarounds are possible. Let's see how the team at Tailscale are regarding the issue you've raised. I'm happy to add any additional support that's within my capacity.

Unfortunately being a terraform provider makes resources etc only as available as the API we call. However, I can imagine that using Terraform/Pulumi will be important for other Tailscale users, so perhaps this use-case will be true for others. I only have very basic statistics that the Terraform registry provides regarding usage, but we have had 7k downloads of the provider at this point, which is perhaps a significant enough number to consider better support.

@mindreader
Copy link

Tailscale has been working well for me, but this one issue is a real bummer when it comes to trying to use it at scale.

When I delete a k8s cluster and recreate it without remembering to go into the tailscale UI and manually delete each and every node, everything seems to work but it gets the ips of the old machines and then there are hard to debug connectivity issues later in the process.

Certainly if a new device pops up with a hostname that conflicts with another device with the same hostname that is ephemeral and in an unconnected state, it makes sense to delete the old device on tailscale's side immediately? It was going to be deleted automatically at some point, anyways.

If that makes people nervous an option could be added to tailscale_tailnet_key to perhaps on_ephemeral_hostname_conflict_remake or some such to make such behavior explicit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested upstream Issues with the provider that are caused by issues in the Tailscale API
Projects
None yet
Development

No branches or pull requests

3 participants