Reevaluate how we reconnect VMs from other Virtual Centers #553

agrare · 2020-03-05T21:28:51Z

Overview

Currently the VMware provider has the ability to "reconnect" virtual machines which have archived or orphaned records in VMDB and are registered as new VMs to a VMware provider that we are monitoring.

There are a number of scenarios where this is helpful:

You "Remove From Inventory" a virtual machine accidentally
You want to move a VM from one active vCenter to another
You want to upgrade your vCenter by adding a completely new one and moving hosts over then deleting the old one

Registering a VM can be done by browsing the Datastore, selecting the .vmx file and "Register" the file. This will create a new VirtualMachine with a new ManagedObjectReference but it will have the same summary.config.uuid which comes from the uuid.bios property of the vmx file.

In all of these cases we would have an archived vm record in our database, and the next refresh would pick up the new VM because when saving VMs we first build an index of all vms by uid_ems. We then try to find any existing archived VMs with the same uid_ems to reconnect before we create a new VM.

Benefits

Reconnecting an existing VM record allows you to keep all of the associated records that go along with that VM, this includes events, metrics, SSA, tags for automate, links to what it was provisioned from, etc... come over with the new VM.

Problems

The problem is that this is way more expensive than simply using the equivalent VmOrTemplate.find_by(:ems_id => ems.id, :ems_ref => ems_ref) and creating a new VM if that is nil. We are doing queries on the whole VMs table (not even scoped to the EMS) every single time we do a refresh.

It is also error prone because it has to ensure that VMs aren't being stolen from other active providers so it only considers vms with a nil ems_id, most people don't delete the old EMS when they upgrade, so the vms aren't archived yet and we end up creating duplicates anyway.

This is also complicated by the fact that it is extremely common to find multiple active VMs with duplicate BIOS UUIDs on the same vCenter. Since the bios uuid is written to the VMX file if you copy the VM directory and register the VM you can create a new VM with a duplicate UUID. VMware knows this and when you register a VM it asks you if you "Moved or Copied" the VM. What they're really asking is "should I generate a new bios uuid for this VM or use the existing one". If you answer this question wrong boom duplicate UUID.

Alternatives

VMs are removed and re-registered pretty infrequently, and most of the time it is planned (e.g. the vSphere upgrade case). Our approach to reconnecting VMs happens every single refresh.

An alternative to this approach could be to replace this lookup with a helper script or other operation.

We could go ahead and use the standard mechanism to find or create records based off of the ems_id and the ems_ref but allow users to "reconnect" vms when they know that something is wrong or that they planned on migrating VMs to another provider.

We have an existing script to help people do this which finds VMs with the same uid_ems where one VM is archived and the other is active and the archived vm is older: tools/reconnect_vms.rb as a result of customers doing this upgrade dance incorrectly and creating lots of duplicates instead of reconnecting vms.

We could even bake this into the product more by allowing users to search for re-connectable VMs through the UI/API either on a specific VM or on an entire EMS.

The text was updated successfully, but these errors were encountered:

miq-bot · 2023-03-06T00:07:26Z

This issue has been automatically marked as stale because it has not been updated for at least 3 months.

If you can still reproduce this issue on the current release or on master, please reply with all of the information you have about it in order to keep the issue open.

Thank you for all your contributions! More information about the ManageIQ triage process can be found in the triage process documentation.

agrare added the redesign label Mar 5, 2020

gtanzillo assigned agrare Mar 12, 2020

miq-bot added the stale label Mar 6, 2023

agrare added pinned and removed stale labels Mar 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reevaluate how we reconnect VMs from other Virtual Centers #553

Reevaluate how we reconnect VMs from other Virtual Centers #553

agrare commented Mar 5, 2020

miq-bot commented Mar 6, 2023

Reevaluate how we reconnect VMs from other Virtual Centers #553

Reevaluate how we reconnect VMs from other Virtual Centers #553

Comments

agrare commented Mar 5, 2020

Overview

Benefits

Problems

Alternatives

miq-bot commented Mar 6, 2023