Skip to content

Add cold migration to osism manage compute migrate #1287

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

janhorstmann
Copy link
Contributor

Automatically cold migrate instances in SHUTOFF state when issueing osism manage compute migrate command.
Cold migration will be automatically confirmed if the command is waiting for the migration to finish. A warning has been added to the description of the --no-wait parameter.
A --no-cold-migration parameter is added to allow to skip cold migration and thus allow to fallback to the previous behaviour.

Part of osism/issues#1248

Automatically cold migrate instances in SHUTOFF state when issueing
`osism manage compute migrate` command.
Cold migration will be automatically confirmed if the command is waiting
for the migration to finish. A warning has been added to the description
of the `--no-wait` parameter.
A `--no-cold-migration` parameter is added to allow to skip cold
migration and thus allow to fallback to the previous behaviour.

Part of osism/issues#1248

Signed-off-by: Jan Horstmann <[email protected]>
Comment on lines 431 to +444
if not no_wait:
inner_wait = True
while inner_wait:
while True:
time.sleep(2)
s = conn.compute.get_server(server[0])
if s.status in ["MIGRATING"]:
logger.info(
f"Live migration of {server[0]} ({server[1]}) is still in progress"
f"{migration_type.capitalize()} migration of {server[0]} ({server[1]}) is still in progress"
)
inner_wait = True
else:
inner_wait = False
if migration_type == "cold" and s.status in [
"VERIFY_RESIZE"
]:
conn.compute.confirm_server_resize(s)
break

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop used for checking the migration status of servers (lines 431-444) can potentially lead to performance issues due to the continuous polling with a fixed sleep interval. This approach can cause unnecessary delays and resource usage, especially if the migration operation is lengthy.

Recommendation:
Consider implementing a more efficient waiting mechanism, possibly using event-driven or callback mechanisms provided by the openstack SDK, if available. This would reduce the CPU cycles wasted on polling and could provide a more responsive and resource-efficient solution.

Comment on lines 402 to 447
for server in result:
if server[2] not in ["ACTIVE", "PAUSED"]:
if server[2] in ["ACTIVE", "PAUSED"]:
migration_type = "live"
elif server[2] in ["SHUTOFF"] and not no_cold_migration:
migration_type = "cold"
else:
logger.info(
f"{server[0]} ({server[1]}) in status {server[2]} cannot be live migrated"
f"{server[0]} ({server[1]}) in status {server[2]} cannot be migrated"
)
continue

if yes:
answer = "yes"
else:
answer = prompt(
f"Live migrate server {server[0]} ({server[1]}) [yes/no]: "
f"{migration_type.capitalize()} migrate server {server[0]} ({server[1]}) [yes/no]: "
)

if answer in ["yes", "y"]:
logger.info(f"Live migrating server {server[0]}")
conn.compute.live_migrate_server(
server[0], host=target, block_migration="auto", force=force
logger.info(
f"{migration_type.capitalize()} migrating server {server[0]}"
)
if migration_type == "live":
conn.compute.live_migrate_server(
server[0], host=target, block_migration="auto", force=force
)
elif migration_type == "cold":
conn.compute.migrate_server(server[0], host=target)

if not no_wait:
inner_wait = True
while inner_wait:
while True:
time.sleep(2)
s = conn.compute.get_server(server[0])
if s.status in ["MIGRATING"]:
logger.info(
f"Live migration of {server[0]} ({server[1]}) is still in progress"
f"{migration_type.capitalize()} migration of {server[0]} ({server[1]}) is still in progress"
)
inner_wait = True
else:
inner_wait = False
if migration_type == "cold" and s.status in [
"VERIFY_RESIZE"
]:
conn.compute.confirm_server_resize(s)
break


class ComputeStart(Command):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error handling in the migration logic (lines 402-447) appears minimal, which could lead to scenarios where exceptions or unexpected states are not properly managed. For instance, network issues or API failures during the live_migrate_server or migrate_server calls are not explicitly handled, which could result in partial migrations or unclear application states.

Recommendation:
Enhance the robustness of the migration operations by implementing comprehensive exception handling around the API calls. Use try-except blocks to catch potential exceptions from the openstack SDK methods, and provide clear error messages or recovery options. This would improve the reliability and user experience of the migration commands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant