Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DiskPool stuck Creating #1823

Open
Mefinst opened this issue Feb 23, 2025 · 12 comments
Open

DiskPool stuck Creating #1823

Mefinst opened this issue Feb 23, 2025 · 12 comments

Comments

@Mefinst
Copy link

Mefinst commented Feb 23, 2025

Describe the bug
DiskPool resources hang in Creating state due to IO-Engine unable to send Ok response after pool is in fact created.

IO-engine pod logs

From those I make a conclusion that IO-engine successfully creates or destroys a pool when requested to do so.
create_pool method times out.
As method times out, IO-engine fails to send success response.
Operator sends commands one after another to destroy and import pool, so it destroys previously created pool, then tries to import it, then tries to create new one, but hits timeout again.

I believe DiskPool operator should not send DestroyPoolRequest commands during creation process.

DiskPoll operator logs

Not so informative. Those contain the same messages which posted to kubectl descrie diskpool.

To Reproduce
Steps to reproduce the behavior:

  1. Install Openebs helm chart. Tested 4.1.1, 4.1.2, 4.2.0.
  2. Create DiskPool
  3. DiskPool stuck Creating

Expected behavior
DiskPool created.

Screenshots
If applicable, add screenshots to help explain your problem.

** OS info (please complete the following information):**
I use Openebs chart of versions 4.1.1, 4.1.2, 4.2.0 with default values.
I can not disclose infrastructure details due to NDA.

Additional context
Add any other context about the problem here.

@tiagolobocastro
Copy link
Contributor

Hi @Mefinst, could you please try the latest version of openebs (4.3.0)?

@Mefinst
Copy link
Author

Mefinst commented Feb 24, 2025

Hi @Mefinst, could you please try the latest version of openebs (4.3.0)?

How could I obtain 4.3.0?

@tiagolobocastro
Copy link
Contributor

tiagolobocastro commented Feb 24, 2025

Sorry, it's actually 4.2.0!
I got confused because you're io-engine logs show v2.7.0.

The io-engine upgrade is not performed automatically (daemonset has OnDelete upgrade strategy).
If you can have running workloads you can use kubectl-mayastor upgrade -n openebs to upgrade the io-engine gracefully. If you don't then you can just delete the existing io-engine pods.
(Note: kubectl-openebs is being improved to contain mayastor subcommands such as upgrade mayastor)

@Mefinst
Copy link
Author

Mefinst commented Feb 24, 2025

Those logs from version 4.1.1 to which I tried to downgrade seeing that changes to the pool creation procedure were made in issue 3820 to 4.1.2 version.
I've got the same problem and same logs using any of 3 versions in the initial message clean installed (remove diskpool resources, uninstall helm chart, wait for clean namespace, install helm chart).

@tiagolobocastro
Copy link
Contributor

Could you please share a support bundle for the clean install please?

@Mefinst
Copy link
Author

Mefinst commented Feb 24, 2025

@tiagolobocastro
Copy link
Contributor

Hey, this is still mayastor v2.7.1, are you sure you're on openebs 4.2?
helm ls -n openebs

@Mefinst
Copy link
Author

Mefinst commented Feb 24, 2025

Yep. That was 4.1.1.

Here is new one from 4.2.0.
mayastor-2025-02-24--14-45-22-UTC.tar.gz

❯ helm ls -n openebs
NAME   	NAMESPACE	REVISION	UPDATED                                	STATUS  	CHART        	APP VERSION
openebs	openebs  	1       	2025-02-24 14:41:16.536721252 +0000 UTC	deployed	openebs-4.2.0	4.2.0      

@tiagolobocastro
Copy link
Contributor

hmm indeed something is quite wrong.
I suspect the delete path may also be having issues with timeouts, and perhaps not handling it.
Would you be able to delete io-engine pod on chimera node?

@Mefinst
Copy link
Author

Mefinst commented Feb 26, 2025

Deleting io-engine is not helpful. Though I collected support info after I deleted io-engine.

mayastor-2025-02-26--20-07-25-UTC.tar.gz

@tiagolobocastro
Copy link
Contributor

Could you please scale down agent-core deployment.
Then run this command on the chimera io-engine pod, io-engine container: io-engine-client pool create chimera-hdd-pool-3 aio:///dev/disk/by-path/pci-0000:00:17.0-ata-3

@Mefinst
Copy link
Author

Mefinst commented Mar 2, 2025

❯ kubectl exec -n openebs openebs-io-engine-7ccj7 -c io-engine -- io-engine-client pool create chimera-hdd-pool-3 aio:///dev/disk/by-path/pci-0000:00:17.0-ata-3
gRPC status: status: AlreadyExists, message: ": volume already exists, failed to create pool chimera-hdd-pool-3", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Sun, 02 Mar 2025 03:29:42 GMT", "content-length": "0"} }
Backtrace [
    { fn: "<core::option::Option<std::backtrace::Backtrace> as snafu::GenerateImplicitData>::generate_with_source" },
    { fn: "io_engine_client::v1::pool_cli::handler::{{closure}}" },
    { fn: "io_engine_client::v1::main_::{{closure}}" },
    { fn: "io_engine_client::main::{{closure}}" },
    { fn: "io_engine_client::main" },
    { fn: "std::sys::backtrace::__rust_begin_short_backtrace" },
    { fn: "std::rt::lang_start::{{closure}}" },
    { fn: "std::rt::lang_start_internal" },
    { fn: "main" },
    { fn: "__libc_start_call_main" },
    { fn: "__libc_start_main@GLIBC_2.2.5" },
    { fn: "_start" },
]
command terminated with exit code 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants