Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need to understand recovery when a device goes away and comes back #1448

Open
tasleson opened this issue Feb 27, 2019 · 1 comment
Open

Need to understand recovery when a device goes away and comes back #1448

tasleson opened this issue Feb 27, 2019 · 1 comment

Comments

@tasleson
Copy link
Contributor

tasleson commented Feb 27, 2019

When testing a pool with 3 devices, I created a FS, mounted it and started IO. I then took the devices offline with echo offline > /sys/block/sdk/device/state. I then brought the devices back, but after I do we cannot use the device mapper tables. If I unmount and then try to mount FS I get:

# mount /stratis/yank/some_fs /mnt/fubar/
mount: /mnt/fubar: can't read superblock on /dev/mapper/stratis-1-3b8e5c85a0d84b04ab5e826689e7d020-thin-fs-077152e10d9b4d2eae0f1c151e6a6651.

Stratis daemon shows

ERROR libstratis::engine::strat_engine::thinpool::thinpool: Thinpool status is fail -> Failed

Logs show

[10350.529741] device-mapper: thin: 253:3: metadata operation 'dm_pool_commit_metadata' failed: error = -5
[10350.529742] device-mapper: thin: 253:3: aborting current metadata transaction
[10350.531325] sd 7:0:3:0: rejecting I/O to offline device
[10350.536549] sd 7:0:2:0: rejecting I/O to offline device
[10350.545825] sd 7:0:3:0: rejecting I/O to offline device
[10350.546515] device-mapper: thin: 253:3: failed to abort metadata transaction
[10350.546518] device-mapper: thin: 253:3: switching pool to failure mode
[10350.549157] device-mapper: thin metadata: couldn't read superblock
[10350.549158] device-mapper: thin: 253:3: failed to set 'needs_check' flag in metadata

The only way to get things back is to stop stratisd, remove the dm tables and restart the service.
Perhaps there is something better we can do here to get things back to a working state without requiring user intervention?

@tasleson
Copy link
Contributor Author

Note: This test was done with: stratis-storage/devicemapper-rs#430

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant