-
Notifications
You must be signed in to change notification settings - Fork 50
/
PendingReleaseNotes
802 lines (679 loc) · 43.9 KB
/
PendingReleaseNotes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
>=17.2.8
--------
* RADOS: `get_pool_is_selfmanaged_snaps_mode` C++ API has been deprecated
due to being prone to false negative results. It's safer replacement is
`pool_is_in_selfmanaged_snaps_mode`.
* RBD: When diffing against the beginning of time (`fromsnapname == NULL`) in
fast-diff mode (`whole_object == true` with `fast-diff` image feature enabled
and valid), diff-iterate is now guaranteed to execute locally if exclusive
lock is available. This brings a dramatic performance improvement for QEMU
live disk synchronization and backup use cases.
* RBD: The option ``--image-id`` has been added to `rbd children` CLI command,
so it can be run for images in the trash.
* RBD: `RBD_IMAGE_OPTION_CLONE_FORMAT` option has been exposed in Python
bindings via `clone_format` optional parameter to `clone`, `deep_copy` and
`migration_prepare` methods.
* RBD: `RBD_IMAGE_OPTION_FLATTEN` option has been exposed in Python bindings via
`flatten` optional parameter to `deep_copy` and `migration_prepare` methods.
>=17.2.7
--------
* `ceph mgr dump` command now displays the name of the mgr module that
registered a RADOS client in the `name` field added to elements of the
`active_clients` array. Previously, only the address of a module's RADOS
client was shown in the `active_clients` array.
* mClock Scheduler: The mClock scheduler (default scheduler in Quincy) has
undergone significant usability and design improvements to address the slow
backfill issue. Some important changes are:
* The 'balanced' profile is set as the default mClock profile because it
represents a compromise between prioritizing client IO or recovery IO. Users
can then choose either the 'high_client_ops' profile to prioritize client IO
or the 'high_recovery_ops' profile to prioritize recovery IO.
* QoS parameters like reservation and limit are now specified in terms of a
fraction (range: 0.0 to 1.0) of the OSD's IOPS capacity.
* The cost parameters (osd_mclock_cost_per_io_usec_* and
osd_mclock_cost_per_byte_usec_*) have been removed. The cost of an operation
is now determined using the random IOPS and maximum sequential bandwidth
capability of the OSD's underlying device.
* Degraded object recovery is given higher priority when compared to misplaced
object recovery because degraded objects present a data safety issue not
present with objects that are merely misplaced. Therefore, backfilling
operations with the 'balanced' and 'high_client_ops' mClock profiles may
progress slower than what was seen with the 'WeightedPriorityQueue' (WPQ)
scheduler.
* The QoS allocations in all the mClock profiles are optimized based on the above
fixes and enhancements.
* For more detailed information see:
https://docs.ceph.com/en/quincy/rados/configuration/mclock-config-ref/
* RGW: S3 multipart uploads using Server-Side Encryption now replicate correctly in
multi-site. Previously, the replicas of such objects were corrupted on decryption.
A new tool, ``radosgw-admin bucket resync encrypted multipart``, can be used to
identify these original multipart uploads. The ``LastModified`` timestamp of any
identified object is incremented by 1ns to cause peer zones to replicate it again.
For multi-site deployments that make any use of Server-Side Encryption, we
recommended running this command against every bucket in every zone after all
zones have upgraded.
* CEPHFS: MDS evicts clients which are not advancing their request tids which causes
a large buildup of session metadata resulting in the MDS going read-only due to
the RADOS operation exceeding the size threshold. `mds_session_metadata_threshold`
config controls the maximum size that a (encoded) session metadata can grow.
* CEPHFS: After recovering a Ceph File System post following the disaster recovery
procedure, the recovered files under `lost+found` directory can now be deleted.
* `ceph config dump --format <json|xml>` output will display the localized
option names instead of its normalized version. For e.g.,
"mgr/prometheus/x/server_port" will be displayed instead of
"mgr/prometheus/server_port". This matches the output of the non pretty-print
formatted version of the command.
>=17.2.6
--------
* `ceph mgr dump` command now outputs `last_failure_osd_epoch` and
`active_clients` fields at the top level. Previously, these fields were
output under `always_on_modules` field.
>=17.2.5
--------
>=19.0.0
* RGW: S3 multipart uploads using Server-Side Encryption now replicate correctly in
multi-site. Previously, the replicas of such objects were corrupted on decryption.
A new tool, ``radosgw-admin bucket resync encrypted multipart``, can be used to
identify these original multipart uploads. The ``LastModified`` timestamp of any
identified object is incremented by 1ns to cause peer zones to replicate it again.
For multi-site deployments that make any use of Server-Side Encryption, we
recommended running this command against every bucket in every zone after all
zones have upgraded.
* CEPHFS: MDS evicts clients which are not advancing their request tids which causes
a large buildup of session metadata resulting in the MDS going read-only due to
the RADOS operation exceeding the size threshold. `mds_session_metadata_threshold`
config controls the maximum size that a (encoded) session metadata can grow.
* CephFS: For clusters with multiple CephFS file systems, all the snap-schedule
commands now expect the '--fs' argument.
* CephFS: The period specifier ``m`` now implies minutes and the period specifier
``M`` now implies months. This has been made consistent with the rest
of the system.
* RGW: New tools have been added to radosgw-admin for identifying and
correcting issues with versioned bucket indexes. Historical bugs with the
versioned bucket index transaction workflow made it possible for the index
to accumulate extraneous "book-keeping" olh entries and plain placeholder
entries. In some specific scenarios where clients made concurrent requests
referencing the same object key, it was likely that a lot of extra index
entries would accumulate. When a significant number of these entries are
present in a single bucket index shard, they can cause high bucket listing
latencies and lifecycle processing failures. To check whether a versioned
bucket has unnecessary olh entries, users can now run ``radosgw-admin
bucket check olh``. If the ``--fix`` flag is used, the extra entries will
be safely removed. A distinct issue from the one described thus far, it is
also possible that some versioned buckets are maintaining extra unlinked
objects that are not listable from the S3/ Swift APIs. These extra objects
are typically a result of PUT requests that exited abnormally, in the middle
of a bucket index transaction - so the client would not have received a
successful response. Bugs in prior releases made these unlinked objects easy
to reproduce with any PUT request that was made on a bucket that was actively
resharding. Besides the extra space that these hidden, unlinked objects
consume, there can be another side effect in certain scenarios, caused by
the nature of the failure mode that produced them, where a client of a bucket
that was a victim of this bug may find the object associated with the key to
be in an inconsistent state. To check whether a versioned bucket has unlinked
entries, users can now run ``radosgw-admin bucket check unlinked``. If the
``--fix`` flag is used, the unlinked objects will be safely removed. Finally,
a third issue made it possible for versioned bucket index stats to be
accounted inaccurately. The tooling for recalculating versioned bucket stats
also had a bug, and was not previously capable of fixing these inaccuracies.
This release resolves those issues and users can now expect that the existing
``radosgw-admin bucket check`` command will produce correct results. We
recommend that users with versioned buckets, especially those that existed
on prior releases, use these new tools to check whether their buckets are
affected and to clean them up accordingly.
* mgr/snap-schedule: For clusters with multiple CephFS file systems, all the
snap-schedule commands now expect the '--fs' argument.
* RGW: Fixed a S3 Object Lock bug with PutObjectRetention requests that specify
a RetainUntilDate after the year 2106. This date was truncated to 32 bits when
stored, so a much earlier date was used for object lock enforcement. This does
not effect PutBucketObjectLockConfiguration where a duration is given in Days.
The RetainUntilDate encoding is fixed for new PutObjectRetention requests, but
cannot repair the dates of existing object locks. Such objects can be identified
with a HeadObject request based on the x-amz-object-lock-retain-until-date
response header.
* RADOS: `get_pool_is_selfmanaged_snaps_mode` C++ API has been deprecated
due to being prone to false negative results. It's safer replacement is
`pool_is_in_selfmanaged_snaps_mode`.
* RADOS: For bug 62338 (https://tracker.ceph.com/issues/62338), we did not choose
to condition the fix on a server flag in order to simplify backporting. As
a result, in rare cases it may be possible for a PG to flip between two acting
sets while an upgrade to a version with the fix is in progress. If you observe
this behavior, you should be able to work around it by completing the upgrade or
by disabling async recovery by setting osd_async_recovery_min_cost to a very
large value on all OSDs until the upgrade is complete:
``ceph config set osd osd_async_recovery_min_cost 1099511627776``
* RADOS: A detailed version of the `balancer status` CLI command in the balancer
module is now available. Users may run `ceph balancer status detail` to see more
details about which PGs were updated in the balancer's last optimization.
See https://docs.ceph.com/en/latest/rados/operations/balancer/ for more information.
* CephFS: Full support for subvolumes and subvolume groups is now available
for snap_schedule Manager module.
>=18.0.0
* RBD: The semantics of compare-and-write C++ API (`Image::compare_and_write`
and `Image::aio_compare_and_write` methods) now match those of C API. Both
compare and write steps operate only on `len` bytes even if the respective
buffers are larger. The previous behavior of comparing up to the size of
the compare buffer was prone to subtle breakage upon straddling a stripe
unit boundary.
* RBD: compare-and-write operation is no longer limited to 512-byte sectors.
Assuming proper alignment, it now allows operating on stripe units (4M by
default).
* RBD: New `rbd_aio_compare_and_writev` API method to support scatter/gather
on both compare and write buffers. This compliments existing `rbd_aio_readv`
and `rbd_aio_writev` methods.
* RBD: `rbd device unmap` command gained `--namespace` option. Support for
namespaces was added to RBD in Nautilus 14.2.0 and it has been possible to
map and unmap images in namespaces using the `image-spec` syntax since then
but the corresponding option available in most other commands was missing.
* CEPHFS: Rename the `mds_max_retries_on_remount_failure` option to
`client_max_retries_on_remount_failure` and move it from mds.yaml.in to
mds-client.yaml.in because this option was only used by MDS client from its
birth.
>=17.2.4
--------
* Cephfs: The 'AT_NO_ATTR_SYNC' macro is deprecated, please use the standard
'AT_STATX_DONT_SYNC' macro. The 'AT_NO_ATTR_SYNC' macro will be removed in
the future.
* OSD: The issue of high CPU utilization during recovery/backfill operations
has been fixed. For more details, see: https://tracker.ceph.com/issues/56530.
* Trimming of PGLog dups is now controlled by the size instead of the version.
This fixes the PGLog inflation issue that was happening when the on-line
(in OSD) trimming got jammed after a PG split operation. Also, a new off-line
mechanism has been added: `ceph-objectstore-tool` got `trim-pg-log-dups` op
that targets situations where OSD is unable to boot due to those inflated dups.
If that is the case, in OSD logs the "You can be hit by THE DUPS BUG" warning
will be visible.
Relevant tracker: https://tracker.ceph.com/issues/53729
* RBD: `rbd device unmap` command gained `--namespace` option. Support for
namespaces was added to RBD in Nautilus 14.2.0 and it has been possible to
map and unmap images in namespaces using the `image-spec` syntax since then
but the corresponding option available in most other commands was missing.
* RGW: Compression is now supported for objects uploaded with Server-Side Encryption.
When both are enabled, compression is applied before encryption.
* RGW: the "pubsub" functionality for storing bucket notifications inside Ceph
is removed. Together with it, the "pubsub" zone should not be used anymore.
The REST operations, as well as radosgw-admin commands for manipulating
subscriptions, as well as fetching and acking the notifications are removed
as well.
In case that the endpoint to which the notifications are sent maybe down or
disconnected, it is recommended to use persistent notifications to guarantee
the delivery of the notifications. In case the system that consumes the
notifications needs to pull them (instead of the notifications be pushed
to it), an external message bus (e.g. rabbitmq, Kafka) should be used for
that purpose.
* RGW: The serialized format of notification and topics has changed, so that
new/updated topics will be unreadable by old RGWs. We recommend completing
the RGW upgrades before creating or modifying any notification topics.
* RBD: Trailing newline in passphrase files (`<passphrase-file>` argument in
`rbd encryption format` command and `--encryption-passphrase-file` option
in other commands) is no longer stripped.
* RBD: Support for layered client-side encryption is added. Cloned images
can now be encrypted each with its own encryption format and passphrase,
potentially different from that of the parent image. The efficient
copy-on-write semantics intrinsic to unformatted (regular) cloned images
are retained.
* CEPHFS: Rename the `mds_max_retries_on_remount_failure` option to
`client_max_retries_on_remount_failure` and move it from mds.yaml.in to
mds-client.yaml.in because this option was only used by MDS client from its
birth.
* The `perf dump` and `perf schema` commands are deprecated in favor of new
`counter dump` and `counter schema` commands. These new commands add support
for labeled perf counters and also emit existing unlabeled perf counters. Some
unlabeled perf counters became labeled in this release, with more to follow in
future releases; such converted perf counters are no longer emitted by the
`perf dump` and `perf schema` commands.
* `ceph mgr dump` command now outputs `last_failure_osd_epoch` and
`active_clients` fields at the top level. Previously, these fields were
output under `always_on_modules` field.
* `ceph mgr dump` command now displays the name of the mgr module that
registered a RADOS client in the `name` field added to elements of the
`active_clients` array. Previously, only the address of a module's RADOS
client was shown in the `active_clients` array.
* RBD: All rbd-mirror daemon perf counters became labeled and as such are now
emitted only by the new `counter dump` and `counter schema` commands. As part
of the conversion, many also got renamed to better disambiguate journal-based
and snapshot-based mirroring.
* RBD: list-watchers C++ API (`Image::list_watchers`) now clears the passed
`std::list` before potentially appending to it, aligning with the semantics
of the corresponding C API (`rbd_watchers_list`).
* Telemetry: Users who are opted-in to telemetry can also opt-in to
participating in a leaderboard in the telemetry public
dashboards (https://telemetry-public.ceph.com/). Users can now also add a
description of the cluster to publicly appear in the leaderboard.
For more details, see:
https://docs.ceph.com/en/latest/mgr/telemetry/#leaderboard
See a sample report with `ceph telemetry preview`.
Opt-in to telemetry with `ceph telemetry on`.
Opt-in to the leaderboard with
`ceph config set mgr mgr/telemetry/leaderboard true`.
Add leaderboard description with:
`ceph config set mgr mgr/telemetry/leaderboard_description ‘Cluster description’`.
* CEPHFS: After recovering a Ceph File System post following the disaster recovery
procedure, the recovered files under `lost+found` directory can now be deleted.
* core: cache-tiering is now deprecated.
* mClock Scheduler: The mClock scheduler (default scheduler in Quincy) has
undergone significant usability and design improvements to address the slow
backfill issue. Some important changes are:
* The 'balanced' profile is set as the default mClock profile because it
represents a compromise between prioritizing client IO or recovery IO. Users
can then choose either the 'high_client_ops' profile to prioritize client IO
or the 'high_recovery_ops' profile to prioritize recovery IO.
* QoS parameters like reservation and limit are now specified in terms of a
fraction (range: 0.0 to 1.0) of the OSD's IOPS capacity.
* The cost parameters (osd_mclock_cost_per_io_usec_* and
osd_mclock_cost_per_byte_usec_*) have been removed. The cost of an operation
is now determined using the random IOPS and maximum sequential bandwidth
capability of the OSD's underlying device.
* Degraded object recovery is given higher priority when compared to misplaced
object recovery because degraded objects present a data safety issue not
present with objects that are merely misplaced. Therefore, backfilling
operations with the 'balanced' and 'high_client_ops' mClock profiles may
progress slower than what was seen with the 'WeightedPriorityQueue' (WPQ)
scheduler.
* The QoS allocations in all the mClock profiles are optimized based on the above
fixes and enhancements.
* For more detailed information see:
https://docs.ceph.com/en/latest/rados/configuration/mclock-config-ref/
* mgr/snap_schedule: The snap-schedule mgr module now retains one less snapshot
than the number mentioned against the config tunable `mds_max_snaps_per_dir`
so that a new snapshot can be created and retained during the next schedule
run.
>=17.2.1
* The "BlueStore zero block detection" feature (first introduced to Quincy in
https://github.com/ceph/ceph/pull/43337) has been turned off by default with a
new global configuration called `bluestore_zero_block_detection`. This feature,
intended for large-scale synthetic testing, does not interact well with some RBD
and CephFS features. Any side effects experienced in previous Quincy versions
would no longer occur, provided that the configuration remains set to false.
Relevant tracker: https://tracker.ceph.com/issues/55521
* telemetry: Added new Rook metrics to the 'basic' channel to report Rook's
version, Kubernetes version, node metrics, etc.
See a sample report with `ceph telemetry preview`.
Opt-in with `ceph telemetry on`.
For more details, see:
https://docs.ceph.com/en/latest/mgr/telemetry/
>=17.0.0
* Filestore has been deprecated in Quincy, considering that BlueStore has been
the default objectstore for quite some time.
* Critical bug in OMAP format upgrade is fixed. This could cause data corruption
(improperly formatted OMAP keys) after pre-Pacific cluster upgrade if
bluestore-quick-fix-on-mount parameter is set to true or ceph-bluestore-tool's
quick-fix/repair commands are invoked.
Relevant tracker: https://tracker.ceph.com/issues/53062
* `ceph-mgr-modules-core` debian package does not recommend `ceph-mgr-rook`
anymore. As the latter depends on `python3-numpy` which cannot be imported in
different Python sub-interpreters multi-times if the version of
`python3-numpy` is older than 1.19. Since `apt-get` installs the `Recommends`
packages by default, `ceph-mgr-rook` was always installed along with
`ceph-mgr` debian package as an indirect dependency. If your workflow depends
on this behavior, you might want to install `ceph-mgr-rook` separately.
* the "kvs" Ceph object class is not packaged anymore. "kvs" Ceph object class
offers a distributed flat b-tree key-value store implemented on top of librados
objects omap. Because we don't have existing internal users of this object
class, it is not packaged anymore.
* A new library is available, libcephsqlite. It provides a SQLite Virtual File
System (VFS) on top of RADOS. The database and journals are striped over
RADOS across multiple objects for virtually unlimited scaling and throughput
only limited by the SQLite client. Applications using SQLite may change to
the Ceph VFS with minimal changes, usually just by specifying the alternate
VFS. We expect the library to be most impactful and useful for applications
that were storing state in RADOS omap, especially without striping which
limits scalability.
* The ``device_health_metrics`` pool has been renamed ``.mgr``. It is now
used as a common store for all ``ceph-mgr`` modules.
* fs: A file system can be created with a specific ID ("fscid"). This is useful
in certain recovery scenarios, e.g., monitor database lost and rebuilt, and
the restored file system is expected to have the same ID as before.
* fs: A file system can be renamed using the `fs rename` command. Any cephx
credentials authorized for the old file system name will need to be
reauthorized to the new file system name. Since the operations of the clients
using these re-authorized IDs may be disrupted, this command requires the
"--yes-i-really-mean-it" flag. Also, mirroring is expected to be disabled
on the file system.
* fs: A FS volume can be renamed using the `fs volume rename` command. Any cephx
credentials authorized for the old volume name will need to be reauthorized to
the new volume name. Since the operations of the clients using these re-authorized
IDs may be disrupted, this command requires the "--yes-i-really-mean-it" flag. Also,
mirroring is expected to be disabled on the file system.
* MDS upgrades no longer require stopping all standby MDS daemons before
upgrading the sole active MDS for a file system.
* RGW: RGW now supports rate limiting by user and/or by bucket.
With this feature it is possible to limit user and/or bucket, the total operations and/or
bytes per minute can be delivered.
This feature is allowing the admin to limit only READ operations and/or WRITE operations.
The rate limiting configuration could be applied on all users and all bucket by using
global configuration.
* RGW: `radosgw-admin realm delete` is now renamed to `radosgw-admin realm rm`. This
is consistent with the help message.
* OSD: Ceph now uses mclock_scheduler for bluestore OSDs as its default osd_op_queue
to provide QoS. The 'mclock_scheduler' is not supported for filestore OSDs.
Therefore, the default 'osd_op_queue' is set to 'wpq' for filestore OSDs
and is enforced even if the user attempts to change it. For more details on
configuring mclock see,
https://docs.ceph.com/en/latest/rados/configuration/mclock-config-ref/
* CephFS: Failure to replay the journal by a standby-replay daemon will now
cause the rank to be marked damaged.
* RGW: S3 bucket notification events now contain an `eTag` key instead of `etag`,
and eventName values no longer carry the `s3:` prefix, fixing deviations from
the message format observed on AWS.
* RGW: It is possible to specify ssl options and ciphers for beast frontend now.
The default ssl options setting is "no_sslv2:no_sslv3:no_tlsv1:no_tlsv1_1".
If you want to return back the old behavior add 'ssl_options=' (empty) to
``rgw frontends`` configuration.
* RGW: The behavior for Multipart Upload was modified so that only
CompleteMultipartUpload notification is sent at the end of the multipart upload.
The POST notification at the beginning of the upload, and PUT notifications that
were sent on each part are not sent anymore.
* MGR: The pg_autoscaler has a new 'scale-down' profile which provides more
performance from the start for new pools. However, the module will remain
using it old behavior by default, now called the 'scale-up' profile.
For more details, see:
https://docs.ceph.com/en/latest/rados/operations/placement-groups/
* MGR: The pg_autoscaler can now be turned `on` and `off` globally
with the `noautoscale` flag. By default this flag is unset and
the default pg_autoscale mode remains the same.
For more details, see:
https://docs.ceph.com/en/latest/rados/operations/placement-groups/
* The ``ceph pg dump`` command now prints three additional columns:
`LAST_SCRUB_DURATION` shows the duration (in seconds) of the last completed scrub;
`SCRUB_SCHEDULING` conveys whether a PG is scheduled to be scrubbed at a specified
time, queued for scrubbing, or being scrubbed;
`OBJECTS_SCRUBBED` shows the number of objects scrubbed in a PG after scrub begins.
* A health warning will now be reported if the ``require-osd-release`` flag is not
set to the appropriate release after a cluster upgrade.
* LevelDB support has been removed. ``WITH_LEVELDB`` is no longer a supported
build option.
* MON/MGR: Pools can now be created with `--bulk` flag. Any pools created with `bulk`
will use a profile of the `pg_autoscaler` that provides more performance from the start.
However, any pools created without the `--bulk` flag will remain using it's old behavior
by default. For more details, see:
https://docs.ceph.com/en/latest/rados/operations/placement-groups/
* Cephadm: ``osd_memory_target_autotune`` will be enabled by default which will set
``mgr/cephadm/autotune_memory_target_ratio`` to ``0.7`` of total RAM. This will be
unsuitable for hyperconverged infrastructures. For hyperconverged Ceph, please refer
to the documentation or set ``mgr/cephadm/autotune_memory_target_ratio`` to ``0.2``.
* telemetry: Improved the opt-in flow so that users can keep sharing the same
data, even when new data collections are available. A new 'perf' channel
that collects various performance metrics is now avaiable to opt-in to with:
`ceph telemetry on`
`ceph telemetry enable channel perf`
See a sample report with `ceph telemetry preview`
For more details, see:
https://docs.ceph.com/en/latest/mgr/telemetry/
* MGR: The progress module disables the pg recovery event by default
since the event is expensive and has interrupted other service when
there are OSDs being marked in/out from the the cluster. However,
the user may still enable this event anytime. For more details, see:
https://docs.ceph.com/en/latest/mgr/progress/
>=16.0.0
--------
* mgr/nfs: ``nfs`` module is moved out of volumes plugin. Prior using the
``ceph nfs`` commands, ``nfs`` mgr module must be enabled.
* volumes/nfs: The ``cephfs`` cluster type has been removed from the
``nfs cluster create`` subcommand. Clusters deployed by cephadm can
support an NFS export of both ``rgw`` and ``cephfs`` from a single
NFS cluster instance.
* The ``nfs cluster update`` command has been removed. You can modify
the placement of an existing NFS service (and/or its associated
ingress service) using ``orch ls --export`` and ``orch apply -i
...``.
* The ``orch apply nfs`` command no longer requires a pool or
namespace argument. We strongly encourage users to use the defaults
so that the ``nfs cluster ls`` and related commands will work
properly.
* The ``nfs cluster delete`` and ``nfs export delete`` commands are
deprecated and will be removed in a future release. Please use
``nfs cluster rm`` and ``nfs export rm`` instead.
* The ``nfs export create`` CLI arguments have changed, with the
*fsname* or *bucket-name* argument position moving to the right of
*the *cluster-id* and *pseudo-path*. Consider transitioning to
*using named arguments instead of positional arguments (e.g., ``ceph
*nfs export create cephfs --cluster-id mycluster --pseudo-path /foo
*--fsname myfs`` instead of ``ceph nfs export create cephfs
*mycluster /foo myfs`` to ensure correct behavior with any
*version.
* mgr-pg_autoscaler: Autoscaler will now start out by scaling each
pool to have a full complements of pgs from the start and will only
decrease it when other pools need more pgs due to increased usage.
This improves out of the box performance of Ceph by allowing more PGs
to be created for a given pool.
* CephFS: Disabling allow_standby_replay on a file system will also stop all
standby-replay daemons for that file system.
* New bluestore_rocksdb_options_annex config parameter. Complements
bluestore_rocksdb_options and allows setting rocksdb options without repeating
the existing defaults.
* The MDS in Pacific makes backwards-incompatible changes to the ON-RADOS
metadata structures, which prevent a downgrade to older releases
(to Octopus and older).
* $pid expansion in config paths like `admin_socket` will now properly expand
to the daemon pid for commands like `ceph-mds` or `ceph-osd`. Previously only
`ceph-fuse`/`rbd-nbd` expanded `$pid` with the actual daemon pid.
* The allowable options for some "radosgw-admin" commands have been changed.
* "mdlog-list", "datalog-list", "sync-error-list" no longer accept
start and end dates, but do accept a single optional start marker.
* "mdlog-trim", "datalog-trim", "sync-error-trim" only accept a
single marker giving the end of the trimmed range.
* Similarly the date ranges and marker ranges have been removed on
the RESTful DATALog and MDLog list and trim operations.
* ceph-volume: The ``lvm batch`` subcommand received a major rewrite. This
closed a number of bugs and improves usability in terms of size specification
and calculation, as well as idempotency behaviour and disk replacement
process. Please refer to
https://docs.ceph.com/en/latest/ceph-volume/lvm/batch/ for more detailed
information.
* Configuration variables for permitted scrub times have changed. The legal
values for ``osd_scrub_begin_hour`` and ``osd_scrub_end_hour`` are ``0`` -
``23``. The use of 24 is now illegal. Specifying ``0`` for both values
causes every hour to be allowed. The legal vaues for
``osd_scrub_begin_week_day`` and ``osd_scrub_end_week_day`` are ``0`` -
``6``. The use of ``7`` is now illegal. Specifying ``0`` for both values
causes every day of the week to be allowed.
* Support for multiple file systems in a single Ceph cluster is now stable.
New Ceph clusters enable support for multiple file systems by default.
Existing clusters must still set the "enable_multiple" flag on the fs.
See the CephFS documentation for more information.
* volume/nfs: The "ganesha-" prefix from cluster id and nfs-ganesha common
config object was removed to ensure a consistent namespace across different
orchestrator backends. Delete any existing nfs-ganesha clusters prior
to upgrading and redeploy new clusters after upgrading to Pacific.
* A new health check, DAEMON_OLD_VERSION, warns if different versions of
Ceph are running on daemons. It generates a health error if multiple
versions are detected. This condition must exist for over
``mon_warn_older_version_delay`` (set to 1 week by default) in order for the
health condition to be triggered. This allows most upgrades to proceed
without falsely seeing the warning. If upgrade is paused for an extended
time period, health mute can be used like this "ceph health mute
DAEMON_OLD_VERSION --sticky". In this case after upgrade has finished use
"ceph health unmute DAEMON_OLD_VERSION".
* MGR: progress module can now be turned on/off, using these commands:
``ceph progress on`` and ``ceph progress off``.
* The ceph_volume_client.py library used for manipulating legacy "volumes" in
CephFS is removed. All remaining users should use the "fs volume" interface
exposed by the ceph-mgr:
https://docs.ceph.com/en/latest/cephfs/fs-volumes/
* An AWS-compliant API: "GetTopicAttributes" was added to replace the existing
"GetTopic" API. The new API should be used to fetch information about topics
used for bucket notifications.
* librbd: The shared, read-only parent cache's config option
``immutable_object_cache_watermark`` has now been updated to properly reflect
the upper cache utilization before space is reclaimed. The default
``immutable_object_cache_watermark`` is now ``0.9``. If the capacity reaches
90% the daemon will delete cold cache.
* OSD: the option ``osd_fast_shutdown_notify_mon`` has been introduced to allow
the OSD to notify the monitor it is shutting down even if ``osd_fast_shutdown``
is enabled. This helps with the monitor logs on larger clusters, that may get
many 'osd.X reported immediately failed by osd.Y' messages, and confuse tools.
* rgw/kms/vault: the transit logic has been revamped to better use
the transit engine in vault. To take advantage of this new
functionality configuration changes are required. See the current
documentation (radosgw/vault) for more details.
* Scubs are more aggressive in trying to find more simultaneous possible PGs within osd_max_scrubs limitation.
It is possible that increasing osd_scrub_sleep may be necessary to maintain client responsiveness.
* Version 2 of the cephx authentication protocol (``CEPHX_V2`` feature bit) is
now required by default. It was introduced in 2018, adding replay attack
protection for authorizers and making msgr v1 message signatures stronger
(CVE-2018-1128 and CVE-2018-1129). Support is present in Jewel 10.2.11,
Luminous 12.2.6, Mimic 13.2.1, Nautilus 14.2.0 and later; upstream kernels
4.9.150, 4.14.86, 4.19 and later; various distribution kernels, in particular
CentOS 7.6 and later. To enable older clients, set ``cephx_require_version``
and ``cephx_service_require_version`` config options to 1.
>=15.0.0
--------
* MON: The cluster log now logs health detail every ``mon_health_to_clog_interval``,
which has been changed from 1hr to 10min. Logging of health detail will be
skipped if there is no change in health summary since last known.
* The ``ceph df`` command now lists the number of pgs in each pool.
* Monitors now have config option ``mon_allow_pool_size_one``, which is disabled
by default. However, if enabled, user now have to pass the
``--yes-i-really-mean-it`` flag to ``osd pool set size 1``, if they are really
sure of configuring pool size 1.
* librbd now inherits the stripe unit and count from its parent image upon creation.
This can be overridden by specifying different stripe settings during clone creation.
* The balancer is now on by default in upmap mode. Since upmap mode requires
``require_min_compat_client`` luminous, new clusters will only support luminous
and newer clients by default. Existing clusters can enable upmap support by running
``ceph osd set-require-min-compat-client luminous``. It is still possible to turn
the balancer off using the ``ceph balancer off`` command. In earlier versions,
the balancer was included in the ``always_on_modules`` list, but needed to be
turned on explicitly using the ``ceph balancer on`` command.
* MGR: the "cloud" mode of the diskprediction module is not supported anymore
and the ``ceph-mgr-diskprediction-cloud`` manager module has been removed. This
is because the external cloud service run by ProphetStor is no longer accessible
and there is no immediate replacement for it at this time. The "local" prediction
mode will continue to be supported.
* Cephadm: There were a lot of small usability improvements and bug fixes:
* Grafana when deployed by Cephadm now binds to all network interfaces.
* ``cephadm check-host`` now prints all detected problems at once.
* Cephadm now calls ``ceph dashboard set-grafana-api-ssl-verify false``
when generating an SSL certificate for Grafana.
* The Alertmanager is now correctly pointed to the Ceph Dashboard
* ``cephadm adopt`` now supports adopting an Alertmanager
* ``ceph orch ps`` now supports filtering by service name
* ``ceph orch host ls`` now marks hosts as offline, if they are not
accessible.
* Cephadm can now deploy NFS Ganesha services. For example, to deploy NFS with
a service id of mynfs, that will use the RADOS pool nfs-ganesha and namespace
nfs-ns::
ceph orch apply nfs mynfs nfs-ganesha nfs-ns
* Cephadm: ``ceph orch ls --export`` now returns all service specifications in
yaml representation that is consumable by ``ceph orch apply``. In addition,
the commands ``orch ps`` and ``orch ls`` now support ``--format yaml`` and
``--format json-pretty``.
* CephFS: Automatic static subtree partitioning policies may now be configured
using the new distributed and random ephemeral pinning extended attributes on
directories. See the documentation for more information:
https://docs.ceph.com/docs/master/cephfs/multimds/
* Cephadm: ``ceph orch apply osd`` supports a ``--preview`` flag that prints a preview of
the OSD specification before deploying OSDs. This makes it possible to
verify that the specification is correct, before applying it.
* RGW: The ``radosgw-admin`` sub-commands dealing with orphans --
``radosgw-admin orphans find``, ``radosgw-admin orphans finish``, and
``radosgw-admin orphans list-jobs`` -- have been deprecated. They have
not been actively maintained and they store intermediate results on
the cluster, which could fill a nearly-full cluster. They have been
replaced by a tool, currently considered experimental,
``rgw-orphan-list``.
* RBD: The name of the rbd pool object that is used to store
rbd trash purge schedule is changed from "rbd_trash_trash_purge_schedule"
to "rbd_trash_purge_schedule". Users that have already started using
``rbd trash purge schedule`` functionality and have per pool or namespace
schedules configured should copy "rbd_trash_trash_purge_schedule"
object to "rbd_trash_purge_schedule" before the upgrade and remove
"rbd_trash_purge_schedule" using the following commands in every RBD
pool and namespace where a trash purge schedule was previously
configured::
rados -p <pool-name> [-N namespace] cp rbd_trash_trash_purge_schedule rbd_trash_purge_schedule
rados -p <pool-name> [-N namespace] rm rbd_trash_trash_purge_schedule
or use any other convenient way to restore the schedule after the
upgrade.
* librbd: The shared, read-only parent cache has been moved to a separate librbd
plugin. If the parent cache was previously in-use, you must also instruct
librbd to load the plugin by adding the following to your configuration::
rbd_plugins = parent_cache
* Monitors now have a config option ``mon_osd_warn_num_repaired``, 10 by default.
If any OSD has repaired more than this many I/O errors in stored data a
``OSD_TOO_MANY_REPAIRS`` health warning is generated.
* Introduce commands that manipulate required client features of a file system::
ceph fs required_client_features <fs name> add <feature>
ceph fs required_client_features <fs name> rm <feature>
ceph fs feature ls
* OSD: A new configuration option ``osd_compact_on_start`` has been added which triggers
an OSD compaction on start. Setting this option to ``true`` and restarting an OSD
will result in an offline compaction of the OSD prior to booting.
* OSD: the option named ``bdev_nvme_retry_count`` has been removed. Because
in SPDK v20.07, there is no easy access to bdev_nvme options, and this
option is hardly used, so it was removed.
* Now when noscrub and/or nodeep-scrub flags are set globally or per pool,
scheduled scrubs of the type disabled will be aborted. All user initiated
scrubs are NOT interrupted.
* Alpine build related script, documentation and test have been removed since
the most updated APKBUILD script of Ceph is already included by Alpine Linux's
aports repository.
* fs: Names of new FSs, volumes, subvolumes and subvolume groups can only
contain alphanumeric and ``-``, ``_`` and ``.`` characters. Some commands
or CephX credentials may not work with old FSs with non-conformant names.
* It is now possible to specify the initial monitor to contact for Ceph tools
and daemons using the ``mon_host_override`` config option or
``--mon-host-override <ip>`` command-line switch. This generally should only
be used for debugging and only affects initial communication with Ceph's
monitor cluster.
* `blacklist` has been replaced with `blocklist` throughout. The following commands have changed:
- ``ceph osd blacklist ...`` are now ``ceph osd blocklist ...``
- ``ceph <tell|daemon> osd.<NNN> dump_blacklist`` is now ``ceph <tell|daemon> osd.<NNN> dump_blocklist``
* The following config options have changed:
- ``mon osd blacklist default expire`` is now ``mon osd blocklist default expire``
- ``mon mds blacklist interval`` is now ``mon mds blocklist interval``
- ``mon mgr blacklist interval`` is now ''mon mgr blocklist interval``
- ``rbd blacklist on break lock`` is now ``rbd blocklist on break lock``
- ``rbd blacklist expire seconds`` is now ``rbd blocklist expire seconds``
- ``mds session blacklist on timeout`` is now ``mds session blocklist on timeout``
- ``mds session blacklist on evict`` is now ``mds session blocklist on evict``
* CephFS: Compatibility code for old on-disk format of snapshot has been removed.
Current on-disk format of snapshot was introduced by Mimic release. If there
are any snapshots created by Ceph release older than Mimic. Before upgrading,
either delete them all or scrub the whole filesystem:
ceph daemon <mds of rank 0> scrub_path / force recursive repair
ceph daemon <mds of rank 0> scrub_path '~mdsdir' force recursive repair
* CephFS: Scrub is supported in multiple active mds setup. MDS rank 0 handles
scrub commands, and forward scrub to other mds if necessary.
* The following librados API calls have changed:
- ``rados_blacklist_add`` is now ``rados_blocklist_add``; the former will issue a deprecation warning and be removed in a future release.
- ``rados.blacklist_add`` is now ``rados.blocklist_add`` in the C++ API.
* The JSON output for the following commands now shows ``blocklist`` instead of ``blacklist``:
- ``ceph osd dump``
- ``ceph <tell|daemon> osd.<N> dump_blocklist``
* caps: MON and MDS caps can now be used to restrict client's ability to view
and operate on specific Ceph file systems. The FS can be specificed using
``fsname`` in caps. This also affects subcommand ``fs authorize``, the caps
produce by it will be specific to the FS name passed in its arguments.
* fs: root_squash flag can be set in MDS caps. It disallows file system
operations that need write access for clients with uid=0 or gid=0. This
feature should prevent accidents such as an inadvertent `sudo rm -rf /<path>`.
* fs: "fs authorize" now sets MON cap to "allow <perm> fsname=<fsname>"
instead of setting it to "allow r" all the time.
* ``ceph pg #.# list_unfound`` output has been enhanced to provide
might_have_unfound information which indicates which OSDs may
contain the unfound objects.
* The ``ceph orch apply rgw`` syntax and behavior have changed. RGW
services can now be arbitrarily named (it is no longer forced to be
`realm.zone`). The ``--rgw-realm=...`` and ``--rgw-zone=...``
arguments are now optional, which means that if they are omitted, a
vanilla single-cluster RGW will be deployed. When the realm and
zone are provided, the user is now responsible for setting up the
multisite configuration beforehand--cephadm no longer attempts to
create missing realms or zones.
* The ``min_size`` and ``max_size`` CRUSH rule properties have been removed. Older
CRUSH maps will still compile but Ceph will issue a warning that these fields are
ignored.
* The cephadm NFS support has been simplified to no longer allow the
pool and namespace where configuration is stored to be customized.
As a result, the ``ceph orch apply nfs`` command no longer has
``--pool`` or ``--namespace`` arguments.
Existing cephadm NFS deployments (from earlier version of Pacific or
from Octopus) will be automatically migrated when the cluster is
upgraded. Note that the NFS ganesha daemons will be redeployed and
it is possible that their IPs will change.
* RGW now requires a secure connection to the monitor by default
(``auth_client_required=cephx`` and ``ms_mon_client_mode=secure``).
If you have cephx authentication disabled on your cluster, you may
need to adjust these settings for RGW.