health HEALTH_WARN 64 pgs incomplete; 64 pgs stuck inactive; 64 pgs stuck unclean #187

abhishek6590 · 2015-02-03T00:27:23Z

Hi,

I am having an issue of ceph health -
health HEALTH_WARN 64 pgs incomplete; 64 pgs stuck inactive; 64 pgs stuck unclean
Please suggest me what should I check.

Thanks,
Abhishek

hufman · 2015-02-03T02:34:53Z

That sounds like there aren't any OSD processes running and connected to the cluster. If you check the output of ceph osd tree, does it show that the cluster expects to have an OSD? If not, this means that the ceph-disk-prepare script didn't run, which comes from the ceph::osd recipe. If so, this means that the ceph::osd script ran and initialized an OSD, but for some reason that OSD didn't connect to the cluster. Check the OSD server to make sure the process is running, and then look at the logs in /var/log/ceph/ceph-osd* to see why the OSD isn't connecting.

abhishek6590 · 2015-02-03T20:59:09Z

Hi ceph osd tree is showing output -
#ceph osd tree

id weight type name up/down reweight

-1 0.09 root default
-2 0.09 host server3
0 0.09 osd.0 up 1

and logs are showing
tail -f ceph-osd.0.log
2015-02-03 12:50:44.115354 7f0d0d1b7900 0 cls/hello/cls_hello.cc:271: loading cls_hello
2015-02-03 12:50:44.157671 7f0d0d1b7900 0 osd.0 4 crush map has features 1107558400, adjusting msgr requires for clients
2015-02-03 12:50:44.157682 7f0d0d1b7900 0 osd.0 4 crush map has features 1107558400 was 8705, adjusting msgr requires for mons
2015-02-03 12:50:44.157687 7f0d0d1b7900 0 osd.0 4 crush map has features 1107558400, adjusting msgr requires for osds
2015-02-03 12:50:44.157703 7f0d0d1b7900 0 osd.0 4 load_pgs
2015-02-03 12:50:44.201885 7f0d0d1b7900 0 osd.0 4 load_pgs opened 64 pgs
2015-02-03 12:50:44.212991 7f0d0d1b7900 -1 osd.0 4 set_disk_tp_priority(22) Invalid argument: osd_disk_thread_ioprio_class is but only the following values are allowed: idle, be or rt
2015-02-03 12:50:44.290354 7f0cfb587700 0 osd.0 4 ignoring osdmap until we have initialized
2015-02-03 12:50:44.290416 7f0cfb587700 0 osd.0 4 ignoring osdmap until we have initialized
2015-02-03 12:50:44.371616 7f0d0d1b7900 0 osd.0 4 done with init, starting boot process

Please suggest me.

Thanks,

hufman · 2015-02-04T16:14:52Z

Ah yes, you'll need at least 3 OSDs for Ceph to be happy and healthy. Depending on how your Crush map is configured, I forget the defaults, these OSDs will have to be on separate hosts.

zdubery · 2016-11-09T10:05:19Z

Hi

I am a bit confused by this statement. "you'll need at least 3 OSD's to be happy and healthy". I followed the instructions (here: http://docs.ceph.com/docs/hammer/start/quick-ceph-deploy/) and once I get to the command "ceph health", the response is: "health HEALTH_ERR 64 pgs incomplete; 64 pgs stuck inactive; 64 pgs stuck unclean". That is when I install it...

Ceph documentation clearly stated:
"Change the default number of replicas in the Ceph configuration file from 3 to 2 so that Ceph can achieve an active + clean state with just two Ceph OSDs. Add the following line under the [global] section:
osd pool default size = 2"

I have attempted this install at least 3 times now and the response is the same every time. I am running 1 admin node, 1 monitor and 2 osd's on 4 VirtualBox Ubuntu 14.04 LTS VM's within Ubuntu 16 (previous attempt was within Ubuntu 14).

The debug information is not very helpful at all. Ceph is also not writing to the /var/log/ceph/ location at all even after I set permissions
sudo chmod ceph:root /var/log/ceph

ceph-deploy osd activate tells me that the osd's are active but ceph osd tree shows otherwise. (down)

The config is read from /etc/ceph/cep.conf all the time (even though I install everything from my-cluster directory) which is incorrect. When I ran the install, the config was created in /home/user/my-cluster/ceph.conf yet it reads it from /etc/ceph/cep.conf.

So I will attempt 3 OSD's now even though the site states otherwise...

Any suggestions would be very helpful.

Thanks,

zd

sweetie233 · 2016-12-02T15:30:41Z

Hi, I just have the same problem as yours, and I have reinstalled Ceph for more than 3 times. I'm really upset. Have you figured it out? Expect your suggestions.

zdubery · 2016-12-03T08:47:40Z

Hi If you are using ext4 file system, you need to place this in config global section: filestore xattr use omap Restart and see if HEALTH_OK achieved. Cheers

…

On 02 Dec 2016 17:30, "LostSoul007" ***@***.***> wrote: Hi, I just have the same problem as yours, and I have reinstalled Ceph for more than 3 times. I'm really upset. Have you figured it out? Expect your suggestions. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#187 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AU-ifrx6HvfwyP4KiNKNeRr-TMsVlB-7ks5rEDmkgaJpZM4DaskH> .

sweetie233 · 2016-12-03T12:35:59Z

Hi

First, thank you so much for your suggestion!!!
My file system is ext4, and I just did the thing you suggested, but it seems to make no difference.

I reviewed the osd's log throughly and found the following words:
osd.0 0 backend (filestore) is unable to support max object name[space] len
osd.0 0 osd max object name len = 2048
osd.0 0 osd max object namespace len = 256
osd.0 0 (36) File name too long
journal close /var/lib/ceph/osd/ceph-0/journal
** ERROR: osd init failed: (36) File name too long

Then I found this page:
http://docs.ceph.com/docs/jewel/rados/configuration/filesystem-recommendations/

I just reinstalled Ceph again, and place the following words in config global
section:
osd_max_object_name_len = 256
osd_max_object_namespace_len = 64

It works!!! I'm so happy and I appreciate you reply very much!!!

Thanks again!
Best wishes~

zdubery · 2016-12-04T09:52:58Z

Hi You are welcome. I am glad you solved it. Best Wishes Zayne

…

On 03 Dec 2016 14:36, "LostSoul007" ***@***.***> wrote: Hi First, thank you so much for your suggestion!!! My file system is ext4, and I just did the thing you suggested, but it seems to make no difference. I reviewed the osd's log throughly and found the following words: osd.0 0 backend (filestore) is unable to support max object name[space] len osd.0 0 osd max object name len = 2048 osd.0 0 osd max object namespace len = 256 osd.0 0 (36) File name too long journal close /var/lib/ceph/osd/ceph-0/journal ** ERROR: osd init failed: (36) File name too long Then I found this page: http://docs.ceph.com/docs/jewel/rados/configuration/ filesystem-recommendations/ I just reinstalled Ceph again, and place the following words in config global section: osd_max_object_name_len = 256 osd_max_object_namespace_len = 64 It works!!! I'm so happy and I appreciate you reply very much!!! Thanks again! Best wishes~ — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#187 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AU-ifp3r6BBzkWacMmp_yBc3BFqxEr7Vks5rEWIxgaJpZM4DaskH> .

subhashchand · 2016-12-28T12:25:14Z

If you are using ext4 file system, you need to place this in config global section:

vim /etc/ceph/ceph.conf

osd_max_object_name_len = 256
osd_max_object_namespace_len = 64
http://docs.ceph.com/docs/jewel/rados/configuration/filesystem-recommendations/
#ceph status

getarz4u15ster · 2017-01-27T12:59:49Z

I'm having the same problem, however I am using the preferred xfs filesystem.. Any suggestions?

[From monitor node i get the following]
HEALTH_ERR 64 pgs are stuck inactive for more than 300 seconds; 64 pgs stuck inactive; no osds

[From OSD node]
2017-01-27 07:55:28.000882 7fde7846d700 0 -- :/429908835 >> ipaddress:6789/0 pipe(0x7fde74063f30 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7fde7405c5a0).fault

[From Monitor node out of /var/log/ceph/ceph.log]
2017-01-27 06:47:11.121804 mon.0 ipaddress:6789/0 1 : cluster [INF] mon.oso-node1@0 won leader election with quorum 0
2017-01-27 06:47:11.121931 mon.0ipaddress:6789/0 2 : cluster [INF] monmap e1: 1 mons at {oso-node1=ipaddress:6789/0}
2017-01-27 06:47:11.122008 mon.0 ipaddress:6789/0 3 : cluster [INF] pgmap v2: 64 pgs: 64 creating; 0 bytes data, 0 kB used, 0 kB / 0 kB avail
2017-01-27 06:47:11.122090 mon.0 ipaddress:6789/0 4 : cluster [INF] fsmap e1:
2017-01-27 06:47:11.122203 mon.0 ipaddress:6789/0 5 : cluster [INF] osdmap e1: 0 osds: 0 up, 0 in
2017-01-27 06:54:50.687322 mon.0 ipaddress:6789/0 1 : cluster [INF] mon.oso-node1@0 won leader election with quorum 0
2017-01-27 06:54:50.687415 mon.0 ipaddress:6789/0 2 : cluster [INF] monmap e1: 1 mons at {oso-node1=ipaddress:6789/0}
2017-01-27 06:54:50.687497 mon.0 ipaddress:6789/0 3 : cluster [INF] pgmap v2: 64 pgs: 64 creating; 0 bytes data, 0 kB used, 0 kB / 0 kB avail
2017-01-27 06:54:50.687577 mon.0 ipaddress:6789/0 4 : cluster [INF] fsmap e1:
2017-01-27 06:54:50.687716 mon.0 ipaddress:6789/0 5 : cluster [INF] osdmap e1: 0 osds: 0 up, 0 in

swq499809608 · 2017-03-02T02:59:00Z

f_redirected e754) currently waiting for peered
2017-03-02 10:58:39.952422 osd.25 [WRN] 100 slow requests, 1 included below; oldest blocked for > 324.251003 secs
2017-03-02 10:58:39.952444 osd.25 [WRN] slow request 240.250943 seconds old, received at 2017-03-02 10:54:39.701431: osd_op(client.512724.0:135407 97.84ada7c9 (undecoded) ondisk+write+known_if_redirected e754) currently waiting for peered
2017-03-02 10:58:40.091373 osd.27 [WRN] 100 slow requests, 1 included below; oldest blocked for > 324.389960 secs
2017-03-02 10:58:40.091378 osd.27 [WRN] slow request 240.389941 seconds old, received at 2017-03-02 10:54:39.701397: osd_op(client.512724.0:135408 97.31099063 (undecoded) ondisk+write+known_if_redirected e754) currently waiting for peered
2017-03-02 10:58:40.952740 osd.25 [WRN] 100 slow requests, 1 included below; oldest blocked for > 325.251301 secs
2017-03-02 10:58:40.952791 osd.25 [WRN] slow request 240.243998 seconds old, received at 2017-03-02 10:54:40.708674: osd_op(client.36294.0:8895939 97.84ada7c9 (undecoded) ondisk+write+known_if_redirected e754) currently waiting for peered
2017-03-02 10:58:41.091613 osd.27 [WRN] 100 slow requests, 1 included below; oldest blocked for > 325.390198 secs
2017-03-02 10:58:41.091619 osd.27 [WRN] slow request 240.382847 seconds old, received at 2017-03-02 10:54:40.708729: osd_op(client.36294.0:8895940 97.31099063 (undecoded) ondisk+write+known_if_redirected e754) currently waiting for peered
2017-03-02 10:58:43.953496 osd.25 [WRN] 100 slow requests, 1 included below; oldest blocked for > 328.252086 secs
2017-03-02 10:58:43.953517 osd.25 [WRN] slow request 240.022847 seconds old, received at 2017-03-02 10:54:43.930609: osd_op(client.36291.0:8893352 97.84ada7c9 (undecoded) ondisk+write+known_if_redirected e754) currently waiting for peered
2017-03-02 10:58:44.092310 osd.27 [WRN] 100 slow requests, 1 included below; oldest blocked for > 328.390885 secs
2017-03-02 10:58:44.092315 osd.27 [WRN] slow request 240.161657 seconds old, received at 2017-03-02 10:54:43.930605: osd_op(client.36291.0:8893353 97.31099063 (undecoded) ondisk+write+known_if_redirected e754) currently waiting for peered
2017-03-02 10:58:44.953818 osd.25 [WRN] 100 slow requests, 1 included below; oldest blocked for > 329.252386 secs
2017-03-02 10:58:44.953827 osd.25 [WRN] slow request 240.251734 seconds old, received at 2017-03-02 10:54:44.702023: osd_op(client.512724.0:135415 97.84ada7c9 (undecoded) ondisk+write+known_if_redirected e754) currently waiting for peered
2017-03-02 10:58:45.092587 osd.27 [WRN] 100 slow requests, 1 included below; oldest blocked for > 329.391155 secs
2017-03-02 10:58:45.092597 osd.27 [WRN] slow request 240.390484 seconds old, received at 2017-03-02 10:54:44.702049: osd_op(client.512724.0:135416 97.31099063 (undecoded) ondisk+write+known_if_redirected e754) currently waiting for peered
2017-03-02 10:58:45.954085 osd.25 [WRN] 100 slow requests, 1 included below; oldest blocked for > 330.252673 secs
2017-03-02 10:58:45.954103 osd.25 [WRN] slow request 240.244915 seconds old, received at 2017-03-02 10:54:45.709129: osd_op(client.36294.0:8895947 97.84ada7c9 (undecoded) ondisk+write+known_if_redirected e754) currently waiting for peered
2017-03-02 10:58:46.092838 osd.27 [WRN] 100 slow requests, 1 included below; oldest blocked for > 330.391422 secs
2017-03-02 10:58:46.092850 osd.27 [WRN] slow request 240.383640 seconds old, received at 2017-03-02 10:54:45.709160: osd_op(client.36294.0:8895948 97.31099063 (undecoded) ondisk+write+known_if_redirected e754) currently waiting for peered

ghost · 2017-07-26T14:49:04Z

After adding the following lines to /etc/ceph/ceph.conf file and reboot the system. Somehow, the issue still exists.

osd_max_object_name_len = 256
osd_max_object_namespace_len = 64

ceph status

cluster b3609cba-0b6d-4311-8aa3-6968c0e66f5e
 health HEALTH_WARN
        64 pgs degraded
        64 pgs stuck degraded
        64 pgs stuck unclean
        64 pgs stuck undersized
        64 pgs undersized
 monmap e1: 1 mons at {0=10.11.108.188:6789/0}
        election epoch 3, quorum 0 0
 osdmap e15: 2 osds: 2 up, 2 in
        flags sortbitwise,require_jewel_osds
  pgmap v36: 64 pgs, 1 pools, 0 bytes data, 0 objects
        69172 kB used, 3338 GB / 3338 GB avail
              _64 active+undersized+degraded

mosyang · 2017-07-28T09:11:18Z

I met those ext4 file system issue before. I tried below settings in ceph.conf but finally gave up.

osd_max_object_name_len = 256
osd_max_object_namespace_len = 64
osd check max object name len on startup = false

However, I follow this helpful document to deploy Ceph Jewel 10.2.9 on Ubuntu 16.04. Login to all OSD nodes and format the /dev/sdb partition with XFS file system. After that, I follow official document to deploy ceph on my ubuntu 16.04 servers. Everything works fine now.

Runomu · 2017-08-01T23:37:31Z

i have exactly same Problem with 14.04 LTS ext4. I tried almost everything and all suggestions above. But i'm still getting following on celp -s and next one on celp osd tree

health HEALTH_ERR
64 pgs are stuck inactive for more than 300 seconds
64 pgs stuck inactive
64 pgs stuck unclean

ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 0 root default
0 0 osd.0 down 0 1.00000

mattshma · 2017-09-08T03:07:41Z

After appended those lines into admin_node's ceph.conf:

osd max object name len = 256
osd max object namespace len = 64

then I think you should run ceph-deploy --overwrite-conf admin osd1 osd2 to deploy the changes to osd nodes. And you should make sure the user ceph has r permission of /etc/ceph/ceph.client.admin.keyring in the osd nodes.

alamintech · 2020-09-06T12:48:20Z

When my server reboot and then see error osds down and pgs inactive.
Please help me. How can I solve this. This storage using for cloudstack primary storage.

Thanks.

alamintech · 2020-09-07T08:21:08Z

Please help me anyone.

zdover23 · 2020-09-07T08:29:22Z

Does this look like your error? https://tracker.ceph.com/issues/17722

…

On Mon, Sep 7, 2020 at 6:21 PM alamintech ***@***.***> wrote: Please help me anyone. [image: image] <https://user-images.githubusercontent.com/68062764/92364757-5a4fae00-f115-11ea-90ee-a61246a87297.png> — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#187 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AALAZ46AN3QNEZNX6FWV6ZDSESJYRANCNFSM4A3KZEDQ> .

alamintech · 2020-09-07T11:29:57Z

See but can't find solution for this

alamintech · 2020-09-12T17:55:54Z

After server reboot can't start osd service. Please help me any one.

mattshma mentioned this issue Sep 8, 2017

health HEALTH_WARN 64 pgs incomplete; 64 pgs stuck inactive; 64 pgs stuck unclean mattshma/bigdata#95

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

health HEALTH_WARN 64 pgs incomplete; 64 pgs stuck inactive; 64 pgs stuck unclean #187

health HEALTH_WARN 64 pgs incomplete; 64 pgs stuck inactive; 64 pgs stuck unclean #187

abhishek6590 commented Feb 3, 2015

hufman commented Feb 3, 2015

abhishek6590 commented Feb 3, 2015

hufman commented Feb 4, 2015

zdubery commented Nov 9, 2016

sweetie233 commented Dec 2, 2016

zdubery commented Dec 3, 2016 via email

sweetie233 commented Dec 3, 2016

zdubery commented Dec 4, 2016 via email

subhashchand commented Dec 28, 2016

getarz4u15ster commented Jan 27, 2017 •

edited

Loading

swq499809608 commented Mar 2, 2017

ghost commented Jul 26, 2017

mosyang commented Jul 28, 2017 •

edited

Loading

Runomu commented Aug 1, 2017 •

edited

Loading

mattshma commented Sep 8, 2017 •

edited

Loading

alamintech commented Sep 6, 2020

alamintech commented Sep 7, 2020

zdover23 commented Sep 7, 2020 via email

alamintech commented Sep 7, 2020

alamintech commented Sep 12, 2020

health HEALTH_WARN 64 pgs incomplete; 64 pgs stuck inactive; 64 pgs stuck unclean #187

health HEALTH_WARN 64 pgs incomplete; 64 pgs stuck inactive; 64 pgs stuck unclean #187

Comments

abhishek6590 commented Feb 3, 2015

hufman commented Feb 3, 2015

abhishek6590 commented Feb 3, 2015

id weight type name up/down reweight

hufman commented Feb 4, 2015

zdubery commented Nov 9, 2016

sweetie233 commented Dec 2, 2016

zdubery commented Dec 3, 2016 via email

sweetie233 commented Dec 3, 2016

zdubery commented Dec 4, 2016 via email

subhashchand commented Dec 28, 2016

vim /etc/ceph/ceph.conf

getarz4u15ster commented Jan 27, 2017 • edited Loading

swq499809608 commented Mar 2, 2017

ghost commented Jul 26, 2017

ceph status

mosyang commented Jul 28, 2017 • edited Loading

Runomu commented Aug 1, 2017 • edited Loading

mattshma commented Sep 8, 2017 • edited Loading

alamintech commented Sep 6, 2020

alamintech commented Sep 7, 2020

zdover23 commented Sep 7, 2020 via email

alamintech commented Sep 7, 2020

alamintech commented Sep 12, 2020

getarz4u15ster commented Jan 27, 2017 •

edited

Loading

mosyang commented Jul 28, 2017 •

edited

Loading

Runomu commented Aug 1, 2017 •

edited

Loading

mattshma commented Sep 8, 2017 •

edited

Loading