Skip to content
This repository has been archived by the owner on Sep 22, 2022. It is now read-only.

rare test019-syncrepl-cascade failure on a busy machine #102

Closed
erthink opened this issue Aug 15, 2016 · 3 comments
Closed

rare test019-syncrepl-cascade failure on a busy machine #102

erthink opened this issue Aug 15, 2016 · 3 comments

Comments

@erthink
Copy link
Owner

erthink commented Aug 15, 2016

running defines.sh
Starting master slapd on TCP/IP port 17901...
Using ldapsearch to check that master slapd is running (port 17901)...
Waiting 1 seconds for master slapd to start...
Waiting 0.5 seconds for master slapd to start...
Waiting 1 seconds for master slapd to start...
Using ldapadd to create the context prefix entry in the master...
Starting R1-slave slapd on TCP/IP port 17902...
Using ldapsearch to check that R1-slave slapd is running (port 17902)...
Waiting 1 seconds for R1-slave slapd to start...
Waiting 0.5 seconds for R1-slave slapd to start...
Waiting 1 seconds for R1-slave slapd to start...
Starting R2-slave slapd on TCP/IP port 17903...
Using ldapsearch to check that R2-slave slapd is running (port 17903)...
Waiting 1 seconds for R2-slave slapd to start...
Waiting 0.5 seconds for R2-slave slapd to start...
Waiting 1 seconds for R2-slave slapd to start...
Starting P1-slave slapd on TCP/IP port 17904...
Using ldapsearch to check that P1-slave slapd is running (port 17904)...
Waiting 1 seconds for P1-slave slapd to start...
Waiting 0.5 seconds for P1-slave slapd to start...
Starting P2 slave slapd on TCP/IP port 17905...
Using ldapsearch to check that P2-slave slapd is running (port 17905)...
Waiting 1 seconds for P2-slave slapd to start...
Waiting 0.5 seconds for P2-slave slapd to start...
Starting P3 slave slapd on TCP/IP port 17906...
Using ldapsearch to check that P3-slave slapd is running (port 17906)...
Waiting 1 seconds for P3-slave slapd to start...
Using ldapadd to populate the master directory...
Waiting while syncrepl replicates a changes (between 17901 and 17902)... Done in 1 seconds
Waiting while syncrepl replicates a changes (between 17901 and 17903)... Done in 2 seconds
Waiting while syncrepl replicates a changes (between 17901 and 17904)... Done in 1 seconds
Waiting while syncrepl replicates a changes (between 17901 and 17905)... Done in 1 seconds
Waiting while syncrepl replicates a changes (between 17901 and 17906)... Done in 1 seconds
Using ldapmodify to modify master directory...
Waiting while syncrepl replicates a changes (between 17901 and 17902)... Done in 2 seconds
Waiting while syncrepl replicates a changes (between 17901 and 17903)... Done in 3 seconds
Waiting while syncrepl replicates a changes (between 17901 and 17904)... Done in 1 seconds
Waiting while syncrepl replicates a changes (between 17901 and 17905)... Done in 1 seconds
Waiting while syncrepl replicates a changes (between 17901 and 17906)... Done in 1 seconds
Performing modify alone on provider...
Waiting while syncrepl replicates a changes (between 17901 and 17902)... Done in 1 seconds
Waiting while syncrepl replicates a changes (between 17901 and 17903)... Done in 2 seconds
Waiting while syncrepl replicates a changes (between 17901 and 17904)... Done in 1 seconds
Waiting while syncrepl replicates a changes (between 17901 and 17905)... Done in 1 seconds
Waiting while syncrepl replicates a changes (between 17901 and 17906)... Done in 1 seconds
Using ldapsearch to read all the entries from the master...
Using ldapsearch to read all the entries from the R1 slave...
Using ldapsearch to read all the entries from the R2 slave...
Using ldapsearch to read all the entries from the P1 slave...
Using ldapsearch to read all the entries from the P2 slave...
Using ldapsearch to read all the entries from the P3 slave...
>>>>> waiting for things (23828 24064 24177 24412 24507 24673) to exit... done
Filtering master ldapsearch results...
Filtering R1 slave ldapsearch results...
Filtering R2 slave ldapsearch results...
Filtering P1 slave ldapsearch results...
Filtering P2 slave ldapsearch results...
Filtering P3 slave ldapsearch results...
Comparing retrieved entries from master and R1 slave...
Comparing retrieved entries from master and R2 slave...
test failed - master and R2 slave databases differ

diff server1.flt server3.flt

112,128d111
< dn: cn=Gern Jensen,ou=Information Technology Division,ou=People,dc=example,dc=com
< objectClass: OpenLDAPperson
< cn: Gern Jensen
< sn: Jensen
< uid: gjensen
< title: Chief Investigator, ITD
< postalAddress: ITD $ 535 W. William St $ Ann Arbor, MI 48103
< seeAlso: cn=All Staff,ou=Groups,dc=example,dc=com
< drink: Coffee
< homePostalAddress: 844 Brown St. Apt. 4 $ Ann Arbor, MI 48104
< description: Very odd
< facsimileTelephoneNumber: +1 313 555 7557
< telephoneNumber: +1 313 555 8343
< mail: [email protected]
< homePhone: +1 313 555 8844
< entryCSN: 20160814160805.787321Z#000000#000#000000
== 2f4cd285-9945-4fa5-a316-9541e136ea10

272,283d254
< 
< dn: ou=Retired,ou=People,dc=example,dc=com
< objectClass: organizationalUnit
< ou: Retired
< entryCSN: 20160814160805.792539Z#000000#000#000000
== b2928418-dd66-4c64-bfdd-0856522cb6a8

< dn: cn=Rosco P. Coltrane,ou=Retired,ou=People,dc=example,dc=com
< objectClass: OpenLDAPperson
< sn: Coltrane
< uid: rosco
< cn: Rosco P. Coltrane
< entryCSN: 20160814160805.796549Z#000000#000#000000
== fc4c9f2d-c43d-4dc0-8407-412334baba96

slapd.1.log

...
160814-16:08:05.780811_24042 syncprov_op_response: MODIFY entry cn=all staff,ou=groups,dc=example,dc=com, (null), (null)
160814-16:08:05.780819_24042 syncprov_op_response: cookie-forward, opc->sctxcsn => 20160814160805.780515Z#000000#000#000000, n-dirty was 1
160814-16:08:05.780901_24042 syncprov_matchops: sid 000 fscope 1 rc 6
160814-16:08:05.780914_24042 syncprov_matchops: sid 000 fscope 1 rc 6
160814-16:08:05.780922_24042 send_ldap_response: msgid=5 tag=103 err=0
160814-16:08:05.780956_24042 conn=1017 op=4 RESULT tag=103 err=0 text=
160814-16:08:05.780961_24042 slap_graduate_commit_csn: removing 0x60800006d840 (conn 1017, opid 4) 20160814160805.780515Z#000000#000#000000, op 0x6190000037a0
160814-16:08:05.780998_24042 syncprov_send_resp: LDAP_SYNC_MODIFY, cn=all staff,ou=groups,dc=example,dc=com, to=000, a59f9337-5a14-46f8-b815-0a6313294bb4, 20160814160805.780515Z#000000#000#000000
160814-16:08:05.781003_24042 syncprov_state_ctrl: rid=000, LDAP_SYNC_MODIFY, (null), a59f9337-5a14-46f8-b815-0a6313294bb4, 20160814160805.780515Z#000000#000#000000
160814-16:08:05.781022_24042 => send_search_entry: conn 1006 dn="cn=All Staff,ou=Groups,dc=example,dc=com"
160814-16:08:05.781063_24042 <= send_search_entry: conn 1006 exit.
160814-16:08:05.781083_24042 syncprov_send_resp: LDAP_SYNC_MODIFY, cn=all staff,ou=groups,dc=example,dc=com, to=000, a59f9337-5a14-46f8-b815-0a6313294bb4, 20160814160805.780515Z#000000#000#000000
160814-16:08:05.781088_24042 syncprov_state_ctrl: rid=000, LDAP_SYNC_MODIFY, (null), a59f9337-5a14-46f8-b815-0a6313294bb4, 20160814160805.780515Z#000000#000#000000
160814-16:08:05.781118_24042 => send_search_entry: conn 1004 dn="cn=All Staff,ou=Groups,dc=example,dc=com"
160814-16:08:05.781165_24042 <= send_search_entry: conn 1004 exit.
...
160814-16:08:05.787041_24044 conn=1017 op=5 ADD dn="cn=Gern Jensen,ou=Information Technology Division,ou=People,dc=example,dc=com"
160814-16:08:05.787326_24044 slap_queue_csn: tail-queueing 0x608000090d40 (conn 1017, opid 5) 20160814160805.787321Z#000000#000#000000, op 0x6190000082a0
...
160814-16:08:05.787752_24044 syncprov_op_response: ADD entry cn=gern jensen,ou=information technology division,ou=people,dc=example,dc=com, 20160814160805.787321Z#000000#000#000000, 2f4cd285-9945-4fa5-a31
160814-16:08:05.787761_24044 syncprov_op_response: cookie-forward, opc->sctxcsn => 20160814160805.787321Z#000000#000#000000, n-dirty was 1
...
160814-16:08:05.792460_24043 conn=1017 op=6 ADD dn="ou=Retired,ou=People,dc=example,dc=com"
160814-16:08:05.792545_24043 slap_queue_csn: tail-queueing 0x60800008f1c0 (conn 1017, opid 6) 20160814160805.792539Z#000000#000#000000, op 0x61900006c2a0
160814-16:08:05.792804_24043 syncprov_op_response: ADD entry ou=retired,ou=people,dc=example,dc=com, 20160814160805.792539Z#000000#000#000000, b2928418-dd66-4c64-bfdd-0856522cb6a8
160814-16:08:05.792813_24043 syncprov_op_response: cookie-forward, opc->sctxcsn => 20160814160805.792539Z#000000#000#000000, n-dirty was 1
...
160814-16:08:05.793811_24042 conn=1017 op=7 do_add: dn (cn=Rosco P. Coltrane, ou=Information Technology Division, ou=People, dc=example,dc=com)
160814-16:08:05.793969_24042 slap_queue_csn: tail-queueing 0x60800006cec0 (conn 1017, opid 7) 20160814160805.793963Z#000000#000#000000, op 0x6190000037a0
160814-16:08:05.794363_24042 syncprov_op_response: ADD entry cn=rosco p. coltrane,ou=information technology division,ou=people,dc=example,dc=com, 20160814160805.793963Z#000000#000#000000, fc4c9f2d-c43d-4d
160814-16:08:05.794371_24042 syncprov_op_response: cookie-forward, opc->sctxcsn => 20160814160805.793963Z#000000#000#000000, n-dirty was 1
...
160814-16:08:05.796303_24040 conn=1017 op=8 do_modrdn
160814-16:08:05.796311_24040 do_modrdn: dn (cn=Rosco P. Coltrane, ou=Information Technology Division, ou=People, dc=example,dc=com) newrdn (cn=Rosco P. Coltrane) newsuperior (ou=Retired, ou=People, dc=exa
160814-16:08:05.796555_24040 slap_queue_csn: tail-queueing 0x6080000962c0 (conn 1017, opid 8) 20160814160805.796549Z#000000#000#000000, op 0x61900005e6a0
160814-16:08:05.797024_24040 syncprov_op_response: RENAME entry (null), (null), (null)
160814-16:08:05.797032_24040 syncprov_op_response: cookie-forward, opc->sctxcsn => 20160814160805.796549Z#000000#000#000000, n-dirty was 1
160814-16:08:05.797152_24040 slap_graduate_commit_csn: removing 0x6080000962c0 (conn 1017, opid 8) 20160814160805.796549Z#000000#000#000000, op 0x61900005e6a0
...
...
160814-16:08:07.639761_24044 syncprov-search: rid=001, sid=ffffffff, hint -1, srs 0x7f2eae68adb0, sop (nil)
160814-16:08:07.639764_24044 syncprov-search: cookie: provider[rid=0, sid=0]#1 | consumer[rid=1, sid=-1]#1
160814-16:08:07.639767_24044 syncprov-search: cookie: ~> provider 20160814160805.813230Z#000000#000#000000 > consumer 20160814160757.895461Z#000000#000#000000
160814-16:08:07.639771_24044 syncprov_op_search: sid ffffffff, provider's state is newer, pivot 20160814160757.895461Z#000000#000#000000

160814-16:08:07.640332_24044 syncprov_sendinfo: LDAP_TAG_SYNC_ID_SET, no-cookie, syncUUIDs=#23, refreshDeletes=0
160814-16:08:07.640339_24044 send_ldap_intermediate: err=0 oid=1.3.6.1.4.1.4203.1.9.1.4 len=422
160814-16:08:07.640342_24044 send_ldap_response: msgid=2 tag=121 err=0
160814-16:08:07.640370_24044 syncprov-findpresent: sid -1, pivot 20160814160757.895461Z#000000#000#000000, find-present 23, rc 0
160814-16:08:07.640375_24044 syncprov-findpresent:: provider[rid=0, sid=0]#1 | consumer[rid=1, sid=-1]#1
160814-16:08:07.640378_24044 syncprov-findpresent:: ~> provider 20160814160805.813230Z#000000#000#000000 > consumer 20160814160757.895461Z#000000#000#000000

160814-16:08:07.640527_24044 syncprov_state_ctrl: rid=000, LDAP_SYNC_ADD, cn=all staff,ou=groups,dc=example,dc=com, a59f9337-5a14-46f8-b815-0a6313294bb4, no-CSN
160814-16:08:07.640640_24044 syncprov_state_ctrl: rid=000, LDAP_SYNC_ADD, cn=bjorn jensen,ou=information technology division,ou=people,dc=example,dc=com, c1d2e24c-f41c-45ce-b87b-6073f91da76f, no-CSN
160814-16:08:07.640732_24044 syncprov_state_ctrl: rid=000, LDAP_SYNC_ADD, cn=itd staff,ou=groups,dc=example,dc=com, b23d5193-1511-4830-9f19-80a7c27c1adf, no-CSN
160814-16:08:07.640807_24044 syncprov_state_ctrl: rid=000, LDAP_SYNC_ADD, cn=james a jones 1,ou=alumni association,ou=people,dc=example,dc=com, 84e67eac-192d-44dc-b44a-aeb666a5f9ed, no-CSN
160814-16:08:07.640938_24044 syncprov_state_ctrl: rid=000, LDAP_SYNC_ADD, dc=itsdomain1,dc=example,dc=com, 2948357a-a2a5-49e1-ad6c-355683007421, no-CSN
160814-16:08:07.641005_24044 syncprov_state_ctrl: rid=000, LDAP_SYNC_ADD, dc=itsdomain2,dc=example,dc=com, 3c519a93-a7ea-44df-be15-1a828c4bac8d, no-CSN

160814-16:08:07.641067_24044 syncprov_state_ctrl: rid=000, LDAP_SYNC_ADD, cn=gern jensen,ou=information technology division,ou=people,dc=example,dc=com, 2f4cd285-9945-4fa5-a316-9541e136ea10, no-CSN
160814-16:08:07.641141_24044 syncprov_state_ctrl: rid=000, LDAP_SYNC_ADD, ou=retired,ou=people,dc=example,dc=com, b2928418-dd66-4c64-bfdd-0856522cb6a8, no-CSN
160814-16:08:07.641204_24044 syncprov_state_ctrl: rid=000, LDAP_SYNC_ADD, cn=rosco p. coltrane,ou=retired,ou=people,dc=example,dc=com, fc4c9f2d-c43d-4dc0-8407-412334baba96, no-CSN
160814-16:08:07.641258_24044 syncprov-response: cookie=rid=001,sid=000,csn=20160814160805.813230Z#000000#000#000000
160814-16:08:07.641261_24044 syncprov_done_ctrl: rid=001,sid=000,csn=20160814160805.813230Z#000000#000#000000, refreshDeletes=0

slapd.2.log

...
160814-16:08:07.653878_24174 syncrepl_pull_contextCSN: reloaded, age => 3
160814-16:08:07.653886_24174 cookie_merge: forward 20160814160757.895461Z#000000#000#000000 => 20160814160805.811469Z#000000#000#000000
...
160814-16:08:07.656300_24128 conn=1008 op=1 SRCH base="dc=example,dc=com" scope=2 deref=0 filter="(objectClass=*)"
160814-16:08:07.656302_24128 conn=1008 op=1 SRCH attr=* structuralObjectClass entryCSN
160814-16:08:07.656313_24128 syncprov-search: rid=001, sid=ffffffff, hint -1, srs 0x7f7237eecdd8, sop (nil)
160814-16:08:07.656316_24128 syncprov-search: cookie: provider[rid=0, sid=0]#1 | consumer[rid=1, sid=-1]#1
160814-16:08:07.656319_24128 syncprov-search: cookie: ~> provider 20160814160805.811469Z#000000#000#000000 > consumer 20160814160757.895461Z#000000#000#000000
160814-16:08:07.656324_24128 syncprov_op_search: sid ffffffff, provider's state is newer, pivot 20160814160757.895461Z#000000#000#000000

160814-16:08:07.656328_24128 srs csn 20160814160757.895461Z#000000#000#000000
160814-16:08:07.656395_24128 log csn 20160814160805.742861Z#000000#000#000000
160814-16:08:07.656397_24128 log csn 20160814160805.755768Z#000000#000#000000
160814-16:08:07.656399_24128 log csn 20160814160805.762433Z#000000#000#000000
160814-16:08:07.656401_24128 log csn 20160814160805.780515Z#000000#000#000000
160814-16:08:07.656402_24128 log csn 20160814160805.811469Z#000000#000#000000
160814-16:08:07.656404_24128 log csn 20160814160805.811469Z#000000#000#000000

160814-16:08:07.658178_24128 syncprov_state_ctrl: rid=000, LDAP_SYNC_ADD, cn=all staff,ou=groups,dc=example,dc=com, a59f9337-5a14-46f8-b815-0a6313294bb4, no-CSN
160814-16:08:07.660695_24128 syncprov_state_ctrl: rid=000, LDAP_SYNC_ADD, cn=bjorn jensen,ou=information technology division,ou=people,dc=example,dc=com, c1d2e24c-f41c-45ce-b87b-6073f91da76f, no-CSN
160814-16:08:07.660818_24128 syncprov_state_ctrl: rid=000, LDAP_SYNC_ADD, cn=itd staff,ou=groups,dc=example,dc=com, b23d5193-1511-4830-9f19-80a7c27c1adf, no-CSN
160814-16:08:07.660868_24128 syncprov_state_ctrl: rid=000, LDAP_SYNC_ADD, cn=james a jones 1,ou=alumni association,ou=people,dc=example,dc=com, 84e67eac-192d-44dc-b44a-aeb666a5f9ed, no-CSN
160814-16:08:07.661000_24128 syncprov_state_ctrl: rid=000, LDAP_SYNC_ADD, dc=itsdomain1,dc=example,dc=com, 2948357a-a2a5-49e1-ad6c-355683007421, no-CSN

160814-16:08:07.661043_24128 syncprov-response: ffffffff, Entry dc=testdomain2,dc=example,dc=com, 20160814160757.895461Z#000000#000#000000 <= consumer 20160814160757.895461Z#000000#, skip

160814-16:08:07.661056_24128 syncprov-response: cookie=rid=001,sid=000,csn=20160814160805.811469Z#000000#000#000000
160814-16:08:07.661059_24128 syncprov_done_ctrl: rid=001,sid=000,csn=20160814160805.811469Z#000000#000#000000, refreshDeletes=1
160814-16:08:07.661066_24128 send_ldap_response: msgid=2 tag=101 err=0
160814-16:08:07.661082_24128 conn=1008 op=1 SEARCH RESULT tag=101 err=0 nentries=5 text=
...
160814-16:08:07.673793_24174 syncrepl_entry: rid=001 LDAP_SYNC_ADD, (null), uuid 2f4cd285-9945-4fa5-a316-9541e136ea10,
        incoming-csn 20160814160805.787321Z#000000#000#000000, present-csn (null)
160814-16:08:07.674190_24174 syncprov_op_response: ADD entry cn=gern jensen,ou=information technology division,ou=people,dc=example,dc=com, 20160814160805.787321Z#000000#000#000000, 2f4cd285-9945-4fa5-a31
160814-16:08:07.674195_24174 syncprov_op_response: cookie-kept
...

slapd.3.log

...
160814-16:08:07.656026_24261 syncrepl_quorum: notify-status dc=example,dc=com, rid 1/0x6180000364a0, sid 0, ready->dirty
160814-16:08:07.656032_24261 syncrepl_gate: dc=example,dc=com dirty
160814-16:08:07.656039_24261 quorum_syncrepl_gate: dc=example,dc=com, limit 0, running 1, 0x6180000364a0 in, rc 0
160814-16:08:07.656042_24261 syncrepl_quorum: notify-status dc=example,dc=com, rid 1/0x6180000364a0, sid 0, dirty->refresh
160814-16:08:07.656046_24261 syncrepl: refresh-begin rid=001
160814-16:08:07.656054_24261 refresh-begin-cookie 0x618000036658: rid=1, sid=-1
160814-16:08:07.656057_24261 refresh-begin-cookie: 0) sid=0 20160814160757.895461Z#000000#000#000000
160814-16:08:07.656065_24261 syncrepl_process_search: rid=001, type 1, hint 1, cookie rid=001,csn=20160814160757.895461Z#000000#000#000000
...
160814-16:08:07.660005_24261 syncrepl_entry: rid=001 LDAP_SYNC_ADD, cn=All Staff,ou=Groups,dc=example,dc=com, uuid a59f9337-5a14-46f8-b815-0a6313294bb4,
        incoming-csn 20160814160805.780515Z#000000#000#000000, present-csn 20160814160757.861437Z#000000#000#000000
160814-16:08:07.660011_24261 mdb_modify: cn=All Staff,ou=Groups,dc=example,dc=com
160814-16:08:07.663130_24261 syncrepl_entry: rid=001 LDAP_SYNC_ADD, cn=Bjorn Jensen,ou=Information Technology Division,ou=People,dc=example,dc=com, uuid c1d2e24c-f41c-45ce-b87b-6073f91da76f,
        incoming-csn 20160814160805.755768Z#000000#000#000000, present-csn 20160814160757.869900Z#000000#000#000000
160814-16:08:07.665986_24261 syncrepl_entry: rid=001 LDAP_SYNC_ADD, cn=ITD Staff,ou=Groups,dc=example,dc=com, uuid b23d5193-1511-4830-9f19-80a7c27c1adf,
        incoming-csn 20160814160805.762433Z#000000#000#000000, present-csn 20160814160757.873837Z#000000#000#000000
160814-16:08:07.667483_24261 syncrepl_entry: rid=001 LDAP_SYNC_ADD, cn=James A Jones 1,ou=Alumni Association,ou=People,dc=example,dc=com, uuid 84e67eac-192d-44dc-b44a-aeb666a5f9ed,
        incoming-csn 20160814160805.742861Z#000000#000#000000, present-csn 20160814160757.874941Z#000000#000#000000

160814-16:08:07.668897_24261 syncrepl_entry: rid=001 LDAP_SYNC_ADD, dc=testdomain1,dc=example,dc=com, uuid 2948357a-a2a5-49e1-ad6c-355683007421,
        incoming-csn 20160814160805.811469Z#000000#000#000000, present-csn 20160814160757.885476Z#000000#000#000000
160814-16:08:07.668921_24261 160814-16:08:07.668921_24261 ==>mdb_modrdn(dc=testdomain1,dc=example,dc=com,dc=itsdomain1,NULL)
160814-16:08:07.668929_24261 mdb_dn2entry("dc=example,dc=com")
160814-16:08:07.668931_24261 => mdb_dn2id("dc=example,dc=com")
160814-16:08:07.668936_24261 <= mdb_dn2id: got id=0x1
160814-16:08:07.668939_24261 => mdb_entry_decode:
160814-16:08:07.668942_24261 <= mdb_entry_decode
160814-16:08:07.668954_24261 mdb_modrdn: wr to children of entry dc=example,dc=com OK
160814-16:08:07.668956_24261 mdb_modrdn: parent dn=dc=example,dc=com
160814-16:08:07.668958_24261 mdb_dn2entry("dc=testdomain1,dc=example,dc=com")
160814-16:08:07.668959_24261 => mdb_dn2id("dc=testdomain1,dc=example,dc=com")
160814-16:08:07.668972_24261 <= mdb_dn2id: got id=0x14
160814-16:08:07.668977_24261 => mdb_entry_decode:
160814-16:08:07.668980_24261 <= mdb_entry_decode
160814-16:08:07.668984_24261 >>> dnNormalize: <dc=itsdomain1,dc=example,dc=com>
160814-16:08:07.668987_24261 => ldap_bv2dn(dc=itsdomain1,dc=example,dc=com,0)
160814-16:08:07.668998_24261 <= ldap_bv2dn(dc=itsdomain1,dc=example,dc=com)=0
160814-16:08:07.669009_24261 => ldap_dn2bv(272)
160814-16:08:07.669014_24261 <= ldap_dn2bv(dc=itsdomain1,dc=example,dc=com)=0
160814-16:08:07.669019_24261 <<< dnNormalize: <dc=itsdomain1,dc=example,dc=com>
160814-16:08:07.669023_24261 mdb_modrdn: new ndn=dc=itsdomain1,dc=example,dc=com
160814-16:08:07.669029_24261 => mdb_dn2id("dc=itsdomain1,dc=example,dc=com")
160814-16:08:07.669041_24261 <= mdb_dn2id: get failed: MDB_NOTFOUND: No matching key/data pair found (-30798)
160814-16:08:07.669045_24261 => mdb_dn2id_delete 0x14
160814-16:08:07.669067_24261 <= mdb_dn2id_delete 0x14: 0
160814-16:08:07.669074_24261 => mdb_dn2id_add 0x14: "dc=itsdomain1,dc=example,dc=com"
160814-16:08:07.669084_24261 <= mdb_dn2id_add 0x14: 0
160814-16:08:07.669087_24261 mdb_modify_internal: 0x00000014: dc=itsdomain1,dc=example,dc=com
160814-16:08:07.669111_24261 mdb_modify_internal: delete dc
160814-16:08:07.669121_24261 mdb_modify_internal: softadd dc
160814-16:08:07.669144_24261 => mdb_entry_encode(0x00000014): dc=itsdomain1,dc=example,dc=com
160814-16:08:07.669150_24261 <= mdb_entry_encode(0x00000014): dc=itsdomain1,dc=example,dc=com
160814-16:08:07.669183_24261 mdb_modrdn: rdn modified id=00000014 dn="dc=testdomain1,dc=example,dc=com"
160814-16:08:07.669231_24261 slap_queue_csn: tail-queueing 0x6080000a45c0 (conn 1, opid 31) 20160814160805.811469Z#000000#000#000000, op 0x7fc7fd901260
160814-16:08:07.669355_24261 slap_graduate_commit_csn: removing 0x6080000a45c0 (conn 1, opid 31) 20160814160805.811469Z#000000#000#000000, op 0x7fc7fd901260
160814-16:08:07.669364_24261 syncrepl_entry: rid=001 be_modify dc=itsdomain1,dc=example,dc=com (0)

160814-16:08:07.669467_24261 syncrepl_process: rid=001 LDAP_RES_SEARCH_RESULT
160814-16:08:07.669486_24261 syncrepl_cookie_take: rid=001 raw-cookie=rid=001,sid=000,csn=20160814160805.811469Z#000000#000#000000
160814-16:08:07.670718_24261 compare-cookie: local[rid=1, sid=-1]#1 | remote[rid=1, sid=0]#1
160814-16:08:07.670727_24261 compare-cookie: ~> local 20160814160757.895461Z#000000#000#000000 < remote 20160814160805.811469Z#000000#000#000000 <~
160814-16:08:07.672034_24261 cookie_merge: forward 20160814160757.895461Z#000000#000#000000 => 20160814160805.811469Z#000000#000#000000
160814-16:08:07.672044_24261 push: current[rid=0, sid=0]#1 | next[rid=0, sid=0]#1
160814-16:08:07.672048_24261 push: ~> current 20160814160757.895461Z#000000#000#000000 < next 20160814160805.811469Z#000000#000#000000 <~
160814-16:08:07.672247_24261 syncrepl_cookie_push: age => 3
160814-16:08:07.672251_24261 syncrepl_cookie_push: write-done
160814-16:08:07.672254_24261 syncrepl: refresh-done rid=001, rc 0, take 0 seconds
160814-16:08:07.672272_24261 <<= syncrep_process rid=001, rc -45 (SYNC_DONE)
160814-16:08:07.672275_24261 syncrepl_quorum: notify-status dc=example,dc=com, rid 1/0x6180000364a0, sid 0, refresh->ready
160814-16:08:07.672281_24261 syncrepl_gate: dc=example,dc=com clean
160814-16:08:07.672285_24261 quorum_syncrepl_gate: dc=example,dc=com, limit 0, running 0, 0x6180000364a0 out, rc 0
160814-16:08:07.672287_24261 syncrepl: refresh-end rid=001, rc 0, take 0 seconds
160814-16:08:07.672289_24261 <=do_syncrepl rid=001, rc -45 (SYNC_DONE)
160814-16:08:07.672292_24261 syncrepl_shutdown_io: rid=001 unbind-ext >>
@erthink
Copy link
Owner Author

erthink commented Aug 15, 2016

Seems this problem already fixed in ReOpenLDAP by using biglock in the syncrepl message loop.
But currently test019-syncrepl-cascade don't use a biglock feature.

@erthink
Copy link
Owner Author

erthink commented Aug 16, 2016

Проблема в гонках при каскадном refreshOnly обновлении содержимого в схеме slap1 => slap2 =>slap3.

slap3 выполняет обновление со slap2, в то время когда сам slap2 еще не завершил свою синхронизацию со slap1:

  • slap2 обрабатывает поступающий к нему набор записей syncprov_state_ctrl(LDAP_SYNC_ADD), которые поступают в произвольном порядке.
  • slap3 успевает завершить синхронизацию со slap2 раньше, чем slap2 успевает получить и обработать все записи syncprov_state_ctrl(LDAP_SYNC_ADD), получив таким образом лишь некоторую часть записей со slap1 в произвольном порядке;
  • на следующем цикле slap3 отправляет запрос на синхронизацию с cookie по максимальному entryCSN из имеющихся у него записей (это обеспечивается логикой syncprov), но так как на предыдущем были получены не все записи со slap1, то этот CSN может быть больше чем часть недополученных записей;
  • slap2 производит выборку используя CSN из полученного cookie, но в эту выборку не попадут ранее недополученные записи, entryCSN которых меньше.

Варианты решения проблемы:

  • добавить rw-блокировку чтобы не позволять syncprov начинать обслуживать новый refresh-search до завершения цикла синхронизации syncrepl, но этом может приводить к deadlock в fullmesh multimastrer кластере из нескольких узлов.
  • на стороне syncrepl перед применением уведомлений LDAP_SYNC_ADD/LDAP_SYNC_MODIFY производить их сортировку по entryCSN.
  • производить сортировку выборки по entryCSN на стороне syncprov.
  • в syncprov_operational() не отдавать "грязный" набор contextCSN, со значениями которые не были обновлены явно syncrepl по завершению цикла.

@erthink
Copy link
Owner Author

erthink commented Feb 20, 2017

Seems that this issue has the same reason as the #121

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant