Skip to content
This repository has been archived by the owner on Nov 24, 2023. It is now read-only.

.*: add compactor for syncer #2261

Open
wants to merge 113 commits into
base: master
Choose a base branch
from
Open

Conversation

GMHDBJD
Copy link
Collaborator

@GMHDBJD GMHDBJD commented Oct 25, 2021

What problem does this PR solve?

add compactor to compact dml

What is changed and how it works?

# Assuming that UPDATE statement does not update the primary key. If no primary key, choose a UNIQUE NOT NULL Key.
# If UPDATE statement update the primary key or unique key, split it into one DELETE and one INSERT statement.
# We list two successive DMLs of one row and the result that can be compacted. X means this situation will not happen.
INSERT + INSERT => X
INSERT + UPDATE => INSERT
INSERT + DELETE => DELETE
UPDATE + INSERT => X
UPDATE + UPDATE => UPDATE
UPDATE + DELETE => DELETE
DELETE + INSERT => UPDATE
DELETE + UPDATE => X
DELETE + DELETE => X

Check List

Tests

  • Integration test
  • UT

@GMHDBJD GMHDBJD removed the status/WIP This PR is still work in progress label Oct 26, 2021
chaos/cases/conf/task-optimistic.yaml Show resolved Hide resolved
dm/config/task.go Show resolved Hide resolved
syncer/dml_worker.go Outdated Show resolved Hide resolved
syncer/syncer.go Show resolved Hide resolved
syncer/compactor.go Show resolved Hide resolved
syncer/compactor.go Outdated Show resolved Hide resolved
syncer/compactor.go Outdated Show resolved Hide resolved
syncer/compactor.go Show resolved Hide resolved
syncer/compactor.go Outdated Show resolved Hide resolved
syncer/compactor.go Show resolved Hide resolved
Comment on lines +268 to +272
chanSize := s.cfg.QueueSize * s.cfg.WorkerCount / 2
if s.cfg.Compact {
chanSize /= 2
}
s.dmlJobCh = make(chan *job, chanSize)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DM originally cached s.cfg.QueueSize * s.cfg.WorkerCount dml jobs in memory. Now if compact: false, dmlJobCh will dmlWorker will both cached s.cfg.QueueSize * s.cfg.WorkerCount/2 jobs. If compact: true, dmlJobCh, compactor buffer, compactor output channel and dmlWorker will all cached s.cfg.QueueSize * s.cfg.WorkerCount/4 jobs.
Actually we can use a larger compact buffer-size, but if so, when user pause-task/stop-task, they may need to wait a longer time to wait all jobs flushed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems causality output channel didn't get adjusted

syncer/compactor_test.go Show resolved Hide resolved
syncer/compactor_test.go Show resolved Hide resolved
Copy link
Contributor

@Ehco1996 Ehco1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
if all chanSize and bufferSize is accept by other reviews, btw pelease add some comments when the parameters are determined

@ti-chi-bot ti-chi-bot added the status/LGT1 One reviewer already commented LGTM label Oct 28, 2021
@GMHDBJD
Copy link
Collaborator Author

GMHDBJD commented Oct 28, 2021

upgrade failed due to https://github.com/pingcap/dm/issues/1961

Copy link
Collaborator

@lance6716 lance6716 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will review after food

dm/config/task.go Outdated Show resolved Hide resolved
syncer/compactor.go Show resolved Hide resolved
*delDML = *updateDML
delDML.op = del
// use oldValues of update as values of delete and reset oldValues
delDML.values = updateDML.oldValues
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe in future we can remove the op of DML,

swicth DML:
only has new value: it's an INSERT
only has old value: it's a DELETE
else: it's a UPDATE

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe some problems like a table with all columns are generate columns, so new value and old value will both be null in insert statement.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe there should be at least one column, to let other columns generated from it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this kind of implicit opType is no good for the readbility if our code? Are there any harm to reserve the explicit op field?

syncer/compactor.go Outdated Show resolved Hide resolved
metrics.QueueSizeGauge.WithLabelValues(c.task, "compactor_input", c.source).Set(float64(len(c.inCh)))

if j.tp == flush {
c.flushBuffer()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will block the consumption of c.inCh. if we has a lot of jobs to be flushed, we are now blocked to wait every job sent to next stage. This will increase the risk of upstream read timeout.

Maybe we can implement ping-pong buffer in future.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not the same problem. Upstream read timeout may also caused by the dmlJobCh is full. And after we support async flush checkpoint, we only need flush job when stop-task/pause-task?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when DM write 1 job per second to downstream and we have 100 jobs in compactor to flush, we now have to wait 100 seconds before reading upstream. Without compactor we only wait 1 second.

Copy link
Collaborator Author

@GMHDBJD GMHDBJD Oct 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if len(c.inCh) == 0 || len(c.buffer) >= c.bufferSize || len(c.outCh) == 0 {

Does len(c.in)==0 meets requirements?

Copy link
Collaborator

@lance6716 lance6716 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will review tests later

syncer/compactor.go Outdated Show resolved Hide resolved
syncer/compactor.go Outdated Show resolved Hide resolved
Comment on lines +268 to +272
chanSize := s.cfg.QueueSize * s.cfg.WorkerCount / 2
if s.cfg.Compact {
chanSize /= 2
}
s.dmlJobCh = make(chan *job, chanSize)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems causality output channel didn't get adjusted

syncer/compactor.go Outdated Show resolved Hide resolved
}

for i := 0; i < len(values); i++ {
if values[i] != oldValues[i] {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're comparing interface{}, so I guess they are not equal for most time?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interface values are comparable. Two interface values are equal if they have identical dynamic types and equal dynamic values or if both have value nil.

https://stackoverflow.com/questions/34245932/checking-equality-of-interface

@GMHDBJD
Copy link
Collaborator Author

GMHDBJD commented Oct 28, 2021

seems causality output channel didn't get adjusted

I think causality output channel no need to adjust, because it has no buffer and default queue size 1024 is enough.

Copy link
Collaborator

@lance6716 lance6716 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest lgtm

tests/shardddl1/run.sh Outdated Show resolved Hide resolved
@ti-chi-bot ti-chi-bot added status/LGT2 Two reviewers already commented LGTM, ready for merge and removed status/LGT1 One reviewer already commented LGTM labels Oct 28, 2021
Comment on lines +118 to +120
if c.safeMode {
j.dml.safeMode = true
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if c.safeMode {
j.dml.safeMode = true
}
j.dml.safeMode = c.safeMode

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

key := j.dml.identifyKey()
prevPos, ok := tableKeyMap[key]
// if no such key in the buffer, add it
if !ok || prevPos >= len(c.buffer) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can prevPos bigger than buffer length

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just avoid panic. have removed it

syncer/compactor.go Show resolved Hide resolved
*delDML = *updateDML
delDML.op = del
// use oldValues of update as values of delete and reset oldValues
delDML.values = updateDML.oldValues
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this kind of implicit opType is no good for the readbility if our code? Are there any harm to reserve the explicit op field?

@@ -59,10 +59,14 @@ type DMLWorker struct {

// dmlWorkerWrap creates and runs a dmlWorker instance and returns flush job channel.
func dmlWorkerWrap(inCh chan *job, syncer *Syncer) chan *job {
chanSize := syncer.cfg.QueueSize / 2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment for this special logic? I'm curious if we can change the default value of this field to avoid this change. BTW, doesn't anyone know the effect of the chan size change? How can a user know he/she need to adjust its value and what is the proper value?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update comment in pingcap/ticdc#3162

Copy link
Collaborator Author

@GMHDBJD GMHDBJD Oct 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think no user will change the queue-size... so I keep the original value and didn't add a new config item like compact-buffer-size

@GMHDBJD
Copy link
Collaborator Author

GMHDBJD commented Oct 28, 2021

Please comment in https://github.com/pingcap/ticdc/pull/3162

@dianqihanwangzi
Copy link

/run-unit-test

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
needs-cherry-pick-release-2.0 This PR should be cherry-picked to release-2.0. Remove this label after cherry-picked to release-2.0 needs-update-docs Should update docs after this PR is merged. Remove this label once the docs are updated size/XXL status/LGT2 Two reviewers already commented LGTM, ready for merge status/PTAL This PR is ready for review. Add this label back after committing new changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants