Skip to content

Commit

Permalink
Merge branch 'main' into CELEBORN-656-FOLLOWUP
Browse files Browse the repository at this point in the history
  • Loading branch information
AngersZhuuuu committed Jul 24, 2023
2 parents 8b854bf + 67c18e6 commit 94c1df7
Show file tree
Hide file tree
Showing 14 changed files with 319 additions and 55 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -466,7 +466,7 @@ class CelebornConf(loadDefaults: Boolean) extends Cloneable with Logging with Se

def networkAllocatorVerboseMetric: Boolean = get(NETWORK_MEMORY_ALLOCATOR_VERBOSE_METRIC)

def shuffleIoMaxChunksBeingTransferred: Long = {
def shuffleIoMaxChunksBeingTransferred: Option[Long] = {
get(MAX_CHUNKS_BEING_TRANSFERRED)
}

Expand Down Expand Up @@ -1418,7 +1418,7 @@ object CelebornConf extends Logging {
.bytesConf(ByteUnit.BYTE)
.createWithDefaultString("2m")

val MAX_CHUNKS_BEING_TRANSFERRED: ConfigEntry[Long] =
val MAX_CHUNKS_BEING_TRANSFERRED: OptionalConfigEntry[Long] =
buildConf("celeborn.shuffle.io.maxChunksBeingTransferred")
.categories("network")
.doc("The max number of chunks allowed to be transferred at the same time on shuffle service. Note " +
Expand All @@ -1427,7 +1427,7 @@ object CelebornConf extends Logging {
"`celeborn.<module>.io.retryWait`), if those limits are reached the task will fail with fetch failure.")
.version("0.2.0")
.longConf
.createWithDefault(Long.MaxValue)
.createOptional

val PUSH_TIMEOUT_CHECK_INTERVAL: ConfigEntry[Long] =
buildConf("celeborn.<module>.push.timeoutCheck.interval")
Expand Down
4 changes: 4 additions & 0 deletions docs/assets/img/backpressure.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions docs/assets/img/mappartition.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions docs/assets/img/multilayer.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions docs/assets/img/reducepartition.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/configuration/network.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,5 +49,5 @@ license: |
| celeborn.rpc.dispatcher.threads | &lt;undefined&gt; | Threads number of message dispatcher event loop | 0.3.0 |
| celeborn.rpc.io.threads | &lt;undefined&gt; | Netty IO thread number of NettyRpcEnv to handle RPC request. The default threads number is the number of runtime available processors. | 0.2.0 |
| celeborn.rpc.lookupTimeout | 30s | Timeout for RPC lookup operations. | 0.2.0 |
| celeborn.shuffle.io.maxChunksBeingTransferred | 9223372036854775807 | The max number of chunks allowed to be transferred at the same time on shuffle service. Note that new incoming connections will be closed when the max number is hit. The client will retry according to the shuffle retry configs (see `celeborn.<module>.io.maxRetries` and `celeborn.<module>.io.retryWait`), if those limits are reached the task will fail with fetch failure. | 0.2.0 |
| celeborn.shuffle.io.maxChunksBeingTransferred | &lt;undefined&gt; | The max number of chunks allowed to be transferred at the same time on shuffle service. Note that new incoming connections will be closed when the max number is hit. The client will retry according to the shuffle retry configs (see `celeborn.<module>.io.maxRetries` and `celeborn.<module>.io.retryWait`), if those limits are reached the task will fail with fetch failure. | 0.2.0 |
<!--end-include-->
1 change: 1 addition & 0 deletions docs/developers/pushdata.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,7 @@ For the first three cases, `Worker` informs `ShuffleClient` that it should trigg
`ShuffleClient` triggers split itself.

There are two kinds of Split:

- `HARD_SPLIT`, meaning old `PartitionLocation` epoch refuses to accept any data, and future data of the
`PartitionLocation` will only be pushed after new `PartitionLocation` epoch is ready
- `SOFT_SPLIT`, meaning old `PartitionLocation` epoch continues to accept data, when new epoch is ready, `ShuffleClient`
Expand Down
Loading

0 comments on commit 94c1df7

Please sign in to comment.