Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDFS-17658. HDFS decommissioning does not consider if Under Construction blocks are sufficiently replicated which causes HDFS Data Loss #7179
base: trunk
Are you sure you want to change the base?
HDFS-17658. HDFS decommissioning does not consider if Under Construction blocks are sufficiently replicated which causes HDFS Data Loss #7179
Changes from all commits
e0193a2
e07907a
b450fee
290ff09
731900e
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this UnderConstruction or UnderReplication ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I am definitely open to changing the terminology here.
I don't know if
UnderReplication
is the correct term though, I think this may be more applicable to a block which needs to be replicated asynchronously by the Namenode to meet replication factor (i.e. due to datanode decommissioning or datanode failures).From the Namenode logs we can see the block replica state is reported as
RBW
(replica being written) \RECEIVING_BLOCK
:I took the term
UnderConstruction
from here in the code:hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminManager.java
Line 305 in cd2cffe
However, upon further inspection it seems this term is related to a BlockCollection (as opposed to a block replica):
Let me know your thoughts. I can refactor
UnderConstructionBlocks
to something likeRbwBlocks
orReplicaBeingWrittenBlocks
if this is a more accurate termThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am seeing that "UnderConstruction" terminology is already used in the code for both blocks & block replicas:
Therefore, I do think that the name
UnderConstructionBlocks
aligns with existing terminology already used in the code baseThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me, appreciate you taking time to refer to related code and reply in detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this info log become noisy if the client takes time to close the stream ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This log will be printed at most once per datanode per DatanodeAdminDefaultMonitor cycle. This means in a 1k datanode HDFS cluster, there could be up to 1k log lines printed every 30 seconds (if all the 1k datanodes have under construction blocks).
This behaviour matches existing behaviour where if a decommissioning datanode has under replicated blocks, then 1 log line will be printed every single DatanodeAdminDefaultMonitor cycle (for that datanode):