-
Notifications
You must be signed in to change notification settings - Fork 414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OAK-11444 [full-gc] Save document id and empty properties names before deletion #2038
base: trunk
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, this adds a new Collection for the first time in more than a decade. I believe this needs more discussion, optimally not in the PR but in the ticket.
oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/Collection.java
Outdated
Show resolved
Hide resolved
oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/FullGcBin.java
Outdated
Show resolved
Hide resolved
oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/FullGcBin.java
Outdated
Show resolved
Hide resolved
oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/FullGcBin.java
Outdated
Show resolved
Hide resolved
oak-store-document/src/test/java/org/apache/jackrabbit/oak/plugins/document/FullGcBinTest.java
Outdated
Show resolved
Hide resolved
oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/FullGcBin.java
Outdated
Show resolved
Hide resolved
oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/FullGcBin.java
Outdated
Show resolved
Hide resolved
oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/FullGcBin.java
Outdated
Show resolved
Hide resolved
oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/FullGcBin.java
Outdated
Show resolved
Hide resolved
oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/FullGcBin.java
Outdated
Show resolved
Hide resolved
oak-store-document/src/test/java/org/apache/jackrabbit/oak/plugins/document/FullGcBinTest.java
Outdated
Show resolved
Hide resolved
I've addressed all issues mentioned and increased test coverage to 100% |
import java.util.Map; | ||
import java.util.stream.Collectors; | ||
|
||
public class FullGcBin { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider adding a short class description summarizing how and where this class is used, for example: "used in MongoDB garbage collection in order to first store the deleted document paths in MongoDB before doing the deletion".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
documentStore = ds; | ||
} | ||
|
||
public int remove(Map<String, Long> orphanOrDeletedRemovalMap) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider adding a short description (one line) for what the methods do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
try { | ||
return documentStore.create(Collection.SETTINGS, docs); | ||
} catch (Exception e) { | ||
LOG.error("Error while adding delete candidate documents to bin", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we also log the paths of the documents that failed to be added to the bin, they will be helpful for debug purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
documentStore.createOrUpdate(Collection.SETTINGS, binOpList); | ||
return true; | ||
} catch (Exception e) { | ||
LOG.error("Error while adding removed properties to bin", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we also log the paths of the properties that failed to be added to the bin, will be helpful for debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added suggestions for logging of paths and adding class and method descriptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TTL index for the /bin documents should also be created automatically.
I suggest doing this in MongoDocumentStore.ensureIndexes method.
what should be the TTL for this documents? 30 days? 60 days? ... |
I suggest we start with 90 days. |
@@ -191,10 +191,10 @@ private static class RevisionsOptions extends Utils.NodeStoreOptions { | |||
fullGcProgressSize = parser.accepts("fullGcProgressSize", "The number of documents to check for " + | |||
"garbage in each Full GC cycle") | |||
.withRequiredArg().ofType(Integer.class).defaultsTo(10000); | |||
fullGcMaxAge = parser.accepts("fullGcMaxAge", "The maximum age of the document in seconds " + | |||
fullGcMaxAgeSec = parser.accepts("fullGcMaxAge", "The maximum age of the document in seconds " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would keep both variable name & param name same.
fullGcMaxAgeSec = parser.accepts("fullGcMaxAge", "The maximum age of the document in seconds " + | |
fullGcMaxAge = parser.accepts("fullGcMaxAge", "The maximum age of the document in seconds " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
oak-run/src/main/java/org/apache/jackrabbit/oak/run/RevisionsCommand.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, approved.
* Each method delegates directly to DocumentStore | ||
*/ | ||
public class MongoFullGcNodeBin implements FullGcNodeBin { | ||
private static final Logger LOG = getLogger(MongoFullGcNodeBin.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use Fully qualified class name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean @rishabhdaim ... why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean @rishabhdaim ... why?
me too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avoid static import of "getLogger" for consistency with the rest of Oak :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes, LoggerFactory.getLogger
instead of getLogger
. Right!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
return mongoDocumentStore.findAndUpdate(Collection.NODES, updateOpList); | ||
} | ||
|
||
protected boolean addToBin(Map<String, Long> orphanOrDeletedRemovalMap) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add the comments about what would be added in DB after this operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
public MongoFullGcNodeBin(MongoDocumentStore ds) { | ||
mongoDocumentStore = ds; | ||
enabled = System.getProperty("oak.document.fullGcBin.enabled", "false").equals("true"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We typically have a (public) constant and then use this here.
Also, Boolean.getBoolean(...)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this part is gone in the configuration PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know. But the convenience comes with drawbacks. That's why we have a utility method (logging, value checking etc...). How many times have you wondered whether you really have set the correct property? And what the default was without checking the source code?
Here's an example use:
jackrabbit-oak/oak-core/src/main/java/org/apache/jackrabbit/oak/core/MutableRoot.java
Line 142 in 48b26b5
private static final boolean CLASSIC_MOVE = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also using "System.getProperty" instead of "Boolean.getBoolean" makes things worse because you lose common parsing of boolean values...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
understood. updated as suggested
return mongoDocumentStore.findAndUpdate(Collection.NODES, updateOpList); | ||
} | ||
|
||
protected boolean addToBin(Map<String, Long> orphanOrDeletedRemovalMap) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
protected boolean addToBin(Map<String, Long> orphanOrDeletedRemovalMap) { | |
private boolean addToBin(Map<String, Long> orphanOrDeletedRemovalMap) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
return false; | ||
} | ||
|
||
protected boolean persist(List<BasicDBObject> inserts) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
protected boolean persist(List<BasicDBObject> inserts) { | |
private boolean persist(List<BasicDBObject> inserts) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
return true; | ||
} | ||
|
||
BasicDBObject toBasicDBObject(UpdateOp op) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BasicDBObject toBasicDBObject(UpdateOp op) { | |
private BasicDBObject toBasicDBObject(UpdateOp op) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -481,6 +481,11 @@ private void logSplitDocIdsTobeDeleted(Bson query) { | |||
LOG.debug(sb.toString()); | |||
} | |||
|
|||
@Override | |||
public FullGcNodeBin getFullGCBin() { | |||
return new MongoFullGcNodeBin(store); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be singleton.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think the place to adding getBinCollection()
is wrong. We should have this in DocumentStore
interface with default implementation of noBin
and then override this for MongoDocumentStore
only.
For reference we could use default Throttler throttler()
.
For test we could override this for MemoryDocumentStore
too.
return false; | ||
} | ||
|
||
private boolean addToBin(List<UpdateOp> updateOpList) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both addToBin
method does exactly the same thing, just differ in input. We should extract out the common code to a common method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we also have the conversion of input param to BasicDBObject
which is different. The common part is extracted in the persist()
method
we discussed about adding a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems all the comments have been addressed, but apparently there is an open question. If the implementation detail has to be challenged I propose to discuss in Jira ticket, so we have everything at one place.
I have merged OAK-11525 containing the configuration part into this PR so we have everything in a single place |
I'm running the pipeline just to ensure it builds properly, and will merge afterwards. Thanks! |
…llection * OAK-11525 Update fullGC configuration for audit logging into Mongo collection
Introduce a
FullGcBin
class as a wrapper aroundDocumentStore
that expose two methods used to clean garbage from NODES collectionWhen enabled
Each methods saves the document ID or empty properties names (that will be deleted) to a separate
_bin
collection as aBinDocument
then delegates deletion toDocumentStore
When disabled (default)
Each method delegates directly to DocumentStore
This PR does not contain the necessary configuration parameter to turn on/off the fullGcBin. I'll create a separate PR for that (due the fact how config params are passed in, it will require additional 15 file changes and will make this PR hard to follow)
Update: PR that adds configuration for this feature is available here: daniancu#1