-
Notifications
You must be signed in to change notification settings - Fork 269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Bsa Download Action #2214
Add Bsa Download Action #2214
Conversation
|
||
public static Order deserialize(String text) { | ||
List<String> items = SPLITTER.splitToList(text); | ||
return of(Long.valueOf(items.get(0)), OrderType.valueOf(items.get(1))); |
Check notice
Code scanning / CodeQL
Missing catch of NumberFormatException Note
import org.junit.jupiter.api.Test; | ||
|
||
/** Unit tests for {@link Label}. */ | ||
class LabelTest { |
Check notice
Code scanning / CodeQL
Unused classes and interfaces Note test
import org.junit.jupiter.api.Test; | ||
|
||
/** Unit tests for {@link NonBlockedDomain}. */ | ||
class NonBlockedDomainTest { |
Check notice
Code scanning / CodeQL
Unused classes and interfaces Note test
ImmutableMap<BlockList, String> fetchedChecksums = | ||
ImmutableMap.of(BLOCK, block.peekChecksum(), BLOCK_PLUS, blockPlus.peekChecksum()); |
Check notice
Code scanning / CodeQL
Unread local variable Note
ImmutableMap<BlockList, String> prevChecksums = | ||
schedule | ||
.latestCompleted() | ||
.map(DownloadSchedule.CompletedJob::checksums) | ||
.orElseGet(ImmutableMap::of); |
Check notice
Code scanning / CodeQL
Unread local variable Note
16478ab
to
cca9663
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 35 files reviewed, 7 unresolved discussions (waiting on @Github-advanced-security[bot] and @weiminyu)
core/src/main/java/google/registry/module/backend/BackendRequestComponent.java
line 105 at r3 (raw file):
interface BackendRequestComponent { BsaDownloadAction bsaDownloadAction();
We probably want to make it a separate service altogether. This way static IP change is more safe, because as it turned out we have multiple upload actions that require whitelisting with ICANN. Given the infrequent nature of those actions and possibility of messing them up we would be better of limiting the scope of change to a separate service specifically to BSA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 35 files reviewed, 7 unresolved discussions (waiting on @ptkach)
core/src/main/java/google/registry/module/backend/BackendRequestComponent.java
line 105 at r3 (raw file):
Previously, ptkach (Pavlo Tkach) wrote…
We probably want to make it a separate service altogether. This way static IP change is more safe, because as it turned out we have multiple upload actions that require whitelisting with ICANN. Given the infrequent nature of those actions and possibility of messing them up we would be better of limiting the scope of change to a separate service specifically to BSA
Makes sense. Could you set it up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 35 files reviewed, 7 unresolved discussions (waiting on @Github-advanced-security[bot] and @weiminyu)
core/src/main/java/google/registry/module/backend/BackendRequestComponent.java
line 105 at r3 (raw file):
Previously, weiminyu (Weimin Yu) wrote…
Makes sense. Could you set it up?
Yeah, I can do it
cca9663
to
c918fa1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 35 files reviewed, 10 unresolved discussions (waiting on @Github-advanced-security[bot], @ptkach, and @weiminyu)
db/src/main/resources/sql/schema/db-schema.sql.generated
line 88 at r4 (raw file):
); create table "BsaDomainInUse" (
Add domain_repo_id
as a pointer so that these can easily be checked in the future to see which might no longer be in use if the domain gets deleted?
db/src/main/resources/sql/schema/db-schema.sql.generated
line 101 at r4 (raw file):
creation_time timestamptz not null, stage text not null, update_timestamp timestamptz,
Why is there a creation time and an update time? Aren't these immutable? I would expect there to just be a download time (could also be called creation time).
db/src/main/resources/sql/schema/db-schema.sql.generated
line 107 at r4 (raw file):
create table "BsaLabel" ( label text not null, creation_time timestamptz not null,
Do we want a pointer to the first BsaDownload
the label occurred in? Or would you do that by matching up the creation times?
I think it might be better to have an revision/download ID here, and then join it to the BsaDownload
table if you need to know the creation time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 35 files reviewed, 11 unresolved discussions (waiting on @Github-advanced-security[bot], @ptkach, and @weiminyu)
core/src/main/java/google/registry/bsa/persistence/BsaDomainInUse.java
line 29 at r4 (raw file):
import javax.persistence.IdClass; /** A domain matching a BSA label but is in use (registered or reserved), so cannot be blocked. */
I wouldn't necessarily call a reserved domain an "domain in use". BsaUnblockableDomain
with a reason or either registered or reserved makes more sense to me semantically.
Also what about domains that are unblockable by reason of the IDN table not supporting them? We don't want to track those here, and will instead depend on regenerating the entire set (with notifications) on the (very rare) chance of the IDN table on the TLD being changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 35 files reviewed, 13 unresolved discussions (waiting on @Github-advanced-security[bot], @ptkach, and @weiminyu)
core/src/main/java/google/registry/bsa/BlockList.java
line 18 at r4 (raw file):
/** Identifiers of the BSA lists with blocking labels. */ public enum BlockList {
You can put these small enums as static inner members of the classes in which they're used rather than having them be their own files, like NonBlockedDomain.Reason
below.
Also I think the name here needs to be workshopped. Maybe BlockListType
, BlockListName
or similar?
core/src/main/java/google/registry/bsa/api/NonBlockedDomain.java
line 27 at r4 (raw file):
*/ @AutoValue public abstract class NonBlockedDomain {
UnblockableDomain
makes more sense to me here as a name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 35 files reviewed, 13 unresolved discussions (waiting on @CydeWeys, @Github-advanced-security[bot], and @ptkach)
db/src/main/resources/sql/schema/db-schema.sql.generated
line 88 at r4 (raw file):
Previously, CydeWeys (Ben McIlwain) wrote…
Add
domain_repo_id
as a pointer so that these can easily be checked in the future to see which might no longer be in use if the domain gets deleted?
Domain_name is the foreign key so checking should be efficient too. Also, this may be a reserved name that is not registered yet.
db/src/main/resources/sql/schema/db-schema.sql.generated
line 101 at r4 (raw file):
Previously, CydeWeys (Ben McIlwain) wrote…
Why is there a creation time and an update time? Aren't these immutable? I would expect there to just be a download time (could also be called creation time).
A download progresses through multiple stages. If it somehow fails in the middle and retries, the update_time will be different from the
creation time.
db/src/main/resources/sql/schema/db-schema.sql.generated
line 107 at r4 (raw file):
Previously, CydeWeys (Ben McIlwain) wrote…
Do we want a pointer to the first
BsaDownload
the label occurred in? Or would you do that by matching up the creation times?I think it might be better to have an revision/download ID here, and then join it to the
BsaDownload
table if you need to know the creation time.
The creation_time should have been called first_download_time. It's the same timestamp of the first BsaDownload it appears in.
It's a bit too late to change the name if we don't want to wait another week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 35 files reviewed, 13 unresolved discussions (waiting on @CydeWeys, @Github-advanced-security[bot], and @ptkach)
db/src/main/resources/sql/schema/db-schema.sql.generated
line 107 at r4 (raw file):
Previously, weiminyu (Weimin Yu) wrote…
The creation_time should have been called first_download_time. It's the same timestamp of the first BsaDownload it appears in.
It's a bit too late to change the name if we don't want to wait another week.
By the way, this file is carried over from go/r3pr/2205, which is approved and will be submitted next.
Please make comments there if you want to discuss more about the schema.
d0a930d
to
5744d8f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 44 files reviewed, 12 unresolved discussions (waiting on @Github-advanced-security[bot], @ptkach, and @weiminyu)
db/src/main/resources/sql/schema/db-schema.sql.generated
line 88 at r4 (raw file):
Previously, weiminyu (Weimin Yu) wrote…
Domain_name is the foreign key so checking should be efficient too. Also, this may be a reserved name that is not registered yet.
But you don't have domain_name
in here either -- it would need to be constructed through string concatenation. And the primary key is better than the foreign key, as domain_name
on the Domain
table is not unique (whereas repo id is). If a domain has existed several times, then it will have several entries in the Domain
table, and now you have order them by create time and take the most recent one. It's just a lot harder than JOIN
to the Domain
table on domain_repo_id
.
And for the case of a reservation, the domain_repo_id
field can be nullable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 44 files reviewed, 12 unresolved discussions (waiting on @CydeWeys, @Github-advanced-security[bot], and @ptkach)
db/src/main/resources/sql/schema/db-schema.sql.generated
line 88 at r4 (raw file):
Previously, CydeWeys (Ben McIlwain) wrote…
But you don't have
domain_name
in here either -- it would need to be constructed through string concatenation. And the primary key is better than the foreign key, asdomain_name
on theDomain
table is not unique (whereas repo id is). If a domain has existed several times, then it will have several entries in theDomain
table, and now you have order them by create time and take the most recent one. It's just a lot harder thanJOIN
to theDomain
table ondomain_repo_id
.And for the case of a reservation, the
domain_repo_id
field can be nullable.
ForeignKeyUtils.load
already does the filtering on timestamps, and it is used when we create BsaDomainInUse entities from labels.
We can reuse the same code when doing the refresh. ForeignKeyUtils.load
also already supports bulk query, which is also needed by the refresh check.
core/src/main/java/google/registry/bsa/BlockList.java
line 18 at r4 (raw file):
Previously, CydeWeys (Ben McIlwain) wrote…
You can put these small enums as static inner members of the classes in which they're used rather than having them be their own files, like
NonBlockedDomain.Reason
below.Also I think the name here needs to be workshopped. Maybe
BlockListType
,BlockListName
or similar?
These enums are used in multiple classes, and there is not an obvious choice of enclosing class for each of them.
Unless you want a Definitions
interface or class to hold all of them.
core/src/main/java/google/registry/bsa/persistence/BsaDomainInUse.java
line 29 at r4 (raw file):
Previously, CydeWeys (Ben McIlwain) wrote…
I wouldn't necessarily call a reserved domain an "domain in use".
BsaUnblockableDomain
with a reason or either registered or reserved makes more sense to me semantically.Also what about domains that are unblockable by reason of the IDN table not supporting them? We don't want to track those here, and will instead depend on regenerating the entire set (with notifications) on the (very rare) chance of the IDN table on the TLD being changed?
I just want a name that is distinguishable from the one used for communicating with BSA. We can change the name of this class, but changing the name of the underlying
schema table would be time-consuming.
And yes, invalid (due to idn) domains are not tracked here. See the Reason enum below: it doesn't have INVALID
, which is present in the API object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 44 files reviewed, 12 unresolved discussions (waiting on @Github-advanced-security[bot], @ptkach, and @weiminyu)
db/src/main/resources/sql/schema/db-schema.sql.generated
line 88 at r4 (raw file):
Previously, weiminyu (Weimin Yu) wrote…
ForeignKeyUtils.load
already does the filtering on timestamps, and it is used when we create BsaDomainInUse entities from labels.
We can reuse the same code when doing the refresh.ForeignKeyUtils.load
also already supports bulk query, which is also needed by the refresh check.
I was thinking it might be useful for manual queries or dashboards to do analytics on the list (to join to the exact Domain
row in question easily). You don't have access to ``ForeignKeyUtils.load()` from a SQL query sadly; it gets much more complicated (need to concatenate domain + label, join to the Domain table based on that, sort by create time, pick the most recent, etc.).
But maybe we wouldn't use this table for that, and just use the BsaLabel
table? The only reason we really need this table is to keep track of what we already sent to the BSA API, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 44 files reviewed, 12 unresolved discussions (waiting on @CydeWeys, @Github-advanced-security[bot], and @ptkach)
core/src/main/java/google/registry/bsa/BsaDownloadAction.java
line 90 at r1 (raw file):
Previously, github-advanced-security[bot] wrote…
Unread local variable
Variable 'ImmutableMap<BlockList,String> fetchedChecksums' is never read.
Will address will BSA feature is available.
core/src/main/java/google/registry/bsa/BsaDownloadAction.java
line 95 at r1 (raw file):
Previously, github-advanced-security[bot] wrote…
Unread local variable
Variable 'ImmutableMap<BlockList,String> prevChecksums' is never read.
Will address will BSA feature is available.
core/src/main/java/google/registry/bsa/api/NonBlockedDomain.java
line 27 at r4 (raw file):
Previously, CydeWeys (Ben McIlwain) wrote…
UnblockableDomain
makes more sense to me here as a name.
Will address in a future PR.
core/src/main/java/google/registry/bsa/api/Order.java
line 42 at r1 (raw file):
Previously, github-advanced-security[bot] wrote…
Missing catch of NumberFormatException
Potential uncaught 'java.lang.NumberFormatException'.
Addressed in a follow-up PR
core/src/test/java/google/registry/bsa/api/LabelTest.java
line 25 at r1 (raw file):
Previously, github-advanced-security[bot] wrote…
Unused classes and interfaces
Unused class: LabelTest is not referenced within this codebase. If not used as an external API it should be removed.
Github check gone wrong
. This is test.
core/src/test/java/google/registry/bsa/api/NonBlockedDomainTest.java
line 24 at r1 (raw file):
Previously, github-advanced-security[bot] wrote…
Unused classes and interfaces
Unused class: NonBlockedDomainTest is not referenced within this codebase. If not used as an external API it should be removed.
Github check gone wrong. This is test.
core/src/test/java/google/registry/bsa/api/OrderTest.java
line 24 at r1 (raw file):
Previously, github-advanced-security[bot] wrote…
Unused classes and interfaces
Unused class: OrderTest is not referenced within this codebase. If not used as an external API it should be removed.
Github check gone wrong. This is test.
db/src/main/resources/sql/schema/db-schema.sql.generated
line 88 at r4 (raw file):
Previously, CydeWeys (Ben McIlwain) wrote…
I was thinking it might be useful for manual queries or dashboards to do analytics on the list (to join to the exact
Domain
row in question easily). You don't have access to ``ForeignKeyUtils.load()` from a SQL query sadly; it gets much more complicated (need to concatenate domain + label, join to the Domain table based on that, sort by create time, pick the most recent, etc.).But maybe we wouldn't use this table for that, and just use the
BsaLabel
table? The only reason we really need this table is to keep track of what we already sent to the BSA API, right?
Yes, this table is just to track what we sent to BSA.
If we want to do a present-time analysis we can just join the "Domain" table with the "BsaLabel" table, with a deletion_time > now()
filter.
core/src/main/java/google/registry/bsa/persistence/BsaDomainInUse.java
line 29 at r4 (raw file):
Previously, weiminyu (Weimin Yu) wrote…
I just want a name that is distinguishable from the one used for communicating with BSA. We can change the name of this class, but changing the name of the underlying
schema table would be time-consuming.And yes, invalid (due to idn) domains are not tracked here. See the Reason enum below: it doesn't have
INVALID
, which is present in the API object.
Started the renaming process. New table checked in.
core/src/main/java/google/registry/module/backend/BackendRequestComponent.java
line 105 at r3 (raw file):
Previously, ptkach (Pavlo Tkach) wrote…
Yeah, I can do it
Fixed
5744d8f
to
065fc66
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 44 files reviewed, 12 unresolved discussions (waiting on @CydeWeys, @Github-advanced-security[bot], and @ptkach)
core/src/main/java/google/registry/bsa/BlockList.java
line 18 at r4 (raw file):
Previously, weiminyu (Weimin Yu) wrote…
These enums are used in multiple classes, and there is not an obvious choice of enclosing class for each of them.
Unless you want aDefinitions
interface or class to hold all of them.
We can have a refactor PR for naming etc when all is done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 44 files reviewed, 21 unresolved discussions (waiting on @Github-advanced-security[bot], @ptkach, and @weiminyu)
core/src/main/java/google/registry/bsa/BlockListFetcher.java
line 78 at r6 (raw file):
errorDetails = new String(ByteStreams.toByteArray(errorStream), UTF_8); } catch (Exception e) { // ignore
Set errorDetails
in here to "Couldn't get error details from connection"
(or whatever else is correctly descriptive), just so that the exception message below has the full info.
And there's really nothing of value on e
here ever? Not its message or anything?
core/src/main/java/google/registry/bsa/BlockListFetcher.java
line 94 at r6 (raw file):
} static class LazyBlockList implements Closeable {
I guess Lazy<BlockList>
wasn't good enough?
core/src/main/java/google/registry/bsa/BsaDownloadAction.java
line 69 at r6 (raw file):
try { if (!bsaLock.executeWithLock(this::runWithinLock)) { logger.atInfo().log("Job is being executed by another worker.");
Error message should say the action being taken. E.g.: "Job is being executed by another worker; terminating."
core/src/main/java/google/registry/bsa/BsaDownloadAction.java
line 74 at r6 (raw file):
logger.atWarning().withCause(throwable).log("Failed to update block lists."); } // Always return OK. Let the next cron job retry.
How often is the cron entry going to run this? Might we want to retry a few times in the case of an error rather than waiting for the next at least, what, hour?
core/src/main/java/google/registry/bsa/BsaDownloadAction.java
line 81 at r6 (raw file):
Optional<DownloadSchedule> scheduleOptional = downloadScheduler.schedule(); if (!scheduleOptional.isPresent()) { logger.atInfo().log("Nothing to do.");
Log message needs more clarity. Why might there be nothing to do and what is happening as a result? Is this a configuration thing, or what causes this case?
core/src/main/java/google/registry/bsa/BsaDownloadAction.java
line 85 at r6 (raw file):
} DownloadSchedule schedule = scheduleOptional.get(); switch (schedule.stage()) {
So this, what, keeps running over and over again, ratcheting through the stages and updating the progress each time? And this supports resuming from the previous stage it left off at I assume?
core/src/main/java/google/registry/bsa/BsaLock.java
line 23 at r6 (raw file):
import org.joda.time.Duration; /** Helper for guarding all BSA related work with a common lock. */
So this is a global singleton lock? If so, mention that here.
core/src/main/java/google/registry/bsa/api/BsaCredential.java
line 43 at r6 (raw file):
import org.joda.time.Instant; /** Self-refreshing credential for accessing the BSA API. */
Add more documentation here about what the credential consists of and how/why it self-refreshes (e.g. because it expires, and also possibly other reasons?).
core/src/main/java/google/registry/bsa/api/BsaException.java
line 17 at r6 (raw file):
package google.registry.bsa.api; public class BsaException extends RuntimeException {
Why are they all RuntimeException
s?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 44 files reviewed, 26 unresolved discussions (waiting on @Github-advanced-security[bot], @ptkach, and @weiminyu)
core/src/main/java/google/registry/bsa/BsaDownloadAction.java
line 87 at r6 (raw file):
switch (schedule.stage()) { case DOWNLOAD: try (LazyBlockList block = blockListFetcher.fetch(BLOCK);
You'll probably want to break these out into helper methods but I assume that comes in a subsequent PR.
core/src/main/java/google/registry/bsa/api/BsaCredential.java
line 88 at r6 (raw file):
ensureAuthTokenValid(); } catch (IOException e) { throw new BsaException(e, /* retriable= */ true);
Maybe there's a case here for a subclass of BsaException
, something like RetriableBsaException
or TransientBsaException
? Rather than just having it as a parameter to the constructor?
core/src/main/java/google/registry/bsa/api/BsaCredential.java
line 123 at r6 (raw file):
errorDetails = new String(ByteStreams.toByteArray(errorStream), UTF_8); } catch (Exception e) { // ignore
This always makes me a bit wary ... what is the case that's being ignored here?
core/src/main/java/google/registry/bsa/api/Label.java
line 28 at r6 (raw file):
*/ @AutoValue public abstract class Label {
Label
is quite generic. Why not BsaLabel
? Is that because it overloads the DB table name?
core/src/main/java/google/registry/bsa/api/NonBlockedDomain.java
line 27 at r4 (raw file):
Previously, weiminyu (Weimin Yu) wrote…
Will address in a future PR.
OK
core/src/main/java/google/registry/bsa/api/Order.java
line 27 at r6 (raw file):
*/ @AutoValue public abstract class Order {
Perhaps BsaOrder
?
core/src/main/java/google/registry/bsa/persistence/DownloadSchedule.java
line 30 at r6 (raw file):
import java.util.Optional; /** Information needed when handling a download from BSA. */
Can you expand this documentation? This class is not 100% clear.
core/src/main/java/google/registry/config/files/default-config.yaml
line 620 at r6 (raw file):
# Max time period during which downloads can be skipped because checksums have # not changed from the previous one. bsaMaxNopIntervalHours: 24
Nop
? Should be Noop
perhaps?
core/src/main/java/google/registry/model/tld/Tld.java
line 577 at r6 (raw file):
/** Returns the time when this TLD was enrolled in the Brand Safety Alliance (BSA) program. */ @JsonIgnore // Annotation can be removed once we add the field and annotate it.
So is this going to be removed or not? I think it should be.
core/src/main/java/google/registry/bsa/BlockList.java
line 18 at r4 (raw file):
Previously, weiminyu (Weimin Yu) wrote…
We can have a refactor PR for naming etc when all is done.
Yeah it's fine, this is low importance obviously and trivial to refactor later on. It's just that there's quite a lot of different files floating around and I think some things could be consolidated eventually.
core/src/main/java/google/registry/bsa/persistence/BsaDomainInUse.java
line 29 at r4 (raw file):
Previously, weiminyu (Weimin Yu) wrote…
Started the renaming process. New table checked in.
Thank you!
Add the BsaDomainRefresh class which tracks the refresh actions. The refresh actions checks for changes in the set of registered and reserved domains, which are called unblockables to BSA.
065fc66
to
1660eae
Compare
Add a skeletion download action.
Also added a downloader (BlockListFetcher) and a GcsClient.
This change is