Skip to content
This repository has been archived by the owner on Oct 12, 2021. It is now read-only.

Support table parameters #45

Merged
merged 11 commits into from
Nov 20, 2019
Merged

Support table parameters #45

merged 11 commits into from
Nov 20, 2019

Conversation

max-jacobs
Copy link
Contributor

Making a small update for SY to support the latest Circus Train and to allow skipping CT's housekeeping process.

Main change is in CircusTrainRunner


@Configuration
@ConfigurationProperties
public class OrphanedDataStrategyConfig {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so we need a suffix on this one since it's not the actual OrphanedDataStrategy. I see we have one other class in this package called EventReceiverConfiguration - should we follow the naming convention and change Config->Configuration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes good point, one day I'll name these config classes appropriately...

import com.hotels.bdp.circustrain.api.conf.OrphanedDataStrategy;

@Configuration
@ConfigurationProperties
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this need a prefix "orphaned-data-strategy" to get bound to the value from the YAML?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so because in this case there are no "sub" properties. Spring is clever enough to know to map the yaml property "orphaned-data-strategy" to the field in here called "orphanedDataStrategy". I've tested this out on EMR doing an actual replication and it works. Shame there's no integration tests in SY like in CT to test out different config settings.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think can be separate we already have an issue: #5

@@ -147,6 +164,7 @@ The table below describes all the available configuration values for Shunting Ya
|`sqs.queue`|Yes|Fully qualified URI of the [AWS SQS](https://aws.amazon.com/sqs/) Queue to read the Hive events from.|
|`sqs.wait.time.seconds`|No|Wait time in seconds for which the receiver will poll the SQS queue for a batch of messages. Default is 10 seconds. Read more about long polling with AWS SQS [here](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-long-polling.html).|
|`source-table-filter.table-names`|No|A list of tables selected for Shunting Yard replication. Supported format: `database_1.table_1, database_2.table_2`. If these are not provided, Shunting Yard will not replicate any tables.|
|`orphaned-data-strategy`|No|Orphaned data strategy to use for replications. Possible values: `NONE` and `HOUSEKEEPING`. Default is `HOUSEKEEPING`.|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm missing where this default happens in the code? I see you've added the ability to set an orphaned data strategy but is there a test (maybe an existing one so it doesn't show up in the PR) that shows that if you don't set this then the value is HOUSEKEEPING?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this happens in OrphanedDataStrategyConfig line 26 the default is if no yaml configuration overrides this initial setting.

So there's no test for this because i guess it would require an integration test of sorts in order to test the mapping from yaml to bean, and we don't have these in SY at the moment. should I make it part of this PR to add these?

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
README.md Outdated
@@ -163,6 +181,18 @@ Graphite configuration can be passed to Shunting Yard using an optional `--ct-co
namespace: com.company.shuntingyard
prefix: dev

### Configuring table transformations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section in not required to be added coz it's already available in CT's readme.

Instead, I suggest we add a section after Housekeeping section below titled "Using Beekeeper for Housekeeping" and then provide the table-properties that need to be set in order to activate Beekeeper for your target tables / refer to the README of Beekeeper where you mention it. Similar to how its mentioned for configuring Housekeeping: https://github.com/ExpediaGroup/shunting-yard/blob/master/README.md#sample-ct-configyml-for-housekeeping

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will update this

@@ -96,6 +96,7 @@ public void withMultiValuePartitionFilter() {
assertThat(replication.getReplicaTable().getDatabaseName()).isEqualTo("replicaDatabaseName");
assertThat(replication.getReplicaTable().getTableName()).isEqualTo("replicaTableName");
assertThat(replication.getReplicaTable().getTableLocation()).isEqualTo("replicaTableLocation");
assertThat(replication.getOrphanedDataStrategy()).isEqualTo(OrphanedDataStrategy.HOUSEKEEPING);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be good to add a test when OrphanedDataStrategy is null.

The CircusTrainConfig doesn't do a null check on this argument at the moment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, will add this in, thanks

max-jacobs and others added 5 commits October 22, 2019 11:39
Co-Authored-By: Abhimanyu Gupta <[email protected]>
Co-Authored-By: Abhimanyu Gupta <[email protected]>
Co-Authored-By: Abhimanyu Gupta <[email protected]>
Copy link
Contributor

@massdosage massdosage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, let's just leave this to be updated and merged once we've released a new version of CT.

@max-jacobs
Copy link
Contributor Author

This now uses the latest Circus Train version, also tested this on EMR and it works, so going to merge.

@max-jacobs max-jacobs merged commit d77cca3 into master Nov 20, 2019
@max-jacobs max-jacobs deleted the support-table-parameters branch November 20, 2019 16:29
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants