Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add purge commands to CLI tool #373

Merged
merged 27 commits into from
Sep 27, 2023
Merged
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
a3ddd2c
Add helper types for use as flags
TylerHendrickson Sep 25, 2023
13def00
Add cli tool to purge prepared s3 data
TylerHendrickson Sep 25, 2023
3dd50aa
Add cli tool to purge prepared DDB table data
TylerHendrickson Sep 25, 2023
3118a8f
Allow scan parallization in "purge prepared-data-table" cli tool
TylerHendrickson Sep 25, 2023
f2ad486
Allow pausing between S3 uploads during ffis-import CLI command
TylerHendrickson Sep 25, 2023
96e0809
Support --dry-run when purging DDB items
TylerHendrickson Sep 25, 2023
a60a69a
Simulate deletion during --dry-run
TylerHendrickson Sep 25, 2023
2ec6471
Remove QuietMode=true from S3 DeleteObjects operation
TylerHendrickson Sep 25, 2023
77d593c
Debug item preparation
TylerHendrickson Sep 25, 2023
bb5d14a
Call stop() on fatal error
TylerHendrickson Sep 25, 2023
813567d
Fix channel close
TylerHendrickson Sep 25, 2023
5c84500
Ensure batchedRequests channel is closed
TylerHendrickson Sep 25, 2023
b429596
Preserve PK for PutItem attributes
TylerHendrickson Sep 25, 2023
17b8527
Change debug log placement
TylerHendrickson Sep 25, 2023
cc839d1
Keep totals inside goroutines
TylerHendrickson Sep 25, 2023
3b8b47e
Add debug log for prepared PutRequest results
TylerHendrickson Sep 25, 2023
f354707
Temporarily limit scans to 1 item for testing
TylerHendrickson Sep 25, 2023
d8fa5e2
Log final count of purged DDB items
TylerHendrickson Sep 25, 2023
ed903b3
Temporarily limit scans to 10 items for testing
TylerHendrickson Sep 25, 2023
e456b34
Remove artificial scan limits from testing
TylerHendrickson Sep 25, 2023
c85fad5
Add stream-detection safety check
TylerHendrickson Sep 25, 2023
40342ea
Add stream-detection safety check
TylerHendrickson Sep 25, 2023
4852a0b
Refactor stream safety check
TylerHendrickson Sep 25, 2023
d80c57d
Refactor flag names for clarity
TylerHendrickson Sep 26, 2023
8b230fa
Add detailed --help text for purge command
TylerHendrickson Sep 26, 2023
fd6f4b3
Merge branch 'main' into feat/cli-purge-data
TylerHendrickson Sep 26, 2023
85a882a
Merge branch 'main' into feat/cli-purge-data
TylerHendrickson Sep 27, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 11 additions & 5 deletions cli/grants-ingest/ffisImport/cmd.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ package ffisImport
import (
"context"
"fmt"
"time"

"github.com/alecthomas/kong"
"github.com/aws/aws-sdk-go-v2/service/s3"
Expand All @@ -16,11 +17,12 @@ type Cmd struct {
S3Bucket string `arg:"" name:"bucket" help:"Destination S3 bucket name"`

// Flags
S3Prefix string `name:"s3-prefix" help:"Path prefix for mapped S3 keys" default:"sources"`
S3DateLayout string `name:"s3-date-layout" help:"Date layout for mapped S3 keys" default:"2006/01/02"`
S3Suffix string `name:"s3-suffix" help:"Path suffix for mapped S3 keys" default:"ffis.org/download.xlsx"`
S3UsePathStyle bool `name:"s3-use-path-style" help:"Use path-style addressing for S3 bucket"`
DryRun bool `help:"Dry run only - no files will be uploaded to S3"`
S3Prefix string `name:"s3-prefix" help:"Path prefix for mapped S3 keys" default:"sources"`
S3DateLayout string `name:"s3-date-layout" help:"Date layout for mapped S3 keys" default:"2006/01/02"`
S3Suffix string `name:"s3-suffix" help:"Path suffix for mapped S3 keys" default:"ffis.org/download.xlsx"`
S3UsePathStyle bool `name:"s3-use-path-style" help:"Use path-style addressing for S3 bucket"`
DryRun bool `help:"Dry run only - no files will be uploaded to S3"`
Wait time.Duration `help:"Duration to wait between uploads" default:"0s"`
}

func (cmd *Cmd) Help() string {
Expand Down Expand Up @@ -58,6 +60,10 @@ func (cmd *Cmd) Run(app *kong.Kong, logger *log.Logger) error {
logger := log.WithSuffix(*logger,
"source", src, "destination", fmt.Sprintf("s3://%s/%s", cmd.S3Bucket, dst),
"progress", fmt.Sprintf("%d of %d", i+1, len(srcToDst)))
if i > 0 && cmd.Wait > 0 {
log.Info(logger, fmt.Sprintf("Pausing for %s before next upload...", cmd.Wait))
time.Sleep(cmd.Wait)
}
log.Debug(logger, "Uploading file to S3")
if !cmd.DryRun {
if err := uploadToS3(ctx, s3svc, cmd.S3Bucket, src, dst); err != nil {
Expand Down
14 changes: 6 additions & 8 deletions cli/grants-ingest/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ import (
"github.com/go-kit/log/level"
"github.com/posener/complete"
"github.com/usdigitalresponse/grants-ingest/cli/grants-ingest/ffisImport"
"github.com/usdigitalresponse/grants-ingest/cli/grants-ingest/purgeData"
"github.com/usdigitalresponse/grants-ingest/internal/log"
"github.com/willabides/kongplete"
)
Expand Down Expand Up @@ -36,17 +37,14 @@ type CLI struct {
Globals

FFISImport ffisImport.Cmd `cmd:"ffis-import" help:"Import FFIS spreadsheets to S3."`
Purge purgeData.Cmd `cmd:"purge" help:"Purge data from various locations."`

Completion kongplete.InstallCompletions `cmd:"" help:"Install shell completions"`
}

func main() {
cli := CLI{
Globals: Globals{},
}

var logger log.Logger
parser := kong.Must(&cli,
parser := kong.Must(&CLI{Globals: Globals{}},
kong.Name("grants-ingest"),
kong.Description("CLI utility for the grants-ingest service."),
kong.UsageOnError(),
Expand All @@ -59,9 +57,9 @@ func main() {
kongplete.WithPredictor("dir", complete.PredictDirs("*")),
)

ctx, err := parser.Parse(os.Args[1:])
cli, err := parser.Parse(os.Args[1:])
parser.FatalIfErrorf(err)
if err := ctx.Run(&cli.Globals); err != nil {
ctx.Exit(1)
if err := cli.Run(); err != nil {
cli.Exit(1)
}
}
55 changes: 55 additions & 0 deletions cli/grants-ingest/purgeData/cmd.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
package purgeData

import (
"github.com/usdigitalresponse/grants-ingest/cli/grants-ingest/purgeData/preparedDataBucket"
"github.com/usdigitalresponse/grants-ingest/cli/grants-ingest/purgeData/preparedDataTable"
)

type Cmd struct {
// Sub-commands
PreparedDataBucket preparedDataBucket.Cmd `cmd:"prepared-data-bucket" help:"Purge data from the prepared data S3 bucket."`
PreparedDataTable preparedDataTable.Cmd `cmd:"prepared-data-table" help:"Purge data from the prepared data DynamoDB table."`
}

func (cmd *Cmd) Help() string {
return `
This command serves as the entrypoint for subcommands that purge specific types of data from
an environment, especially when targeting data from a specific source (e.g. Grants.gov or FFIS.org),
presumably before running a backfill operation that restores the purged data. These operations
may prove useful in scenarios following a bug that resulted in corrupted data, or where downstream
consumers of published events require new events to be published (which should be avoided in favor
of alternatives, especially if the number of such consumers grows over time, in order to keep the
event-publishing behaviors of this service consistent with its documentation).

In most cases, purge operations that target Grants.gov data are the easiest and most effective
to orchestrate.
The following serves as a runbook for these scenarios:
1. Disable any DynamoDB-based streams/triggers. Although not strictly necessary, this step
may be useful in scenarios where quick restoration is warranted, and helps keep data which
is known to be corrupt from entering the stream in the first place.
2. Purge Grants.gov data from S3 by running the prepared-data-bucket subcommand with --purge-gov
option.
3. Purge Grants.gov data from DynamoDB by running the prepared-data-table subcommand with the
--purge-gov option.
4. Restore DynamoDB-based streams/triggers disabled in Step 1.
5. Trigger Lambda execution to re-ingest the purged data (or wait for the Lambda execution
to run according to schedule).

In addition to the workflow enumerated above, subcommands may be executed on an as-needed basis.
Given the destructive nature of these operations, please exercise caution, especially when dealing
with Production and other shared environments' data.

Observe the following recommended best practices:
- Always test workflows involving these commands against lower environments.
- Make use of --dry-run and --log-level=debug options when running against sensitive environments.
- Consider making bash scripts that can be peer-reviewed to guard against human error.
- Make backups of important data before performing destructive actions.
- Always communicate explicitly before running these commands against sensitive and/or shared
environments.

Finally, it is worth noting that these operations are optimized for disaster-recovery and other
time-sensitive scenarios rather than cost. Testing is of course encouraged, but users should
be aware that scan-type operations like these can be costly when run against large data sets.
Therefore, avoid running these commands in a scheduled or automated fashion that can potentially
result in an unbounded number of executions.`
}
Loading