Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[api] delete file #1122

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open

[api] delete file #1122

wants to merge 18 commits into from

Conversation

MalinAhlberg
Copy link
Contributor

@MalinAhlberg MalinAhlberg commented Nov 7, 2024

Related issue(s) and PR(s)
This PR closes #1134 .

Description
This PR adds the delete file functionality to the api component. Specifically, it deletes the file from the inbox and it adds a new file log event, setting the file status to disabled.

Also, it adds the fileID to the list functionality of the api, since that field is needed in order to delete a file:

curl -H "Authorization: Bearer $token" "http://localhost:8090/users/[email protected]/files" | jq .
[
  {
    "fileID": "d32f7117-bb85-40e2-9c52-101bf9c1ca5a",
    "inboxPath": "test_dummy.org/race_file.c4gh",
    "fileStatus": "ready",
    "createAt": "2024-11-19T10:50:07.836513Z"
  },
  {...

How to test
make build-all then PR_NUMBER=$(date +%F) docker compose -f .github/integration/sda-s3-integration.yml run integration_test.
List the files (eg with http://localhost:8090/users/[email protected]/files) and make sure files in the inbox can be deleted, and that archived files can not be deleted.

@MalinAhlberg MalinAhlberg force-pushed the feature/api-delete-file branch 8 times, most recently from b31bc38 to 446a1ff Compare November 21, 2024 10:22
@MalinAhlberg MalinAhlberg marked this pull request as ready for review November 21, 2024 10:22
@MalinAhlberg MalinAhlberg requested a review from a team November 21, 2024 10:25
Copy link
Contributor

@kostas-kou kostas-kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job. I have only some minor comments

sda/cmd/api/api.go Outdated Show resolved Hide resolved
sda/internal/database/database.go Show resolved Hide resolved
.github/integration/tests/sda/60_api_admin_test.sh Outdated Show resolved Hide resolved
sda/cmd/api/api.md Show resolved Hide resolved
@MalinAhlberg
Copy link
Contributor Author

MalinAhlberg commented Nov 27, 2024

Great comments @kostas-kou ! Fixed most of them in 92903a6, but left this one for others to see <- solved during stand-up.

kostas-kou
kostas-kou previously approved these changes Nov 28, 2024
pahatz
pahatz previously approved these changes Nov 28, 2024
Copy link
Contributor

@pahatz pahatz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Especially on the tests, very extensive.
I don't have any significant remark on the PR.

sda/cmd/api/api.go Outdated Show resolved Hide resolved
sda/internal/database/db_functions.go Show resolved Hide resolved
@MalinAhlberg
Copy link
Contributor Author

@kostas-kou and @pahatz, thanks for your reviews! I have fixed the weird comment, rebased on main and also rebased to get rid of the fixup-commits. Only 3ac7e5e is new, the rest is the same as when you reviewed.

@MalinAhlberg
Copy link
Contributor Author

...and added 8a745c4 for the rbac

pahatz
pahatz previously approved these changes Nov 29, 2024
aaperis
aaperis previously approved these changes Dec 2, 2024
Copy link
Contributor

@aaperis aaperis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't tested, but it looks good. Tiny minor comments :-)

Comment on lines +44 to +53
last_event=$(psql -U postgres -h postgres -d sda -At -c "SELECT event FROM sda.file_event_log WHERE file_id='$fileid' order by started_at desc limit 1;")

if [ "$last_event" != "disabled" ]; then
echo "The file $fileid does not have the expected las event 'disabled', but $last_event."
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it is too much, but shouldn't the test check at this point that the file was deleted from the s3 bucket as well?

Copy link
Contributor Author

@MalinAhlberg MalinAhlberg Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes... it shoud. Fixed in 109e7e2

- `/file/:username/*fileid`
- accepts `DELETE` requests
- marks the file as `disabled` in the database, and deletes it from the inbox.
- The file is identfied by its id, returned by `users/:username/:files`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The file is identfied by its id, returned by `users/:username/:files`
- The file is identified by its id, returned by `users/:username/:files`

}

submissionUser := c.Param("username")
log.Warn("submission user:", submissionUser)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why log.Warn and not log.Debug or log.Info?

Copy link
Contributor Author

@MalinAhlberg MalinAhlberg Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason 😬 ... Changed in 109e7e2


fileID := c.Param("file")
fileID = strings.TrimPrefix(fileID, "/")
log.Warn("submission file:", fileID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why log.Warn?

Copy link
Contributor Author

@MalinAhlberg MalinAhlberg Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason. Changed in 109e7e2

Copy link
Collaborator

@jbygdell jbygdell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More specific comments are coming

Copy link
Collaborator

@jbygdell jbygdell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a few things here that we need to think more about.

  • In FEGA the submitter can trigger ingestion of an uploaded file. That means we also needs to be able to remove a file both from the inbox and the archive during a cleaning operation. As well as the backup site if that is configured.
  • The API can also be used by the submitter and that means we can't use the same functions or structs sometimes as that might leak things the submitter should not be able to see.

{
"role": "admin",
"path": "/file/*",
"action": "(DELETE)"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"action": "(DELETE)"
"action": "DELETE"

Single action doesn't require parentheses

@@ -54,6 +56,10 @@ func main() {
if err != nil {
log.Fatal(err)
}
Conf.API.INBOX, err = storage.NewBackend(Conf.Inbox)
if err != nil {
log.Fatal(err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log.Fatal is the equivalent of doing log.Error followed by os.Exit(1), something we shouldn't do when we have established connections to the MQ and DB.

The proper way is to have a shutdown function that will ensure that the external connections get's closed before the application closes

@@ -20,6 +20,11 @@
"path": "/file/accession",
"action": "POST"
},
{
"role": "admin",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"role": "admin",
"role": "submission",

Removing files from the inbox is something the helpdesk should be able to do.

@@ -100,6 +106,7 @@ func setup(config *config.Config) *http.Server {
r.POST("/c4gh-keys/add", rbac(e), addC4ghHash) // Adds a key hash to the database
r.GET("/c4gh-keys/list", rbac(e), listC4ghHashes) // Lists key hashes in the database
r.POST("/c4gh-keys/deprecate/*keyHash", rbac(e), deprecateC4ghHash) // Deprecate a given key hash
r.DELETE("/file/:username/*file", rbac(e), deleteFile) // Delete a file from inbox
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The *file here will catch a full path i.e. /path/to/file.c4gh. If we want' that semantics we should probably call it filePath.

If we not intend to use the file path but instead use the file id (to cope with BPs expected upload structure) the *file should be :file or :fileID as we only expect a single value (no slashes).

Comment on lines +312 to +314
filePath := ""
// Get the file path from the fileID and submission user
if filePath, err = Conf.API.DB.GetInboxFilePathFromID(submissionUser, fileID); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
filePath := ""
// Get the file path from the fileID and submission user
if filePath, err = Conf.API.DB.GetInboxFilePathFromID(submissionUser, fileID); err != nil {
// Get the file path from the fileID and submission user
filePath, err := Conf.API.DB.GetInboxFilePathFromID(submissionUser, fileID)
if err != nil {

// The deleteFile function deletes files from the inbox and marks them as
// discarded in the db. Files are identified by their ids and the user id.
func deleteFile(c *gin.Context) {

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

}

// Requires a filepath instead of fileID
// Note: The remove fails randomly sometimes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Note: The remove fails randomly sometimes

File removal should never fail randomly.
If this is in the tests it's because of timing issues in the S3 backend (metadata not stored before deletion is initated).

exit 1
fi
# delete it
resp="$(curl -s -k -L -o /dev/null -w "%{http_code}\n" -H "Authorization: Bearer $token" -H "Content-Type: application/json" -X DELETE "http://api:8080/file/[email protected]/$fileid")"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
resp="$(curl -s -k -L -o /dev/null -w "%{http_code}\n" -H "Authorization: Bearer $token" -H "Content-Type: application/json" -X DELETE "http://api:8080/file/[email protected]/$fileid")"
resp="$(curl -s -k -L -o /dev/null -w "%{http_code}\n" -H "Authorization: Bearer $token" -X DELETE "http://api:8080/file/[email protected]/$fileid")"

Content type header not needed since we are not sending any payload.

@@ -46,6 +46,7 @@ type SyncData struct {
}

type SubmissionFileInfo struct {
FileID string `json:"fileID"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This struct is also used when the submission user wants to list the files that have been uploaded and the FileID is not something that the submitter needs to see.

Comment on lines +88 to +90
"AND EXISTS (SELECT 1 FROM " +
"(SELECT event from sda.file_event_log where file_id = $2 order by started_at desc limit 1) " +
"as subquery WHERE event in ('registered', 'uploaded', 'submitted', 'ingested', 'error'))"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be a bit iffy since in FEGA if the submission user adds a file to a run by mistake that file will be ingest, archived, backed up and given a stable ID (a case we actually will ned to be able to handle).
If we want to limit this to files that only have been uploaded to the inbox ingested can not be a valid event since then the file will also be in the archive.

Instead we probably need to check if the file has had a state of ingested, verified, archived or ready and then also remove it from the archive. If the file has had the state backed up we also need to remove the file from the backup site.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[api] delete files from inbox
7 participants