Skip to content

Conversation

@j-coll
Copy link
Member

@j-coll j-coll commented Jul 30, 2025

No description provided.

@j-coll j-coll requested a review from pfurio July 30, 2025 07:25
@halender
Copy link
Contributor

@j-coll j-coll requested a review from Copilot July 30, 2025 10:15
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request implements functionality to load multiple VCF files with the same name (TASK-6346). The solution handles file name duplication by tracking duplicated file names in metadata and using full file paths as identifiers when conflicts occur.

  • Adds duplicated file name tracking in metadata management
  • Implements file path-based storage for files with duplicate names
  • Provides new administrative tools for file/workspace URI management

Reviewed Changes

Copilot reviewed 52 out of 52 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
opencga-storage/opencga-storage-hadoop/opencga-storage-hadoop-core/src/test/java/org/opencb/opencga/storage/hadoop/variant/HadoopVariantStorageEngineDuplicatedFileNameTest.java New test class extending base duplicated file name tests for Hadoop storage
opencga-storage/opencga-storage-hadoop/opencga-storage-hadoop-core/src/main/java/org/opencb/opencga/storage/hadoop/variant/metadata/HBaseFileMetadataDBAdaptor.java Updates file metadata storage to handle duplicated names in HBase
opencga-storage/opencga-storage-core/src/main/java/org/opencb/opencga/storage/core/metadata/VariantStorageMetadataManager.java Core logic for handling file registration with duplicate names and metadata management
opencga-storage/opencga-storage-core/src/main/java/org/opencb/opencga/storage/core/metadata/models/FileMetadata.java Adds duplicatedName field to track original name for duplicated files
opencga-core/src/main/java/org/opencb/opencga/core/models/file/FileUriChangeParam.java New model for file URI change operations
opencga-catalog/src/main/java/org/opencb/opencga/catalog/managers/FileManager.java Implements file URI update functionality with validation
opencga-app/src/main/java/org/opencb/opencga/app/cli/main/executors/FilesCommandExecutor.java CLI executor for file URI update operations

* Name of the file, if it is duplicated.
* If this is not null, then the file is a duplicated file.
* The name of the original file is stored here.
*/
Copy link

Copilot AI Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The duplicatedName field should have a @JsonProperty annotation or similar to ensure proper serialization/deserialization, especially given this is a metadata field that will be persisted.

Suggested change
*/
*/
@com.fasterxml.jackson.annotation.JsonProperty

Copilot uses AI. Check for mistakes.
}
}
} else {
getFileIdOrDuplicated(studyId, filePath);
Copy link

Copilot AI Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line appears to call getFileIdOrDuplicated but doesn't use the return value. This looks like a leftover from development that should be removed as it serves no purpose.

Suggested change
getFileIdOrDuplicated(studyId, filePath);

Copilot uses AI. Check for mistakes.
Comment on lines 4025 to 4030
if (Files.exists(oldPath)) {
throw new CatalogException("File " + param.getOriginal() + " still exists in the path.");
}
if (Files.notExists(newPath)) {
throw new CatalogException("File " + param.getUpdated() + " not found in the path.");
}
Copy link

Copilot AI Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file existence checks should be performed with proper error handling for race conditions. Between checking if oldPath exists and newPath exists, the file system state could change, potentially leading to inconsistent behavior.

Suggested change
if (Files.exists(oldPath)) {
throw new CatalogException("File " + param.getOriginal() + " still exists in the path.");
}
if (Files.notExists(newPath)) {
throw new CatalogException("File " + param.getUpdated() + " not found in the path.");
}
try {
if (Files.isRegularFile(oldPath)) {
throw new CatalogException("File " + param.getOriginal() + " still exists in the path.");
}
if (!Files.isRegularFile(newPath)) {
throw new CatalogException("File " + param.getUpdated() + " not found in the path.");
}
} catch (IOException e) {
throw new CatalogException("Error accessing file paths: " + e.getMessage(), e);
}

Copilot uses AI. Check for mistakes.
OpenCGAResult<Long> countResult = count(clientSession, tmpQuery);
if (countResult.getNumMatches() > 0) {
throw new CatalogDBException("There already exists another file under path '" + desiredPath + "'.");
if (parameters.containsKey(QueryParams.URI.key()) && parameters.containsKey(QueryParams.PATH.key())) {
Copy link

Copilot AI Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation that prevents setting both URI and PATH simultaneously should be documented in the method javadoc to clarify this constraint for API consumers.

Copilot uses AI. Check for mistakes.
}
fileIdCache.clear();
fileNameCache.clear();
fileId = DUPLICATED_NAME_ID;
Copy link

Copilot AI Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The magic constant DUPLICATED_NAME_ID (-999) is used throughout the code but could benefit from being defined as a properly documented constant with its meaning and usage clearly explained.

Copilot uses AI. Check for mistakes.
Comment on lines +163 to +164
if (!configuration.getWorkspace().contains(newInstallationDirectory)
&& !newInstallationDirectory.contains(configuration.getWorkspace())) {
Copy link

Copilot AI Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workspace validation using string contains() is insufficient for path security. This could allow path traversal attacks. Use proper path canonicalization and validation instead.

Suggested change
if (!configuration.getWorkspace().contains(newInstallationDirectory)
&& !newInstallationDirectory.contains(configuration.getWorkspace())) {
java.nio.file.Path workspacePath = java.nio.file.Paths.get(configuration.getWorkspace()).toRealPath();
java.nio.file.Path newDirectoryPath = java.nio.file.Paths.get(newInstallationDirectory).toRealPath();
if (!newDirectoryPath.startsWith(workspacePath)) {

Copilot uses AI. Check for mistakes.
@j-coll j-coll requested a review from jtarraga September 22, 2025 08:11
@jtarraga jtarraga requested a review from Copilot October 3, 2025 08:49
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 52 out of 52 changed files in this pull request and generated no new comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

5 participants