Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SNOW-1882616] Error out for duplicate keys in variant #929

Merged
merged 2 commits into from
Jan 16, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,8 @@ class DataValidationUtil {

private static final ObjectMapper objectMapper = new ObjectMapper();

private static final JsonFactory factory = new JsonFactory();
private static final JsonFactory factory =
new JsonFactory().configure(JsonGenerator.Feature.STRICT_DUPLICATE_DETECTION, true);

// The version of Jackson we are using does not support serialization of date objects from the
// java.time package. Here we define a module with custom java.time serializers. Additionally, we
Expand Down Expand Up @@ -176,7 +177,16 @@ private static String validateAndParseSemiStructured(
throw valueFormatNotAllowedException(
columnName, snowflakeType, "Not a valid JSON", insertRowIndex);
} catch (IOException e) {
throw new SFException(e, ErrorCode.IO_ERROR, "Cannot create JSON Parser or JSON generator");
if (e.getMessage().contains("Duplicate field")) {
throw valueFormatNotAllowedException(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's 'chain' exceptions like you do below: pass the original e exception to the constructor of the one you throw. This will preserve information contained in the original exception.

I wonder if the "Not a valid JSON: duplicate field" message you're adding to the child exception becomes redundant in this case: the original one already has similar information. Keep or edit or delete, up to you.

Copy link
Contributor

@sfc-gh-lsembera sfc-gh-lsembera Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sfc-gh-dorlovsky We don't want to preserve information here, it may lead to customer data being exposed in client side logs.

columnName, snowflakeType, "Not a valid JSON: duplicate field", insertRowIndex);
}
throw new SFException(
e,
ErrorCode.IO_ERROR,
String.format(
"Cannot create JSON Parser or JSON generator for column %s of type %s, rowIndex:%d",
columnName, snowflakeType, insertRowIndex));
}
// We return the minified string from the result writer
return resultWriter.toString();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -856,6 +856,35 @@ public void testValidateAndParseObject() throws Exception {
() -> validateAndParseObjectNew("COL", Collections.singletonMap("foo", new Object()), 0));
}

@Test
public void testValidateDuplicateKeys() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a test that tests validateAndParseVariantNew, as well as validateAndParseArrayNew (array of objects with duplicated keys)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bringing this up. Added!

// simple JSON object with duplicate keys can not be ingested
expectError(
ErrorCode.INVALID_VALUE_ROW,
() -> validateAndParseObjectNew("COL", "{\"key\":1, \"key\":2}", 0));
expectError(
ErrorCode.INVALID_VALUE_ROW,
() -> validateAndParseVariantNew("COL", "{\"key\":1, \"key\":2}", 0));

// nested JSON object with duplicate keys can not be ingested
expectError(
ErrorCode.INVALID_VALUE_ROW,
() ->
validateAndParseObjectNew("COL", "{\"key\":1, \"nested\":{\"key\":2, \"key\":3}}", 0));
expectError(
ErrorCode.INVALID_VALUE_ROW,
() ->
validateAndParseVariantNew("COL", "{\"key\":1, \"nested\":{\"key\":2, \"key\":3}}", 0));

// array of objects with duplicate keys can not be ingested
expectError(
ErrorCode.INVALID_VALUE_ROW,
() -> validateAndParseArrayNew("COL", "[{\"key\":1, \"key\":2}]", 0));
expectError(
ErrorCode.INVALID_VALUE_ROW,
() -> validateAndParseVariantNew("COL", "[{\"key\":1, \"key\":2}]", 0));
}

@Test
public void testTooLargeVariant() {
char[] stringContent = new char[16 * 1024 * 1024 - 16]; // {"a":"11","b":""}
Expand Down
Loading