Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] Add validateValueType function to ensure data type integrity #1002

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

i5o
Copy link
Contributor

@i5o i5o commented Dec 19, 2023

Added the validateValueType function, significantly enhancing the robustness of our data handling in the createIntegrationEntity process.

The function is designed to enforce strict type checking, ensuring that all data conforms to our predefined SUPPORTED_TYPES.

This only applies for createIntegrationEntity, we already have types for assign but IMO it's never too late to validate anyway

@i5o i5o requested a review from a team as a code owner December 19, 2023 14:23
Copy link

jit-ci bot commented Dec 19, 2023

Hi, I’m Jit, a friendly security platform designed to help developers build secure applications from day zero with an MVS (Minimal viable security) mindset.

In case there are security findings, they will be communicated to you as a comment inside the PR.

Hope you’ll enjoy using Jit.

Questions? Comments? Want to learn more? Get in touch with us.

@xdumaine
Copy link
Contributor

It might be worth adding a couple scale tests to this to ensure that running on tens or hundreds of thousands of values doesn't significantly impact performance. You don't have to include those on every ci run, but they're good to have to validate. We did that with normalization in persister and caught a couple of important bugs that would have caused significant slow downs.

@i5o
Copy link
Contributor Author

i5o commented Dec 19, 2023

@xdumaine I've added tests for 500k of items, they run properly on my M1 Pro laptop.

 PASS  packages/integration-sdk-core/src/data/__tests__/validateValueType.test.ts
  ● Console

    console.log
      Execution time for string dataset: 62.04570806026459 ms

      at Object.log (src/data/__tests__/validateValueType.test.ts:80:15)

    console.log
      Execution time for number dataset: 66.66854095458984 ms

      at Object.log (src/data/__tests__/validateValueType.test.ts:80:15)

    console.log
      Execution time for boolean dataset: 66.96124994754791 ms

      at Object.log (src/data/__tests__/validateValueType.test.ts:80:15)

    console.log
      Execution time for array dataset: 278.7055000066757 ms

      at Object.log (src/data/__tests__/validateValueType.test.ts:80:15)

    console.log
      Execution time for undefined dataset: 68.97012507915497 ms

      at Object.log (src/data/__tests__/validateValueType.test.ts:80:15)

    console.log
      Execution time until error: 0.13349997997283936 ms

      at Object.log (src/data/__tests__/validateValueType.test.ts:118:15)
      ````

@i5o i5o enabled auto-merge December 19, 2023 16:06
@JakeFerrero JakeFerrero disabled auto-merge December 19, 2023 16:18
@i5o i5o changed the title Add validateValueType function to ensure data type integrity [DO NOT MERGE] Add validateValueType function to ensure data type integrity Dec 19, 2023
@i5o
Copy link
Contributor Author

i5o commented Dec 19, 2023

Added [DO NOT MERGE] to review after holidays

@zemberdotnet
Copy link
Member

I think this approach is both valid and a good incremental step forward. It solves a problem we have. I do think that it may be worth considering if using a JSON schema validator on the entities and properties would not provide a more battle-tested form of data validation. We already make use of some JSON Schema validation during development and with better integration I think it could be a powerful safeguard in our platform.

Additionally, JSON schemas would help with search and entity property documentation, a current and longstanding problem.

// For non-array values, check if the type is supported
const valueType = typeof value;
if (!SUPPORTED_TYPES.includes(valueType)) {
throw new IntegrationError({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we keep steps from failing on unsupported data? This is a step towards strictness that'll have any overall impact on job statuses and the ingestion team.
How else could we improve the overall health of jobs and data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should open a bigger discussion for this but this should be a MAJOR, we have some ideas in the AWS Pod as to how to proceed

@Nick-NCSU
Copy link
Contributor

Nick-NCSU commented Jan 2, 2024

Can we simplify this and avoid recursive calls with something like

validate(entity: Entity): void {
  for(const [key, value] of Object.entries(entity)) {
    const valid = Array.isArray(value) ? value.every(p => SUPPORTED_TYPES.includes(typeof p)) : SUPPORTED_TYPES.includes(typeof value);
    valid || throw new IntegrationError({
        code: 'UNSUPPORTED_TYPE',
        message: `Unsupported type found at "${key}": Nested arrays are not supported.`,
      });
  }
}

@i5o
Copy link
Contributor Author

i5o commented Jan 2, 2024

@Nick-NCSU tried adding your code into my existing codebase but couldn't succeed, feel free to push to this branch!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants