Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-29970 Check case-insensitive owner ID in LF/WU #17624

Closed
wants to merge 1 commit into from

Conversation

wangkx
Copy link
Member

@wangkx wangkx commented Jul 27, 2023

The owner ID of a logical file or workunit is based on user's login name which can be case-insensitive. The code for the ID verification should allow the case-insensitive ID as we do in CDaliWorkUnitFactory.getWorkUnitsByOwner().

  1. Change the setFileProtectTree(), getNamedPropTree(), getAttrQueryStr(), and related code to allow case- insensitive owner ID.
  2. Allow case-insensitive owner ID when validating ECL WU access by ownership.
  3. In the IDFUWorkUnitFactory.getWorkUnitsByOwner(), allow case-insensitive owner ID and wildcard match (the same as in CDaliWorkUnitFactory.getWorkUnitsByOwner()).

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Cloud-compatibility
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Smoketest:

  • Send notifications about my Pull Request position in Smoketest queue.
  • Test my draft Pull Request.

Testing:

The owner ID of a logical file or workunit is based on user's
login name which can be case-insensitive. The code for the ID
verification should allow the case-insensitive ID as we do in
CDaliWorkUnitFactory.getWorkUnitsByOwner().
1. Change the setFileProtectTree(), getNamedPropTree(),
getAttrQueryStr(), and related code to allow case-
insensitive owner ID.
2. Allow case-insensitive owner ID when validating ECL WU
access by ownership.
3. In the IDFUWorkUnitFactory.getWorkUnitsByOwner(), allow
case-insensitive owner ID and wildcard match (the same as
in CDaliWorkUnitFactory.getWorkUnitsByOwner()).

Signed-off-by: wangkx <[email protected]>
@github-actions
Copy link

@wangkx wangkx requested a review from afishbeck July 27, 2023 22:19
@@ -3287,7 +3287,7 @@ class CDFUWorkUnitFactory : implements IDFUWorkUnitFactory, implements ISDSSubsc
{
StringBuffer path("*");
if (owner && *owner)
path.append("[@submitID=\"").append(owner).append("\"]");
path.append("[@submitID=?~\"").append(owner).append("\"]");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the '~' to specify a wild match? If so, that MIGHT not be needed anymore (ptree might always support wild card characters?). but I guess it doesn't hurt.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said doesn't hurt and makes the wildcard more explicit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the implication is that wildcard is desired here? (it would not have done so before with a regular equality qualifier)

I would not expect getWorkUnitsByOwner("usera") to also return dfu workunits owned by "notusera" ..

I do notice it's already this way for ecl workunits in it's implementation of getWorkUnitsByOwner, but should it be?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, using anything but a regular equality is lot more inefficient, because jptree optimizes lookups of sibling via qualifiers on attributes with straight equalities (by auto building HT's on the fly), so using "=?" (or =?~") will result in a linear scan of all dfu workunits instead.

The normal approach to case-insensitivity employed elsewhere is to store it lower case it when set, and looked up.
Perhaps the original cased submitID is needed here(?), but we could add an internal attribute for this purpose instead and thus avoid "=?".
Either way, that won't help with pre-existing workunits that don't have the lower-case @submitID (but could be 'fixed' automatically via other changes).

@wangkx - first question is - does submitID need to preserve case when it is set?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakesmith I have 3 concerns for not preserving case. 1. confuse a user ('who changed my username?' or 'the submitID was not in lower case on previous ECLWatch page'). 2. how to handle old WUs which contain case sensitive submitID? 3. break some existing logic (where? I do not know).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wangkx - this is quite a significant performance issue.

  • @submitID doesn't need to change.
  • A new internal attribute can be added (lower-cased) explicitly for the purpose of optimized caseless searching.
  • Old WUs can be scanned for the new missing attribute at startup time (it would be best to do in Dali itself), and the new attribute added.
  • Code for searching attributed can use regular hard equality matching, which will be an order of magnitude faster than the linear search that =? would involve.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this one was marked as resolved.
There are 2 relates issues here:

  1. the wildcard question - why should getWorkUnitsByOwner("usera") return results for "notusera" ?
  2. the efficiency issues caused by not performing a standard equality match

@wangkx
Copy link
Member Author

wangkx commented Aug 3, 2023

@ghalliday please merge.

@ghalliday
Copy link
Member

It looked good to me. @jakesmith please can you sanity check - e.g., that use of =~ doesn't introduce any major inefficiencies.

Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wangkx - a couple of questions.

@@ -3287,7 +3287,7 @@ class CDFUWorkUnitFactory : implements IDFUWorkUnitFactory, implements ISDSSubsc
{
StringBuffer path("*");
if (owner && *owner)
path.append("[@submitID=\"").append(owner).append("\"]");
path.append("[@submitID=?~\"").append(owner).append("\"]");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the implication is that wildcard is desired here? (it would not have done so before with a regular equality qualifier)

I would not expect getWorkUnitsByOwner("usera") to also return dfu workunits owned by "notusera" ..

I do notice it's already this way for ecl workunits in it's implementation of getWorkUnitsByOwner, but should it be?

{
assertex(key[0]=='@');
str.appendf("%s[%s=\"%s\"]",sub,key,name);
if (nameCaseInsensitive)
str.appendf("%s[%s=?\"%s\"]",sub,key,name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as long as these are not top-level properties, and the only one nameCaseInsnsitive=true is used for is for Protect/@name, then the loss of efficiency this implies is okay I think.

@@ -3287,7 +3287,7 @@ class CDFUWorkUnitFactory : implements IDFUWorkUnitFactory, implements ISDSSubsc
{
StringBuffer path("*");
if (owner && *owner)
path.append("[@submitID=\"").append(owner).append("\"]");
path.append("[@submitID=?~\"").append(owner).append("\"]");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, using anything but a regular equality is lot more inefficient, because jptree optimizes lookups of sibling via qualifiers on attributes with straight equalities (by auto building HT's on the fly), so using "=?" (or =?~") will result in a linear scan of all dfu workunits instead.

The normal approach to case-insensitivity employed elsewhere is to store it lower case it when set, and looked up.
Perhaps the original cased submitID is needed here(?), but we could add an internal attribute for this purpose instead and thus avoid "=?".
Either way, that won't help with pre-existing workunits that don't have the lower-case @submitID (but could be 'fixed' automatically via other changes).

@wangkx - first question is - does submitID need to preserve case when it is set?

@wangkx wangkx requested a review from jakesmith August 15, 2023 17:47
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wangkx - please see new comments.

@wangkx
Copy link
Member Author

wangkx commented Aug 16, 2023

Need to re-design. Closing for now.

@wangkx wangkx closed this Aug 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants