Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace XSD string type with more appropriate NCName for VariableId, RuleId and ParameterName #42

Open
cdanger opened this issue Jan 10, 2025 · 5 comments
Labels
bug Something isn't working

Comments

@cdanger
Copy link

cdanger commented Jan 10, 2025

In XACML 3.0, the xs:string type is used for the VariableId (in VariableDefinition), RuleId (in Rule) and ParameterName (in CombinerParameter) attributes, which means any XML text can be used as variable ID, rule ID or parameter name, including bad things like an empty string "" or a string starting/ending with whitespace(s), or a string filled with whitespaces, or even worse a string containing multiple lines with line breaks!

NCName (or Name) type is usually more appropriate for this.

@cdanger cdanger added the bug Something isn't working label Jan 10, 2025
@humantypo
Copy link

That makes sense. Thoughts on ramifications to JSON? NCName seems more restrictive than JSON strings so do we specify lowest common denominator for data types…?

@cdanger
Copy link
Author

cdanger commented Jan 11, 2025

For the JSON variant, we can translate NCName to a JSON string type with a pattern as follows in JSON schema (more info):

{  "type": "string",  "pattern": "^[_A-Za-z][-._A-Za-z0-9]*$"}

The pattern value is the regular expression for a NCName according to the definition in the Schema for Datatype Definitions from the XML Schema standard:

...
<xs:simpleType name="NCName" id="NCName">
    ...
      <xs:pattern value="[\i-[:]][\c-[:]]*" id="NCName.pattern">
        ...
  </xs:simpleType>
...

\i (respectively \c) is a XML special shorthand character class for [_:A-Za-z] (respectively [-._:A-Za-z0-9]) (assuming we only use plain ASCII here). More details.

@humantypo
Copy link

This makes sense.** Perhaps we should consider general lexicon of regex definitions that apply to corresponding XML/JSON schema datatypes?

**I think we may want to tighten this one up a bit. It matches things like "-------------------"

https://pythex.org/?regex=%5Cb(0%7C%5B1-9%5D%7B1%2C%7D)%5C.(0%7B1%2C4%7D%7C0*%5B1-9%5D%7B1%2C%7D)(%5C.(0%7B1%2C4%7D%7C0*%5B1-9%5D%7B1%2C%7D))%3F%5Cb&test_string=1.0%0A1.00%0A1.0000%0A1.00000000000000000%0A0000.%0A0.01%0A.01%0A.%0A1.1%0A1.005%0A123.%0A2.3.6%0A09.83%0A9.800000%0A765.8765%0A0%0A1&ignorecase=0&multiline=1&dotall=0&verbose=1

@cdanger
Copy link
Author

cdanger commented Jan 12, 2025

I think your link is wrong, it shows another regex.
"-------------------" does not match ^[_A-Za-z][-._A-Za-z0-9]*$ . First character cannot be a hyphen.

@humantypo
Copy link

Sorry, a typo. It matches “a------------”

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants