Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-29381 Improve XSD generation for ESP services #17349

Merged
merged 2 commits into from
Jul 13, 2023

Conversation

kenrowland
Copy link
Contributor

@kenrowland kenrowland commented May 18, 2023

Replaced explicit XSD and WSDL generation in genreated code by using existing library function.

Signed-Off-By: Kenneth Rowland [email protected]

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Cloud-compatibility
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Smoketest:

  • Send notifications about my Pull Request position in Smoketest queue.
  • Test my draft Pull Request.

Testing:

@github-actions
Copy link

Copy link
Contributor

@asselitx asselitx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only other thing is a kind of overall suggestion- Tony can weigh in too if it is appropriate or not. It may be good to add a kind of unittest or maybe regression test to have a known baseline showing that the esdl and hidl generated wsdls are matching, then we can keep track in future updates if anything breaks it.

response->setContent(verxml.str());
response->setContentType(HTTP_TYPE_APPLICATION_XML_UTF8);
response->setStatus(HTTP_STATUS_OK);
response->send();
return 0;
}

void getServiceSchema(const char *serviceQName, const char *methodQName, double version, const char *nstr,
const StringBuffer &xmlFilename, Owned<IProperties> &params, EsdlXslTypeId EsdlXslType, StringBuffer &schema)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be premature optimization, but it could make sense to consider creating an IEsdlDefinitionHelper as a member of the EspHttpBinding and initializing it with the xslt and parameters at an appropriate time (initialization or service loading). Then we can avoid the overhead of compiling the transform each time the wsdl is requested. However, since the ECLWatch services aren't customer-facing transactional services with latency SLAs, it may not be required. I'm not sure where the pro/con balance is for doing it this way, but may depend on how feasible it is to hook into HttpBinding in a similar way as was done for EsdlBindingImpl.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be premature optimization, but it could make sense to consider creating an IEsdlDefinitionHelper as a member of the EspHttpBinding and initializing it with the xslt and parameters at an appropriate time (initialization or service loading). Then we can avoid the overhead of compiling the transform each time the wsdl is requested. However, since the ECLWatch services aren't customer-facing transactional services with latency SLAs, it may not be required. I'm not sure where the pro/con balance is for doing it this way, but may depend on how feasible it is to hook into HttpBinding in a similar way as was done for EsdlBindingImpl.

I think saving off the compiled esp_service_xml2xsd.xslt after the first use and then reusing it would be a good optimization. These wsdls and xsds are used in quite a few ways. I'd be fine with having a new jira opened for that if you don't want to do it in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimization was something I wanted to do after all uses of the getXsdDefinition method (generated code) are replaced. Then all options/settings are known.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Can you open a JIRA about only compiling the xslt once?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few differences I'm seeing using this transform compared to hidl that I think will likely require some changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparison of the PR esdl-generated wsdl for WsWorkunits vs. hidl-generated yields some differences:

 

WSDL Namespace

In the PR, the default namespace for wsdl is declared on the each parent element that needs it, starting with the <definitions> element. Also the same namespace is declared with the wsdl prefix on the <definitions> element and the <xsl:stylesheet> element.

 

This results in:

 

  1. An extra xmlns:wsdl declaration in the output <definitions> element (possibly not a big deal)
  2. Extra xmlns="" declarations on enum <item> elements, such as those in the LogSelectColumnMode xsd:simpleType. Strictly speaking, these elements probably should be in the xsd namespace, but they aren't in the hidl-generated output.

 

These discrepancies can be cleared up by:

 

  1. Placing the only xmlns="http://schemas.xmlsoap.org/wsdl/" declaration in the <xsl:stylesheet> element.
  2. Declaring xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/" in the stylesheet element.
  3. Adding exclude-result-prefixes="wsdl" attribute to the stylesheet element.

 

Numeric Default Values

 

It appears that where the scm file defines a default numeric value of zero, the esdl wsdl is generating a default="…" attribute but the hidl wsdl is not. Reference <xsd:element name="WUSubmit"> or <xsd:element name="WUAnalyseHotspot">.

 

Strangely, you can also see in WsAnalyseHotspot that hidl does output default="1" for the ThresholdPercent double, while esdl outputs default="1.0". And also both agree when the default value is -1 such as WUQueryResponse.LastPage.

 

Optional

 

It appears that the esdl wsdl is not generating the optional elements. See that <xsd:element name="WUAnalyseHotspot"> is missing RootScope and PropertyOptions. I think this is because you're passing a non-null IProperties in to the call to getDependencies(). Passing a non-null pointer there means that you intend to enforce the 'optional' attributes from the scm definitions. If you do that you'd need to pass in explicitly those optional values that are in effect for the request. However since it is empty, that means you're enforcing the optional flags but there are none in effect.

 

This is confusing I apologize and in retrospect I'd probably make that getDependencies() interface more clear.

 

Element Types

 

It looks like hidl and esdl aren't agreeing on the types output for some cases:

 

hidl type esdl type example
xsd:base64Binary xsd:binary WUDeployWorkunit.Object
xsd:long xsd:int64 WUFile.SizeLimit
xsd:dateTime xsd:string ECLWorkunitLW.DateTimeScheduled

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asselitx for the optional stuff. Are you saying that simply requesting the xsd should include all the optionals?

It's been a long time since I've tried that, but seems like optionals would dissapear if not requests, but ultimately of course we just want to match current behavior

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just getting the wsdl from the link in a browser at the URL like http://127.0.0.1:8010/WsWorkunits

But I think you're right @afishbeck my original suggestion is not the correct solution across the board- I was referencing the behavior of the esdl xsd tool command.

We will for sure need to pass through the actual "optionals" included in the URL request, but at least in this case it appears that by default hidl is showing us those elements marked optional even when its not explicitly included in the URL. I didn't check to see if there was a special case for just that value or if it is exposing all.

Further, I see that this use of a bare [optional] attribute differs from what I've seen used ( across the board I believe) in scapps, where it always has a string label like [optional("foobar")]. I don't know if this "bare" usage is a special case or how it has been treated historically.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asselitx I don't think the bare optionals are correct. Looks like they were added when someone thought it was just a way of marking an entity as "not required". @kenrowland you may want to remove those but in any case if they are never filtered out that is fine.

I'm not sure exactly, but the esdl tool by default I think may be trying to give us the superset. ESP shouldn't expose an optional tag unless it's requested by the caller.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want you can ask Shamser about the bare optionals. I think he added them but thought it just meant the opposite of required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@afishbeck @asselitx I am a little confused on whether the optionals should be included or not. As stated by Terrance, removal of the empty opts parameter and using nullptr instead results in the optional request members being included in the wsdl output. I am not aware of any way externally to optionally include them or not.

Ultimately if we are striving to match what the current hidl generated code outputs, then it needs to be included. The hidl generated code makes no distinction between optional and non-optional and simply outputs all request members.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should test and support real optionals, but ignore the invalid ones just like it seems to do now.

A real optional looks like this:
[optional("myoptionname")] string a;

that string "a" should not show up in the xsd if you do this:
https://esp:8010/wsabc/mymethod?xsd

but should show up if you do this:
https://esp:8010/wsabc/mymethod?xsd&myoptionname

[optional] string a;
Should do nothing, and the "optional" should probably be removed.

Please experiment by adding a real optional to one of the fields and trying the old and new xsd mechanisms.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kenrowland I think my approval was a bit premature. Modifying an ecm file with a correctly implemented [optional("foobar")] attribute has the effect that now it is never output even when &foobar is included in the URL. I'll need to dig in a little more to understand exactly why the [optional] decorations appeared to affect the output in your previous revision.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be addressed now.

@afishbeck
Copy link
Member

The only other thing is a kind of overall suggestion- Tony can weigh in too if it is appropriate or not. It may be good to add a kind of unittest or maybe regression test to have a known baseline showing that the esdl and hidl generated wsdls are matching, then we can keep track in future updates if anything breaks it.

But the hidl generated ones are going away. We could store them off as a unit test, but we'd have to find a way to make it manageable because the actual content of the interfaces can change.

@asselitx
Copy link
Contributor

The only other thing is a kind of overall suggestion- Tony can weigh in too if it is appropriate or not. It may be good to add a kind of unittest or maybe regression test to have a known baseline showing that the esdl and hidl generated wsdls are matching, then we can keep track in future updates if anything breaks it.

But the hidl generated ones are going away. We could store them off as a unit test, but we'd have to find a way to make it manageable because the actual content of the interfaces can change.

Ah yes of course. Probably then it doesn't really make sense in this case.

@afishbeck
Copy link
Member

@kenrowland In regards to the change in types.

xsd:base64Binary vs xsd:binary
xsd:long vs xsd:int64
xsd:dateTime vs xsd:string

There is a chance it would break clients.

A good test would be to have Rodrigo see what changes that made to the Java classes he generates from the xsd/wsdl.

I'm not sure what java types are used for what schema types, but for a made up example, if what was generated as a java LocalDateTime is now generated as a string, then code that uses the generated code might break.

@rpastrana might have some thoughts. But I would consider adding some flags to the xslt or wherever needed to allow you to specify which mapping you want.

@afishbeck
Copy link
Member

or since you are using an isolated xslt it may be even easier than that. I forget where the mappings happen.

@kenrowland
Copy link
Contributor Author

kenrowland commented May 26, 2023

@rpastrana Please comment on whether a date time typed method request member being typed as an xsd:string in the generated WSDL for an ESP service method is an issue. Note that this would be a change from the current generated WSDL

@afishbeck The source type for how the new WSDL generation is done is the service XML generated as part of the build. The XML has date time fields typed as a string. The esdl-xml command used to generate the XML as part of the build would need to change so it honors the the original date time type. I should note that running the esdl command against the source ecm file also results in a "string" type in the generated WSDL. For example, if you run

esdl wsdl .../src/esp/scm/ws_workunits.ecm WsWorkunits > test.out

And inspect the output, you'll see

<xsd:element minOccurs="0" name="DateTimeScheduled" type="xsd:string"/>

The DateTimeScheduled field is declared in the ecm as a date time type.

@afishbeck
Copy link
Member

@rpastrana Please comment on whether a date time typed method request member being typed as an xsd:string in the generated WSDL for an ESP service method is an issue. Note that this would be a change from the current generated WSDL

@afishbeck The source type for how the new WSDL generation is done is the service XML generated as part of the build. The XML has date time fields typed as a string. The esdl-xml command used to generate the XML as part of the build would need to change so it honors the the original date time type. I should note that running the esdl command against the source ecm file also results in a "string" type in the generated WSDL. For example, if you run

esdl wsdl .../src/esp/scm/ws_workunits.ecm WsWorkunits > test.out

And inspect the output, you'll see

<xsd:element minOccurs="0" name="DateTimeScheduled" type="xsd:string"/>

The DateTimeScheduled field is declared in the ecm as a date time type.

Agreed, the fact that the generated xml would have to change would make fixing it trickier.

@rpastrana
Copy link
Member

@kenrowland In regards to the change in types.

xsd:base64Binary vs xsd:binary xsd:long vs xsd:int64 xsd:dateTime vs xsd:string

There is a chance it would break clients.

A good test would be to have Rodrigo see what changes that made to the Java classes he generates from the xsd/wsdl.

I'm not sure what java types are used for what schema types, but for a made up example, if what was generated as a java LocalDateTime is now generated as a string, then code that uses the generated code might break.

@rpastrana might have some thoughts. But I would consider adding some flags to the xslt or wherever needed to allow you to specify which mapping you want.

Thanks for the heads-up @afishbeck, yes changing existing field types will break the hpcc4j client (in general all clients based on the existing WSDLs would be affected).

@afishbeck
Copy link
Member

@kenrowland I would suggest adding an option on the code that generates the xml files so that when an element is generated that would differ between esdl and hidl versions of the xsd an extra attribute would be output. When it doesn't differ it should be left off.

This should be used for all of the xml files generated by our build process.

For example:

<xsd:element minOccurs="0" name="DateTimeScheduled" type="xsd:string" hidl_type="xsd:dateTime" />

Then the esdl xsd will continue to use xsd:string, but the hidl replacement xsd can use hidl_type when it is present.

You could open another JIRA about adding an option to the "esdl xsd" and "esdl wsdl" cli commands to cause them to output the hidl version when it's wanted.

@kenrowland
Copy link
Contributor Author

@afishbeck @asselitx I am going to divert and work on the generated service XML in order to get the types correct. I'll use the strategy mentioned by Tony where a hidl type is output.

@afishbeck
Copy link
Member

I think the title of the PR / Commit should be changed. The XSD isn't improving it's staying the same, this is really about providing a runtime alternative to HIDL generated XSD code.

Copy link
Contributor

@asselitx asselitx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparison against hidl-based wsdls with a variety of different URL parameters showed no differences.

esp/scm/additional.cmake Show resolved Hide resolved
@@ -0,0 +1,239 @@
/*##############################################################################
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you delete the files from their previous locations it will show this as a relocation rather than a new file. In any case we never want the file in two places people (myself especially) will get confused.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted duplicate files.

<?xml version="1.0" encoding="UTF-8"?>
<!--
##############################################################################
# HPCC SYSTEMS software Copyright (C) 2012 HPCC Systems®.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a new file (not just relocated) then the date should be 2023.

@@ -5371,6 +5384,9 @@ void EspServInfo::write_esp_binding()
}
outs("\tDBGLOG(\"Client version: %g\", context.getClientVersion());\n");

// getXSDDefinition(StringBuffer &content, const char *packageName, const char *serviceName, double version, const char *method);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the commented out code you said you were removing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the original generated code that calls the hardcoded code in the service. I left if for reference until it fully removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still think this should just be removed, but I assume this will be removed as part of one of your next PRs. Can be merged as is.

//method ==> getXmlFilename for xsd and wsdl transformations
outf("\nint C%sSoapBinding::getXmlFilename(StringBuffer &filename)\n", name_);
outs("{\n");
outf("\tfilename.append(\"%s\");\n", packagename);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could add the .xml here, producing one line:

outf("\tfilename.append("%s.xml");\n", packagename);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reduced to one line

xlstFile.append("xslt").append(PATHSEPSTR).append("esp_service_xml2xsd.xslt");

defHelper->loadTransform(xlstFile, params, EsdlXslType);
defHelper->toWSDL(*structs, schema, EsdlXslType, version, nullptr, nstr, optFlags);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of passing nullptr for the opts parameter, we'll need to pull the URL parameters from the request and pass those instead. Then will be the issue of how to deal with ecm elements tagged [optional]. Either we can remove that from the ecm if it would have no negative effect, or we could figure out if there is some other value to insert (for this code path) in the IProperties* opts parameter that the transform would recognize as the name for that 'optional'. My hunch is maybe an empty string, but I'll need to check the generated xml format of esdl.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passed the context request parameters to the function and used accordingly.

Copy link
Contributor

@asselitx asselitx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Tested modified WsWorkunits interface with an optional("foobar") element that correctly toggled in the wsdl depending on the presence of the &foobar URL parameter.

Owned<IEsdlDefinition> esdlDef;
Owned<IEsdlDefinitionHelper> defHelper;

esdlDef.set(createEsdlDefinition(nullptr, nullptr));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These look like they would leak to me. Have you tested with valgrind? To hand ownership to "Owned" you should use setown().
But the easiest thing would be just to do:

Owned<IEsdlDefinition> esdlDef = createEsdlDefinition(nullptr, nullptr);
Owned<IEsdlDefinitionHelper> defHelper = createEsdlDefinitionHelper();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed as indicated.

Hard to tell looking at Owned and Shared if it would leak or not.

I ran valgrind and nothing jumped out regarding a leak in that area. I ran after the above change you suggested.

Copy link
Member

@afishbeck afishbeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kenrowland some comments.


std::string xmlFilePath(getCFD());
xmlFilePath.append("esdl_files").append(PATHSEPSTR).append(xmlFilename.str());
esdlDef->addDefinitionsFromFile(xmlFilePath.c_str());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should keep in mind for the future that we could cache some esdlDefs loaded from the xml files if accessing these is perceived to be slow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HPCC-29598 is open to optimize this process. Once all use of the current XSD/WSDL generation are removed (there is at least one more Jira to do that), optimization will be done.

See the License for the specific language governing permissions and
limitations under the License.
############################################################################## */

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this file now exists in two places? It should be removed from the old location and only this version should now be used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the extra copy

See the License for the specific language governing permissions and
limitations under the License.
############################################################################## */

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, file should only exist in one place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the extra copy.

@kenrowland kenrowland requested a review from afishbeck July 7, 2023 15:26
Copy link
Member

@afishbeck afishbeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved.

@@ -5371,6 +5384,9 @@ void EspServInfo::write_esp_binding()
}
outs("\tDBGLOG(\"Client version: %g\", context.getClientVersion());\n");

// getXSDDefinition(StringBuffer &content, const char *packageName, const char *serviceName, double version, const char *method);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still think this should just be removed, but I assume this will be removed as part of one of your next PRs. Can be merged as is.

Replaced explicit XSD and WSDL generation in genreated code by
using existing library function.

Signed-Off-By: Kenneth Rowland [email protected]
@kenrowland
Copy link
Contributor Author

@ghalliday Please merge

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this change is breaking the build

@GordonSmith GordonSmith merged commit 75bc83c into hpcc-systems:master Jul 13, 2023
49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants