Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PMML version downgrade is blocked by Version#XPMML-annotated vendor extension markup #433

Open
pchitimi opened this issue Aug 28, 2024 · 8 comments

Comments

@pchitimi
Copy link

pchitimi commented Aug 28, 2024

Hello! I am seeing the following issue when attempting to export a model Pipeline to PMML 4.3. I am uncertain if the model requires at least 4.4 or if there are other issues at play here.

Exception in thread "main" java.lang.UnsupportedOperationException
	at org.dmg.pmml.Version$1.getVersion(Version.java:23)
	at com.sklearn2pmml.Main.run(Main.java:107)
	at com.sklearn2pmml.Main.main(Main.java:84)`

Using the debug flag, the output I observe is as follows:

python: 3.10.14
sklearn2pmml: 0.110.0
sklearn: 1.3.2
pandas: 2.2.2
numpy: 1.26.4
dill: 0.3.8
joblib: 1.4.2
openjdk: 21.0.4
Executing command:
java -cp /opt/anaconda3/lib/python3.10/site-packages/sklearn2pmml/resources/sklearn2pmml-1.0-SNAPSHOT.jar:/opt/anaconda3/lib/python3.10/site-packages/sklearn2pmml/resources/gson-2.10.1.jar:/opt/anaconda3/lib/python3.10/site-packages/sklearn2pmml/resources/guava-33.0.0-jre.jar:/opt/anaconda3/lib/python3.10/site-packages/sklearn2pmml/resources/h2o-genmodel-3.46.0.4.jar:/opt/anaconda3/lib/python3.10/site-packages/sklearn2pmml/resources/h2o-logger-3.46.0.4.jar:/opt/anaconda3/lib/python3.10/site-packages/sklearn2pmml/resources/h2o-tree-api-0.3.17.jar:/opt/anaconda3/lib/python3.10/site-packages/sklearn2pmml/resources/istack-commons-runtime-4.0.1.jar:/opt/anaconda3/lib/python3.10/site-packages/sklearn2pmml/resources/jackson-annotations-2.13.3.jar:/opt/anaconda3/lib/python3.10/site-packages/sklearn2pmml/resources/jakarta.activation-2.0.1.jar:/opt/anaconda3/lib/python3.10/site-packages/sklearn2pmml/resources/jakarta.xml.bind-api-3.0.1.jar:/opt/anaconda3/lib/python3.10/site-packages/sklearn2pmml/resources/jaxb-core-3.0.2.jar:/opt/anaconda3/lib/python3.10/site-packages/sklearn2pmml/resources/jaxb-runtime-3.0.2.jar:/opt/anaconda3/lib/python3.10/site-packages/sklearn2pmml/resources/jcommander-1.72.jar:/opt/anaconda3/lib/python3.10/site-packages/sklearn2pmml/resources/pickle-1.5.jar:/opt/anaconda3/lib/python3.10/site-packages/sklearn2pmml/resources/pmml-converter-1.5.6.jar:/opt/anaconda3/lib/python3.10/site-packages/sklearn2pmml/resources/pmml-h2o-1.2.12.jar:/opt/anaconda3/lib/python3.10/site-packages/sklearn2pmml/resources/pmml-lightgbm-1.5.3.jar:/opt/anaconda3/lib/python3.10/site-packages/sklearn2pmml/resources/pmml-model-1.6.4.jar:/opt/anaconda3/lib/python3.10/site-packages/sklearn2pmml/resources/pmml-model-metro-1.6.4.jar:/opt/anaconda3/lib/python3.10/site-packages/sklearn2pmml/resources/pmml-python-1.2.2.jar:/opt/anaconda3/lib/python3.10/site-packages/sklearn2pmml/resources/pmml-sklearn-1.8.4.jar:/opt/anaconda3/lib/python3.10/site-packages/sklearn2pmml/resources/pmml-sklearn-extension-1.8.4.jar:/opt/anaconda3/lib/python3.10/site-packages/sklearn2pmml/resources/pmml-sklearn-h2o-1.8.4.jar:/opt/anaconda3/lib/python3.10/site-packages/sklearn2pmml/resources/pmml-sklearn-lightgbm-1.8.4.jar:/opt/anaconda3/lib/python3.10/site-packages/sklearn2pmml/resources/pmml-sklearn-statsmodels-1.8.4.jar:/opt/anaconda3/lib/python3.10/site-packages/sklearn2pmml/resources/pmml-sklearn-xgboost-1.8.4.jar:/opt/anaconda3/lib/python3.10/site-packages/sklearn2pmml/resources/pmml-statsmodels-1.1.0.jar:/opt/anaconda3/lib/python3.10/site-packages/sklearn2pmml/resources/pmml-xgboost-1.8.5.jar:/opt/anaconda3/lib/python3.10/site-packages/sklearn2pmml/resources/serpent-1.40.jar:/opt/anaconda3/lib/python3.10/site-packages/sklearn2pmml/resources/slf4j-api-1.7.36.jar:/opt/anaconda3/lib/python3.10/site-packages/sklearn2pmml/resources/slf4j-jdk14-1.7.36.jar:/opt/anaconda3/lib/python3.10/site-packages/sklearn2pmml/resources/ubjson-0.1.8.jar:/opt/anaconda3/lib/python3.10/site-packages/sklearn2pmml/resources/ubjson-gson-0.1.8.jar com.sklearn2pmml.Main --pkl-input /var/folders/sh/qjnmlk9d3271rj192qp33swc0000gr/T/estimator-ebmpdaxc.pkl.z --pmml-output pda_xgb_raw.pmml --pmml-schema 4.3
Standard output is empty
Standard error:
Exception in thread "main" java.lang.UnsupportedOperationException
	at org.dmg.pmml.Version$1.getVersion(Version.java:23)
	at com.sklearn2pmml.Main.run(Main.java:107)
	at com.sklearn2pmml.Main.main(Main.java:84)`

Thank you for your assistance!

@vruusmann
Copy link
Member

I am seeing the following issue when attempting to export a model Pipeline to PMML 4.3.

Exception in thread "main" java.lang.UnsupportedOperationException
at org.dmg.pmml.Version$1.getVersion(Version.java:23)

This exception is thrown by the Version#XPMML special enum constant:
https://github.com/jpmml/jpmml-model/blob/1.6.5/pmml-model/src/main/java/org/dmg/pmml/Version.java#L16-L25

It means that your model may be PMML 4.3 compatible, but it "contains" some vendor extensions - a XML markup (typically, some XML attribute) which is not part of the PMML specification.

Anyway, the good news is that if you are using JPMML converters (one the Python ML side), and JPMML evaluators (on the Java application side), then this vendor extension is likely to be recognized/supported in both PMML 4.3 and 4.4 modes.

The SkLearn2PMML package should contain special logic for dealing with vendor extensions. The Version#XPMML enum constant is not a standalone PMML version per se. It's more like a "mask" on top of some valid PMML version such as PMML 4.3 or 4.4 (to be interpreted as "PMML 4.3 with some JPMML-specific attributes").

@vruusmann
Copy link
Member

Now, thinking about this issue, then I can think of the following improvements:

  • The Version#XPMML enum constant needs special handling. It should never cause the version downgrade to fail. At most, it may cause some warning messages to be omitted (eg. "The generated document is PMML schema version $major.$minor compatible, but contains such-and-such JPMML vendor extensions").
  • Similarly, the version downgrade should print out a list of blocking (ie. version incompatible) markup. Right now it's operating in a really black box mode. This requires implementing some new version inspection tools into the JPMML-Model library. The current tool (the org.jpmml.model.visitors.VersionInspector class) simply gives a "yes, all good" or "no, something is not right" binary answer, which is not sufficient.
  • All JPMML vendor extensions should have the minimum required JPMML library version information attached to them. Meaning, the downgrade to PMML 4.3 succeeds, but with a caveat - "requires JPMML runtime $major.$minor (or newer) for evaluation".

@vruusmann vruusmann changed the title Issue with exporting to PMML4.3 using pmml_schema parameter PMML version downgrade is blocked by Version#XPMML-annotated vendor extension markup Aug 29, 2024
@vruusmann
Copy link
Member

@pchitimi What you can try right now to clarify the situation: export your model using the default (ie. latest) PMML schema version, and open it in a text editor; then, search for XML element and attributes whose name starts with "x-" (letter "X" followed by hypen). How many/which can you find?

If it's only or two pieces of markup, we can verify them together, and you can then proceed to perform the version downgrade manually - by editing the XML namespace declaration.

@vruusmann
Copy link
Member

Thinking about this issue, then I can think of the following improvements:

Also, perhaps the version downgrade functionality should be available as a separate SkLearn2PMML utility function.

This functionality does (potentially-) have many controlling options. Adding them to the main sklearn2pmml.sklearn2pmml utility function as extra parameters would complicate the situation too much.

@pchitimi
Copy link
Author

pchitimi commented Aug 29, 2024

Thank you very much for the detailed response including the potential improvement paths @vruusmann!

As per your guidance, I was able to identify 4 unique (97 total) XML element/attributes whose name starts with "x-":

<MiningModel functionName="regression" x-mathContext="float">
<MiningModel functionName="classification" algorithmName="XGBoost (GBTree)" x-mathContext="float">
<RegressionModel functionName="classification" normalizationMethod="logit" x-mathContext="float">
<TreeModel functionName="regression" noTrueChildStrategy="returnLastPrediction" x-mathContext="float">

@vruusmann
Copy link
Member

I was able to identify 4 unique XML element/attributes whose name starts with "x-"

They are all <Model>@x-mathContext attributes, which instruct the JPMML evaluator to carry out all model-internal computations using 32-bit floating point data type/math operations (the default would be 64-bit).

Fundamentally, this particular attribute can be omitted without breaking the underlying model (the predicted results will come out with extra precision, which qualifies as "noise"). It is a very ancient vendor extension, which should be recognized by all JPMML-Evaluator 1.4.X and newer versions.

Anyway, my expectation is that the SkLearn2PMML package should never fail because of the <Model>@x-mathContext attribute.

The trouble is that this attribute is always present for XGBoost models.

@pchitimi
Copy link
Author

pchitimi commented Aug 30, 2024

Gotcha, just to make sure I understand:

  • We can remove the <Model>@x-mathContext attributes without significant changes to the model (only a precision change)
  • In conjunction with a manual change to the XML namespace declaration, we can effectively convert the PMML 4.4 file to a PMML 4.3 file by removing these attributes.

Is my understanding correct or did I miss anything?

@vruusmann
Copy link
Member

Is my understanding correct or did I miss anything?

Yes, these two changes should achieve the "PMML schema version downgrade" from 4.4 to 4.3 for XGBoost models.

For comparison, you may train a toy LightGBM model (structurally very similar to XGBoost models), and do the following:

  • Export default (pmml_schema = None)
  • Export to PMML 4.3 (pmml_schema = "4.3")

Then diff these two files (eg. using the command-line diff tool) - you will see exactly what was changed, line by line. Should the the XML namespace URL, and the PMML@version attribute values (the latter being non-critical).

LightGBM models don't need the <Model>@x-mathContext attribute, so the conversion should succeed every time.

vruusmann added a commit to jpmml/jpmml-model that referenced this issue Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants