You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The specification outlines the float type as 32bit. Python has 64bit floats, hence when packing these per the template are dumped to the output file. Other parsers (e.g. mmtf-java) try to load these as 32bit floats, and hence fail. We can overcome this easily by updating the msgpack.packb call to include use_single_float=True.
However, it seems mmtf-java also violates the standard, and uses doubles (64bit floats) for the ncsOperatorList, thus the above change means it can't parse the output still. Given mmtf-java is used for the RCSB files, we can assume they won't shift to 32bit floats - it'll break their parsing for even more files.
Additionally, the msgpack-python implementation does not support selecting doubles for only one field - msgpack/msgpack-python#326. Instead you have to pack the biological assemblies list separately and then combine it, as in the collapsed snipped below.
Code for packing separately.
# The mmtf standard expects everything as 32bit - hence use_single_float.# Note the encode_data no longer includes bioAssemblyList.main=msgpack.packb(self.encode_data(), use_bin_type=True, use_single_float=True)
# Assemblies need to be 64bit for Java compatibility.assemblies=msgpack.packb(
{"bioAssemblyList": self.bio_assembly},
use_bin_type=True,
use_single_float=False,
)
# In msgpack, the first three bytes of a map (over 15 elements) are `\xde\x12\x34`, where# 1234 gives the map length.# Our `main` map has 30-something elements, hence only the `\x34` matters.# Get the new length indicator, prepended with the map indicator and a `\x00`.new_map_length: bytes=b"\xde\x00"+chr(main[2] +1).encode()
# Strip the first three bytes from `main` (the map indicator byte and two bytes for length).main=main[3:]
# Strip the first byte from `assemblies` (it's less than 15 elements, has a single byte indicator).assemblies=assemblies[1:]
# Finally put it all back together.new_data=new_map_length+main+assemblies
For reference I have raised this issue in the mmtf-java repo too - rcsb/mmtf-java#53.
The text was updated successfully, but these errors were encountered:
The specification outlines the float type as 32bit. Python has 64bit floats, hence when packing these per the template are dumped to the output file. Other parsers (e.g.
mmtf-java
) try to load these as 32bit floats, and hence fail. We can overcome this easily by updating themsgpack.packb
call to includeuse_single_float=True
.However, it seems
mmtf-java
also violates the standard, and uses doubles (64bit floats) for thencsOperatorList
, thus the above change means it can't parse the output still. Givenmmtf-java
is used for the RCSB files, we can assume they won't shift to 32bit floats - it'll break their parsing for even more files.Additionally, the
msgpack-python
implementation does not support selecting doubles for only one field - msgpack/msgpack-python#326. Instead you have to pack the biological assemblies list separately and then combine it, as in the collapsed snipped below.Code for packing separately.
For reference I have raised this issue in the
mmtf-java
repo too - rcsb/mmtf-java#53.The text was updated successfully, but these errors were encountered: