Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output mmtf uses 64bit floats which violates the mmtf specification. #50

Open
zacharyrs opened this issue May 7, 2021 · 0 comments
Open

Comments

@zacharyrs
Copy link
Contributor

The specification outlines the float type as 32bit. Python has 64bit floats, hence when packing these per the template are dumped to the output file. Other parsers (e.g. mmtf-java) try to load these as 32bit floats, and hence fail. We can overcome this easily by updating the msgpack.packb call to include use_single_float=True.

However, it seems mmtf-java also violates the standard, and uses doubles (64bit floats) for the ncsOperatorList, thus the above change means it can't parse the output still. Given mmtf-java is used for the RCSB files, we can assume they won't shift to 32bit floats - it'll break their parsing for even more files.

Additionally, the msgpack-python implementation does not support selecting doubles for only one field - msgpack/msgpack-python#326. Instead you have to pack the biological assemblies list separately and then combine it, as in the collapsed snipped below.

Code for packing separately.
# The mmtf standard expects everything as 32bit - hence use_single_float.
# Note the encode_data no longer includes bioAssemblyList.
main = msgpack.packb(self.encode_data(), use_bin_type=True, use_single_float=True)

# Assemblies need to be 64bit for Java compatibility.
assemblies = msgpack.packb(
    {"bioAssemblyList": self.bio_assembly},
    use_bin_type=True,
    use_single_float=False,
)

# In msgpack, the first three bytes of a map (over 15 elements) are `\xde\x12\x34`, where
# 1234 gives the map length.

# Our `main` map has 30-something elements, hence only the `\x34` matters.

# Get the new length indicator, prepended with the map indicator and a `\x00`.
new_map_length: bytes = b"\xde\x00" + chr(main[2] + 1).encode()

# Strip the first three bytes from `main` (the map indicator byte and two bytes for length).
main = main[3:]

# Strip the first byte from `assemblies` (it's less than 15 elements, has a single byte indicator).
assemblies = assemblies[1:]

# Finally put it all back together.
new_data = new_map_length + main + assemblies

For reference I have raised this issue in the mmtf-java repo too - rcsb/mmtf-java#53.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant