Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ncsOperatorList read as doubles - mmtf spec says should be floats. #53

Open
zacharyrs opened this issue May 7, 2021 · 7 comments
Open

Comments

@zacharyrs
Copy link

zacharyrs commented May 7, 2021

When unpacking an mmtf file, this implementation expects doubles for the transformation matrices.
The specification outlines the float type as 32bit, and says this field is populated with floats.
Not sure if this should be changed - I suspect it might break parsing existing mmtf files, so maybe it needs to accept both types?

@zacharyrs zacharyrs changed the title ncsOperatorList read as a double - mmtf specification says this should be floats. ncsOperatorList read as doubles - mmtf spec says should be floats. May 7, 2021
@zacharyrs
Copy link
Author

Unfortunately this is breaking cross compatibility with mmtf-python, which by default dumps everything as 64bit floats (doubles).
The msgpack-python implementation doesn't support packing a particular field (the transforms) as 64bit floats, and everything else as 32bit floats - see here.

@zacharyrs
Copy link
Author

I have a partial workaround, by making mmtf-python follow the same decisions as here (all 32bit except the transforms list) - rcsb/mmtf-python#50.

@josemduarte
Copy link
Member

Good catch @zacharyrs ! Thanks for the detailed report.

Changing the RCSB mmtf files is doable but as you say may cause quite some trouble. I like your python workaround as a solution. However, to be consistent the spec would have to officially acknowledge that ncsOperList uses doubles, right?

One important note. MMTF is now is in minimal maintenance mode. The preferred compressed format for PDB data is BinaryCIF.

@zacharyrs
Copy link
Author

zacharyrs commented May 19, 2021

Thanks @josemduarte!

Yes, the python solution basically just means both implementations violate the specification in the same way. It avoids the hassle of breaking things.

I didn't realise mmtf had been dropped to maintenance... I assume BinaryCIF follows the CIF spec, it's just encoded?

I recall CIF not caring about bond information, which was what I liked about mmtf - I guess I'll have to read into it more.

@josemduarte
Copy link
Member

I assume BinaryCIF follows the CIF spec, it's just encoded?

Yes, that's correct

I recall CIF not caring about bond information, which was what I liked about mmtf - I guess I'll have to read into it more.

Bond information is available but indirectly via the chemical component dictionary

@zacharyrs
Copy link
Author

Bond information is available but indirectly via the chemical component dictionary

Is that guaranteed for all molecules or is it optional?

@josemduarte
Copy link
Member

The chemical component dictionary contains all intra-residue bond information. But it is not embedded within the structure BCIF files. We will consider offering the whole chemical component dictionary as one BCIF bundle that should make it more convenient to use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants