Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help testing magika 0.6.0rcX and collect problems #798

Open
reyammer opened this issue Nov 19, 2024 · 9 comments
Open

Help testing magika 0.6.0rcX and collect problems #798

reyammer opened this issue Nov 19, 2024 · 9 comments
Labels
help wanted Extra attention is needed python Pull requests that update Python package rust Pull requests that update Rust code testing This issue is about improving testing

Comments

@reyammer
Copy link
Collaborator

reyammer commented Nov 19, 2024

We have uploaded magika 0.6.0rc2 to pypi: https://pypi.org/project/magika/0.6.0rc2/ (and we'll keep pushing minor updates as new -rcX).

This version, which will become 0.6.0 introduces many more content types and some changes (some of which breaking, but should be easy to port to new version). This version also ships the rust CLI, together with the python API.

Please help us testing this new version on various platforms. To install the latest -rc: pip install --pre magika.

To ease tracking, please report any problem by creating a new github issue and reference this #798 issue.

@reyammer reyammer added help wanted Extra attention is needed python Pull requests that update Python package rust Pull requests that update Rust code testing This issue is about improving testing labels Nov 19, 2024
@reyammer
Copy link
Collaborator Author

reyammer commented Nov 19, 2024

Issue: on Mac, pip install magika==0.6.0rc2 works, but uv add magika==0.6.0rc2 doesn't. The culprit seems to be that uv tries to install onnxruntime==1.20.0, which seems broken. Trying uv add onnxruntime also fails (as it tries to install 1.20.0). Instead, this works: uv add onnxruntime==1.19.2; uv add magika==0.6.0rc2.

Opened an issue with the uv folks: astral-sh/uv#9228

Fix is to add a version constraint for onnxruntime. Tracking here: #801.

@reyammer reyammer changed the title Help testing magika 0.6.0rc2 and collect problems Help testing magika 0.6.0rcX and collect problems Nov 21, 2024
@hauntsaninja
Copy link
Contributor

It looks like magika 0.6 is no longer pure Python. The RC on PyPI is missing builds for platforms like linux aarch64

@reyammer
Copy link
Collaborator Author

reyammer commented Jan 10, 2025

Thanks for reporting; that's something we would like to fix. We'll look into this. In the meantime, if you have any pointer on how to extend the compilation to these other platforms with github runners, please let us know, thanks! /cc @ia0

@ia0
Copy link
Member

ia0 commented Jan 10, 2025

The RC on PyPI is missing builds for platforms like linux aarch64

if you have any pointer on how to extend the compilation to these other platforms with github runners

You can try to modify #851, in particular the rust/onnx/build.sh and maturin.sh files. I got onnx to build but it fails the test suite: https://github.com/google/magika/actions/runs/12716267732/job/35450215968?pr=851

@hauntsaninja
Copy link
Contributor

You can use qemu to emulate aarch64 on Github's x86 runners, see e.g. https://github.com/openai/tiktoken/blob/main/.github/workflows/build_wheels.yml#L47

@ia0
Copy link
Member

ia0 commented Jan 11, 2025

The problem is not the theory but the practice. ONNX Runtime indeed recommends emulation (easy and slow) rather than cross-compilation (difficult and fast). The thing is we use ghcr.io/rust-cross/manylinux_2_28-cross:aarch64 which should already do some cross-compilation setup, but it doesn't seem enough for building ONNX Runtime (it doesn't have aarch64-linux-gnu-gcc for example).

So there's 3 options:

  • Do not use a manylinux container and just build in qemu like you suggest. For this we need to make sure this is still manylinux compatible, see Test python manylinux package #752.
  • Use the cross manylinux container but still use qemu inside it. This should theoretically work. It will be very slow, but we cache ONNX Runtime between builds, so we pay each time we update the ONNX Runtime build script (at least when the version used by the ort crate changes).
  • Use the cross manylinux container and fix it to build ONNX Runtime. This should also theoretically work but needs work (probably more than the qemu in container option above). The build should be much faster in this case.

If you want to try any of those options, that would be great! Otherwise, I'll try to look at it eventually (my preference is the third option, but if it's really too much work, I'll try the second). It would be indeed nice to support Linux aarch64.

@leechristensen
Copy link

I don't know if it's a regression of #371, but event trace logs files aren't being detected correctly for me.

@reyammer
Copy link
Collaborator Author

I don't know if it's a regression of #371, but event trace logs files aren't being detected correctly for me.

Not really a regression; Supporting that file format didn't make it yet in any model we have (Context: There are many GH issues like those; we closed them and tagged them with a special label so that we know what to work on next, but "closed github issue" != "supported".)

@reyammer
Copy link
Collaborator Author

Heads up: one of the reported concerns has been increased inference time for the new standard_v2_1 model. The new standard_v3_0 model is ready for testing (#866) and it should be 3x faster than standard_v2_1 and 20% faster than standard_v1. Integration with rust is tracked in #868.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed python Pull requests that update Python package rust Pull requests that update Rust code testing This issue is about improving testing
Projects
None yet
Development

No branches or pull requests

4 participants