Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Huggingface Integration #228

Merged
merged 100 commits into from
Oct 19, 2024
Merged

WIP: Huggingface Integration #228

merged 100 commits into from
Oct 19, 2024

Conversation

ljchang
Copy link
Member

@ljchang ljchang commented Jul 31, 2024

This PR modifies all models to be compatible with weights stored in huggingface_hub.

pytorch models all use huggingface_hub tools

sklearn and xgboost models all use skops tools

Tasks

  • Retinaface
  • mobilenet
  • mobilefacenet
  • pfld
  • img2pose
  • facenet
  • xgb_au
  • svm_au
  • Resmasknet
  • svm_emo
  • integrate downloading with detector initialization for pytorch models
  • integrate downloading with detector initialization for skops models
  • Retinaface model card
  • mobilenet model card
  • mobilefacenet model card
  • pfld model card
  • img2pose model card
  • facenet model card
  • xgb_au model card
  • svm_au model card
  • Resmasknet model card
  • Fix Retinaface
  • svm_emo model card
  • add batching compatibility
  • add mps compatibility
  • add ability to select model
  • add ability to not include model
  • add video detection
  • add tensor detection
  • add 6 pose instead of just 3

@ljchang
Copy link
Member Author

ljchang commented Aug 3, 2024

@ejolly: I sent this to you on slack as well, but thought I would put everything in the same place for you to play with.

I finished the first version of the new FastDetector version of py-feat. It's pretty fast! even with the au outputs not being in pytorch.

I've now run into a new problem that it would be helpful to have help reasoning through. The new FastDetector, cannot really accommodate batches unfortunately. Due to the way img2pose is written, it processes data as lists of tensors. This is most likely because each image can contain varying numbers of faces. To output a combined tensor of (batch, channel, height, width, n_faces), we would have to pad the n_faces dimension based on the image that has the maximum number of faces and then store a mask to work with the whole thing. That seems to be not a very good idea, which is why I'm guessing the authors opted for the list of tensors.

This isn't really a big deal as I can modify everything to work on lists now, but there will be some overhead and I have no idea what the torch/gpu speedup will be once I'm done.
Alternatively, I can wrap the detector so that it will always loop over each image, all of the faces that are detected is constant, so then we will always get the full benefit of the tensor representation for each image if it contains more than one face, but we won't get a benefit from batching. This is basically how I have it written at the moment. Updating all of the functions to deal with lists will require reworking everything and I have no idea if it will be easy or hard.

Just wanted to get your thoughts on this problem. Essentially, there is a reason why TK/Jin went with list of lists approach. As each frame can potentially contain varying number of faces, it is easiest to just operate on a single image. However, they also then proceeded to loop over faces within an image, which i've addressed in the new version to operate on tensors. If I go with the currently easier option of just looping over each image within the detector class I have no idea if batching will ultimately speed up our computations. I think it helped even with the list of lists representation, but I can't remember by how much.

I've just pushed my whole WIP in repository in case you want to play with it. I made you a FastDetector.py script that contains all of the code I've been working on as well as an example of how to run it.

I've updated a bunch of other functions in the repository for various things.

One thing that I haven't fully solved is the MPS problem. I started working on it last night, but got stuck. I think it should be fixable, but I'm struggling to figure out precisely where the problem is. Haven't tried it on NVIDIA yet.

Thought you might want to play with it and possible see if it runs as is in pyfeatlive.

@ejolly
Copy link
Contributor

ejolly commented Aug 5, 2024

@ljchang Cloned this branch and downloaded the requirements but I can't get your new model class to initialize. Not sure if there's some additional setup steps you did that we need to add to code-base.

I wrote a quick test file to try (pushed my changes so you should be able to):

cd feat/tests (sometimes pytest has weird errors when trying to run it from root of project)
pytest test_fast_detector.py

Here's the error I get:

==================================================================== ERRORS =====================================================================
_______________________________________________ ERROR collecting feat/tests/test_fast_detector.py _______________________________________________
test_fast_detector.py:10: in <module>
    class Test_Fast_Detector:
test_fast_detector.py:13: in Test_Fast_Detector
    detector = FastDetector(device="cpu")
../FastDetector.py:345: in __init__
    au_weights = load_model_weights(model_type='au', model='xgb', location='huggingface')
../pretrained.py:280: in load_model_weights
    loaded_model = load(model_path, trusted=unknown_types)
../../env/lib/python3.11/site-packages/skops/io/_persist.py:152: in load
    instance = tree.construct()
../../env/lib/python3.11/site-packages/skops/io/_audit.py:165: in construct
    self._constructed = self._construct()
../../env/lib/python3.11/site-packages/skops/io/_general.py:408: in _construct
    cls = gettype(self.module_name, self.class_name)
../../env/lib/python3.11/site-packages/skops/io/_utils.py:69: in gettype
    return _import_obj(module_name, cls_or_func)
../../env/lib/python3.11/site-packages/skops/io/_utils.py:64: in _import_obj
    return getattr(importlib.import_module(module, package=package), cls_or_func)
E   AttributeError: module '__main__' has no attribute 'XGBClassifier'
=============================================================== warnings summary ================================================================
../emo_detectors/ResMaskNet/resmasknet_test.py:4
  /Users/esh/Documents/pypackages/py-feat/feat/emo_detectors/ResMaskNet/resmasknet_test.py:4: DeprecationWarning: lib2to3 package is deprecated and may not be able to parse Python 3.10+
    from lib2to3.pytree import convert

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================ short test summary info ============================================================
ERROR test_fast_detector.py - AttributeError: module '__main__' has no attribute 'XGBClassifier'
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
========================================================== 1 warning, 1 error in 1.11s ==========================================================

@ljchang
Copy link
Member Author

ljchang commented Aug 5, 2024

ok, this turned out to be a pretty annoying issue that has to do with security with skops persistence. Basically, they are trying to prevent any malicious file from being loaded and they changed how serialized files are loaded due to a reported security isse However, this makes it difficult to load model weights in other environments from how the original weights were created see this issue.

I make a kludgy solution for now where I manually populate XGBClassifier into the __main__ namespace. I also added a test to make sure that this is true for the future.

@ljchang
Copy link
Member Author

ljchang commented Aug 14, 2024

@ejolly: the landmark issue was just the result of the tensors reordering the coordinates. [x0,y0,x1,y1,...] -> [x0,x1,...,y0,y1,...].

I've also been looking into finding easy ways to speed things up. Tiny effects from using inference_mode and torch.compile.

Would be good to talk over batching. I think I have an idea how to do it more quickly, but it's ugly.

Want to see if this update fixes the landmarks for pyfeat-live now?

@ejolly
Copy link
Contributor

ejolly commented Oct 16, 2024

@ljchang Almost done with this here are the change so far:

  • All references to FastDetector are now replaced with Detector and that old module and tests are gone and new tests are passing
  • MPDetector is a new module that contains the functions and new class for your prototype MediaPipe detector; no tests for class
  • Had to set tests on Python 3.12 to fail-able, not because it's not working on 3.12 (I'm using that version locally), but some stupid github actions problem
  • Also like our other packages, windows testing is experimental, tho I'm tempted to disable it entirely and note we can't official support windows. Thoughts?

Here's what I have left:

  • Pull in some nicer plotly plotting functions in iplot branch
  • Remove old documentation references to deprecated Detector methods and update section on pre-trained default models
  • Merge to main
  • Cut release

Then the new release should be good to go and I'll hook it up to py-feat, and the experimental module can be used at any time by from feat.MPDetector import MPDetector but will be invisible to users otherwise

@ljchang
Copy link
Member Author

ljchang commented Oct 16, 2024

great! that all sounds good to me. I think we are also going to need to update our documentation for the very small api changes.

I think it will be easy to port the new detector tests to MPDetector when we're ready, but for now it's probably not quite ready for that yet. Especially the gaze and head pose estimators.

@LandryBulls
Copy link

Hey guys. I have been working on running the new version of PyFeat, using the most up-to-date version of Detector on the huggingface branch.

I'm facing the issue where, on longer videos, I seem to run out of RAM at the last step when the results are being stored to a pandas dataframe. I am running on a list of image paths generated from a long video (30 mins at 60fps 1080p). The process seems to run completely up until this point at which the kernel dies and the only logging info I get is Killed with no traceback info. This only happens on longer videos. I'm using a batch size of 32 and num_workers = 8 on our lab machine with an RTX 3090 and 126GB of RAM.

The other issue I had to change was line 126 of the feat/utils/image_operations.py script from landmarks = np.array(landmarks).copy() to landmarks = np.array(landmarks.cpu()).copy(). Otherwise it threw an error.

In the meantime I'm going to address the kernel death issue by saving to disk every so many frames, rather than at the end. This has worked for me in the past.

Thanks for all the great work!

@ejolly
Copy link
Contributor

ejolly commented Oct 18, 2024

Thanks @LandryBulls! I just added a new arg you can pass to .detect(save='somefile.csv') which wil tell the detector to append each batch's detection to a csv file instead of holding them in RAM and combining them into a giant dataframe at the end.

We haven't stress tested a super huge job with that much parallelization, so it sounds like your system ran out of RAM right when it needed a lot of it: to combine all batches into a single dataframe and compute identities over it after it finished processing.

Once I finish up some bug fixes and other stuff this will be in the new release!

@ejolly
Copy link
Contributor

ejolly commented Oct 19, 2024

Status: Just gotta finish clean up docs and finalize renaming bug-fixes after going through them (2,3, and 4)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants