Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

continued refactoring of feature extractor and classifier #38

Merged
merged 14 commits into from
Dec 15, 2023

Conversation

keighrim
Copy link
Member

@keighrim keighrim commented Dec 12, 2023

more fix for #31


Cherry picked old commits of mine, and tried to resolved all the conflicts with current code. @marcverhagen could you verify the code runs? A few notes;

  • I changed many config key names. Please find the full list in the modeling/config/classifier-full.yml file
  • there is also modeling/config/classifier-no-position.yml symlinked to example-config.yml that has configs for the "old" model w/o pos_enc
  • I merged two config files (model config and classifier config) into just classifier config yaml file. I'm not sure how that's going to impact the 'export' code for the model configs in the train.py module, though.

Let me know if you have questions.

@keighrim keighrim changed the title continued refactoring of continued refactoring of feature extractor and classifier Dec 12, 2023
@marcverhagen
Copy link
Contributor

There are a few issues with running the classier, most of them so far seem minor:

  • Both app.py and classify.py have a default config file that does not exist. I ran classify.py, using the configuration in modeling/config/classifier-no-positional.yml.
  • There is a warning that softmax likes a dim argument, but I do not remember having seen this before.
  • The code relies on "other" being included in "self.labels", but it isn't, so that throws an error when the other category has the highest score. Maybe revisit the choice to not include "other".
  • The prediction now is just the value for the highest scoring label, downstream processing needs all of them. This actually makes the previous problem go away, but I still want to revisit the labels.

With some poking around in the code I could have the classifier at least spit out the frame predictions, making it work with the downstream knitting code needs a few more little edits.

I have not yet tried to create a new model and use that.

About combining model config and classifier config...

I assume classifier config means those settings that effect how the classifier operates, like frameRate, and that model config refers to settings like num_layers and dropout. If we combine them then it has to be made clear that the user cannot simply change the latter since they are inherent to the model chosen. Also, I thought the model settings were saved with the model when it was created (alongside the results file), which is why I had model_file and model_config:

model_file: "modeling/models/20231026-164841.kfold_000.pt"
model_config: "modeling/models/20231026-164841.config.yml"

I wasn't necessarily happy with that and was thinking about just having

model: "modeling/models/20231026-164841.kfold_000.pt"

and have the code figure out where to find the configuration of that model.

Merging the two does not impact the trainer's export code, but we do now manually take some of the settings from the config export and add them to another config file. It looks like we now would need to manually update the configurations when we pick a different model.

@marcverhagen
Copy link
Contributor

@keighrim Are the configuration settings in modeling/config/trainer.yml the ones that I should use for generating the new model to include as the default model in the app?

@marcverhagen
Copy link
Contributor

I ran the classifier again using the positional model that I created yesterday, same error:

python classify.py --config modeling/config/classifier-test.yaml --input modeling/data/cpb-aacip-690722078b2-0000-0100.mp4 
Traceback (most recent call last):
  File "/Users/marc/Documents/git/clams/app-swt-detection/classify.py", line 289, in <module>
    classifier = Classifier(**yaml.safe_load(open(args.config)))
  File "/Users/marc/Documents/git/clams/app-swt-detection/classify.py", line 42, in __init__
    self.classifier.load_state_dict(torch.load(config["model_file"]))
  File "/Applications/ADDED/venv/clams/app-swt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2152, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Sequential:
	size mismatch for fc1.weight: copying a param with shape torch.Size([128, 1280]) from checkpoint, the shape in current model is torch.Size([128, 60768]).

The config file used here was the same as classifier-full.yml except for:

  • it uses a different model
  • it uses convnext_timy instead of convnext_lg
  • it uses [ "slate", "chyron", "credit"] as the labels

It did still have a reference to "other" in the labels list so I got rid of that. This gave more errors:

python classify.py --config modeling/config/classifier-test.yaml --input modeling/data/cpb-aacip-690722078b2-0000-0100.mp4 
Traceback (most recent call last):
  File "/Users/marc/Documents/git/clams/app-swt-detection/classify.py", line 289, in <module>
    classifier = Classifier(**yaml.safe_load(open(args.config)))
  File "/Users/marc/Documents/git/clams/app-swt-detection/classify.py", line 42, in __init__
    self.classifier.load_state_dict(torch.load(config["model_file"]))
  File "/Applications/ADDED/venv/clams/app-swt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2152, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Sequential:
	size mismatch for fc1.weight: copying a param with shape torch.Size([128, 1280]) from checkpoint, the shape in current model is torch.Size([128, 60768]).
	size mismatch for fc_out.weight: copying a param with shape torch.Size([4, 64]) from checkpoint, the shape in current model is torch.Size([3, 64]).
	size mismatch for fc_out.bias: copying a param with shape torch.Size([4]) from checkpoint, the shape in current model is torch.Size([3]).

I also tried creating the model again:

python -m modeling.train -c modeling/config/trainer.yml features/feature-extraction
2023-12-14 13:58:07 __main__ INFO     4595099136 Using config: {'num_epochs': 5, 'num_splits': 5, 'img_enc_name': 'convnext_tiny', 'block_guids_train': ['cpb-aacip-254-75r7szdz'], 'block_guids_valid': ['cpb-aacip-254-75r7szdz', 'cpb-aacip-259-4j09zf95', 'cpb-aacip-526-hd7np1xn78', 'cpb-aacip-75-72b8h82x', 'cpb-aacip-fe9efa663c6', 'cpb-aacip-f5847a01db5', 'cpb-aacip-f2a88c88d9d', 'cpb-aacip-ec590a6761d', 'cpb-aacip-c7c64922fcd', 'cpb-aacip-f3fa7215348', 'cpb-aacip-f13ae523e20', 'cpb-aacip-e7a25f07d35', 'cpb-aacip-ce6d5e4bd7f', 'cpb-aacip-690722078b2', 'cpb-aacip-e649135e6ec', 'cpb-aacip-15-93gxdjk6', 'cpb-aacip-512-4f1mg7h078', 'cpb-aacip-512-4m9183583s', 'cpb-aacip-512-4b2x34nt7g', 'cpb-aacip-512-3n20c4tr34', 'cpb-aacip-512-3f4kk9534t'], 'num_layers': 3, 'dropouts': 0.1, 'pos_enc_name': 'sinusoidal-concat', 'pos_unit': 60000, 'pos_enc_dim': 512, 'pos_max_input_length': 5640000, 'bins': {'pre': {'slate': ['S'], 'chyron': ['I', 'N', 'Y'], 'credit': ['C']}}}
2023-12-14 13:58:07 __main__ WARNING  4595099136 sinusoidal-concat
2023-12-14 13:58:08 __main__ INFO     4595099136 train: 0 videos, 0 images, valid: 0 videos, 0 images
2023-12-14 13:58:08 __main__ INFO     4595099136 Skipping fold 0 due to lack of data
2023-12-14 13:58:08 __main__ WARNING  4595099136 sinusoidal-concat
2023-12-14 13:58:08 __main__ INFO     4595099136 train: 0 videos, 0 images, valid: 0 videos, 0 images
2023-12-14 13:58:08 __main__ INFO     4595099136 Skipping fold 1 due to lack of data
2023-12-14 13:58:08 __main__ WARNING  4595099136 sinusoidal-concat
2023-12-14 13:58:09 __main__ INFO     4595099136 train: 0 videos, 0 images, valid: 0 videos, 0 images
2023-12-14 13:58:09 __main__ INFO     4595099136 Skipping fold 2 due to lack of data
2023-12-14 13:58:09 __main__ WARNING  4595099136 sinusoidal-concat
2023-12-14 13:58:09 __main__ INFO     4595099136 train: 0 videos, 0 images, valid: 0 videos, 0 images
2023-12-14 13:58:09 __main__ INFO     4595099136 Skipping fold 3 due to lack of data
2023-12-14 13:58:09 __main__ WARNING  4595099136 sinusoidal-concat
2023-12-14 13:58:10 __main__ INFO     4595099136 train: 0 videos, 0 images, valid: 0 videos, 0 images
2023-12-14 13:58:10 __main__ INFO     4595099136 Skipping fold 4 due to lack of data

And after that we get an error.

@keighrim
Copy link
Member Author

New commits contain many fixes including a fix for the positional encoder bug that ended up with 60768-dimensional vectors.
Replaced built-in models in modeling/models directory, trained with the included trainer.yml config file, and should be compatible with the included classifier.yml config file.

@marcverhagen
Copy link
Contributor

With the latest changes the classifier now runs on the convnext model with positional encodings. After some more testing I will merge this into the 14-clamsapp branch and prepare a new app version.

@marcverhagen
Copy link
Contributor

... or I may just review this pull request so it can be merged into develop

@keighrim
Copy link
Member Author

I made one more small change before merging this. (please find the latest commit msg helpful).

@keighrim keighrim merged commit f033551 into develop Dec 15, 2023
@keighrim keighrim deleted the refactor-feat-extractor branch December 17, 2023 01:50
@marcverhagen marcverhagen restored the refactor-feat-extractor branch February 7, 2024 21:13
@marcverhagen marcverhagen deleted the refactor-feat-extractor branch February 7, 2024 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants