Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NMT and TGNN Project Documentation with Anas Mohammad #267

Open
thangk opened this issue Nov 26, 2024 · 16 comments
Open

NMT and TGNN Project Documentation with Anas Mohammad #267

thangk opened this issue Nov 26, 2024 · 16 comments
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@thangk
Copy link
Collaborator

thangk commented Nov 26, 2024

A placeholder issue page for tracking all future NMT and TGNN project milestones such as issues, solutions, guides, results, anything related to the two with Anas Mohammad.

I'll update this first post as we progress and needed.

@thangk thangk self-assigned this Nov 26, 2024
@thangk thangk added the documentation Improvements or additions to documentation label Nov 26, 2024
@Groovyfalafel
Copy link

Excited to work under the supervision of Kap.

@thangk thangk changed the title NMT and TGNN Project Milestones with Anas Mohammad NMT and TGNN Project Documentation with Anas Mohammad Nov 26, 2024
@thangk
Copy link
Collaborator Author

thangk commented Nov 29, 2024

@Groovyfalafel

Today's agenda:

  • explain the OpeNTF library and our specific part in the project
  • run the dblp toy dataset
  • get instruction on tasks to work on next week (start convs2s ablation study/experiments)

Can you come on Teams at 2 pm to work on these?

@Groovyfalafel
Copy link

Sounds Great, I'll be available at 3pm, I'm busy till then.

@thangk
Copy link
Collaborator Author

thangk commented Dec 2, 2024

@Groovyfalafel ,

Can you re-run the transformer model but before running do:

  • resync your forked repo as I've updated the _template.sh for using a toy dataset
  • re-duplicate a transformer model, change the train_steps to something like 20 and update the 3 lines below it to be 10% of this 20 value
  • setup a new bash script from the new _template_v1.4.1.sh I've just updated with
  • in the script, set dataset as toy_dblp
  • re-run it again

@Groovyfalafel
Copy link

I will be doing so soon, so far i have

  1. Tried to download the image from docker hub but ran into some few errors, I had to login into docker from the terminal to fix it.

2.Tried running a bash script to run a gith model but it would display an error message indicating there is no such file. Found a solution on stack overflow telling me to download dos2unix. It worked and i can now run bash scripts.

3.Tried running a model but it freezes after a couple of seconds, kept my pc open but still wouldn't finish training the model, the process ran for hours but no result. hopefully after these fixes it should work.

@thangk
Copy link
Collaborator Author

thangk commented Dec 3, 2024

@Groovyfalafel

Were you able to run the toy dblp until the end?

@Groovyfalafel
Copy link

I have tried running it and implemented what you suggested setting the bucket size to 128 yet I am still running into issues.

@thangk
Copy link
Collaborator Author

thangk commented Dec 4, 2024

I have tried running it and implemented what you suggested setting the bucket size to 128 yet I am still running into issues.

Anything in the non error log?

@Groovyfalafel
Copy link

Nothing unusual no

@thangk
Copy link
Collaborator Author

thangk commented Dec 4, 2024

What errors you get? Post what's in the non error log and also like last 20 lines of error log using code blocks here.

@Groovyfalafel
Copy link

Groovyfalafel commented Dec 5, 2024

Loading indexes pickle from ./../data/preprocessed/dblp/toy.dblp.v12.json/indexes.pkl ...
It took 0.016460180282592773 seconds to load from the pickles.
It took 0.03437948226928711 seconds to load the sparse matrices.

Only one GPU detected. Using it (if CUDA is available).

Using device: cuda
Running for (dataset, model): (dblp, nmt_toy_dblp_second_test) ... 
			* corpus_1: 6381
[2024-12-03 21:51:45,573 INFO] Weighted corpora loaded so far:
			* corpus_1: 6382
[2024-12-03 21:51:45,577 INFO] Weighted corpora loaded so far:
			* corpus_1: 6383
[2024-12-03 21:51:45,580 INFO] Weighted corpora loaded so far:
			* corpus_1: 6384
[2024-12-03 21:51:45,584 INFO] Weighted corpora loaded so far:
			* corpus_1: 6385
[2024-12-03 21:51:45,587 INFO] Weighted corpora loaded so far:
			* corpus_1: 6386
[2024-12-03 21:51:45,591 INFO] Weighted corpora loaded so far:
			* corpus_1: 6387
[2024-12-03 21:51:45,594 INFO] Weighted corpora loaded so far:
			* corpus_1: 6388
[2024-12-03 21:51:45,597 INFO] Weighted corpora loaded so far:
			* corpus_1: 6389
[2024-12-03 21:51:45,600 INFO] Weighted corpora loaded so far:
			* corpus_1: 6390
[2024-12-03 21:51:45,603 INFO] Weighted corpora loaded so far:
			* corpus_1: 6391
[2024-12-03 21:51:45,606 INFO] Weighted corpora loaded so far:
			* corpus_1: 6392
[2024-12-03 21:51:45,609 INFO] Weighted corpora loaded so far:
			* corpus_1: 6393
[2024-12-03 21:51:45,612 INFO] Weighted corpora loaded so far:
			* corpus_1: 6394
[2024-12-03 21:51:45,615 INFO] Weighted corpora loaded so far:
			* corpus_1: 6395
[2024-12-03 21:51:45,619 INFO] Weighted corpora loaded so far:
			* corpus_1: 6396
[2024-12-03 21:51:45,621 INFO] Weighted corpora loaded so far:
			* corpus_1: 6397
[2024-12-03 21:51:45,625 INFO] Weighted corpora loaded so far:
			* corpus_1: 6398
[2024-12-03 21:51:45,627 INFO] Weighted corpora loaded so far:
			* corpus_1: 6399
[2024-12-03 21:51:45,631 INFO] Weighted corpora loaded so far:
			* corpus_1: 6400 ```

@thangk
Copy link
Collaborator Author

thangk commented Dec 6, 2024

@Groovyfalafel

Can you lower the bucket size even more to like 16? and then retry.

@Groovyfalafel
Copy link

Froze again this time ending at line 3179.

@thangk
Copy link
Collaborator Author

thangk commented Dec 9, 2024

Froze again this time ending at line 3179.

And nothing in non error log?

@Groovyfalafel
Copy link

No nothing new.

@thangk
Copy link
Collaborator Author

thangk commented Dec 13, 2024

@Groovyfalafel

Okay, we'll try something else. Can you try and use another model, either RNN or CNN (convs2s). Duplicate the model file like the Transformer yml and the rest are the same (ie. making the script for it). Set bucket size to a low number again like 8 or 16).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants