Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

-inf issue #1

Open
jinz2014 opened this issue Sep 4, 2024 · 0 comments
Open

-inf issue #1

jinz2014 opened this issue Sep 4, 2024 · 0 comments

Comments

@jinz2014
Copy link

jinz2014 commented Sep 4, 2024

Did you see the '-inf' in your run ?

+-----------------------+----------------------------------------------------+
| Parameter             | Value                                              |
+-----------------------+----------------------------------------------------+
| train data pattern    | data/tinyshakespeare/tiny_shakespeare_train.bin    |
| val data pattern      | data/tinyshakespeare/tiny_shakespeare_val.bin      |
| output log file       | nullptr                                            |
| batch size B          | 4                                                  |
| sequence length T     | 512                                                |
| learning rate         | 0.000300                                           |
| val_loss_every        | 20                                                 |
| val_max_steps         | 20                                                 |
| sample_every          | 20                                                 |
| genT                  | 64                                                 |
+-----------------------+----------------------------------------------------+
| device                | Intel(R) Arc(TM) A770 Graphics                     |
+-----------------------+----------------------------------------------------+
| max_sequence_length T | 1024                                               |
| vocab_size V          | 50257                                              |
| padded_vocab_size Vp  | 50304                                              |
| num_layers L          | 12                                                 |
| num_heads NH          | 12                                                 |
| channels C            | 768                                                |
| num_parameters        | 124475904                                          |
+-----------------------+----------------------------------------------------+
| train_num_batches     | 149                                                |
| val_num_batches       | 16                                                 |
+-----------------------+----------------------------------------------------+
allocated 474 MiB for model parameters
allocated 2277 MiB for activations
val loss -inf
allocated 474 MiB for parameter gradients
allocated 78 MiB for activation gradients
allocated 474 MiB for AdamW optimizer state m
allocated 474 MiB for AdamW optimizer state v
step    1/149: train loss -inf (1373.163524 ms, 1491 tok/s)
step    2/149: train loss -inf (561.510626 ms, 3647 tok/s)
step    3/149: train loss -inf (562.402664 ms, 3641 tok/s)
step    4/149: train loss -inf (562.877144 ms, 3638 tok/s)
step    5/149: train loss 3.270137 (562.379342 ms, 3641 tok/s)
step    6/149: train loss -inf (562.222351 ms, 3642 tok/s)
step    7/149: train loss -inf (563.365131 ms, 3635 tok/s)
step    8/149: train loss -inf (562.343304 ms, 3641 tok/s)
step    9/149: train loss -inf (561.130251 ms, 3649 tok/s)
step   10/149: train loss 3.771136 (562.232482 ms, 3642 tok/s)
step   11/149: train loss 3.410619 (562.445602 ms, 3641 tok/s)
step   12/149: train loss -inf (562.126695 ms, 3643 tok/s)
step   13/149: train loss -inf (561.149267 ms, 3649 tok/s)
step   14/149: train loss -inf (562.618085 ms, 3640 tok/s)
step   15/149: train loss 3.552519 (562.104564 ms, 3643 tok/s)
step   16/149: train loss -inf (562.188434 ms, 3642 tok/s)
step   17/149: train loss 3.505062 (561.892586 ms, 3644 tok/s)
step   18/149: train loss 3.899063 (561.642640 ms, 3646 tok/s)
step   19/149: train loss 3.790717 (563.634025 ms, 3633 tok/s)
step   20/149: train loss 4.134653 (560.460468 ms, 3654 tok/s)
val loss -inf
generating:
---
O, disorporate, Bering to arm of Trussell, and for private use take me, since you are these children and not these. Unto the wise he, the fool I misjudged him, set me here
Yea.
Letter from Faith
A great prince I presume

---
step   21/149: train loss 3.076251 (574.284356 ms, 3566 tok/s)
step   22/149: train loss 4.044003 (560.395851 ms, 3654 tok/s)
step   23/149: train loss 3.664719 (561.161018 ms, 3649 tok/s)
step   24/149: train loss 3.619468 (560.915251 ms, 3651 tok/s)
step   25/149: train loss 3.448017 (560.725216 ms, 3652 tok/s)
step   26/149: train loss 3.467965 (562.267702 ms, 3642 tok/s)
step   27/149: train loss -inf (560.669723 ms, 3652 tok/s)
step   28/149: train loss 3.983095 (561.373767 ms, 3648 tok/s)
step   29/149: train loss 3.626441 (561.407721 ms, 3647 tok/s)
step   30/149: train loss 3.650180 (561.128670 ms, 3649 tok/s)
step   31/149: train loss 4.230763 (561.467428 ms, 3647 tok/s)
step   32/149: train loss 3.920545 (561.079357 ms, 3650 tok/s)
step   33/149: train loss 3.523292 (561.587132 ms, 3646 tok/s)
step   34/149: train loss 3.645729 (562.089952 ms, 3643 tok/s)
step   35/149: train loss -inf (561.293155 ms, 3648 tok/s)
step   36/149: train loss 3.296374 (562.754651 ms, 3639 tok/s)
step   37/149: train loss 3.665959 (561.024074 ms, 3650 tok/s)
step   38/149: train loss 3.581248 (561.042401 ms, 3650 tok/s)
step   39/149: train loss -inf (561.285337 ms, 3648 tok/s)
step   40/149: train loss 3.861797 (560.898981 ms, 3651 tok/s)
val loss -inf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant