nfent training issue #499

choibigo · 2021-03-15T01:25:29Z

choibigo
Mar 15, 2021

Hi,

I trying to training nfnet. I used the Pytorch internal Optimizer function ( Adam ). But Loss value has increased by more than 1billion won. I want to use the optimizer function inside the pytorch. How can we learn nfnet normally using hte Pytorch internal optimizer function
I trained nfnet using a pytorch internal SGD function. when this is done, the Loss values converage. But the accuracy was very low( I used Cifar-10, test accuracy 69% ) How can we use the Pytorch internal function to improve accuracy? I want to use the nfnet network.

Epoch	Loss	Accuracy
1	1.753645	34.90%
2	1.500068	44.98%
3	1.344993	51.19%
4	1.196938	56.67%
5	1.052379	62.29%
6	0.911386	67.60%
7	0.772456	72.64%
8	0.633716	77.47%
9	0.499606	82.60%
10	0.372976	87.39%
11	0.260337	91.45%
12	0.171906	94.62%
13	0.116029	96.64%
14	0.065437	98.22%
15	0.049405	98.83%
16	0.028897	99.36%
17	0.015737	99.74%
18	0.016722	99.71%
19	0.018367	99.67%
20	0.006228	99.93%
21	0.003135	99.97%
22	0.002308	99.97%
23	0.001698	99.98%
24	0.001331	99.99%
25	0.001087	99.99%
26	0.000905	99.99%
27	0.000768	100.00%
28	0.000658	100.00%
29	0.000574	100.00%
30	0.000509	100.00%
31	0.000444	100.00%
32	0.000397	100.00%
33	0.000346	100.00%
34	0.000307	100.00%
35	0.00027	100.00%
36	0.000241	100.00%
37	0.000218	100.00%
38	0.0002	100.00%
39	0.000183	100.00%
40	0.000169	100.00%
41	0.000158	100.00%
42	0.000148	100.00%
43	0.00014	100.00%
44	0.000132	100.00%
45	0.000124	100.00%
46	0.000118	100.00%
47	0.000112	100.00%
48	0.000107	100.00%
49	0.000102	100.00%
50	9.79E-05	100.00%
51	9.38E-05	100.00%
52	8.99E-05	100.00%
53	8.63E-05	100.00%
54	8.31E-05	100.00%
55	8.00E-05	100.00%
56	7.71E-05	100.00%
57	7.45E-05	100.00%
58	7.20E-05	100.00%
59	6.97E-05	100.00%
60	6.75E-05	100.00%
61	6.53E-05	100.00%
62	6.34E-05	100.00%
63	6.15E-05	100.00%
64	5.97E-05	100.00%
65	5.80E-05	100.00%
66	5.65E-05	100.00%
67	5.49E-05	100.00%
68	5.35E-05	100.00%
69	5.21E-05	100.00%
70	5.08E-05	100.00%
71	4.95E-05	100.00%
72	4.84E-05	100.00%
73	4.72E-05	100.00%
74	4.61E-05	100.00%
75	4.51E-05	100.00%
76	4.41E-05	100.00%
77	4.31E-05	100.00%
78	4.22E-05	100.00%
79	4.13E-05	100.00%
80	4.04E-05	100.00%
81	3.96E-05	100.00%
82	3.88E-05	100.00%
83	3.80E-05	100.00%
84	3.73E-05	100.00%
85	3.66E-05	100.00%
86	3.59E-05	100.00%
87	3.53E-05	100.00%
88	3.46E-05	100.00%
89	3.40E-05	100.00%
90	3.34E-05	100.00%
91	3.28E-05	100.00%
92	3.23E-05	100.00%
93	3.17E-05	100.00%
94	3.12E-05	100.00%
95	3.07E-05	100.00%
96	3.02E-05	100.00%
97	2.97E-05	100.00%
98	2.92E-05	100.00%
99	2.88E-05	100.00%
100	2.83E-05	100.00%

I will wait for your reply.
Thank you

linhduongtuan · 2021-03-15T02:13:24Z

linhduongtuan
Mar 15, 2021

I used RMPROPTF or SGD combined with clip-grad=0.002 and clip-mode=AGC that the models work rather well in my own dataset. I believe it should work with opt Adam. I guess you lack of setting clip-grad and clip-mode. Good luck Linh

…

On 15 Mar 2021, at 08:25, ChoiBigO ***@***.***> wrote: Hi, I trying to training nfnet. I used the Pytorch internal Optimizer function ( Adam ). But Loss value has increased by more than 1billion won. I want to use the optimizer function inside the pytorch. How can we learn nfnet normally using hte Pytorch internal optimizer function — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

0 replies

rwightman · 2021-03-15T04:03:03Z

rwightman
Mar 15, 2021
Maintainer

@choibigo this isn't a bug, so moving to discussion forums, I would not advise deviating far from the paper hparams or the rmsproptf w/ stronger clipping mentioned my @linhduongtuan that I've found to work decently (it can also work with rmsproptf or sgd/momentum + global norm clipping of 1.0.

If you want to use ADAM, you're going to need to do some parameter sweeps

0 replies

rwightman · 2021-03-15T04:08:04Z

rwightman
Mar 15, 2021
Maintainer

And if you aren't using any grad clipping, it won't work so well. The NFRegNets train okay(ish) without grad clipping but the NFResNets and NFNets are pretty unstable without grad clipping. Again, you'll need to do some hparam sweeps to find the best combo of learning rate, opt epsilons, grad clipping for your task and optimizer choice.

0 replies

choibigo · 2021-03-15T04:37:29Z

choibigo
Mar 15, 2021
Author

Thank you rwightman I'm studying about pytorch and networks I don't know much about grad clipping yet Do you have any code to refer to for grad clipping? or is there a link for your reference? It would be very helpful for me if you gave me reference link or code

…

-----Original Message----- From: "Ross ***@***.***> To: ***@***.***>; Cc: ***@***.***>; ***@***.***>; Sent: 2021-03-15 (월) 13:08:18 (GMT+09:00) Subject: Re: [rwightman/pytorch-image-models] nfent training issue (#499) And if you aren't using any grad clipping, it won't work so well. The NFRegNets train okay(ish) without grad clipping but the NFResNets and NFNets are pretty unstable without grad clipping. Again, you'll need to do some hparam sweeps to find the best combo of learning rate, opt epsilons, grad clipping for your task and optimizer choice. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nfent training issue #499

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

nfent training issue #499

choibigo Mar 15, 2021

Replies: 4 comments

linhduongtuan Mar 15, 2021

rwightman Mar 15, 2021 Maintainer

rwightman Mar 15, 2021 Maintainer

choibigo Mar 15, 2021 Author

choibigo
Mar 15, 2021

linhduongtuan
Mar 15, 2021

rwightman
Mar 15, 2021
Maintainer

rwightman
Mar 15, 2021
Maintainer

choibigo
Mar 15, 2021
Author