D
19

My fine-tuning run kept crashing until I set the gradient accumulation steps to 4

It was failing on a 12GB VRAM card after about 20 minutes every single time. Anyone know other tricks for memory issues with smaller models?
3 comments

Log in to join the discussion

Log In
3 Comments
gavin_kim3
gavin_kim32mo ago
Try lowering your batch size to one.
6
ericfox
ericfox2mo ago
Honestly, lowering the batch size just makes things slower for me.
4
briancampbell
... and that's exactly the kind of thing that trips people up, especially when they're new to it. I've noticed the same pattern everywhere - people always want more of something when less would actually work better. Like with cooking, you always need less salt than you think. Or with tools, you need fewer features to get the job done. Batch size is the same way - it's one of those things where the obvious answer (bigger is faster) is actually wrong in practice. A single batch forces you to deal with one thing at a time, which sounds slower but usually ends up being faster because you're not chasing your tail.
2