You can fit models with more parameters into smaller GPUs with quantization!

822 views 27________

Large language models are bags of floating point numbers, so how do you figure out how much vram you need? Try this simple back-of-the-napkin formula:

VRAM = parameter count * parameter size + 30% for inference

You can fit models with more parameters into smaller GPUs with quantization!

コメント