
You can fit models with more parameters into smaller GPUs with quantization!
822 views
27________
Large language models are bags of floating point numbers, so how do you figure out how much vram you need? Try this simple back-of-the-napkin formula:
VRAM = parameter count * parameter size + 30% for inference
コメント