ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
# Summary
This paper introduces ZeRO (Zero Redundancy Optimizer), a set of memory optimization techniques designed to enable training of extremely large (trillion parameter) deep learning models. ZeR...