Automated optimization of unfamiliar tensor operations
We and our colleagues presented a brand new auto-scheduler, DietCode at this year’s Conference on Machine Learning and Systems. It is much more efficient than its predecessors in handling dynamic-shape workloads. DietCode optimizes all shapes simultaneously, unlike existing auto-encoders that optimize each shape individually.
We tested our approach on a natural-language-processing (NLP) task that could take inputs ranging in size from 1 to 128 tokens. We can speed up the process of optimization by almost sixfold when we use a random sample of input sizes, which reflects a plausible distribution in real life. This speedup increases by more than 94 times when we take into account all possible shapes.
DietCode is not only faster but also produces code that performs better. It can improve performance by up to 75% compared to previous auto-schedulers, and up to 19% compared to code optimized manually in existing libraries for tensor operations. It promises to accelerate our customers’ dynamically-shaped machine learning workloads.