Hugging Face officially launched Kernels, where GPU operators can be installed with a single line of code, just like a model.

This article is machine translated

Show original

According to ME News, on April 15th (UTC+8), 1M AI News reported that Hugging Face CEO Clem Delangue announced the official launch of Kernels on the Hub. GPU operators are low-level optimized code that allows graphics cards to run at their maximum speed, accelerating inference and training by 1.7 to 2.5 times. However, installation has always been a nightmare: taking the most commonly used FlashAttention as an example, local compilation requires approximately 96GB of memory and several hours; slight incompatibility with the PyTorch or CUDA versions can lead to errors, causing most developers to get stuck at this step. Kernels Hub moves the compilation process to the cloud. Hugging Face pre-compiles operators for various graphics cards and system environments. Developers write a single line of code, and the Hub automatically matches the hardware environment, downloading pre-compiled files within seconds for immediate use. Multiple different versions of operators can be loaded in the same process, compatible with torch.compile. Kernels was launched for testing last June and this month it was upgraded to a first-level repository type on the Hub, alongside Models, Datasets, and Spaces. Currently, there are 61 pre-compiled operators covering common scenarios such as attention mechanisms, normalization, hybrid expert routing, and quantization. It supports four hardware acceleration platforms: NVIDIA CUDA, AMD ROCm, Apple Metal, and Intel XPU, and has been integrated into Hugging Face's inference framework TGI and Transformers library. (Source: ME)

Source

Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.

Add to Favorites

Comments

Relevant content