Intel Labs explores adapters with low rank and neural architecture after LLM Compression

Large language models (LLMS) have become indispensable for various natural language processing applications, including machine translation, text summary and conversation AI. However, their increasing complexity and size have led to significant calculation efficiency and memory consumption challenges. As these models grow, resource demand makes them difficult to implement in environments with limited calculation options.

The primary obstacle to LLMS lies in their massive calculation requirements. Exercise and fine -tuning of these models involve billions of parameters, making them resource -intensive and limiting their availability. Existing methods of improving efficiency, such as parameter -efficient fine tuning (PEFT), provide some relief, but often compromise performance. The challenge is to find an approach that can significantly reduce calculation requirements, while the model’s accuracy and efficiency in the real world scenarios. Researchers have examined methods that allow effective model setting without requiring extensive calculation resources.

Researchers at Intel Labs and Intel Corporation have introduced an approach that integrates Lavrang’s adaptation (LORA) with Neural Architecture Search (NAS) techniques. This method seeks to tackle the limitations of traditional fine -tuning methods while improving efficiency and performance. The research team developed a framework that optimizes memory consumption and calculation rate by utilizing structured representations with low rank. The technique involves a weight sharing of super network that dynamically adjusts support structures to improve exercise efficiency. This integration makes it possible to fine -tune the model effectively while maintaining a minimal calculation footprint.

The methodology introduced by Intel Labs is centered around Lona’s (Neural Search with Low Rank) using elastic Lora adapters for model fine tuning. Contrary to conventional approaches that require full fine -tuning of LLMs, Lona’s selective enables model sub -structures, reducing redundancy. The most important innovation lies in the flexibility of the elastic adapters that adjust dynamically based on model requirements. The procedure is supported by heuristic sub -network searches that further streamline the fine -tuning process. By focusing only on relevant model parameters, the technique achieves a balance between calculation efficiency and performance. The process is structured to enable selective activation of low -rank structures while maintaining high inference speed.

Performance evaluation of the proposed method highlights its significant improvements in relation to conventional techniques. Experimental results indicate that Lona’s achieve an infern’s speedup of up to 1.4x, while reducing model parameters by approx. 80%. When used to fine tuning Llama-7B on a 15K Unified Commonsense Reasoning Data Set, Lonas demonstrated an average accuracy result of 65.8%. A comparative analysis of different Lona’s configurations showed that heuristic subnet optimization achieved an infern’s speedup of 1.23x, while search subnet configurations gave speedups of 1.28x and 1.41x. Furthermore, Annas increased on Mistral-7B-V0.3 in GSM8K tasks accuracy from 44.1% to 50.1%, maintaining efficiency across different model sizes. These findings confirm that the proposed method significantly improves the performance of LLMs while reducing calculation requirements.

Further improvements to the framework include the introduction of Shears, an advanced fine -tuning strategy based on Lona’s. Scissors use Neural Lavrang’s adapter search (NLS) to limit the elasticity to the adapter arrangement, reducing unnecessary calculations. The procedure uses sparsity on the base model using predefined measurements, ensuring that fine tuning remains effective. This strategy has been particularly effective in maintaining model accuracy, while the number of active parameters is reduced. Another extension, SQFT, incorporates the sparsity and low numeric precision for improved fine tuning. Using quantization -conscious techniques, SQFT ensures that sparse models can be fine -tuned without losing efficiency. These refinings highlight the adaptability of the Lona’s and its potential for further optimization.

The integration of Lora and NAS offers a transformative approach to large language model optimization. By utilizing structured low -ranking representations, research shows that calculation efficiency can be improved significantly without compromising the benefit. The study conducted by Intel Labs confirms that combining these techniques reduces the burden by fine tuning while ensuring model integrity. Future research could investigate further optimizations, including improved choice of subnetworks and more effective heuristic strategies. This approach sets a precedent to make LLMs more accessible and implementable in different environments and pave the way for more effective AI models.


Check out The paper and the github side. All credit for this research goes to the researchers in this project. Nor do not forget to follow us on Twitter and join in our Telegram Channel and LinkedIn GrOUP. Don’t forget to take part in our 70k+ ml subbreddit.

🚨 Meet Intellagent: An Open Source Multi-Agent framework to evaluate complex conversation AI system (Promoted)


Nikhil is an internal consultant at MarkTechpost. He is pursuing an integrated double degree in materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who always examines applications in fields such as biomaterials and biomedical science. With a strong background in material science, he explores new progress and creates opportunities to contribute.

✅ [Recommended] Join our Telegram -Canal

Leave a Comment