Sneha Hanumanthaiah
Publications
Deep learning models have achieved state-of-the-art performance across numerous domains, but their increasing size and computational complexity pose significant challenges for deployment in resource-constrained environments. Model pruning is a key technique to address this issue by reducing the number of model parameters. However, existing methods often present a trade-off between compression rate, computational speed-up, and performance preservation. This paper introduces a novel hybrid pruning methodology that strategically combines Weight Statistics Aware Pruning (WSAP)-based unstructured pruning with hardware-friendly structured channel pruning. Our approach first determines WSAP-driven pruning ratios using a heuristic based on the weights' Coefficient of Variation (CoV), allowing for more aggressive pruning of less critical layers. It then applies both fine-grained and channel-based pruning to maximize model compression while preserving accuracy. We demonstrate the effectiveness and generality of our method on two diverse tasks: Video Quality Assessment (VQA) with the DOVER-Mobile model and Time-Series Forecasting with the CrossFormer model. Our results show that the proposed hybrid method achieves a superior balance of efficiency and performance, reducing model parameters by up to 80% and FLOPs by over 50% while maintaining the accuracy of the original models. These improvements make our method well-suited for trustworthy and efficient deployment of deep learning models in shared and constrained environments.