Deep Learning for Lossless Audio Compression
Main Article Content
Abstract
Audio and speech compression techniques are used to reduce the storage of these data in the required space and the transmission rate of these data in the communication and network systems. In this paper, the researchers exploit neural networks and artificial intelligence to compress audio signals. The researchers investigated compression ratios of 8, 4, 2, and 1 (no compression), and then chose the highest ratio of 8. The compromising choice is based on the best SNR of the recovered audio signal and the required time for implementation. The researchers tested 119 different audio files from the standard BBC audio library. The duration of these files is about 1000 seconds. The average SNR was 26.33 dB, and the mean square error was -52.58 dB. To reduce the running time, the epochs were 30, the hidden layers were 64 to 128, the quantization level was 1, the dimensions were 15 to 20, and each second of the input signal needed 100 seconds to be compressed. The input audio signal files were single-channel mono audio, and the stereo multi-channel audio files were reformatted to mono single-channel. According to the results, the proposal process accomplished good audio compression, while the other parameters were acceptable.
Article Details
Section
How to Cite
References
Al-Bayati, A.Q., Al-Araji, A.S. and Ameen, S.H., 2020. Arabic sentiment analysis (ASA) using a deep learning approach. Journal of Engineering, 26(6), pp.85-93. https://doi.org/10.31026/j.eng.2020.06.07.
Alfarhany, A.A.R. and Abdullah, N.A., 2023. Iraqi sentiment and emotion analysis using deep learning. Journal of Engineering, 29(09), pp.150-165. https://doi.org/10.31026/j.eng.2023.09.11.
Amada, S., Sugiura, R., Kamamoto, Y., Harada, N., Moriya, T., Yamada, T. and Makino, S., 2018. Experimental evaluation of waiver predictor for audio lossless coding. In The Acoustical Society of Japan 1018Autumn Meeting, pp. 1149-1152.
Barman, R., Badade, S., Deshpande, S., Agarwal, S. and Kulkarni, N., 2022. Lossless data compression method using deep learning. In Machine Intelligence and Smart Systems: Proceedings of MISS 2021 (pp. 145-151). Singapore: Springer Nature Singapore. http://dx.doi.org/10.1007/978-981-16-9650-3_11.
Chen, Q., Wu, W. and Luo, W., 2021. Lossless compression of sensor signals using an untrained multi-channel recurrent neural predictor. Applied Sciences, 11(21), p.10240. https://doi.org/10.3390/app112110240.
Crocco, M., Cristani, M., Trucco, A. and Murino, V., 2016. Audio surveillance: A systematic review. ACM Computing Surveys (CSUR), 48(4), pp.1-46. https://doi.org/10.1145/2871183.
Cunha, B.Z., Droz, C., Zine, A.M., Foulard, S. and Ichchou, M., 2023. A review of machine learning methods applied to structural dynamics and vibroacoustic. Mechanical Systems and Signal Processing, 200, p.110535. http://dx.doi.org/10.1016/j.ymssp.2023.110535.
Défossez, A., Copet, J., Synnaeve, G. and Adi, Y., 2022. High-fidelity neural audio compression. arXiv preprint arXiv:2210.13438. https://doi.org/10.48550/arXiv.2210.13438.
Dewangan, G. and Maurya, S., 2021. Fault diagnosis of machines using deep convolutional beta-variational autoencoder. IEEE Transactions on Artificial Intelligence, 3(2), pp.287-296. https://doi.org/10.1109/TAI.2021.3110835
Dubois, Y., Bloem-Reddy, B., Ullrich, K. and Maddison, C.J., 2021. Lossy compression for lossless prediction. Advances in Neural Information Processing Systems, 34, pp.14014-14028. https://doi.org/10.48550/arXiv.2106.10800.
Friedland, G., Jia, R., Wang, J., Li, B. and Mundhenk, N., 2020, August. On the impact of perceptual compression on deep learning. In 2020 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR) (pp. 219-224). IEEE. https://doi.org/10.1109/MIPR49039.2020.00052.
Ghadi, N.M. and Salman, N.H., 2022. Deep learning-based segmentation and classification techniques for brain tumor MRI: A review. Journal of Engineering, 28(12), pp.93-112. https://doi.org/10.31026/j.eng.2022.12.07.
Hassan, B.A.R. and Dawood, F.A.A., 2024. Face-based gender classification using deep learning model. Journal of Engineering, 30(01), pp.106-123. https://doi.org/10.31026/j.eng.2024.01.07.
Hemmer, M., Klausen, A., Van Khang, H., Robbersmyr, K.G. and Waag, T.I., 2020. Health indicator for low-speed axial bearings using variational autoencoders. IEEE Access, 8, pp.35842-35852. https://doi.org/10.1109/ACCESS.2020.2974942.
Hennequin, R., Royo-Letelier, J. and Moussallam, M., 2017, March. Codec-independent lossy audio compression detection. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 726-730). IEEE. https://doi.org/10.1109/ICASSP.2017.7952251.
Huang, Q., Liu, T., Wu, X., and Qu, T., 2019. A generative adversarial net-based bandwidth extension method for audio compression. Journal of the Audio Engineering Society, 67(12), pp.986-993. https://doi.org/10.17743/jaes.2019.0047.
Jain, A. and Patel, R., 2009, May. An efficient compression algorithm (ECA) for text data. In 2009 international conference on signal processing systems (pp. 762-765). IEEE. https://doi.org/10.1109/ICSPS.2009.96.
Jing, W., Xiang, X. and Jingming, K., 2014. A novel multichannel audio signal compression method based on tensor representation and decomposition. China Communications, 11(3), pp.80-90. https://doi.org/10.1109/CC.2014.6825261.
Kalinin, S.V., Dyck, O., Jesse, S. and Ziatdinov, M., 2021. Exploring order parameters and dynamic processes in disordered systems via variational autoencoders. Science Advances, 7(17), p.eabd5084. https://doi.org/10.1126/sciadv.abd5084
Liu, Y., 2021, November. Recovery of lossy compressed music based on CNN super-resolution and GAN. In 2021 IEEE 3rd International Conference on Frontiers Technology of Information and Computer (ICFTIC) (pp. 623-629). IEEE. https://doi.org/10.1109/ICFTIC54370.2021.9647041.
Nagaraj, P., Rao, J.S., Muneeswaran, V. and Kumar, A.S., 2020, May. Competent ultra data compression by enhanced features excerption using deep learning techniques. In 2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS) (pp. 1061-1066). IEEE. https://doi.org/10.1109/ICICCS48265.2020.9121126.
Nogales, A., Donaher, S. and García-Tejedor, Á., 2023. A deep learning framework for audio restoration using Convolutional/Deconvolutional Deep Autoencoders. Expert Systems with Applications, 230, p.120586. https://doi.org/10.1016/j.eswa.2023.120586.
Passricha, V. and Aggarwal, R.K., 2019. A hybrid of deep CNN and bidirectional LSTM for automatic speech recognition. Journal of Intelligent Systems, 29(1), pp.1261-1274. https://doi.org/10.1515/jisys-2018-0372.
Pollastro, A., Testa, G., Bilotta, A. and Prevete, R., 2023. Semi-supervised detection of structural damage using variational autoencoder and a one-class support vector machine. IEEE Access. https://doi.org/10.48550/arXiv.2210.05674.
Ramesh, V.; Wang, M., 2021. Recurrent autoencoders with dynamic time warping for near-lossless music compression and minimal-latency transmission. Preprints 2021, ClefNet, 2021030360. https://doi.org/10.20944/preprints202103.0360.v1
San Martin, G., López Droguett, E., Meruane, V. and das Chagas Moura, M., 2019. Deep variational auto-encoders: A promising tool for dimensionality reduction and ball bearing elements fault diagnosis. Structural Health Monitoring, 18(4), pp.1092-1128. https://doi.org/10.1177/1475921718788299.
Schuller, G.D., Yu, B., Huang, D. and Edler, B., 2002. Perceptual audio coding using adaptive pre-and post-filters and lossless compression. IEEE Transactions on Speech and Audio Processing, 10(6), pp.379-390. https://doi.org/10.1109/TSA.2002.803444.
Shang, Z., Sun, L., Xia, Y., and Zhang, W., 2021. Vibration-based damage detection for bridges by deep convolutional denoising autoencoder. Structural Health Monitoring, 20(4), pp.1880-1903. https://doi.org/10.1177/1475921720942836.
Shin, S., Byun, J., Park, Y., Sung, J. and Beack, S., 2022, May. Deep neural network (DNN) audio coder using a perceptually improved training method. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 871-875). IEEE. https://doi.org/10.1109/ICASSP43922.2022.9747575.
Shukla, S., Ahirwar, M., Gupta, R., Jain, S. and Rajput, D.S., 2019, February. Audio compression algorithm using discrete cosine transform (DCT) and Lempel-Ziv-Welch (LZW) encoding method. In 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon) (pp. 476-480). IEEE. https://doi.org/10.1109/COMITCon.2019.8862228.
Shukla, S., Gupta, R., Rajput, D.S., Goswami, Y. and Sharma, V., 2022. A comparative analysis of lossless compression algorithms on uniformly quantized audio signals. International Journal of Image, Graphics and Signal Processing, 13(6), p.59. https://doi.org/10.5815/ijigsp.2022.06.05.
Välimäki, V. and Reiss, J.D., 2016. All about audio equalization: Solutions and frontiers. Applied Sciences, 6(5), p.129. https://doi.org/10.3390/app6050129.
Yasir, M.H. and Al-Barrak, A., 2024. Utilizing deep learning techniques to identify people by palm print. Journal of Engineering, 30(04), pp.87-98. https://doi.org/10.31026/j.eng.2024.04.06.
Yoshimura, T., Hashimoto, K., Oura, K., Nankaku, Y. and Tokuda, K., 2018, December. WaveNet-based zero-delay lossless speech coding. In 2018 IEEE Spoken Language Technology Workshop (SLT) (pp. 153-158). IEEE. https://doi.org/10.1109/SLT.2018.8639598.
Zeghidour, N., Luebs, A., Omran, A., Skoglund, J. and Tagliasacchi, M., 2021. Soundstream: An end-to-end neural audio codec. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30, pp.495-507. https://doi.org/10.1109/TASLP.2021.3129994.