Sigma - Virtual Assistant
DOI:
https://doi.org/10.63094/AITUSRJ.25.4.1.3Keywords:
Virtual Assistant, Sigma, Hot word detection, Command Execution, Media playback, Conversational responses, Natural Language Processing , NLP, Machine Learning , ML, Python, Voice recognitionAbstract
This document presents Sigma, a virtual assistant with enhanced abilities for performing more complex actions. Sigma can search for any video on YouTube and play it for the user, launch other applications, chat, send WhatsApp messages, and respond to voice commands. Sigma handles the system commands and the interactions with the users through the PyAudio, pvporcupine and HugChat libraries, allowing him to manage the given tasks, recognize certain trigger words, and interact with users naturally. The project focuses on providing a friendly GUI and combining several modules to build a complete, easy-to-use virtual assistant.
References
Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735-1780.
This paper introduces Long Short-Term Memory (LSTM), a recurrent neural network (RNN) architecture crucial for many natural language processing tasks.
Vinyals, O., & Le, Q. (2015). A Neural Conversational Model. arXiv preprint arXiv:1506.05869.
Discusses a sequence-to-sequence neural network model for conversational agents.
Cho, K., Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. arXiv preprint arXiv:1406.1078.
Explores the use of RNN encoder-decoder architecture, foundational for modern virtual assistants.
Kumar, A., Gupta, P., & Sharma, S. (2018). Smart Virtual Assistant: A Review. Journal of Computational and Theoretical Nanoscience, 15(3), 791-796.
Provides a comprehensive review of smart virtual assistants and their underlying technologies.
Chen, Q., Zhuo, Z., & Wang, W. (2019). Bert for Joint Intent Classification and Slot Filling. arXiv preprint arXiv:1902.10909.
Discusses BERT, a powerful NLP model for tasks like intent classification and slot filling, crucial for virtual assistants.
Sarikaya, R., Hinton, G., & Deoras, A. (2014). Application of Deep Belief Networks for Natural Language Understanding. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(4), 778-784.
Examines the use of deep belief networks in natural language understanding.
Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving Language Understanding by Generative Pre-Training. OpenAI.
Introduces the concept of generative pre-training for language models, a precursor to models like GPT-3.
Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., ... & Dean, J. (2016). Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. arXiv preprint arXiv:1609.08144.
Discusses Google's neural machine translation system, which uses sequence-to-sequence models.
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805.
Introduces BERT, a model that has significantly improved various NLP tasks.
Serban, I. V., Lowe, R., Charlin, L., & Pineau, J. (2015). A Survey of Available Corpora for Building Data-Driven Dialogue Systems. arXiv preprint arXiv:1512.05742.
Surveys the datasets available for training dialogue systems.
Deng, L., & Liu, Y. (2018). Deep Learning in Natural Language Processing. Springer.
Comprehensive overview of deep learning techniques in NLP.
Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to Sequence Learning with Neural Networks. Advances in Neural Information Processing Systems, 27, 3104-3112.
Fundamental paper on sequence-to-sequence models used in NLP.
Young, T., Hazarika, D., Poria, S., & Cambria, E. (2018). Recent Trends in Deep Learning Based Natural Language Processing. IEEE Computational Intelligence Magazine, 13(3), 55-75.
Reviews recent trends in deep learning applied to NLP.
Schmidhuber, J. (2015). Deep Learning in Neural Networks: An Overview. Neural Networks, 61, 85-117.
Provides an extensive overview of deep learning in neural networks.
Amodei, D., Ananthanarayanan, S., Anubhai, R., Bai, J., Battenberg, E., Case, C., ... & Ng, A. Y. (2016). Deep Speech 2: End-to-End Speech Recognition in English and Mandarin. International Conference on Machine Learning (ICML).
Discusses end-to-end speech recognition systems.
López, G., Quesada, L., & Guerrero, L. A. (2017). Alexa vs. Siri vs. Cortana vs. Google Assistant: A Comparison of Speech-Based Natural User Interfaces. International Conference on Applied Human Factors and Ergonomics. Springer, Cham.
Compares various virtual assistants and their speech-based interfaces.
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... & Amodei, D. (2020). Language Models are Few-Shot Learners. arXiv preprint arXiv:2005.14165.
Discusses GPT-3, a state-of-the-art language model for various NLP tasks.
Feng, S., Xing, L., Qian, Y., Chen, G., & Huang, C. (2021). Challenges in Reinforcement Learning for Conversational AI: A Survey. Journal of Artificial Intelligence Research, 71, 309-348.
Reviews the challenges and approaches in applying reinforcement learning to conversational AI.
Joulin, A., Grave, E., Bojanowski, P., & Mikolov, T. (2017). Bag of Tricks for Efficient Text Classification. arXiv preprint arXiv:1607.01759.Explores simple and efficient methods for text classification, which are essential for intent recognition in virtual assistants.
Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. arXiv preprint arXiv:1408.5882. Discusses the application of convolutional neural networks (CNNs) for sentence classification tasks.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 AITU SCIENTIFIC RESEARCH JOURNAL

This work is licensed under a Creative Commons Attribution 4.0 International License.