Deepak Bolleddu - Personal Website

Research Blogs

Research Interests

Reinforcement Learning

Multi-agent Systems & AI Agents.

Vision & Speech AI

Multimodal Emotion & Affective Computing.

LLMs & NLP

Alignment, Reasoning, & Safety.

I am a Machine Learning Engineer and Researcher focused on the intersection of Multimodal AI, Computer Vision, and Trustworthy AI Agents.

Currently, I work on enhancing LLM collaboration in Medical AI and reasoning capabilities, with a particular interest in speech and pattern recognition to help machines understand human affect. I have actively contributed to several research Problems in these domains. Previously, I have worked as a Senior Software Engineer at Infosys, where I focused on building complex AI agents for enterprise applications.

I hold an Honors Master's in Computer Science from the University of Wollongong. During my time there, my research area explored advanced lexical complexity and natural language understanding, achieving top performance on NAACL shared tasks. My goal is to build AI that doesn't just process data, but truly understands the nuances of human communication.

Projects & Research

Speech Emotion Recognition project visual

Multimodal Speech Emotion Recognition (SER)

Summary: Developed a robust affect detection system using wav2vec 2.0 acoustic features fused with textual sentiment embeddings. Optimized for high-latency environments to enable real-time emotional feedback in AI tutors.

Abstract: This research explores cross-modal fusion techniques for Speech Emotion Recognition. By combining self-supervised acoustic representations with transformer-based NLP models, we achieved a 12% relative improvement in Weighted Average Recall (WAR) on the IEMOCAP dataset. The study also investigates the impact of background noise on affective feature extraction.

Code

Speech AI Transformers

Emotion-Aware Voice Assistant project visual

End-to-End Emotion-Aware Voice Assistants

Summary: Built an integrated pipeline that modifies LLM response style based on the user's detected emotional state. The system uses a gated fusion mechanism to adjust 'empathy' levels in generated dialogue.

Abstract: We propose an architectural framework for "Emotional SLU" (Spoken Language Understanding). Unlike standard pipelines that ignore prosody, our model conditions the LLM's system prompt on real-time valence and arousal scores, leading to significantly higher user satisfaction scores in qualitative human-centered evaluations.

Demo

HCI NLP

Advanced Facial Recognition & Emotion Detection

Summary: High-performance system for real-time facial recognition and nuanced emotion detection. Includes model architecture for resource-constrained AR/VR environments.

Code

Computer Vision

3D Image Models & Building LLMs project visual

3D Image Models & Building LLMs with Object Detection

Summary: This research investigates the intersection of 3D vision and large language models (LLMs). The project includes a series of Neural Radiance Fields (NeRF) experiments, a novel method for object-aware image retrieval, and detailed integration notes for enhancing Retrieval-Augmented Generation (RAG) pipelines with structured visual data.

Abstract: This work explores the synergy between 3D scene representation and language understanding. We show how object detection models can be used to ground LLMs in visual data, creating "object-aware" context. This method is then applied to enhance Retrieval-Augmented Generation (RAG) pipelines, allowing for more accurate and context-rich responses based on visual queries.

PDF Code

AI Machine Learning Computer Vision

Neural Radiance Fields (NeRF) project visual

Neural Radiance Fields — PyTorch

Summary: Implementation details, tips for fast training, and Object detection models examples.

PDF / Demo

AI Machine Learning Computer Vision