Projects

Here are key projects that reflect my work across machine learning systems, generative AI, security, and applied research.


AI & Machine Learning Projects Link to heading

PSRG PR Agent | Multi-Agent GitHub PR Reviewer Link to heading

Developed a real-time, multi-agent system using LangGraph to automate GitHub Pull Request reviews. Architected a novel PR-Aware Self-Referee Graph (PSRG) for high-fidelity code analysis and patch suggestion. Deployed specialized agents powered exclusively by open-source, ≤1B parameter models.

Tech Stack: LangGraph, Multi-Agent Systems, LLMs, GitHub API

Fine-tuned Qwen2.5-7B using LoRA on Indian law dataset with PEFT and Unsloth optimizations. Deployed scalable inference system using 4-bit quantization and Flash Attention 2 on Modal platform.

Tech Stack: LoRA, Modal, Quantization, Qwen2.5-7B, Indian Legal Dataset

CoDSPy | AI-Powered Code Optimization System Link to heading

Engineered code analysis platform using Chain-of-Thought and ReAct reasoning with local LLM technologies. Implemented autonomous code optimization, syntax inspection, and automated test case generation.

Tech Stack: Python, DSPy, Gradio, Local LLMs, CoT Reasoning

KASPER | PDF Malware Detection System Link to heading

Designed deep learning architecture for PDF malware detection with adversarial robustness. Achieved 99.5% accuracy and strong resilience through custom malware injection pipelines and spline-based kernel layers.

Tech Stack: Python, PyTorch, Adversarial ML, Custom Malware Pipeline


Distributed Systems & Infrastructure Link to heading

Arcane ML | Distributed ML Framework Link to heading

Built a production-ready framework for distributed ML training across SSH clusters, Modal Cloud GPUs, and local setups. Engineered support for PyTorch DDP, enabling efficient multi-GPU scaling and gradient synchronization. Developed a unified CLI to abstract and simplify complex distributed training workflows.

Tech Stack: Python, PyTorch DDP, Distributed Systems, Modal Cloud

goDFS | Decentralized File Storage System Link to heading

Built a decentralized, fully distributed content-addressable file storage system using Golang. Designed to handle and stream very large files efficiently across distributed nodes.

Tech Stack: Go, Distributed Systems, Content-Addressable Storage

makedis | Modern Deployment Framework Link to heading

Developed a modern deployment framework aimed at streamlining application deployments. Enhanced static asset delivery and enabled scalable reverse proxy configurations for robust, industry-standard deployments.

Tech Stack: Go, Reverse Proxy, Static Asset Delivery, Deployment Automation


Data Processing & Analytics Link to heading

E-Commerce Real-Time Analytics | Big Data Processing Pipeline Link to heading

Built a real-time sales analytics system to process high-velocity financial data streams using Apache Flink. Integrated PostgreSQL and Elasticsearch for persistent storage and fast search capabilities. Orchestrated multi-container setup using Docker Compose for scalable deployment.

Tech Stack: Apache Flink, PostgreSQL, Elasticsearch, Docker Compose

RSServe | RSS Aggregator Link to heading

Developed a fully-fledged RSS aggregator in Go that efficiently collects and organizes RSS feeds, providing users with a seamless reading experience.

Tech Stack: Go, RSS Processing, Web Scraping


Game AI & Algorithms Link to heading

Daisy | AlphaZero Implementation Link to heading

Implemented the AlphaZero algorithm from scratch. Built a game-playing AI that utilizes machine learning techniques to achieve superhuman performance in board games.

Tech Stack: Python, Deep Learning, Monte Carlo Tree Search, Game AI


Automation & Productivity Tools Link to heading

clutchCV | LinkedIn Job Search AI Agent Link to heading

Developed an AI agent designed to enhance job searching on LinkedIn. Utilized the LinkedIn API to automate the process of finding suitable job matches based on user’s resume and experience.

Tech Stack: Python, LinkedIn API, AI Agent, Job Matching Algorithm


Open Source Contributions Link to heading

Billion-Scale Vector Embeddings Dataset | Google Summer of Code 2025, UC Santa Cruz Link to heading

Built a billion-scale vector embedding dataset from open source codebases using open-source models for ANN algorithm benchmarking. Designing realistic benchmarks with embeddings of 768, 1024 and 2048 dimensions to reflect modern workloads. Addressing limitations of existing benchmarks to enable robust evaluations of vector search algorithms.

Tech Stack: Python, Open-source LLMs, Vector Databases, ANN Algorithms