Projects

Selected projects across ML systems, distributed infrastructure, and applied AI research. Full list on GitHub.

Cerebrum
Production LLM Training & Serving
End-to-end LLM system with distributed FSDP/DDP training, Flash Attention v2, and Mixture-of-Experts support. Includes Mixture-of-Refusals (MoR), a novel safety routing mechanism achieving 2–3× speedup on safe queries with identical safety guarantees. vLLM inference engine with quantization, speculative decoding, and prefix caching, deployed on Kubernetes with Prometheus monitoring.
PyTorch FSDP/DDP vLLM Kubernetes Flash Attention
Arcane ML
Distributed Training Framework
Production-ready framework for distributed ML training across SSH clusters, Modal Cloud GPUs, and local multi-GPU setups. PyTorch DDP with automatic gradient synchronization, unified CLI abstracting the complexity of distributed workflows.
Python PyTorch DDP Modal Cloud Distributed Systems
PDF Malware Detection · IIT Indore · Applied Soft Computing (Q1)
Deep learning framework for PDF malware detection with 99.5% accuracy, robust against FGSM and PGD adversarial attacks. Custom malware injection pipeline for training; explainability via Kolmogorov-Arnold Networks. Published in Applied Soft Computing (Q1 Journal).
PyTorch Adversarial ML KANs Security
JurisQwen
Legal Domain LLM
Qwen2.5-7B fine-tuned on Indian legal datasets using LoRA + PEFT + Unsloth. Deployed with 4-bit quantization and Flash Attention 2 on Modal. Specialized for Indian legal document analysis and question-answering.
LoRA Qwen2.5-7B Quantization Modal Legal AI
AI-Powered Code Optimization
Code optimization system using Chain-of-Thought and ReAct reasoning with local LLMs. Autonomous refactoring, syntax analysis, and automated test generation, fully local, no API costs.
DSPy CoT Reasoning Gradio Local LLMs
Attention Rollout Live
Transformer Attention Visualizer · Apple Silicon
Interactive visualizer that animates transformer attention weights in real time as a local LLM generates text. Every new token shows which prior tokens the model attended to, across all 28 layers and 12 heads. Live heatmap, per-layer scrubber with entropy sparkline, and per-head selection. No cloud API required.
PyTorch FastAPI React D3 SSE Apple Silicon
Decentralized File Storage System
Fully decentralized, content-addressable file storage system in Go. Handles streaming of large files across distributed nodes with fault tolerance through decentralized architecture and high-performance concurrent operations.
Go Content-Addressable Storage Distributed Systems
Distributed 3D Printer Management
Distributed 3D printer management system using the Raft Consensus Algorithm for data persistence — replacing traditional centralized databases with a consensus-based distributed log.
Go Raft Consensus Distributed Systems
Billion-Scale Vector Embeddings Benchmark
Google Summer of Code 2025 · UC Santa Cruz
Billion-scale vector embedding benchmarks (768, 1024, 2048 dimensions) built from open-source codebases using open-source models. Addresses critical limitations of existing ANN benchmarks, enabling robust evaluation under realistic workloads. Selected for GSoC 2025 (acceptance rate <10%).
Python Vector Search ANN Algorithms Benchmarking
AlphaZero from Scratch
Complete AlphaZero implementation from scratch — self-play training, neural network-guided Monte Carlo Tree Search, achieving superhuman board game performance.
Python Deep RL MCTS Game AI