Projects
Selected projects across ML systems, distributed infrastructure, and applied AI research. Full list on GitHub.
ML Systems & Research
Cerebrum
Production LLM Training & Serving
End-to-end LLM system with distributed FSDP/DDP training, Flash Attention v2, and Mixture-of-Experts support. Includes Mixture-of-Refusals (MoR), a novel safety routing mechanism achieving 2–3× speedup on safe queries with identical safety guarantees. vLLM inference engine with quantization, speculative decoding, and prefix caching, deployed on Kubernetes with Prometheus monitoring.
Arcane ML
Distributed Training Framework
Production-ready framework for distributed ML training across SSH clusters, Modal Cloud GPUs, and local multi-GPU setups. PyTorch DDP with automatic gradient synchronization, unified CLI abstracting the complexity of distributed workflows.
KASPER
PDF Malware Detection · IIT Indore · Applied Soft Computing (Q1)
Deep learning framework for PDF malware detection with 99.5% accuracy, robust against FGSM and PGD adversarial attacks. Custom malware injection pipeline for training; explainability via Kolmogorov-Arnold Networks. Published in Applied Soft Computing (Q1 Journal).
JurisQwen
Legal Domain LLM
Qwen2.5-7B fine-tuned on Indian legal datasets using LoRA + PEFT + Unsloth. Deployed with 4-bit quantization and Flash Attention 2 on Modal. Specialized for Indian legal document analysis and question-answering.
CoDSPy
AI-Powered Code Optimization
Code optimization system using Chain-of-Thought and ReAct reasoning with local LLMs. Autonomous refactoring, syntax analysis, and automated test generation, fully local, no API costs.
Attention Rollout Live
Transformer Attention Visualizer · Apple Silicon
Interactive visualizer that animates transformer attention weights in real time as a local LLM generates text. Every new token shows which prior tokens the model attended to, across all 28 layers and 12 heads. Live heatmap, per-layer scrubber with entropy sparkline, and per-head selection. No cloud API required.
Distributed Infrastructure
goDFS
Decentralized File Storage System
Fully decentralized, content-addressable file storage system in Go. Handles streaming of large files across distributed nodes with fault tolerance through decentralized architecture and high-performance concurrent operations.
Raft3D
Distributed 3D Printer Management
Distributed 3D printer management system using the Raft Consensus Algorithm for data persistence — replacing traditional centralized databases with a consensus-based distributed log.
Open Source
Billion-Scale Vector Embeddings Benchmark
Google Summer of Code 2025 · UC Santa Cruz
Billion-scale vector embedding benchmarks (768, 1024, 2048 dimensions) built from open-source codebases using open-source models. Addresses critical limitations of existing ANN benchmarks, enabling robust evaluation under realistic workloads. Selected for GSoC 2025 (acceptance rate <10%).
Daisy
AlphaZero from Scratch
Complete AlphaZero implementation from scratch — self-play training, neural network-guided Monte Carlo Tree Search, achieving superhuman board game performance.