Projects

Selected projects across ML systems, distributed infrastructure, and applied AI research. Full list on GitHub.

ML Systems & Research

Cerebrum

Production LLM Training & Serving

End-to-end LLM system with distributed FSDP/DDP training, Flash Attention v2, and Mixture-of-Experts support. Includes Mixture-of-Refusals (MoR), a novel safety routing mechanism achieving 2–3× speedup on safe queries with identical safety guarantees. vLLM inference engine with quantization, speculative decoding, and prefix caching, deployed on Kubernetes with Prometheus monitoring.

PyTorch FSDP/DDP vLLM Kubernetes Flash Attention

Arcane ML

GitHub ↗

Distributed Training Framework

Production-ready framework for distributed ML training across SSH clusters, Modal Cloud GPUs, and local multi-GPU setups. PyTorch DDP with automatic gradient synchronization, unified CLI abstracting the complexity of distributed workflows.

Python PyTorch DDP Modal Cloud Distributed Systems

KASPER

GitHub ↗

PDF Malware Detection · IIT Indore · Applied Soft Computing (Q1)

Deep learning framework for PDF malware detection with 99.5% accuracy, robust against FGSM and PGD adversarial attacks. Custom malware injection pipeline for training; explainability via Kolmogorov-Arnold Networks. Published in Applied Soft Computing (Q1 Journal).

PyTorch Adversarial ML KANs Security

JurisQwen

GitHub ↗

Legal Domain LLM

Qwen2.5-7B fine-tuned on Indian legal datasets using LoRA + PEFT + Unsloth. Deployed with 4-bit quantization and Flash Attention 2 on Modal. Specialized for Indian legal document analysis and question-answering.

LoRA Qwen2.5-7B Quantization Modal Legal AI

CoDSPy

GitHub ↗

AI-Powered Code Optimization

Code optimization system using Chain-of-Thought and ReAct reasoning with local LLMs. Autonomous refactoring, syntax analysis, and automated test generation, fully local, no API costs.

DSPy CoT Reasoning Gradio Local LLMs

Attention Rollout Live

GitHub ↗

Transformer Attention Visualizer · Apple Silicon

Interactive visualizer that animates transformer attention weights in real time as a local LLM generates text. Every new token shows which prior tokens the model attended to, across all 28 layers and 12 heads. Live heatmap, per-layer scrubber with entropy sparkline, and per-head selection. No cloud API required.

PyTorch FastAPI React D3 SSE Apple Silicon

Distributed Infrastructure

goDFS

GitHub ↗

Decentralized File Storage System

Fully decentralized, content-addressable file storage system in Go. Handles streaming of large files across distributed nodes with fault tolerance through decentralized architecture and high-performance concurrent operations.

Go Content-Addressable Storage Distributed Systems

Raft3D

GitHub ↗

Distributed 3D Printer Management

Distributed 3D printer management system using the Raft Consensus Algorithm for data persistence — replacing traditional centralized databases with a consensus-based distributed log.

Go Raft Consensus Distributed Systems

Open Source

Billion-Scale Vector Embeddings Benchmark

GitHub ↗

Google Summer of Code 2025 · UC Santa Cruz

Billion-scale vector embedding benchmarks (768, 1024, 2048 dimensions) built from open-source codebases using open-source models. Addresses critical limitations of existing ANN benchmarks, enabling robust evaluation under realistic workloads. Selected for GSoC 2025 (acceptance rate <10%).

Python Vector Search ANN Algorithms Benchmarking

Daisy

GitHub ↗

AlphaZero from Scratch

Complete AlphaZero implementation from scratch — self-play training, neural network-guided Monte Carlo Tree Search, achieving superhuman board game performance.

Python Deep RL MCTS Game AI