Mani Pal

Engineer-researcher

Mani Pal

LLM systems, CUDA kernels, inference optimization, compression, interpretability, and distributed AI infrastructure.

Resume

Compact technical resume

The downloadable PDF is included in the app. This page keeps the frontier-AI signals visible without switching visitors into a traditional portfolio flow.

Download PDF resume

Technical skills

PyTorchJAX/FlaxTritonCUDA C++vLLMSGLangTensorRT-LLMllama.cppFSDPDeepSpeed ZeROGRPODPOTensor NetworksMPOGGUFKubernetesFastAPIRustTypeScriptPostgreSQL

700M LLM from scratch

Hybrid Mamba-2 and Transformer model with GRPO, DPO, GGUF, and CPU inference logs.

FlashAttention-2 CUDA kernels

Tiled IO-aware attention kernel profiled at 2.1x over PyTorch SDPA on A100.

2.4x speculative decoding

Draft-verifier runtime with temperature-corrected rejection sampling and adaptive gamma.

CompactifAI extension

Adaptive tensor-network compression with layer sensitivity profiling and policy search.

vLLM contribution

Disaggregated prefill KV cache request-ID bug fix with production reliability impact.

Published interpretability research

Circuit-level grokking study published on Zenodo and prepared for arXiv submission.

Experience

Engineering roles

Trellions

ML Engineer / LLM Engineer

May 2026 – Present

  • Own the AI layer for a YC-backed recruitment platform with LLM scoring, job description generation, Q&A, and bias detection.
  • Maintain production RAG pipelines with Pinecone, Weaviate, pgvector, hybrid search, and cross-encoder reranking.
  • Own MLOps across MLflow, W&B, model versioning, CI/CD, drift monitoring, and latency/throughput SLAs.

RecurX

Founding Engineer & Technical Lead

Jan 2024 – Nov 2025

  • Architected cross-chain payment infrastructure across Ethereum and Solana with sub-200ms transaction latency.
  • Reduced p95 API latency from 280ms to 182ms through Redis, indexing, and query optimization.
  • Built Docker, GitHub Actions, AWS ECS, OpenTelemetry, Grafana, and Prometheus deployment systems.

Independent Contract AI/LLM Engineer

Clients across India, US, and EU

Jan 2021 – Dec 2023

  • Built production RAG for legal-tech serving 50+ law firms and reducing attorney document-research time by 65%.
  • Fine-tuned Mistral-7B and Llama-2 via QLoRA on AWS SageMaker.
  • Improved high-cardinality PostgreSQL throughput by 10x through materialized views, partitioning, and EXPLAIN-driven tuning.