kanishk

I'm kanishk, an undergraduate student in computer science.

I love working on LLMs and machine learning, making machines think using mathematics excites me a lot.

Currently exploring reinforcement learning, NLP, and computer vision. I like exploring rabbit holes about science, tech, and history.

GitHub HuggingFace Substack Email

Projects

LoopedNanoGPT Experiments

Investigating whether a single transformer block reused recurrently can substitute for N independent blocks.

GitHub ↗

PolyDB Context Graph Engine

An API-first backend that gives LLMs structured context over databases.

GitHub ↗

Can LLMs act as RL Agents?

Using language models as pure policies across classic Gym environments.

Blog ↗ GitHub ↗

TinyLLM

A decoder-only transformer built in pure PyTorch with modern LLM components.

GitHub ↗

TinyPEFT

A lightweight parameter-efficient fine-tuning engine for practical model adaptation.

GitHub ↗

Linesman — AI Referee

A football VAR system combining computer vision with multimodal reasoning.

GitHub ↗

KisanGPT

A multilingual assistant for Indian farmers covering crops, markets, and disease diagnosis.

Live ↗ GitHub ↗

Blogs

Intro to Reinforcement Learning read ↗

Mixture of Experts Explained read ↗

Create Your Own GPT — PEFT read ↗

RoPE Explained read ↗

The Transformer Architecture read ↗

want to talk or build something together?

mkanishkkulkarni@gmail.com

Looped NanoGPT Experiments

An investigation into whether a single transformer block reused recurrently across N iterations with shared weights can substitute for N independent blocks while retaining competitive language modeling performance. Experiments across TinyStories and FineWeb-Edu reveal a narrowing perplexity gap with more training and a meaningful negative result showing test-time loop scaling degrades rather than improves performance.

GitHub ↗

PolyDB Context Graph Engine

An API-first backend for giving LLMs and agents structured database context. It connects to multiple databases, normalizes metadata into a unified model, builds a context graph, and adds embedding-based retrieval plus MCP-accessible tooling.

GitHub ↗

Can LLMs act as RL Agents?

Explores using an LLM as a pure policy in classic Gym environments: CartPole, FrozenLake, and LunarLander. The model reasons over episode history in natural language with no gradient updates or fine-tuning, relying entirely on in-context reasoning to select actions.

Blog ↗ GitHub ↗

TinyLLM

A decoder-only Transformer built entirely from scratch in pure PyTorch. Implements modern architectural components — RMSNorm, Rotary Positional Embeddings (RoPE), SwiGLU feed-forward networks, and KV Caching, mirroring the design choices found in production-scale LLMs like LLaMA.

GitHub ↗

TinyPEFT

A lightweight parameter-efficient fine-tuning engine supporting LoRA, Adapter Tuning, Bitfit, and Prompt Tuning in a single unified package. Designed to make fine-tuning large language models cheap and accessible: installable via pip install tinypeft.

GitHub ↗

Linesman — AI Referee

A Video Assistant Referee (VAR) system that analyzes football match footage to evaluate incidents and make foul decisions according to FIFA Law 12. Uses a two-stage pipeline combining computer vision for player and action detection with a multimodal LLM for objective, rule-aware decision-making.

GitHub ↗

Sentinel

A global disease surveillance dashboard that aggregates real-time outbreak data from multiple public health APIs and WHO feeds. Visualizes emerging threats on an interactive map, providing early-warning signals for pandemic preparedness and public health monitoring.

GitHub ↗

KisanGPT

A multilingual AI assistant built specifically for Indian farmers, combining image recognition for crop disease diagnosis with real-time market price feeds from data.gov.in. Supports regional languages and covers crop advice, pest management, government schemes, and weather-based recommendations.

Live ↗ GitHub ↗

Reward Model (Qwen 2.5 3B)

A reward model fine-tuned on Anthropic's RLHF preference dataset, designed to score model completions for use in RLHF training pipelines. Evaluated on the RewardBench benchmark to assess alignment quality across categories like safety, reasoning, and instruction-following.

HuggingFace ↗ GitHub ↗

RL Implementations

Clean from-scratch implementations of core reinforcement learning algorithms: REINFORCE, Actor-Critic, and PPO — built in PyTorch with Gymnasium environments.

GitHub ↗

Paper Implementations

A collection of transformer architecture deep dives implemented from paper to code: KV Caching, Flash Attention, Rotary Positional Embeddings, and GPT. Each implementation is paired with a detailed blog post explaining the math and intuition behind the technique.

GitHub ↗ Blog ↗

Llama 3.2 fine-tuned on Physics

Llama 3.2 fine-tuned on curated physics question-answer datasets using LoRA and PEFT via Unsloth for efficient training. The resulting model demonstrates improved reasoning on physics problems spanning mechanics, electromagnetism, and thermodynamics compared to the base model.

Model ↗ GitHub ↗

KeyGoblin

A CLI security tool that scans web applications for exposed API keys, secret tokens, and sensitive endpoints in JavaScript bundles, HTML source, and network responses. Helps developers catch credential leaks before they reach production or get picked up by automated scanners.

GitHub ↗

MCP server for Obsidian

A Model Context Protocol server that gives Claude and other MCP-compatible hosts the ability to read and write notes directly inside an Obsidian vault. Enables AI-assisted note-taking workflows, knowledge graph traversal, and automated linking between related notes.

GitHub ↗

HourSwap

A barter platform designed for students to exchange skills and services without money: trade a guitar lesson for a coding session, or tutoring for design help. Built to foster community and make skill-sharing accessible on campus without financial barriers.

Live ↗ GitHub ↗

Intro to Reinforcement Learning

A high-level introduction to the core ideas behind agents, rewards, policies, and learning through interaction.

read ↗

Mixture of Experts Explained

A practical breakdown of sparse expert routing, capacity management, and why MoE systems scale differently.

read ↗

Create Your Own GPT — PEFT

An accessible walkthrough of parameter-efficient fine-tuning and the tradeoffs behind adapting large models cheaply.

read ↗

RoPE Explained and Implemented

An implementation-oriented explanation of rotary positional embeddings with intuition for how they preserve sequence structure.

read ↗

Beyond Transformers — GPT, BERT, BART

A comparative guide to how major transformer families differ in objective, architecture, and downstream use.

read ↗

The Transformer Architecture Explained

A guided tour through attention, feed-forward blocks, residual pathways, and the design logic of transformer models.

read ↗

Functional Programming and Lambda Calculus

A conceptual bridge from formal computation to functional programming ideas that still shape modern languages.

read ↗

How far is AGI?

A reflection on capability progress, open bottlenecks, and what it would actually mean to claim general intelligence.

read ↗

Why I hate existentialism

A personal essay pushing against existentialist framing and testing its assumptions against lived experience.

read ↗