kanishk's space

I'm kanishk, an undergraduate student in computer science.

Here, I share my love for machine learning, robotics, literature and other geeky stuff.

I like working on llms, curate datasets and deploy them in the real world. I also indulge in a fair bit of web development

things I love:

deep learning
neovim (I use nvim + kitty), linux (used arch but im on macos right now)
applied robotics
hidden/subaltern history
tinkering with side projects that break more often than they work
long rabbit holes about science, tech or history
football (fc barcelona ftw)

If you want to talk or build something together, reach out to me at:

mkanishkkulkarni@gmail.com

These are some of the projects I've worked on:

Can LLMs act as RL Agents?

This project explores whether an LLM can function as a policy inside classic Gym environments, without reinforcement learning, gradient updates, or weight training.
Instead of updating parameters, the LLM reasons over context and episode history to choose actions.

I tested this by using an LLM directly as the policy in: • CartPole • FrozenLake • LunarLander

Blog GitHub

TinyLLM

TinyLLM is a lightweight, from-scratch implementation of a large language model built in pure PyTorch.
It implements a modern decoder-only Transformer architecture with causal self-attention.
Architecture components:
• RMSNorm
• RoPE (Rotary Positional Embeddings)
• Multi-Head Self Attention
• SwiGLU Feedforward Network
• KV Cache for efficient autoregressive generation

GitHub

TinyPEFT

TinyPEFT is a small, pure-PyTorch Parameter-Efficient Fine-Tuning (PEFT) engine for fine-tuning large language models.
It currently supports: LoRA, Adapter Tuning, Bitfit and Prompt Tuning.

Install: pip install tinypeft

GitHub

Kisangpt

An AI-powered agricultural assistant supporting multiple languages and image recognition, using real-time data from data.gov.in to assist Indian farmers.

Live Demo GitHub

Reward Model

A reward model based on Qwen 2.5 3B, fine-tuned on the Anthropic RLHF dataset. Designed to score completions for RLHF pipelines and evaluation tasks. Evaluated on RewardBench.

HuggingFace GitHub

RL Algorithm Implementations

From-scratch implementations of RL algorithms using PyTorch and Gymnasium. Implemented: REINFORCE (CartPole), Actor-Critic (TicTacToe), PPO (LunarLander).

GitHub