I'm kanishk, an undergraduate student in computer science.
Here, I share my love for machine learning, robotics, literature and other geeky stuff.
I like working on llms, curate datasets and deploy them in the real world. I also indulge in a fair bit of web development
things I love:
- deep learning
- neovim (I use nvim + kitty), linux (used arch but im on macos right now)
- applied robotics
- hidden/subaltern history
- tinkering with side projects that break more often than they work
- long rabbit holes about science, tech or history
- football (fc barcelona ftw)
These are some of the projects I've worked on:
Can LLMs act as RL Agents?
This project explores whether an LLM can function as a policy inside classic Gym environments, without reinforcement learning, gradient updates, or weight training.
Instead of updating parameters, the LLM reasons over context and episode history to choose actions.
I tested this by using an LLM directly as the policy in:
• CartPole
• FrozenLake
• LunarLander
TinyLLM
TinyLLM is a lightweight, from-scratch implementation of a large language model built in pure PyTorch.
It implements a modern decoder-only Transformer architecture with causal self-attention.
Architecture components:
• RMSNorm
• RoPE (Rotary Positional Embeddings)
• Multi-Head Self Attention
• SwiGLU Feedforward Network
• KV Cache for efficient autoregressive generation
TinyPEFT
TinyPEFT is a small, pure-PyTorch Parameter-Efficient Fine-Tuning (PEFT) engine for fine-tuning large language models.
It currently supports: LoRA, Adapter Tuning, Bitfit and Prompt Tuning.
Install: pip install tinypeft
Kisangpt
An AI-powered agricultural assistant supporting multiple languages and image recognition,
using real-time data from data.gov.in to assist Indian farmers.
Reward Model
A reward model based on Qwen 2.5 3B, fine-tuned on the Anthropic RLHF dataset.
Designed to score completions for RLHF pipelines and evaluation tasks.
Evaluated on RewardBench.
RL Algorithm Implementations
From-scratch implementations of RL algorithms using PyTorch and Gymnasium.
Implemented: REINFORCE (CartPole), Actor-Critic (TicTacToe), PPO (LunarLander).
Paper Implementations
Implementations of transformer architectures including KV caching,
Flash Attention, RoPE and GPT.
Llama3.2 fine-tuned on Physics
Fine-tuned Llama 3.2 on physics datasets using LoRA and PEFT via Unsloth.
KeyGoblin
A CLI tool for detecting exposed API keys, tokens, and endpoints in web applications.
MCP server for Obsidian
An MCP server allowing Claude or other MCP hosts to interact with Obsidian vaults.
HourSwap
A barter-trade platform where students exchange skills without money.
Built for entrepreneurship class.
I write my blogs on Substack mostly, here are the links:
Random stuff
Just some blogs about things I find interesting.