Training an LLM to Trade Stocks with Reinforcement Learning
#reinforcement-learning#llm#pytorch#deepseek#trading
A 0.5B language model with zero training beat a PPO agent trained for 20,000 steps. Here's how I'm building the full loop — from environment design to GRPO fine-tuning on real Indian equity data.