Alex Mallen
Member of Technical Staff at Redwood Research
Researching and writing on reward hacking, AI control, and AI safety at Redwood Research.

Intro
I'm Alex Mallen, a Member of Technical Staff at Redwood Research. My work is primarily focused on the technical challenges of AI safety, specifically around reward hacking, AI control, and alignment. My journey in this field started during my time at the University of Washington, where I researched factual reliability in language models and time-series forecasting, eventually leading me to roles at EleutherAI and the Allen Institute. I'm passionate about ensuring AI systems remain safe and controllable by moving beyond black-box models to extract robust latent knowledge. I'm always happy to connect with others working on interpretability or alignment to discuss how we can build more reliable and trustworthy machine learning systems.
Networking
What I can offer
- ›Technical expertise in ML interpretability and alignment
- ›Deep knowledge of NLP and retrieval-augmented generation
- ›Experience in probabilistic time-series analysis
What I'm looking for
- ›expanding my professional network
- ›exploring mutual opportunities in AI safety and research
Best fit for
Focus
Current interests
Core competencies
Background
Career
Began as an undergraduate researcher at the University of Washington focusing on time-series and NLP, interned at the Allen Institute, and transitioned into full-time AI safety research at EleutherAI before joining Redwood Research.
Education
Bachelor’s degree, Computer Science from University of Washington (2020 – 2023).
Achievements
- ›Co-authored 'When Not to Trust Language Models'
- ›Outperformed 177 global contestants in energy demand forecasting
- ›Collaborated with NASA Goddard on atmospheric anomaly detection
- ›Featured guest on the DataSkeptic podcast
Opinions
- Current reinforcement learning paradigms require significant safety guardrails regarding reward hacking and AI control.
- Models should not be treated as black boxes; we must prioritize extracting latent knowledge and robust representations.
Personality
Communication style
Professional, academic, and precise with a focus on technical accuracy.
Formality — 8/10
Vocabulary