
Alex Mallen
Member of Technical Staff at Redwood Research
About
I'm Alex Mallen, a Member of Technical Staff at Redwood Research. My work is primarily focused on the technical challenges of AI safety, specifically around reward hacking, AI control, and alignment. My journey in this field started during my time at the University of Washington, where I researched factual reliability in language models and time-series forecasting, eventually leading me to roles at EleutherAI and the Allen Institute. I'm passionate about ensuring AI systems remain safe and controllable by moving beyond black-box models to extract robust latent knowledge. I'm always happy to connect with others working on interpretability or alignment to discuss how we can build more reliable and trustworthy machine learning systems.
Networking
What I can offer
- ›Technical expertise in ML interpretability and alignment
- ›Deep knowledge of NLP and retrieval-augmented generation
- ›Experience in probabilistic time-series analysis
Looking for
- ›expanding my professional network
- ›exploring mutual opportunities in AI safety and research
Best fit for
Current Interests
Background
Career
Began as an undergraduate researcher at the University of Washington focusing on time-series and NLP, interned at the Allen Institute, and transitioned into full-time AI safety research at EleutherAI before joining Redwood Research.
Education
Bachelor’s degree, Computer Science from University of Washington (2020 – 2023).
Achievements
- ›Co-authored 'When Not to Trust Language Models'
- ›Outperformed 177 global contestants in energy demand forecasting
- ›Collaborated with NASA Goddard on atmospheric anomaly detection
- ›Featured guest on the DataSkeptic podcast
Opinions
- Current reinforcement learning paradigms require significant safety guardrails regarding reward hacking and AI control.
- Models should not be treated as black boxes; we must prioritize extracting latent knowledge and robust representations.