Early-stage AI infrastructure startup building agent training platforms
Member of Technical Staff, Infrastructure / DevOps
We're building infrastructure for training and evaluating AI agents at scale. This role focuses on designing and maintaining the systems that support our platform, from container orchestration to distributed data pipelines. You'll work closely with a small, focused engineering team to solve the infrastructure challenges that come with running agent training at speed. Based in San Francisco, this is an on-site position where you'll own pieces of our architecture as we grow.
What we're looking for
- 3–8 years of experience in infrastructure, DevOps, SRE, or related backend systems work
- Proficiency with Python and/or Rust for systems-level programming and tooling
- Hands-on experience with AWS, Docker, and container orchestration in production
- Experience designing and operating data systems, including work with caching layers like Redis
- Ability to balance reliability, performance, and developer experience in infrastructure decisions