Senior Software Engineer, AI Infrastructure Engineering ReViz Seattle, WA
Company: Tbwa Chiat/Day Inc
Location: Seattle
Posted on: November 8, 2024
Job Description:
Senior Software Engineer, AI InfrastructureSeattle, WAHybrid:
Individuals in this role are expected to live in the Greater
Seattle area and are encouraged to spend 1-3 days per week on-site
in our Seattle offices.Compensation Range: $140,230 - $213,150Who
We AreWe're software engineers redefining how researchers and
engineers use state of the art GPU clusters. We own and actively
develop Beaker, a GPU-first job orchestration system used by Ai2
researchers to manage and execute frontier research workloads, such
as large-scale, distributed pretraining and online reinforcement
learning. We're also responsible for Ai2's on-premise GPU servers
from the bare-metal up, operating a high performance storage
cluster and designing and developing critical systems that teams
across the institute rely on for pushing forward cutting-edge, open
science.Who You AreYou're a talented, self-directed software
engineer and operator who thrives in a collaborative, fast-paced
environment. You're someone who can quickly and effortlessly
produce code to solve a problem, but are also capable of
articulating a bigger vision and driving consensus. You're ready to
work across the entire stack, from bare metal to the application
layer - though we don't expect you to be an expert in everything.
You lead by example and lift up the team; setting an ambitious bar
in a balanced, healthy fashion.You'll be responsible for developing
and designing systems that make it effortless for Ai2's researchers
and engineers to run large scale, state of the art GPU workloads.
You'll work closely with those stakeholders to understand the
friction in existing solutions and opportunities for new
capabilities. Your work will contribute to the evolution of a
platform purpose built for AI research, directly supporting the
institute's frontier AI efforts, such as online reinforcement
learning, distributed pre-training on large clusters, and PB-scale
dataset curation and synthesis.As a senior individual contributor
you'll be expected to work independently on problems both big and
small. You'll wrangle large, ambiguous projects, but also won't
hesitate to contribute smaller, incremental changes to push forward
things you're not directly responsible for. You'll be expected to
act as a leader and mentor, promoting a healthy culture and
accelerating the growth of your peers.Your responsibilities will
include:
- Designing critical systems that solve emerging requirements of
cutting-edge research projects while partnering closely with
stakeholders
- Delivering software changes, from inception to release
- Contributing to Beaker, our custom job orchestrator written in
Go
- Operating a fleet of state of the art GPU servers and
supporting infrastructure (storage, networking, etc)
- Reviewing code and design documents; mentoring team
members
- Participating in our weekly on-call cycle during operating
hours (9 AM to 5 PM, PT)
- Contributing to long term vision for revolutionizing GPU-first
computation
- Participating in planning exercises; proactively contributing
to their improvement
- Fostering a healthy, high-performance engineering cultureWhat
You'll Need:
- 6+ years developing highly available software in a professional
setting
- Proficiency in Golang, Python, SQL, shell scripting, and Linux
server administration
- A strong understanding of running containerized workloads
(Docker)
- Familiarity with cloud infrastructure (GCP, AWS)
- Excellent writing and collaboration skillsBonus Qualifications:
- Experience operating GPU clusters or developing distributed ML
workloads
- Deep systems administration expertise
- Familiarity with Kubernetes
- Prior experience serving AI models (inference)Physical Demands
and Work Environment:The physical demands described here are
representative of those that must be met by a team member to
successfully perform the essential functions of this position.
Reasonable accommodations may be made to enable individuals with
disabilities to perform the functions.
- Must be able to remain in a stationary position for long
periods of time.
- The ability to communicate information and ideas so others will
understand.
- The ability to observe details at close range.A Little More
About Ai2:Ai2 is a Seattle based non-profit AI research institute
founded in 2014 by the late Paul Allen. Our mission is building
breakthrough AI to solve the world's biggest problems. We develop
foundational AI research and innovation to deliver real-world
impact through large-scale open models, data, robotics,
conservation, and beyond.In addition to Ai2's core mission, we also
aim to contribute to humanity through our treatment of each member
of the Ai2 Team. Some highlights are:
- We are a learning organization- because everything Ai2 does is
ground-breaking, we are learning every day.
- We value diversity -We seek to hire, support, and promote
people from all genders, ethnicities, and all levels of
experience.
- We value inclusion -We understand the value that people's
individual experiences and perspectives can bring to an
organization.
- We emphasize a healthy work/life balance- we believe our team
members are happiest and most productive when their work/life
balance is optimized.
- We are collaborative and transparent- we consider ourselves a
team, all moving with a common purpose.
- We are in Seattle- and our office is on the water!
- We are friendly- chances are you will like every one of the
200+ (and growing) people who work here.Ai2 is proud to be an Equal
Opportunity employer. We do not discriminate based upon race,
religion, color, national origin, sex, sexual orientation, gender,
gender identity, gender expression, age, or disability.This
employer participates in E-Verify.We are committed to providing
reasonable accommodations to employees and applicants with
disabilities to the full extent required by the Americans with
Disabilities Act (ADA).
#J-18808-Ljbffr
Keywords: Tbwa Chiat/Day Inc, Tacoma , Senior Software Engineer, AI Infrastructure Engineering ReViz Seattle, WA, IT / Software / Systems , Seattle, Washington
Didn't find what you're looking for? Search again!
Loading more jobs...