Site Reliability Engineer II
Company: Cox Automotive
Location: Avondale Estates
Posted on: April 2, 2026
|
|
|
Job Description:
The Site Reliability Engineer II will be part of the Site
Reliability Engineering (SRE) team. The SRE team drives
reliability, observability, and engineering practice maturity
across over 150 teams made up of over a thousand engineers in our
part of Cox Automotive. We build processes, documentation, and
tools that scale: deep observability to detect and diagnose issues
faster, engineering maturity assessments that drive measurable
improvement, reusable golden paths that accelerate delivery, and
trusted advisory relationships that align reliability with business
priorities. Much of our work focuses on eliminating toil through
automation-increasingly leveraging AI and agentic solutions-and
establishing self-service capabilities that multiply our impact. If
you love building monitoring systems that reveal truth, evaluating
engineering practices to raise the bar organization-wide, exploring
cutting-edge AI technologies to solve operational challenges, and
acting as a trusted advisor to engineers and leadership, we want to
talk to you. What You'll Do: Define and drive adoption of SLIs,
SLOs, error budgets, and high-quality alerting standards across the
organization Architect end-to-end observability strategies
(metrics, logs, traces, business signals) with consistent taxonomy
and discoverability Build centralized dashboards, reliability
scorecards, and runbooks used by engineering teams and leadership
Establish engineering practice maturity baselines and partner with
teams on measurable improvement plans Create golden
paths-standardized pipelines, infrastructure modules, and service
templates-that enable rapid, consistent delivery Pioneer the use of
AI and agentic solutions to automate toil, accelerate incident
response, and enhance operational workflows Lead internal
workshops, game days, and learning programs to spread operational
excellence Act as a trusted advisor to product and engineering
leadership, providing data-driven insights on reliability risk and
trade-offs Guide post-incident reviews toward systemic remediation
(guardrails, automation, design changes) rather than superficial
fixes Design and extend self-service platforms for deployment,
progressive delivery, and automated recovery Reduce MTTR through
better telemetry, automation, AI-assisted diagnostics, and
resilience patterns Mentor engineers across teams to become local
reliability champions, scaling SRE impact without adding headcount
Who You Are: Experience programming in at least one of the
following languages: Python, Typescript, or Java. Bachelor's degree
in a related discipline and 4 years' experience in a related field.
The right candidate could also have a different combination, such
as a master's degree and 2 years' experience; a Ph.D. and up to 1
year of experience; or 16 years' experience in a related field.
Applicants must currently be authorized to work in the United
States for any employer without current or future sponsorship. No
OPT, CPT, STEM/OPT or visa sponsorship now or in future. Expertise
in designing, analyzing, and troubleshooting large-scale
distributed systems. Deep hands-on experience with modern
observability tools (CloudWatch and NewRelic). Proven ability to
assess engineering practices and drive measurable improvements
across multiple teams. Experience establishing SLIs/SLOs, managing
error budgets, and improving alert signal-to-noise ratios. Strong
background in release engineering, CI/CD, and progressive
deployment strategies. Deep expertise in AWS, Terraform, AWS CDK,
and GitHub/GitHub Actions. Enthusiasm for applying AI, LLMs, and
agentic automation to operational and reliability challenges. Track
record reducing MTTR and improving availability through automation
and architectural improvements. Excellent written and verbal
communication skills tailored to both engineers and executives.
Systematic problem-solving approach with a sense of drive and
ownership. Understanding of Linux operating systems, networking,
and performance fundamentals. Ability to build trust and influence
decisions through data-driven insights. Experience facilitating
effective post-incident analysis and driving systemic remediation.
Desire to work in a fast-paced, evolving, growing, dynamic
environment. USD 89,400.00 - 134,000.00 per year Compensation:
Compensation includes a base salary in the range of $89,400.00 -
$134,000.00. The base salary may vary within the anticipated base
pay range based on factors such as the ultimate location of the
position and the selected candidate's knowledge, skills, and
abilities. Position may be eligible for additional compensation
that may include an incentive program. Benefits: The Company offers
eligible employees the flexibility to take as much vacation with
pay as they deem consistent with their duties, the company's needs,
and its obligations; seven paid holidays throughout the calendar
year; and up to 160 hours of paid wellness annually for their own
wellness or that of family members. Employees are also eligible for
additional paid time off in the form of bereavement leave, time off
to vote, jury duty leave, volunteer time off, military leave, and
parental leave.
Keywords: Cox Automotive, Sandy Springs , Site Reliability Engineer II, Engineering , Avondale Estates, Georgia