Software Engineer, Platform
As a Production AI Ops Lead, you will design and develop the production lifecycle of full-stack AI applications, support end-to-end system reliability, real-time inference observability, sovereign data orchestration, high-security software integration, and resilient cloud infrastructure for international government partners. You will own the production outcome, taking full accountability for the long-term performance and reliability of AI use cases deployed across international government agencies. You will ensure full-stack integrity by overseeing the health of the platform, ensuring seamless integration between the AI core and all full-stack components from APIs to UI. Additionally, you will build automated systems to monitor model performance and data drift across geographically dispersed environments, manage the technical lifecycle within diverse regulatory frameworks, lead the response for production issues in mission-critical environments, translate deep technical performance metrics into clear insights for senior international government officials, and partner with Engineering and ML teams to ensure field lessons influence future technical architecture and decisions.
Software Engineer, Backend
As a backend engineer, you would play a critical role in the search architecture at Exa. Your work may involve building massive-scale machine learning systems, working on projects based on your skills and interests, such as recreating Google-level keyword search over 10 billion pages in one month, building state-of-the-art crawling systems that work optimally for any website, and building custom vector databases that can run over a billion vectors in under 100 milliseconds.
Relocate to SF: Software Engineer (AI Agents)
In this role, you will build the next set of AI Features at Pylon, rapidly iterating based on customer feedback, and improve the quality and performance of AI features.
Relocate to SF: Software Engineer (AI Infra)
Build the platforms that power Pylon's AI features such as prompt executions and search infrastructure. Improve LLM observability including AI evaluations both online and offline, scorers, and prepare Pylon's AI for future scaling. Enhance the quality and performance of AI features.
Software engineer, generative AI (UK)
Design and develop robust, secure, and scalable generative AI services and applications using Python and modern frameworks to drive enterprise-wide transformation; build and optimize high-performance, low-latency APIs and microservices to integrate advanced AI models and sophisticated agentic workflows into the core platform; make meaningful system design decisions and own the architecture of core platform components from initial proposal through production deployment; implement and maintain responsive user interfaces using technologies like React and TypeScript; clearly communicate changes, plans, and proposals to cross-functional teams and collaborate with product managers, data scientists, and DevOps engineers; partner with DevOps teams to build continuous deployment, logging, and monitoring systems to ensure top-tier performance, security, and reliability across distributed workloads.
Software engineer, generative AI
Design and develop robust, secure, and scalable generative AI services and applications using Python and modern frameworks to drive enterprise-wide transformation. Build and optimize high-performance, low-latency APIs and microservices to integrate advanced AI models and sophisticated agentic workflows into the core platform. Make meaningful system design decisions and own the architecture of core platform components from initial proposal through production deployment. Implement and maintain responsive user interfaces using technologies like React and TypeScript to deliver intuitive user experiences and bridge the gap between backend services and frontend enablement. Communicate changes, plans, and proposals clearly to cross-functional teams and collaborate closely with product managers, data scientists, and DevOps engineers. Partner with DevOps teams to build continuous deployment, logging, and monitoring systems that ensure top-tier performance, security, and reliability across distributed workloads.
Host Systems Software Engineer
The Host Systems Software Engineer is responsible for designing, implementing, and debugging host-side systems software for AI infrastructure, including Linux kernel drivers and supporting userspace components. They build and optimize software paths for high-throughput, low-latency communication such as RDMA and related networking functionality, and develop software related to PCIe, DMA, NICs, accelerators, memory movement, and device interaction. The role involves bringing up new hardware platforms, diagnosing complex issues across kernel, firmware, networking, and hardware boundaries, and building tooling for integration, testing, diagnostics, observability, qualification, and performance characterization. Collaboration with hardware, networking, and platform teams to define interfaces and integrate new capabilities is essential, as is working with external vendors to integrate technologies and resolve issues. The engineer contributes across the systems software stack as the platform and team evolve and helps shape the technical direction and engineering practices for the growing systems software stack.
Software Engineer, ML Data Infrastructure
The Software Engineer, ML Data Infrastructure will collaborate with engineers to build advanced AI design experiences, tackle complex technical challenges including scaling distributed systems and enabling generative media experiences, build robust data infrastructure at petabyte scale ensuring reliability and performance across multi-modal training pipelines, optimize data processing workflows for high throughput involving distributed systems, TPU infrastructure, and large-scale storage, and partner with research scientists to understand data requirements and translate them into production-grade systems to accelerate model development cycles.
Full Stack Product Engineer
As a Full-Stack Product Engineer at Ideogram, you will build products that bring generative AI directly to creators, working across the entire technology stack from designing user experiences to optimizing backend systems that serve millions. Your focus will be on shipping features that users love by combining product intuition, strong ownership, and user empathy. You will design APIs and data models to support evolving product needs, utilize AI-native engineering tools to speed up development, debugging, and understanding of the codebase, and work effectively across frontend and backend systems. You will also be responsible for explaining technical concepts to both technical and non-technical stakeholders, participating in constructive code reviews, collaborating with the team, and taking full responsibility for the outcomes of your work, not just the code.
Senior Engineering Manager, Management Plane Systems
Lead the team responsible for the automation, observability, configuration management, and policy enforcement layer that runs across the entire network fleet. Own the architecture, development, and production operation of the SDN Management Plane, including the automation and observability platform for managing network fleet across all regions. Build and operate CI/CD pipelines for network configuration, including automated testing, policy validation, and push-on-green delivery of network changes. Design and implement software systems that enforce reconciliation between declared and actual network state, detect configuration drift, and trigger automated remediation workflows. Define provisioning and onboarding automation for new nodes, regions, and customer environments. Drive the design of network observability systems such as streaming telemetry, synthetic probing, anomaly detection, and real-time traffic monitoring across GPU clusters. Design and implement self-healing network capabilities using closed-loop automation to detect, diagnose, and resolve network faults without human intervention. Set the technical vision for applying GenAI and machine learning to network operations. Partner with Control Plane and Data Plane teams to ensure software interfaces between layers and collaborate with infrastructure and compute teams to support GPU cluster networking requirements. Act as internal platform owner for network automation and treat engineering teams as customers with real product requirements. Lead, mentor, and grow a team of senior and staff-level software and network automation engineers, set technical standards, review architecture and design decisions, and own team performance and development. Foster a high-ownership engineering culture focused on shipping production software.
Access all 4,256 remote & onsite AI jobs.
Frequently Asked Questions
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.
