Research Engineer, Data Infrastructure
The role involves building and operating the next generation of data infrastructure at Mistral AI, being a core contributor to the design and scaling of massive compute fleets and storage systems for high performance and scalability. Responsibilities include architecting and maintaining multi-cluster orchestration layers for optimizing workload placement across diverse hardware and regions, designing future-proof storage systems anticipating exabyte scale growth, contributing to the internal training platform development to support model training and fine-tuning across Kubernetes and SLURM environments, implementing and managing metadata and lineage systems to provide visibility and traceability of data and model pipelines, and managing cloud-native deployments using modern workflows to ensure scalability and operational excellence. The role also includes full lifecycle ownership, from migrating away from legacy orchestrators to implementing production-grade pipelines and participating in on-call rotations for critical training jobs.
Agentic Finance Engineer
The Agentic Finance Engineer is responsible for designing, building, and maintaining a reliable financial data foundation using modern tools, covering revenue, AP/AR, procurement, close, strategic finance, and FP&A. They will partner closely with the data infrastructure team to build the financial data model, define canonical datasets, dimensional schemas, and transformation logic for Finance stakeholders. This role includes partnering with Finance leads to translate business requirements into technical architecture, building and maintaining dashboards and self-serve reporting tools to provide real-time visibility into key metrics. The engineer will own the Agentic Finance roadmap, prioritize use cases, and drive features from ideation to deployment, identifying high-value automation opportunities across Finance and corporate operations, and shipping solutions to eliminate manual work. They will build intelligent, reliable automation using agents, AI-powered tools, multi-step ETL jobs, and internal tooling that Finance teams use, such as lightweight apps, workflow automations, and AI-assisted processes. The engineer must enforce data integrity standards and testing practices to ensure auditability and reliability, ensure AI-assisted processes meet governance and controls standards with clear auditability, and champion a culture of data quality and documentation so that Finance teams trust and rely on the systems built.
Senior Data Intelligence Engineer
The Senior Data Intelligence Engineer is responsible for building and maintaining high-fidelity dbt and SQL models that serve as the foundational data for complex, usage-based revenue models. They develop tools and permissions frameworks enabling 'Analyst Agents' to query data sources such as Athena, correlate Salesforce churn signals, and identify API latency issues. The engineer acts as the technical liaison with the Engineering/Infrastructure team to ensure data contracts are reliable and ready for autonomous agents. They partner with the Head of Data to ingest and transform thousands of hours of unstructured internal call audio into queryable insights for go-to-market teams using Deepgram’s own models. The role includes maintaining a culture focused on automating manual and repetitive SQL tasks through code and agent systems rather than legacy dashboards.
Member of Technical Staff (Data): World Models
Design, automate, maintain, and optimize Python ETL pipelines (Spark/Ray) for large-scale multimodal data. Build and maintain data cataloging, lineage, quality tooling, integrity verification, access controls, and lifecycle management systems. Provide guidance, internal tools, and documentation to colleagues on data best practices. Serve as a custodian of the company’s datasets, ensuring overall data health, quality, and discoverability.
Tech Lead Manager, Data Infrastructure
The Tech Lead Manager, Data Infrastructure at Cartesia is responsible for defining the overall multi-modal data strategy across pre-training and post-training, including human, synthetic, and web-scale data sources. They lead, manage, and mentor a team of data engineers and specialists. They design and oversee the construction of robust, scalable data pipelines for text, audio, and video, establish and enforce rigorous standards for data quality across the organization, deeply understand how data affects model capability and proactively identify and source novel datasets, and manage relationships and budgets with external data vendors and partners.
Engineer, Supercomputing & Distributed Systems
Build and operate infrastructure for research and inference including distributed training, 1000+ Kubernetes GPU clusters, and petabyte-scale data pipelines. Design multi-stage pipelines converting petabytes of raw data into clean, annotated datasets. Run classification models on billions of images. Deploy and combine large language models to caption massive multimedia data. Manage distributed training and inference on GPU Kubernetes clusters. Solve orchestration and scaling challenges for large-scale GPU job processing. Scale workloads and research between clusters in multiple datacenters. Profile and optimize dataloaders streaming thousands of images per second. Profile and debug InfiniBand networking on huge training runs. Build fault tolerance systems for large-scale pretraining. Collaborate with researchers on evolving reinforcement learning infrastructure. Find clean scenes in millions of videos using distributed shot-boundary detection. Customize and train models to filter billions of images. Build systems bridging raw cluster capacity and research output.
Senior Data Engineer
The Senior Data Engineer at HackerOne is responsible for leading the end-to-end design and delivery of scalable, secure, and intelligent data products and solutions to support the company's transformation into an AI-first organization. This role involves partnering across business and engineering teams to identify opportunities for automation, integration, and system modernization, driving the architecture and execution of platform-level capabilities by leveraging AI and modern tooling to reduce manual effort, improve decision-making, and increase system resilience. The engineer will provide technical leadership to internal engineers and external development partners to ensure design quality, operational excellence, and long-term maintainability, shape and contribute to incident and on-call response strategies, playbooks, and processes to build systems that fail gracefully and recover quickly, mentor other engineers and advocate for technical excellence, and promote a culture of innovation and continuous improvement. Additionally, the role includes championing effective change management to ensure systems are successfully launched, adopted, understood, and evolved.
Data Engineer - Foundational
As a Data Engineer on the Foundational team, you will build ETL/ELT pipelines to extract, decode, and store raw Electro-Optical (EO) and Infrared (IR) video from field logs into optimized formats like WebDataset, TFRecords, or Parquet. You will develop algorithms to synchronize EO and IR frames temporally and spatially to provide paired inputs for model training. You will architect storage-to-GPU pipelines to ensure multi-node training clusters maintain over 90% GPU utilization without I/O bottlenecks. Your role includes writing and optimizing distributed data processing jobs using tools such as Apache Spark, Ray, or Apache Beam to process thousands of hours of tactical video logs. You will implement automated quality checks to filter corrupted or blank frames and maintain reproducible training runs through robust versioning and lineage tracking. Additionally, you will assess and implement advanced storage solutions like MinIO and S3 tiering to manage growing datasets while optimizing cost and latency.
Senior AI Platform Engineer (Autonomous Driving)
Set technical strategy and oversee development of a high scale, reliable data platform to manage, visualize, and serve large-scale datasets for ML model training and validation. Build the data lakehouse for autonomous driving scene datasets, including sensor data, calibration data, and annotation data. Drive the Autonomous Driving Data SDK development, including scene data search, datasets preparation, and dataset loading. Identify and resolve performance bottlenecks in the data processing pipelines, including data processing latency, data search latency, and Test Procedure (TP) coverage. Bootstrap and maintain infrastructure for data platform components such as Data Processing Pipeline, Database, Data Lakehouse, and Data Serving. Collaborate with cross-functional teams, including ML algorithm, ML application, and Cloud Infrastructure teams, to align ML platforms with the overall Autonomous Driving System Architecture.
Senior AI Data Pipeline Engineer
Design and build high-performance, scalable data pipelines to support diverse AI and Machine Learning initiatives across the organization. Architect and implement multi-region data infrastructure to ensure global data availability and seamless synchronization. Develop flexible pipeline architectures that allow for complex branching and logic isolation to support multiple concurrent AI projects. Optimize large-scale data processing workloads using Databricks and Spark to maximize throughput and minimize processing costs. Maintain and evolve the containerized data environment on Kubernetes, ensuring robust and reliable execution of data workloads. Collaborate with AI researchers and platform teams to streamline the flow of high-quality data into training and evaluation pipelines.
Access all 4,256 remote & onsite AI jobs.
Frequently Asked Questions
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.
