Researcher: Agent Post-Training, API & Power-Users
The role involves improving the capabilities, reliability, and product fit of OpenAI’s agentic models for power users and API developers. Responsibilities include designing and running experiments to enhance model behavior in API and power-user workflows such as function calling, tool use, coding, planning, and long-horizon execution. The role requires building evals, graders, and environments from real developer and power-user workflows, turning observed failures into training data, hypotheses, and improvements. The researcher partners with API and power-users to identify behavior gaps and translate product signals into post-training interventions. They improve model behavior when composed into systems, ensuring reliable tool use, respect for developer intent, appropriate error handling, clarification when needed, and task coherence. The role also includes owning end-to-end model behavior projects from failure analysis through training, eval design, integration into major model runs, and launch readiness. Developing feedback loops using power-user traces and production-like environments to identify model failures and gaps is part of the job. The researcher assists in deciding which capabilities, fixes, and integrations are ready for major model runs. Additionally, debugging hard failures in models by analyzing traces, evals, training data, and product context is required. The role involves working on early-training and alignment interventions, improving large-scale training and launch machinery, and taking on cross-functional projects that touch model training, product infrastructure, and production agent harnesses, including multi-agent systems and training against production-like environments.
Researcher, Training - London
Design, prototype and scale up new architectures to improve model intelligence; execute and analyze experiments autonomously and collaboratively; study, debug, and optimize both model performance and computational performance; contribute to training and inference infrastructure.
Research Engineer / Research Scientist (Pre-training)
In this role, you will push the frontier of visual generative models. You will work on large-scale pre-training for text-to-image foundation models, shaping objectives, algorithms, data, and systems, and turn novel ideas into models that power products used by millions of users. You will work with a creative and ambitious team of researchers and engineers building the future of the creative economy.
RE/RS, Data Understanding - Foundations
The Data Understanding team is responsible for creating high quality datasets and their quantized representations for OpenAI, which includes synthesizing data, building VQ representations, processing, filtering, deduplication, quality control, and tokenization to enable effective use in large model training runs. The role involves advancing how OpenAI builds and understands pretraining data at scale by treating data quality and curation as core research problems. Responsibilities include developing new methods to select, combine, and transform data, creating datasets that improve model capabilities, designing rigorous experiments to understand how data choices and interventions affect model learning and downstream behavior, and working closely with frontier models and web-scale data to build evidence for effective approaches and translate successful research into scalable data processing pipelines.
RE/RS, Data Understanding (MM)
The Data Understanding team is responsible for creating high quality datasets and their quantized representation for OpenAI, which includes synthesizing multimodal data, building VQ representations, processing, filtering, deduplication, quality control, and tokenization for effective use in big model training runs. The role involves advancing how OpenAI prepares, curates, synthesizes, and understands multimodal data at scale. Responsibilities include working on research and production problems such as synthesizing multimodal content (images, audio, and video) and their supervisions, improving noisy data pipelines, building better quality filters, using models to automate data preparation, and measuring whether changes in the dataset improve model performance. The position also requires owning and driving a research agenda, choosing the right multimodal data problems, and carrying long-running work through to impact, while engaging in an empirical, collaborative approach to research.
Research Engineer - Evals
Build the eval harness for AGI covering model capability, agentic behavior, on-device performance, and end-user experience. Own eval suites gating every model and agent release, including capability, behavior, regressions, and human-rated rubrics. Maintain dashboards and tooling to facilitate fast researcher experiment loops and informed leadership decisions. Set and uphold the criteria for what counts as ready to ship. Assist research by ensuring measurements align with goals. Aid product engineers by instrumenting real-user behavior on devices. Support partnerships by translating performance improvements into measurable terms for OEM partners.
Senior Scientist, Analytical Chemistry
The Senior Scientist is responsible for owning the end-to-end analytical strategy for GC-MS-based programs, including method design, validation frameworks, and data quality standards for targeted and untargeted analyses. They define and evolve sample preparation methodologies for headspace, liquid-phase, and solid-phase extraction of fragrance compounds from complex matrices and consumer products. They maintain and improve Osmo's high-throughput analytical pipeline, ensuring data integrity, reproducibility, and compatibility with downstream machine learning workflows. The role involves partnering with the Platform and ML teams as the chemistry-side technical owner of the data interface, determining methods and procedures for new analytical assignments independently while coordinating execution across team members and collaborating functions. They enforce high standards of scientific rigor and data quality, mentor and develop junior and mid-level scientists, establish best practices, review work for scientific integrity, and elevate the team’s overall analytical capability. Additional responsibilities include writing, editing, and auditing analytical and experimental protocols, serving as an internal expert resource and external-facing collaborator for analytical chemistry questions across Osmo’s scientific and commercial programs.
Researcher, Context - Agent Post-Training
As a Context Researcher on the Agent Post-Training team, the role involves designing and running experiments to improve the scaling of compute on context. The researcher will own end-to-end improvements to the post-training stack, including reinforcement learning, data pipelines, graders, reward signals, evaluations, diagnostics, and model-behavior analysis. Responsibilities include building evaluations and environments to identify model failures and turning those failures into training data, product fixes, or new research directions. The researcher will partner with Codex and ChatGPT product teams to translate product signals into model improvements and work on early-training and alignment interventions such as data mixtures, objectives, synthetic data, and evaluation loops to shape downstream agent behavior. The role involves deciding which integrations, capabilities, and fixes are ready for major model runs, improving machinery for large-scale training and launch including experiment velocity, reliability, observability, reproducibility, cost, latency, and production readiness. The researcher will take on cross-functional projects involving model training, product infrastructure, and the production agent harness and debug failures in shipped or near-shipped models by developing hypotheses, experiments, and fixes from qualitative behaviors.
Researcher, Connectors - Agent Post-Training
As a member of Agent Post-Training, Connectors, you will teach models how to interface with professional software using code, helping train agents to use code, APIs, tools, and structured integrations to operate across applications like Slack, Google Workspace, GitHub, Notion, Linear, Salesforce, and other core systems. You will design and run experiments to improve agentic model behavior for complex software and plugins, own end-to-end improvements to the post-training stack including RL, data pipelines, graders, reward signals, evaluations, diagnostics, and model behavior analysis, and build evaluations and environments that expose model failures to turn those failures into training data, product fixes, or new research directions. You will partner with product teams to understand user needs and translate product signals into model improvements, work on early-training and alignment interventions such as data mixtures, objectives, synthetic data, and evaluation loops, and decide which integrations and capabilities to include in major model runs. Additionally, you will improve large-scale training and launch infrastructure for experiment velocity, reliability, observability, reproducibility, cost, latency, and production readiness, take on cross-functional projects touching model training, product infrastructure, and the production agent harness, and debug failures in shipped or near-shipped models to develop concrete hypotheses, experiments, and fixes.
Researcher, Computer Use - Agent Post-Training
As a member of Agent Post-Training, Computer Use, you will teach models to operate computers, helping to train models that can navigate browsers and desktops, use tools and applications, reason through complex workflows, collaborate with users and other agents, and complete long-horizon tasks with reliability and judgment. Responsibilities include designing and running experiments to improve agentic model behavior for complex computer use, owning end-to-end improvements to the post-training stack such as reinforcement learning, data pipelines, graders, reward signals, evaluations, diagnostics, and model-behavior analysis. You will build evaluations and environments to identify model failures and convert those into training data, product fixes, or research directions. The role involves partnering with product teams to understand user needs and translate product signals into model improvements, working on early-training and alignment interventions, deciding on suitable integrations and fixes for major model runs, and improving large-scale training and launch machinery regarding experiment velocity, reliability, observability, reproducibility, cost, latency, and production readiness. You will also handle cross-functional projects involving model training, product infrastructure, and production agent harness, debug failures in shipped or near-shipped models, and transform qualitative model behavior into concrete hypotheses, experiments, and fixes.
Access all 4,256 remote & onsite AI jobs.
Frequently Asked Questions
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.
