[ad_1]
“Software engineering is changing, and by the end of 2025 it’s going to look fundamentally different.” Greg Brockman’s opening line at OpenAI’s launch event set the tone for what followed. OpenAI released Codex, a cloud‑native software agent designed to work alongside developers.
Codex is not a single product but a family of agents powered by codex‑1, OpenAI’s latest coding model. Codex CLI, arrived a few weeks ago as a lightweight companion that runs inside your terminal. Today the spotlight shifts to its bigger, remote agent that is available entirely on ChatGPT. You can spin up several “mini‑computers” and tackle multiple tasks while you’re off grabbing coffee. This article is going to be an overview of Codex on ChatGPT, and we will soon be releasing some project based articles on the topic.
From Autocomplete to Vibe Coding
OpenAI started working toward AI-assisted coding back in 2021, when the original Codex model launched and powered tools like GitHub Copilot. Back then, it functioned more like an autocomplete support for developers.
Since then, a lot has changed. Thanks to major advances in reinforcement learning, Codex has grown far more capable.
Now, in a world where vibe coding is becoming the new normal, you can just describe what you want in native language and Codex figures out how to build it. The newest version, Codex‑1, is built on OpenAI’s o3 architecture and fine-tuned using real-world pull requests. It doesn’t just generate code; it follows best practices like linting, writing tests, and keeping a consistent style, making it genuinely useful for real development work.
Also Read: A Guide to Master the Art of Vibe Coding
Availability and Limits
Codex is currently available to ChatGPT Pro, Enterprise, and Team users. Plus and EDU users are expected to gain access soon. During the research preview, usage is subject to generous limits, but these may evolve based on demand. Future plans include an API for Codex, integration into CI pipelines, and unification between the CLI and ChatGPT versions to allow seamless handoffs between local and cloud development.
How to Access Codex in the ChatGPT Interface?
Time needed: 5 minutes
Follow these simple steps to start using Codex:
- Find Codex in ChatGPT
Open ChatGPT and go to “Codex” sidebar in the left navigation rail you’ll see a new “Codex (beta)” icon. Click it to reveal the agent dashboard.
- Multi-factor authentication
Click “Set up MFA to continue,” scan the QR code with your preferred authentication app (like Google Authenticator or Authy), then enter the code to verify. That’s it, you’re all set.
- Connect GitHub (first‑time only)
A single OAuth click authorises Codex to read/write on your repos. You can restrict it to specific organisations or personal projects.
- Select a repository & branch
Pick the project you’d like Codex to work on. The agent clones this branch into its own sandbox.
- Configure the environment (optional)
Add environment variables, secrets, or setup commands just like you would in a CI job. Linters and formatters come preinstalled, but you can override the versions if needed.
- Choose a task template
Ask: “Explain the architecture.”
Code: “Find and fix the flakey test in test_api.py.”
Suggest: Let Codex scan the repo and propose maintenance chores.
Or just type a custom instruction in natural language. - Run & multitask
Press “Launch”. Each job spins up its own micro‑VM; you can queue dozens in parallel and continue chatting elsewhere in ChatGPT.
- Review results
Green check‑marks indicate passing tests. Click a task card to see the diff, the model’s explanation, and the full work‑log.
- Merge or iterate
Hit “Open PR” to push the branch back to GitHub or reply to the task with follow‑up instructions if changes are needed.
OpenAI Codex Demo
In this section, I am sharing the different examples demonstrating how this new software development agent can sort your life!
Example 1: Accelerate Development
OpenAI engineer Nacho Soto shows how Codex helps him start new tasks faster by setting up project scaffolding, like Swift packages. With simple prompts, he was able to offload the setup and focus on building features while Codex handles the rest in the background.
Example 2: Review Workflows
Codex supports more than just code generation. It also fits into review workflows, where developers check AI-generated pull requests, spot issues like formatting problems, and prompt Codex to make fixes.
Example 3: Fixing Papercuts with Codex
Engineer Max Johnson explains how Codex helps fix small bugs and code quality issues without breaking his focus. Rather than switching contexts, he hands off these tasks to Codex and reviews the results later to improve the codebase.
Example 4: Finding Error in Codebase while on a Call
Calvin shares how Codex helps with urgent tasks during on-call shifts. By sending stack traces to Codex, he can quickly get diagnostics or fixes. It also helps fine-tune alerts and handle routine ops work, cutting down on manual effort.
OpenAI Codex vs o3
Prompt: “Please fix the following issue in the matplotlib/matplotlib repository. Please resolve the issue in the problem below by editing and testing code files in your current code execution session. The repository is cloned in the /testbed folder. You must fully solve the problem for your answer to be considered correct.”
Problem statement:[Bug]: Windows correction is not correct in `mlab._spectral_helper`
### Bug summaryWindows correction is not correct in `mlab._spectral_helper`:
https://github.com/matplotlib/matplotlib/blob/3418bada1c1f44da1f73916c5603e3ae79fe58c1/lib/matplotlib/mlab.py#L423-L430The `np.abs` is not needed, and give wrong result for window with negative value, such as `flattop`.
For reference, the implementation of scipy can be found here :
https://github.com/scipy/scipy/blob/d9f75db82fdffef06187c9d8d2f0f5b36c7a791b/scipy/signal/_spectral_py.py#L1854-L1859### Code for reproduction
```python
import numpy as np
from scipy import signal
window = signal.windows.flattop(512)
print(np.abs(window).sum()**2-window.sum()**2)
```### Actual outcome
4372.942556173262
### Expected outcome
0
### Additional information
_No response_
### Operating system
_No response_
### Matplotlib Version
latest
### Matplotlib Backend
_No response_
### Python version
_No response_
### Jupyter version
_No response_
### Installation
None
Output:
Observation:
The fix generated by Codex is more accurate and complete than the one from o3. It correctly removes the unnecessary use of np.abs() in window normalization within mlab._spectral_helper, which had caused incorrect results for windows with negative values like flattop. Codex replaces the faulty logic with a proper mathematical expression, using (window2).sum() instead of (np.abs(window)2).sum(), which matches best practices seen in SciPy’s implementation. It also includes a unit test to confirm the behavior, making the fix both verifiable and reliable. In comparison, the o3 output seems incomplete and doesn’t clearly resolve the core issue, making Codex the stronger option.
Working of OpenAI’s Codex
- Codex writes code: The model starts by generating code to solve a given task.
- It runs the code: The output is not just evaluated for plausibility, but actually executed.
- It checks test results: Codex observes whether the generated code passes the relevant tests.
- It gets rewarded only if the task is completed successfully: Unlike traditional LLMs that focus on next-word prediction, Codex only gets a high score if the code works end-to-end.
- It learns through feedback: If the code fails, Codex retries: creating repro scripts, fixing lint errors, and adjusting formatting until it meets standards.
- It evolves like a junior developer: This training method teaches Codex to behave less like a text generator and more like a thoughtful engineer following real-world coding practices.

Codex‑1 outperforms previous models both in standardized benchmarks and internal OpenAI workflows. As shown below, it achieves higher accuracy on the SWE-Bench Verified benchmark across all attempt counts and leads in OpenAI’s internal software engineering tasks. This highlights Codex‑1’s real-world reliability, especially for developers integrating it into daily workflows.

A Peek Inside the Cloud Workshop
Every time you press Run in the Codex sidebar, the system creates a micro‑VM sandbox: its own file‑system, CPU, RAM, and locked‑down network policy. Your repository is cloned, environment variables injected, and common developer tools (linters, formatters, test runners) pre‑installed. That isolation delivers two immediate benefits:
- Safety & Reproducibility – Rogue scripts can’t touch your laptop or leak secrets; the whole run can be replayed later.
- Parallelism at Scale – Need to fix typos, harmonise time‑outs, and hunt a mysterious bug? Launch three tasks and review the results side‑by‑side.
An optional AGENTS.md file acts like a README for robots: you describe the project layout, how to run tests, preferred commit style, even a request to print ASCII cats between steps. The richer the instructions, the smoother Codex behaves.
Also Read:
Conclusion
“I just landed a multi‑file refactor that never touched my laptop.”
– OpenAI Engineer
Stories like that hint at a future where coding resembles high‑level orchestration: you provide intent, the agent grinds through the details. Codex represents a shift in how developers interact with code, moving from writing everything manually to orchestrating high-level tasks. Engineers now focus more on intent and validation, while Codex handles execution. For many, this signals the beginning of a new development workflow, where human and agent collaboration becomes the standard rather than the exception.
How are you planning to use Codex? Let me know in the comment section below!
Login to continue reading and enjoy expert-curated content.
[ad_2]