Using AI Agents for Reproducible Research

Dominik Lukeš (Dominik.Lukes@oerc.ox.ac.uk)

dominiks-handouts.pages.dev/oa8c

Introduction

Introduction

Workshop plan

  • 9:30-10:00 - Getting ready
  • 10:00-11:00 - How agents work
  • 11:00-11:15 - Your first agent task
  • 11:15-11:30 - Break
  • 11:30-12:30 - Learning to instruct your agent with AGENTS.md and SKILLS.md
  • 12:30-13:30 - Lunch
  • 13:30-14:15 - Make your own skill
  • 14:15-15:00 - Using CLI tools (Git, GitHub)
  • 15:00-15:15 - Break
  • 15:15-16:00 - Reproduce a paper
  • 16:00-16:30 - Dissemination and implications

Introduction

Getting to know each other

Introduction

About me

Dominik Lukes Dominik.Lukes@oerc.ox.ac.uk AI Competency Centre

ainewsroundup.pages.dev

Introduction

About you

  • Research
    • Discipline
    • Methods
  • Computer use
    • Mac/Linux/Windows
    • CLI experience
    • Coding tools experience
  • AI Use
    • ChatGPT or other chatbots
    • Codex or other agents
    • Other tools

Introduction

About the workshop

  • First time
  • Responding to urgent need
  • Constantly changing field
  • Unsettled methods

Getting ready for the rest of the day

Getting ready for the rest of the day

Making sure you have Codex

bit.ly/ox-codex-access

Getting ready for the rest of the day

Making sure you have workshop files

github.com/techczech/agents-for-reproduci…

Getting ready for the rest of the day

Questions?

What and why of agents?

What and why of agents?

What was the last time you used ChatGPT?

What and why of agents?

I haven't ChatGPTChatGPT regularly in months and not at all last week.

Screenshot illustrating daily Codex use

What and why of agents?

I have been using Codex every day

codex profile card 1

What and why of agents?

What this means

  • I am still using AI
  • I am using the same model power in a different context
  • ChatGPT is no longer my main interface to the large language model

What and why of agents?

Codex

The Codex desktop app

What and why of agents?

Linux only? VS Code plugin

Codex VS Code plugin

Chatbot vs agent

Agent

Model using tools in a loop to achieve complex tasks that require multiple steps without user intervention.

Chatbot

Chatbot answering questions based on prompts and documents (sometimes called agent e.g. Customer service agent)

What and why of agents?

How it works: chatbot vs agent

Chatbot

  • Prompt
  • Documents
  • Web-search
  • Model responds

Agent

  • User types prompt
  • Model makes plan
  • Model calls tools (search, code, command)
  • Computer runs tools and sends results to model
  • Model changes plan
  • Model responds

What and why of agents?

ChatGPT v Codex

ChatGPT

  • Web-based app
  • Uses basic tools
  • Works on uploaded files

Codex

  • Desktop app
  • Uses tools in local sandbox
  • Works on local files

What and why of agents?

ChatGPT and Codex

ChatGPT

  • Powered by GPT-5
  • Login with SSO
  • Usage limits
  • Available to all

Codex

  • Powered by GPT-5
  • Login with SSO
  • Usage limits
  • Available after gate

What and why of agents?

Why that matters

  • The work is more transparent
  • Outputs are easier to retrieve later
  • Files stay in the normal workflow
  • The model can act through tools on the computer

What and why of agents?

A step up in utility

AI as oracle
AI as tool maker
AI as tool user

What Codex Is

What Codex Is

What exactly is Codex?

  • General computer productivity tool
  • Desktop app
  • Works on the local machine
  • Acts through tools on that machine
  • Powered by the GPT-5.4 family

What Codex Is

What tools does it use?

  • Run commands
  • Open and inspect files
  • Create files and folders
  • Run computer code

What Codex Is

Folders, projects, and sandboxes

  • Codex works in folders
  • A project is really a folder
  • A sandbox limits what it can touch
  • Permissions matter

What Codex Is

ChatGPT and Codex comparison

ChatGPTCodex
Main formCloud-based appDesktop app
How you control itBy chattingBy chatting
Model powerGPT-5 familyGPT-5 family
Where chat history livesIn the cloud appIn the Codex app, but important project state should still be saved into files
Where working files liveCloud or uploaded into the cloud appLocal machine
What tools it can useLimited by ChatGPT Code InterpreterAnything you can do on your computer with the keyboard
CostsFlat fee with some usage limitsFully usage based pricing

What Codex Is

Alternatives to Codex

  • Claude Code
  • Cursor
  • Google Antigravity

Benefits of working locally

Benefits of working locally

Where the work lives

  • The work is saved to a folder on your machine
  • Outputs become part of your normal file structure
  • The chat is not the main archive

Benefits of working locally

Benefits of local work

  • Retrieval
  • Portability across tools
  • Transparency

Benefits of working locally

What gets sent to the model

TaskWhat may be enough
OrganiseFile name, size, dates, header, excerpt
SummarisePart of the file or the whole file
InterpretPart of the file or the whole file

Benefits of working locally

Trade-offs of local work

AspectAdvantageTrade-off
File handlingFiles stay in your normal folder structureYou need to manage that structure properly
RetrievalOutputs are easier to find laterYou need to save and name things well
ScaleEasier to work across many local filesYou need to know what folder you are giving it access to
PrivacySome tasks can stay mostly localSome tasks still require remote model processing
PersistenceThe work is tied to your project folderIt does not automatically follow you to every machine

Benefits of working locally

Some other nice features

Five Things You Can Do with Codex

Five Things You Can Do with Codex

Five things you can do with Codex

  • Catalogue and organise data
  • Manage projects and create project outputs
  • Build and maintain a note system
  • Create small tools, websites, and dissemination outputs
  • Set up and control your computer

Five Things You Can Do with Codex

Catalogue and organise your data

Five Things You Can Do with Codex

Example: organising a downloads folder

  • Ask what is in the folder
  • Check names, dates, sizes, and duplicates
  • Ask for suggested organising schemes
  • Rename or move files after review
Organising a downloads folder with Codex

Five Things You Can Do with Codex

Manage projects and create project outputs

Five Things You Can Do with Codex

Research project outputs

  • Trackers
  • Timelines
  • Literature-review tracker
  • Dashboards
  • Fieldwork or admin timeline

Five Things You Can Do with Codex

Research project example

  • Read milestones and task lists from a project folder
  • Build a literature-review tracker
  • Turn notes into a dashboard or timeline
Research project tracker example

Five Things You Can Do with Codex

Build and maintain a note system

Five Things You Can Do with Codex

Notes example

  • Design a note structure in Markdown
  • Add new notes into the same system
  • Link notes to each other
  • Summarise material into reusable research notes
Markdown note system example

Five Things You Can Do with Codex

Create small tools, websites, and dissemination outputs

Five Things You Can Do with Codex

From script to website

  • Start from a prompt and a folder of files
  • Generate a script, tool, or web page
  • Keep the files locally so you can keep editing them
  • Publish later if you want to
From a script to a published website

Five Things You Can Do with Codex

Set up and control your computer

Five Things You Can Do with Codex

Setup example

  • Install a utility or CLI tool
  • Configure it on the machine
  • Keep a log of what changed
  • Repeat the same setup in another project folder
Machine setup example

What do you need to know about agents?

What do you need to know about agents?

Every AI app is two things.

Model + orchestration.

The orchestration is now called the harness.

What do you need to know about agents?

The model and the harness, and what each does.

HARNESS · on your computer

does things

  • parses model output
  • formats text and tables
  • opens, creates, edits files
  • runs shell commands
  • loads files into context
  • sandboxing & permissions
  • shows the user what's going on
  • often via complex CLI tools
  • works inside a folder

MODEL · in the cloud

thinks

  • follows instructions
  • writes commands
  • understands text and images
  • makes plans, reasons
  • keeps things in context
  • can read screenshots
  • can emit click coordinates
  • no persistent memory
  • string in, string out

What do you need to know about agents?

On your computer · in the cloud.

YOUR COMPUTER

YOUR FOLDER

  • harness — parses, runs, displays
  • your files
  • terminal · browser · tools
  • files survive sessions · audit trail · privacy

THE CLOUD

A SERVER

  • model — stateless · text in, text out
  • no access to your files
  • apps / no memory

What do you need to know about agents?

Most "intelligence" is tool-use.

It is also true of us. We just stopped noticing.

What do you need to know about agents?

~300 instructions in mind.

  • Context windows: 200k–1M tokens
  • Models can keep ~300 instructions in mind and act on them

What do you need to know about agents?

Models are now updated every six weeks.

Not retrained from scratch. Fine-tuned variants. The model moves under you.

The agentic loop

The agentic loop

The agentic loop.

The agentic loop

Step 1 — the harness assembles the prompt.

The prompt is not just what you typed.

The agentic loop

What the prompt is made of.

  • system prompt — sets the agent's personality and constraints
  • prior turns of this conversation
  • tool definitions — names, descriptions, argument schemas
  • AGENTS.md from the current folder, if present
  • files the user attached or the model previously asked for
  • the user's latest message

The agentic loop

Step 2 — the model responds to the prompt.

The model is generating tokens one at a time. Reasoning is just earlier tokens.

The agentic loop

A response can be:

  • a chat reply — just text for the user
  • a tool call — a structured invocation the harness will recognise
  • reasoning followed by either of the above

The agentic loop

Step 3 — the harness parses the response.

  • Pattern-match for tool-call markers
  • Dispatch to the right handler
  • Sandbox dangerous calls
  • Ask the user for permission when needed
  • This is plain code · never an AI decision

The agentic loop

Step 4 — append, repeat.

  • Tool result becomes the next conversation message
  • Model sees it as if a colleague had replied
  • Or: response was just chat — render it, wait for the user
  • The loop closes either way

The agentic loop

List a folder.

USER
How many PDFs are in my Downloads folder?
HARNESS → MODEL
[system: tools available — list_dir(path), read_file(path)] [user: How many PDFs are in my Downloads folder?]
MODEL · reasoning
I need to inspect the directory first. I'll call list_dir on ~/Downloads, then count files matching .pdf.
MODEL → HARNESS
<tool_call name="list_dir" args={"path": "~/Downloads"}/>
TOOL RESULT
paper-draft.pdf notes-2026-04.md receipt.pdf holiday.jpg slides.pdf budget.xlsx …
MODEL → USER
There are 12 files in Downloads. Three are PDFs: • paper-draft.pdf • receipt.pdf • slides.pdf

The agentic loop

Visualise data — AGENTS.md as project rule.

USER
Make a chart of monthly visitors from data.csv
HARNESS → MODEL
[system: tools — read_file, write_file, run_shell, list_dir] [file: AGENTS.md (auto-loaded from current folder): "This project uses Python only. Never JavaScript. Always save outputs in ./figures/."] [user: Make a chart of monthly visitors from data.csv]
MODEL · reasoning
AGENTS.md says Python only — I'll use matplotlib, not plotly.js. Output goes in ./figures/. Let me read the CSV first.
MODEL → HARNESS
<tool_call name="read_file" args={"path": "data.csv"}/>
TOOL RESULT
month,visitors 2026-01,1240 2026-02,1387 2026-03,1604 2026-04,1851
MODEL → HARNESS
<tool_call name="write_file" args={"path": "figures/visitors.py", "content": "import matplotlib.pyplot as plt\n# (full Python script)\n…"}/>
TOOL RESULT
wrote figures/visitors.py (442 bytes)
MODEL → HARNESS
<tool_call name="run_shell" args={"cmd": "python figures/visitors.py"}/>
TOOL RESULT
saved figures/visitors.png
MODEL → USER
Done — figures/visitors.png shows the four-month trend (climbing from 1,240 in January to 1,851 in April).

The agentic loop

Check Gmail — with authentication.

USER
Summarise unread email from this week.
HARNESS → MODEL
[tools: gmail_search, gmail_read, gmail_authenticate] [user message]
MODEL · reasoning
I'll need to query Gmail. First check auth — call gmail_search and see if it errors.
MODEL → HARNESS
<tool_call name="gmail_search" args={"q": "is:unread newer_than:7d"}/>
TOOL RESULT
error
not authenticated
MODEL → HARNESS
<tool_call name="gmail_authenticate" args={}/>
HARNESS
// opens browser to OAuth consent // user clicks "Allow" // harness stores refresh token securely // model never sees the token
TOOL RESULT
authenticated as dominik@oerc.ox.ac.uk
MODEL → HARNESS
<tool_call name="gmail_search" args={"q": "is:unread newer_than:7d"}/>
TOOL RESULT
8 messages
3 from collaborators, 2 newsletters, 2 admin, 1 student
MODEL → USER
Eight unread messages this week. Three are from collaborators (one waiting on your reply), two are newsletters (skim or skip), two are admin (one needs action by Friday), and one is from a student asking about supervision.

The agentic loop

Research and write a report.

USER
Write me a 2-page brief on retrieval-augmented generation in legal research, with citations. Save it as brief.md.
MODEL · reasoning
Plan
(1) survey current literature, (2) identify the 4–5 main approaches, (3) check for legal-domain specifics, (4) outline, (5) draft, (6) revise. Start with a broad search.
MODEL → HARNESS
<tool_call name="web_search" args={"q": "RAG retrieval-augmented generation survey 2025"}/>
TOOL RESULT
[10 results — survey papers, blog posts, two arxiv preprints]
MODEL · reasoning
I want the survey paper and the legal-domain results. Open the top two.
MODEL → HARNESS
<tool_call name="web_fetch" args={"url": "arxiv.org/abs/2503.12345"}/> <tool_call name="web_fetch" args={"url": "harvardlawreview.org/…"}/>
TOOL RESULT
[two long documents]
MODEL · reasoning
Useful. Both flag retrieval quality and citation faithfulness as the bottleneck. I should write notes before continuing or I'll lose detail.
MODEL → HARNESS
<tool_call name="write_file" args={"path": "research/notes-rag.md", "content": "# RAG notes\n\n…"}/>
TOOL RESULT
wrote research/notes-rag.md
MODEL → HARNESS
<tool_call name="web_search" args={"q": "RAG legal research case law citation faithfulness"}/>
TOOL RESULT
[8 more results, three legal-tech specific]
MODEL → HARNESS
<tool_call name="web_fetch" args={"url": "thomson-reuters.com/…"}/> <tool_call name="web_fetch" args={"url": "casetext-blog.com/…"}/>
TOOL RESULT
[two more documents]
MODEL · reasoning
Enough material. Write the outline first, then the draft.
MODEL → HARNESS
<tool_call name="write_file" args={"path": "brief-outline.md", "content": "..."}/>
MODEL → HARNESS
<tool_call name="write_file" args={"path": "brief.md", "content": "# RAG in Legal Research\n\n…(2 pages)…"}/>
TOOL RESULT
wrote brief.md (1,734 words)
MODEL → USER
Brief saved to brief.md (1,734 words). Three sources, six citations, structured around retrieval quality / citation faithfulness / legal-domain specifics. Notes are in research/notes-rag.md if you want to dig further.

What else do you need to know?

What else do you need to know?

What do you need to know about computers in the age of AI?

Question: Who uses git to manage versions of your work?

What else do you need to know?

Do you need to know any code?

file 2fb4de740d68a2395842e1c6b284909b

What else do you need to know?

History of software abstractions

  • Machine code: 89 06 D1 E0 8B 07
  • Assembler: movl (%rdi), %eax sall $1, %eax movl %eax, (%rsi)
  • C: int b = 42; int ptr_b = &b; int a = (ptr_b) * 2;
  • Python: a = b * 2
  • LLM: Write code that multiplies all values a user enters by 2

What else do you need to know?

But actually most of the code we write is

jsimport ffi from 'ffi-napi';
import ref from 'ref-napi';

// 1. Define explicit types using the ref library
const intPtr = ref.refType(ref.types.int);

// 2. Load the compiled native C library and declare the function signature
const lib = ffi.Library('./libmultiply', {
  'double_value': ['void', [intPtr, intPtr]] // returns void, takes two int pointers
});

// 3. Allocate physical memory buffers to act as pointers
const b_ptr = ref.alloc(ref.types.int, 42); // b = 42
const a_ptr = ref.alloc(ref.types.int, 0);  // a = 0 (allocated space for result)

// 4. Pass the raw memory pointers to the compiled C function
lib.double_value(b_ptr, a_ptr);

// 5. Dereference the pointer to retrieve the value from physical memory
const a = a_ptr.deref(); 

console.log(`Value at pointer a_ptr: ${a}`); // Outputs: 84

What else do you need to know?

It's hard to read code for humans

There’s a subtle reason that programmers always want to throw away the code and start over. The reason is that they think the old code is a mess. And here is the interesting observation: they are probably wrong. The reason that they think the old code is a mess is because of a cardinal, fundamental law of programming:

It’s harder to read code than to write it.

What else do you need to know?

It's easy to read code for Large Language Models

For Large Language Models reading and writing code are perfectly symmetrical skills just like reading and writing emails for humans.

What else do you need to know?

But you still have to learn something because of...

What else do you need to know?

Law of Leaky abstractions

All non-trivial abstractions, to some degree, are leaky.

What else do you need to know?

Leaky abstractions

  • Abstractions hide underlying complexity and make it seem that things just work as one thing.
  • But no abstraction is perfect. The complexity leaks, and if you have some understanding of it, you can deal with it better.
  • Abstractions are all around us. Words, objects, machines, concepts, theories.

What else do you need to know?What do you need to know about computers?

What do you need to know about computers?

What else do you need to know?Leaky abstractions are not just a problem for computers

Leaky abstractions are not just a problem for computers

What else do you need to know?

The knowledge needed today

Knowledge you don't need

  • Complex algorithms
  • Syntax of computer languages
  • Variables, functions, etc.
  • How to use developer tools

Knowledge you do need

  • How your computer works
  • Principles of software architecture
  • Developer tools landscape
  • How AI agents work
  • What is possible with software

What else do you need to know?

Some useful things to know

Computer

  1. File system details

  2. Plain text / Markdown vs Rich Text

  3. CLI tools

  4. Pipes and deamons

Development

  1. Libraries

  2. Git and GitHub

  3. Content vs display

  4. What an API can do for you

  5. Hosting (Cloudflare)

Software architecture

  1. When you need a database

  2. Refactoring

  3. Modularity

  4. Security concerns

Agents

  1. Model v harness

  2. Model tool use

  3. Model capabilities

What else do you need to know?

Big lesson from the agent revolution

It is now worth learning more about computers at every level because you can now do something about it.

What else do you need to know?

Today

First step on the journey.

Your "first" agent task

Your "first" agent task

The assignment

Organise a messy folder full of files.

Your "first" agent task

What to do:

  • 1Unzip sample folder on your computer
  • 2Open messy-research-archive in Codex
  • 3Ask Codex to analyse it

Your "first" agent task

Debrief

How did it go? Lessons learned?

Your "first" agent task

What permissions did you give Codex?

YOLO mode warning

Break - 15 minutes

How to instruct your agents with AGENTS.md

How to instruct your agents with AGENTS.md

Instructions and outputs belong in files and not in chat

  • AGENTS.md
  • Plan files
  • Notes and reports

How to instruct your agents with AGENTS.md

Plain text is best

How to instruct your agents with AGENTS.mdUse Markdown to give plaint text structure

Use Markdown to give plaint text structure

Markdown elementExample
Heading# Project plan
Bullet list- interview notes
Linkinstructions
Markdown source example

How to instruct your agents with AGENTS.md

Why files?

  • Chat is transient files are persistent
  • Other agents can use the files
  • Other people can access the files

How to instruct your agents with AGENTS.mdContext management

Context management

Files help you manage what gets put in the context window must manage the context window.

How to instruct your agents with AGENTS.mdThe real prompt

The real prompt

Codex context transparency view

Every time you send a prompt, the Large Language Model (LLM) receives about 20-30k words of other instructions including the whole history of the current chat.

How to instruct your agents with AGENTS.mdWhen the chat history is too long

When the chat history is too long

How to instruct your agents with AGENTS.md

AGENTS.md is 'permanent context'

  • Place AGENTS.md in a folder
  • Codex or Antigravity automatically load it when they work in the folder
  • Subfolders can have their own AGENTS.md

How to instruct your agents with AGENTS.md

Claude workaround

  • Claude uses CLAUDE.md
  • Put it next to AGENTS.md
  • Put @./AGENTS.md as the only text in CLAUDE.md

How to instruct your agents with AGENTS.mdWhat to put in AGENTS.md

What to put in AGENTS.md

IncludeWhy it helps
What the folder is forKeeps the project purpose visible
Naming conventionsMakes new files consistent
Expected outputsTells Codex what to create
Things to avoidReduces accidental changes
Privacy or sharing notesMakes handling expectations explicit
AGENTS.md example

How to instruct your agents with AGENTS.md

Take advantage of "progressive disclosure"

  • 1Models can only keep about 300 instructions "in mind"
  • 2Agents can use tools to load files
  • 3Point AGENTS.md at files with more instructions

How to instruct your agents with AGENTS.md

My "new project" set up

AGENT-INSTRUCTIONS
CHANGELOG
TASK-LOG
AGENTS.md

How to instruct your agents with AGENTS.mdTime to explore

Time to explore

From AGENTS.md to SKILL.md

From AGENTS.md to SKILL.mdWhat is a skill?

What is a skill?

Agent Skills are a lightweight, open format for extending AI agent capabilities with specialized knowledge and workflows.

From AGENTS.md to SKILL.md

How does a skill work?

Skill is a folder with instruction files in it. By convention the top file is called SKILL.md and the agent loads its metadata into its context window.

From AGENTS.md to SKILL.md

How does an agent use a skill?

  • 1You can "install" a skill
  • 2You can point the agent at the skill link or folder and tell it to use it

From AGENTS.md to SKILL.md

SKILL.md vs AGENTS.md

AGENTS.md

  1. Placed in project folder

  2. Automatically loaded by agent when working in the folder

SKILL.md

  1. Placed in its own folder

  2. Agent loads it on request or when it thinks it needs it

Lunch - 1 hour

Trying and making skills

Trying and making skills

Where do Skills live in Codex?

Trying and making skills

How do you call a skill?

Trying and making skills

Use a skill

Install Matt Pocock's TEACH skill from skills/skills/productivity/teach at main · mattpocock/skills · GitHub

Note: This way is very inefficient and 'token expensive', it is easier to download the skill.

Trying and making skills

Create a skill

Agents using tools

Agents using tools

Three types of tools

  • Shell commands
    • cd
    • ls
    • mkdir
  • CLI tools
    • git
    • gh
    • wrangler
  • API services
    • Lab notebooks
    • Canvas

Agents using tools

Two tools for today

  • GitHub
    • keeping records
    • versioning
    • sharing
  • Cloudflare
    • building websites
    • disseminating

Tracking data with Git and Github

Tracking data with Git and Github

Why do you need git?

Track versions of your documents.

Tracking data with Git and Github

Prerequisites

  • 1Create a GitHub account
  • 2Install git
  • 3Install GitHub CLI tool gh

Tracking data with Git and GithubAgents can help you

Agents can help you

Break

Putting it all together - Working with data

Putting it all together - Working with dataGetting started

Getting started

Open research-data-for-replication in Codex

Putting it all together - Working with data

Setting up a project

  • 1Ask to make it a git repository
  • 2Ask Codex to help you create AGENTS.md with instructions
  • 3Ask Codex to analyse the data

Putting it all together - Working with data

Dissemination with Cloudflare (optional)

Your task is to publish a simple free website with your results using Cloudflare Pages. Codex can help you. You should be able to take all that you learned and be able to achieve the result.

Agents and Your Data: Big questions

Agents and Your Data: Big questions

Big questions

  • Security, safety and privacy
  • Cost
  • Learning

Agents and Your Data: Big questions

Three questions about safety

  • Safe use of your computer
  • Secure tools built
  • Data protection

Agents and Your Data: Big questionsSafe use of your computer

Safe use of your computer

Agents and Your Data: Big questions

Final question - Cost / Benefit

How much are you willing to invest in AI-assisted research?

  • Money
  • Time
  • Effort to learn and keep up

Agents and Your Data: Big questions

What does research look like in the age of AI agents?

Thank you

Dominik Lukeš (Dominik.Lukes@oerc.ox.ac.uk)

dominiks-handouts.pages.dev/oa8c