Using AI Agents for Reproducible Research

Dominik Lukeš (Dominik.Lukes@oerc.ox.ac.uk)

dominiks-handouts.pages.dev/oa8c

Introduction

Workshop plan

9:30-10:00 - Getting ready
10:00-11:00 - How agents work
11:00-11:15 - Your first agent task
11:15-11:30 - Break
11:30-12:30 - Learning to instruct your agent with AGENTS.md and SKILLS.md
12:30-13:30 - Lunch
13:30-14:15 - Make your own skill
14:15-15:00 - Using CLI tools (Git, GitHub)
15:00-15:15 - Break
15:15-16:00 - Reproduce a paper
16:00-16:30 - Dissemination and implications

Introduction

Getting to know each other

Introduction

About me

Dominik Lukes Dominik.Lukes@oerc.ox.ac.uk AI Competency Centre

Introduction

About me

Research in linguistics, metaphor and discourse analysis

AI Trends Tracking

ainewsroundup.pages.dev

Introduction

About you

Research
- Discipline
- Methods
Computer use
- Mac/Linux/Windows
- CLI experience
- Coding tools experience
AI Use
- ChatGPT or other chatbots
- Codex or other agents
- Other tools

Introduction

About the workshop

First time
Responding to urgent need
Constantly changing field
Unsettled methods

Getting ready for the rest of the day

Making sure you have Codex

bit.ly/ox-codex-access

Getting ready for the rest of the day

Making sure you have workshop files

github.com/techczech/agents-for-reproduci…

Getting ready for the rest of the day

Questions?

What and why of agents?

What was the last time you used ChatGPT?

What and why of agents?

I haven't ChatGPTChatGPT regularly in months and not at all last week.

What and why of agents?

I have been using Codex every day

What and why of agents?

What this means

I am still using AI
I am using the same model power in a different context
ChatGPT is no longer my main interface to the large language model

What and why of agents?

Codex

What and why of agents?

Linux only? VS Code plugin

Agent

Model using tools in a loop to achieve complex tasks that require multiple steps without user intervention.

Chatbot

Chatbot answering questions based on prompts and documents (sometimes called agent e.g. Customer service agent)

What and why of agents?

How it works: chatbot vs agent

Chatbot

Prompt
Documents
Web-search
Model responds

Agent

User types prompt
Model makes plan
Model calls tools (search, code, command)
Computer runs tools and sends results to model
Model changes plan
Model responds

What and why of agents?

ChatGPT v Codex

ChatGPT

Web-based app
Uses basic tools
Works on uploaded files

Codex

Desktop app
Uses tools in local sandbox
Works on local files

What and why of agents?

ChatGPT and Codex

ChatGPT

Powered by GPT-5
Login with SSO
Usage limits
Available to all

Codex

Powered by GPT-5
Login with SSO
Usage limits
Available after gate

What and why of agents?

Why that matters

The work is more transparent
Outputs are easier to retrieve later
Files stay in the normal workflow
The model can act through tools on the computer

What and why of agents?

A step up in utility

AI as oracle

AI as tool maker

AI as tool user

What Codex Is

What exactly is Codex?

General computer productivity tool
Desktop app
Works on the local machine
Acts through tools on that machine
Powered by the GPT-5.4 family

What Codex Is

What tools does it use?

Run commands
Open and inspect files
Create files and folders
Run computer code

What Codex Is

Folders, projects, and sandboxes

Codex works in folders
A project is really a folder
A sandbox limits what it can touch
Permissions matter

What Codex Is

ChatGPT and Codex comparison

	ChatGPT	Codex
Main form	Cloud-based app	Desktop app
How you control it	By chatting	By chatting
Model power	GPT-5 family	GPT-5 family
Where chat history lives	In the cloud app	In the Codex app, but important project state should still be saved into files
Where working files live	Cloud or uploaded into the cloud app	Local machine
What tools it can use	Limited by ChatGPT Code Interpreter	Anything you can do on your computer with the keyboard
Costs	Flat fee with some usage limits	Fully usage based pricing

What Codex Is

Alternatives to Codex

Claude Code
Cursor
Google Antigravity

Benefits of working locally

Where the work lives

The work is saved to a folder on your machine
Outputs become part of your normal file structure
The chat is not the main archive

Benefits of working locally

Benefits of local work

Retrieval
Portability across tools
Transparency

Benefits of working locally

What gets sent to the model

Task	What may be enough
Organise	File name, size, dates, header, excerpt
Summarise	Part of the file or the whole file
Interpret	Part of the file or the whole file

Benefits of working locally

Trade-offs of local work

Aspect	Advantage	Trade-off
File handling	Files stay in your normal folder structure	You need to manage that structure properly
Retrieval	Outputs are easier to find later	You need to save and name things well
Scale	Easier to work across many local files	You need to know what folder you are giving it access to
Privacy	Some tasks can stay mostly local	Some tasks still require remote model processing
Persistence	The work is tied to your project folder	It does not automatically follow you to every machine

Benefits of working locally

Some other nice features

Benefits of working locally

Some other nice features

you can queue or steer messages

you can have multiple tasks running

context window is visible

dictation anywhere

Five Things You Can Do with Codex

Five things you can do with Codex

Catalogue and organise data
Manage projects and create project outputs
Build and maintain a note system
Create small tools, websites, and dissemination outputs
Set up and control your computer

Five Things You Can Do with Codex

Catalogue and organise your data

Five Things You Can Do with Codex

Example: organising a downloads folder

Ask what is in the folder
Check names, dates, sizes, and duplicates
Ask for suggested organising schemes
Rename or move files after review

Organising a downloads folder with Codex

Five Things You Can Do with Codex

Manage projects and create project outputs

Five Things You Can Do with Codex

Research project outputs

Trackers
Timelines
Literature-review tracker
Dashboards
Fieldwork or admin timeline

Five Things You Can Do with Codex

Research project example

Read milestones and task lists from a project folder
Build a literature-review tracker
Turn notes into a dashboard or timeline

Five Things You Can Do with Codex

Build and maintain a note system

Five Things You Can Do with Codex

Notes example

Design a note structure in Markdown
Add new notes into the same system
Link notes to each other
Summarise material into reusable research notes

Five Things You Can Do with Codex

Create small tools, websites, and dissemination outputs

Five Things You Can Do with Codex

From script to website

Start from a prompt and a folder of files
Generate a script, tool, or web page
Keep the files locally so you can keep editing them
Publish later if you want to

Five Things You Can Do with Codex

Set up and control your computer

Five Things You Can Do with Codex

Setup example

Install a utility or CLI tool
Configure it on the machine
Keep a log of what changed
Repeat the same setup in another project folder

What do you need to know about agents?

Every AI app is two things.

Model + orchestration.

The orchestration is now called the harness.

What do you need to know about agents?

The model and the harness, and what each does.

HARNESS · on your computer

does things

parses model output
formats text and tables
opens, creates, edits files
runs shell commands
loads files into context
sandboxing & permissions
shows the user what's going on
often via complex CLI tools
works inside a folder

MODEL · in the cloud

thinks

follows instructions
writes commands
understands text and images
makes plans, reasons
keeps things in context
can read screenshots
can emit click coordinates
no persistent memory
string in, string out

What do you need to know about agents?

On your computer · in the cloud.

YOUR COMPUTER

YOUR FOLDER

harness — parses, runs, displays
your files
terminal · browser · tools
files survive sessions · audit trail · privacy

THE CLOUD

A SERVER

model — stateless · text in, text out
no access to your files
apps / no memory

What do you need to know about agents?

Most "intelligence" is tool-use.

It is also true of us. We just stopped noticing.

What do you need to know about agents?

~300 instructions in mind.

Context windows: 200k–1M tokens
Models can keep ~300 instructions in mind and act on them

What do you need to know about agents?

Models are now updated every six weeks.

Not retrained from scratch. Fine-tuned variants. The model moves under you.

The agentic loop

The agentic loop.

The agentic loop

Step 1 — the harness assembles the prompt.

The prompt is not just what you typed.

The agentic loop

What the prompt is made of.

system prompt — sets the agent's personality and constraints
prior turns of this conversation
tool definitions — names, descriptions, argument schemas
AGENTS.md from the current folder, if present
files the user attached or the model previously asked for
the user's latest message

The agentic loop

Step 2 — the model responds to the prompt.

The model is generating tokens one at a time. Reasoning is just earlier tokens.

The agentic loop

A response can be:

a chat reply — just text for the user
a tool call — a structured invocation the harness will recognise
reasoning followed by either of the above

The agentic loop

Step 3 — the harness parses the response.

Pattern-match for tool-call markers
Dispatch to the right handler
Sandbox dangerous calls
Ask the user for permission when needed
This is plain code · never an AI decision

The agentic loop

Step 4 — append, repeat.

Tool result becomes the next conversation message
Model sees it as if a colleague had replied
Or: response was just chat — render it, wait for the user
The loop closes either way

The agentic loop

List a folder.

USER

How many PDFs are in my Downloads folder?

HARNESS → MODEL

[system: tools available — list_dir(path), read_file(path)] [user: How many PDFs are in my Downloads folder?]

MODEL · reasoning

I need to inspect the directory first. I'll call list_dir on ~/Downloads, then count files matching .pdf.

MODEL → HARNESS

<tool_call name="list_dir" args={"path": "~/Downloads"}/>

TOOL RESULT

paper-draft.pdf notes-2026-04.md receipt.pdf holiday.jpg slides.pdf budget.xlsx …

MODEL → USER

There are 12 files in Downloads. Three are PDFs: • paper-draft.pdf • receipt.pdf • slides.pdf

The agentic loop

Visualise data — AGENTS.md as project rule.

USER

Make a chart of monthly visitors from data.csv

HARNESS → MODEL

[system: tools — read_file, write_file, run_shell, list_dir] [file: AGENTS.md (auto-loaded from current folder): "This project uses Python only. Never JavaScript. Always save outputs in ./figures/."] [user: Make a chart of monthly visitors from data.csv]

MODEL · reasoning

AGENTS.md says Python only — I'll use matplotlib, not plotly.js. Output goes in ./figures/. Let me read the CSV first.

MODEL → HARNESS

<tool_call name="read_file" args={"path": "data.csv"}/>

TOOL RESULT

month,visitors 2026-01,1240 2026-02,1387 2026-03,1604 2026-04,1851

MODEL → HARNESS

<tool_call name="write_file" args={"path": "figures/visitors.py", "content": "import matplotlib.pyplot as plt\n# (full Python script)\n…"}/>

TOOL RESULT

wrote figures/visitors.py (442 bytes)

MODEL → HARNESS

<tool_call name="run_shell" args={"cmd": "python figures/visitors.py"}/>

TOOL RESULT

saved figures/visitors.png

MODEL → USER

Done — figures/visitors.png shows the four-month trend (climbing from 1,240 in January to 1,851 in April).

The agentic loop

Check Gmail — with authentication.

USER

Summarise unread email from this week.

HARNESS → MODEL

[tools: gmail_search, gmail_read, gmail_authenticate] [user message]

MODEL · reasoning

I'll need to query Gmail. First check auth — call gmail_search and see if it errors.

MODEL → HARNESS

<tool_call name="gmail_search" args={"q": "is:unread newer_than:7d"}/>

TOOL RESULT

error

not authenticated

MODEL → HARNESS

<tool_call name="gmail_authenticate" args={}/>

HARNESS

// opens browser to OAuth consent // user clicks "Allow" // harness stores refresh token securely // model never sees the token

TOOL RESULT

authenticated as dominik@oerc.ox.ac.uk

MODEL → HARNESS

<tool_call name="gmail_search" args={"q": "is:unread newer_than:7d"}/>

TOOL RESULT

8 messages

3 from collaborators, 2 newsletters, 2 admin, 1 student

MODEL → USER

Eight unread messages this week. Three are from collaborators (one waiting on your reply), two are newsletters (skim or skip), two are admin (one needs action by Friday), and one is from a student asking about supervision.

The agentic loop

Research and write a report.

USER

Write me a 2-page brief on retrieval-augmented generation in legal research, with citations. Save it as brief.md.

MODEL · reasoning

Plan

(1) survey current literature, (2) identify the 4–5 main approaches, (3) check for legal-domain specifics, (4) outline, (5) draft, (6) revise. Start with a broad search.

MODEL → HARNESS

<tool_call name="web_search" args={"q": "RAG retrieval-augmented generation survey 2025"}/>

TOOL RESULT

[10 results — survey papers, blog posts, two arxiv preprints]

MODEL · reasoning

I want the survey paper and the legal-domain results. Open the top two.

MODEL → HARNESS

<tool_call name="web_fetch" args={"url": "arxiv.org/abs/2503.12345"}/> <tool_call name="web_fetch" args={"url": "harvardlawreview.org/…"}/>

TOOL RESULT

[two long documents]

MODEL · reasoning

Useful. Both flag retrieval quality and citation faithfulness as the bottleneck. I should write notes before continuing or I'll lose detail.

MODEL → HARNESS

<tool_call name="write_file" args={"path": "research/notes-rag.md", "content": "# RAG notes\n\n…"}/>

TOOL RESULT

wrote research/notes-rag.md

MODEL → HARNESS

<tool_call name="web_search" args={"q": "RAG legal research case law citation faithfulness"}/>

TOOL RESULT

[8 more results, three legal-tech specific]

MODEL → HARNESS

<tool_call name="web_fetch" args={"url": "thomson-reuters.com/…"}/> <tool_call name="web_fetch" args={"url": "casetext-blog.com/…"}/>

TOOL RESULT

[two more documents]

MODEL · reasoning

Enough material. Write the outline first, then the draft.

MODEL → HARNESS

<tool_call name="write_file" args={"path": "brief-outline.md", "content": "..."}/>

MODEL → HARNESS

<tool_call name="write_file" args={"path": "brief.md", "content": "# RAG in Legal Research\n\n…(2 pages)…"}/>

TOOL RESULT

wrote brief.md (1,734 words)

MODEL → USER

Brief saved to brief.md (1,734 words). Three sources, six citations, structured around retrieval quality / citation faithfulness / legal-domain specifics. Notes are in research/notes-rag.md if you want to dig further.

What else do you need to know?

What do you need to know about computers in the age of AI?

Question: Who uses git to manage versions of your work?

What else do you need to know?

Do you need to know any code?

What else do you need to know?

History of software abstractions

Machine code: 89 06 D1 E0 8B 07
Assembler: movl (%rdi), %eax sall $1, %eax movl %eax, (%rsi)
C: int b = 42; int ptr_b = &b; int a = (ptr_b) * 2;
Python: a = b * 2
LLM: Write code that multiplies all values a user enters by 2

What else do you need to know?

But actually most of the code we write is

jsimport ffi from 'ffi-napi';
import ref from 'ref-napi';

// 1. Define explicit types using the ref library
const intPtr = ref.refType(ref.types.int);

// 2. Load the compiled native C library and declare the function signature
const lib = ffi.Library('./libmultiply', {
  'double_value': ['void', [intPtr, intPtr]] // returns void, takes two int pointers
});

// 3. Allocate physical memory buffers to act as pointers
const b_ptr = ref.alloc(ref.types.int, 42); // b = 42
const a_ptr = ref.alloc(ref.types.int, 0);  // a = 0 (allocated space for result)

// 4. Pass the raw memory pointers to the compiled C function
lib.double_value(b_ptr, a_ptr);

// 5. Dereference the pointer to retrieve the value from physical memory
const a = a_ptr.deref(); 

console.log(`Value at pointer a_ptr: ${a}`); // Outputs: 84

What else do you need to know?

It's hard to read code for humans

There’s a subtle reason that programmers always want to throw away the code and start over. The reason is that they think the old code is a mess. And here is the interesting observation: they are probably wrong. The reason that they think the old code is a mess is because of a cardinal, fundamental law of programming:
It’s harder to read code than to write it.

Joel Spolsky, 2000, Things You Should Never Do, Part I – Joel on Software

What else do you need to know?

It's easy to read code for Large Language Models

For Large Language Models reading and writing code are perfectly symmetrical skills just like reading and writing emails for humans.

What else do you need to know?

But you still have to learn something because of...

What else do you need to know?

Law of Leaky abstractions

All non-trivial abstractions, to some degree, are leaky.

Joel Spolsky, The Law of Leaky Abstractions)

What else do you need to know?

Leaky abstractions

Abstractions hide underlying complexity and make it seem that things just work as one thing.
But no abstraction is perfect. The complexity leaks, and if you have some understanding of it, you can deal with it better.
Abstractions are all around us. Words, objects, machines, concepts, theories.

What else do you need to know?What do you need to know about computers?

What else do you need to know?

What do you need to know about computers?

Step 1: Ask the LLM to make you software

Write a tool that will analyse the word frequencies in a text and display them in a graph.

Step 2: Get result

Question: Why is "the" not the most frequent word?

Explore puzzle

Abstraction you have to know: Typical word frequency analyser software includes "stopwords".

What else do you need to know?Leaky abstractions are not just a problem for computers

What else do you need to know?

Leaky abstractions are not just a problem for computers

What else do you need to know?

The knowledge needed today

Knowledge you don't need

Complex algorithms
Syntax of computer languages
Variables, functions, etc.
How to use developer tools

Knowledge you do need

How your computer works
Principles of software architecture
Developer tools landscape
How AI agents work
What is possible with software

What else do you need to know?

Some useful things to know

Computer

File system details
Plain text / Markdown vs Rich Text
CLI tools
Pipes and deamons

Development

Libraries
Git and GitHub
Content vs display
What an API can do for you
Hosting (Cloudflare)

Software architecture

When you need a database
Refactoring
Modularity
Security concerns

Agents

Model v harness
Model tool use
Model capabilities

What else do you need to know?

Big lesson from the agent revolution

It is now worth learning more about computers at every level because you can now do something about it.

What else do you need to know?

Today

First step on the journey.

Your "first" agent task

The assignment

Organise a messy folder full of files.

Your "first" agent task

What to do:

1Unzip sample folder on your computer
2Open messy-research-archive in Codex
3Ask Codex to analyse it

Your "first" agent task

Debrief

How did it go? Lessons learned?

Your "first" agent task

What permissions did you give Codex?

Break - 15 minutes

How to instruct your agents with AGENTS.md

Instructions and outputs belong in files and not in chat

AGENTS.md
Plan files
Notes and reports

How to instruct your agents with AGENTS.md

Plain text is best

How to instruct your agents with AGENTS.mdUse Markdown to give plaint text structure

Markdown element	Example
Heading	`# Project plan`
Bullet list	`- interview notes`
Link	`instructions`

How to instruct your agents with AGENTS.md

Why files?

Chat is transient files are persistent
Other agents can use the files
Other people can access the files

How to instruct your agents with AGENTS.mdContext management

Files help you manage what gets put in the context window must manage the context window.

How to instruct your agents with AGENTS.mdThe real prompt

Every time you send a prompt, the Large Language Model (LLM) receives about 20-30k words of other instructions including the whole history of the current chat.

How to instruct your agents with AGENTS.mdWhen the chat history is too long

How to instruct your agents with AGENTS.md

When the chat history is too long

How to instruct your agents with AGENTS.md

AGENTS.md is 'permanent context'

Place AGENTS.md in a folder
Codex or Antigravity automatically load it when they work in the folder
Subfolders can have their own AGENTS.md

How to instruct your agents with AGENTS.md

Claude workaround

Claude uses CLAUDE.md
Put it next to AGENTS.md
Put @./AGENTS.md as the only text in CLAUDE.md

How to instruct your agents with AGENTS.mdWhat to put in AGENTS.md

Include	Why it helps
What the folder is for	Keeps the project purpose visible
Naming conventions	Makes new files consistent
Expected outputs	Tells Codex what to create
Things to avoid	Reduces accidental changes
Privacy or sharing notes	Makes handling expectations explicit

How to instruct your agents with AGENTS.md

Take advantage of "progressive disclosure"

1Models can only keep about 300 instructions "in mind"
2Agents can use tools to load files
3Point AGENTS.md at files with more instructions

How to instruct your agents with AGENTS.md

My "new project" set up

AGENT-INSTRUCTIONS

CHANGELOG

TASK-LOG

AGENTS.md

How to instruct your agents with AGENTS.mdTime to explore

How to instruct your agents with AGENTS.md

Time to explore

Have a look at AGENTS.md site

AGENTS.md

Explore the sample project

Open project-management-sample-project in Codex and ask to create an html report on the project status.

From AGENTS.md to SKILL.md

From AGENTS.md to SKILL.mdWhat is a skill?

Agent Skills are a lightweight, open format for extending AI agent capabilities with specialized knowledge and workflows.

AgentSkills.io

From AGENTS.md to SKILL.md

How does a skill work?

Skill is a folder with instruction files in it. By convention the top file is called SKILL.md and the agent loads its metadata into its context window.

From AGENTS.md to SKILL.md

How does an agent use a skill?

1You can "install" a skill
2You can point the agent at the skill link or folder and tell it to use it

From AGENTS.md to SKILL.md

SKILL.md vs AGENTS.md

AGENTS.md

Placed in project folder
Automatically loaded by agent when working in the folder

SKILL.md

Placed in its own folder
Agent loads it on request or when it thinks it needs it

Lunch - 1 hour

Trying and making skills

Explore some skills

Trying and making skills

Where do Skills live in Codex?

Trying and making skills

How do you call a skill?

Trying and making skills

Use a skill

Install Matt Pocock's TEACH skill from skills/skills/productivity/teach at main · mattpocock/skills · GitHub

Note: This way is very inefficient and 'token expensive', it is easier to download the skill.

Trying and making skills

Create a skill

Agents using tools

Three types of tools

Shell commands
- cd
- ls
- mkdir
CLI tools
- git
- gh
- wrangler
API services
- Lab notebooks
- Canvas

Agents using tools

Two tools for today

GitHub
- keeping records
- versioning
- sharing
Cloudflare
- building websites
- disseminating

Tracking data with Git and Github

Why do you need git?

Track versions of your documents.

Tracking data with Git and Github

Prerequisites

1Create a GitHub account
2Install git
3Install GitHub CLI tool gh

Tracking data with Git and GithubAgents can help you

Tracking data with Git and Github

Agents can help you

Step 1

Ask Codex to help you set up git and the GitHub CLI.

Step 2

Ask Codex to make a sample current folder a git repo and push it to GitHub as a private repository.

Break

Putting it all together - Working with data

Putting it all together - Working with dataGetting started

Open research-data-for-replication in Codex

Putting it all together - Working with data

Setting up a project

1Ask to make it a git repository
2Ask Codex to help you create AGENTS.md with instructions
3Ask Codex to analyse the data

Putting it all together - Working with data

Dissemination with Cloudflare (optional)

Your task is to publish a simple free website with your results using Cloudflare Pages. Codex can help you. You should be able to take all that you learned and be able to achieve the result.

Agents and Your Data: Big questions

Big questions

Security, safety and privacy
Cost
Learning

Agents and Your Data: Big questions

Three questions about safety

Safe use of your computer
Secure tools built
Data protection

Agents and Your Data: Big questionsSafe use of your computer

Agents and Your Data: Big questions

Safe use of your computer

Permissions

Secure software

Ask for security reviews
Keep secrets in a secret manager
Be mindful about architecture and dependencies

This requires focus on typical security practices. Codex can help but force it consider what it does, search and review.

Data protection

Spend time learning about the data policies of your project, research group and university. There are no blank permissions.

Agents and Your Data: Big questions

Final question - Cost / Benefit

How much are you willing to invest in AI-assisted research?

Money
Time
Effort to learn and keep up

Agents and Your Data: Big questions

What does research look like in the age of AI agents?

Thank you

Dominik Lukeš (Dominik.Lukes@oerc.ox.ac.uk)

dominiks-handouts.pages.dev/oa8c