Introduction
Using AI Agents for Reproducible Research
Dominik Lukeš (Dominik.Lukes@oerc.ox.ac.uk)
Introduction
Introduction
Workshop plan
- 9:30-10:00 - Getting ready
- 10:00-11:00 - How agents work
- 11:00-11:15 - Your first agent task
- 11:15-11:30 - Break
- 11:30-12:30 - Learning to instruct your agent with AGENTS.md and SKILLS.md
- 12:30-13:30 - Lunch
- 13:30-14:15 - Make your own skill
- 14:15-15:00 - Using CLI tools (Git, GitHub)
- 15:00-15:15 - Break
- 15:15-16:00 - Reproduce a paper
- 16:00-16:30 - Dissemination and implications
Introduction
Getting to know each other
Introduction
About me
Dominik Lukes Dominik.Lukes@oerc.ox.ac.uk AI Competency Centre
Research in linguistics, metaphor and discourse analysis
AI Trends Tracking
Introduction
About you
- Research
- Discipline
- Methods
- Computer use
- Mac/Linux/Windows
- CLI experience
- Coding tools experience
- AI Use
- ChatGPT or other chatbots
- Codex or other agents
- Other tools
Introduction
About the workshop
- First time
- Responding to urgent need
- Constantly changing field
- Unsettled methods
Getting ready for the rest of the day
Getting ready for the rest of the day
Making sure you have Codex
Getting ready for the rest of the day
Making sure you have workshop files
Getting ready for the rest of the day
Questions?
What and why of agents?
What and why of agents?
What was the last time you used ChatGPT?
What and why of agents?
I haven't ChatGPTChatGPT regularly in months and not at all last week.
What and why of agents?
I have been using Codex every day
What and why of agents?
What this means
- I am still using AI
- I am using the same model power in a different context
- ChatGPT is no longer my main interface to the large language model
What and why of agents?
Codex
What and why of agents?
Linux only? VS Code plugin
Chatbot vs agent
Agent
Model using tools in a loop to achieve complex tasks that require multiple steps without user intervention.
Chatbot
Chatbot answering questions based on prompts and documents (sometimes called agent e.g. Customer service agent)
What and why of agents?
How it works: chatbot vs agent
Chatbot
- Prompt
- Documents
- Web-search
- Model responds
Agent
- User types prompt
- Model makes plan
- Model calls tools (search, code, command)
- Computer runs tools and sends results to model
- Model changes plan
- Model responds
What and why of agents?
ChatGPT v Codex
ChatGPT
- Web-based app
- Uses basic tools
- Works on uploaded files
Codex
- Desktop app
- Uses tools in local sandbox
- Works on local files
What and why of agents?
ChatGPT and Codex
ChatGPT
- Powered by GPT-5
- Login with SSO
- Usage limits
- Available to all
Codex
- Powered by GPT-5
- Login with SSO
- Usage limits
- Available after gate
What and why of agents?
Why that matters
- The work is more transparent
- Outputs are easier to retrieve later
- Files stay in the normal workflow
- The model can act through tools on the computer
What and why of agents?
A step up in utility
What Codex Is
What Codex Is
What exactly is Codex?
- General computer productivity tool
- Desktop app
- Works on the local machine
- Acts through tools on that machine
- Powered by the GPT-5.4 family
What Codex Is
What tools does it use?
- Run commands
- Open and inspect files
- Create files and folders
- Run computer code
What Codex Is
Folders, projects, and sandboxes
- Codex works in folders
- A project is really a folder
- A sandbox limits what it can touch
- Permissions matter
What Codex Is
ChatGPT and Codex comparison
| ChatGPT | Codex | |
|---|---|---|
| Main form | Cloud-based app | Desktop app |
| How you control it | By chatting | By chatting |
| Model power | GPT-5 family | GPT-5 family |
| Where chat history lives | In the cloud app | In the Codex app, but important project state should still be saved into files |
| Where working files live | Cloud or uploaded into the cloud app | Local machine |
| What tools it can use | Limited by ChatGPT Code Interpreter | Anything you can do on your computer with the keyboard |
| Costs | Flat fee with some usage limits | Fully usage based pricing |
What Codex Is
Alternatives to Codex
- Claude Code
- Cursor
- Google Antigravity
Benefits of working locally
Benefits of working locally
Where the work lives
- The work is saved to a folder on your machine
- Outputs become part of your normal file structure
- The chat is not the main archive
Benefits of working locally
Benefits of local work
- Retrieval
- Portability across tools
- Transparency
Benefits of working locally
What gets sent to the model
| Task | What may be enough |
|---|---|
| Organise | File name, size, dates, header, excerpt |
| Summarise | Part of the file or the whole file |
| Interpret | Part of the file or the whole file |
Benefits of working locally
Trade-offs of local work
| Aspect | Advantage | Trade-off |
|---|---|---|
| File handling | Files stay in your normal folder structure | You need to manage that structure properly |
| Retrieval | Outputs are easier to find later | You need to save and name things well |
| Scale | Easier to work across many local files | You need to know what folder you are giving it access to |
| Privacy | Some tasks can stay mostly local | Some tasks still require remote model processing |
| Persistence | The work is tied to your project folder | It does not automatically follow you to every machine |
Benefits of working locally
Some other nice features
Benefits of working locally
Some other nice features
you can queue or steer messages
you can have multiple tasks running
context window is visible
dictation anywhere
Five Things You Can Do with Codex
Five Things You Can Do with Codex
Five things you can do with Codex
- Catalogue and organise data
- Manage projects and create project outputs
- Build and maintain a note system
- Create small tools, websites, and dissemination outputs
- Set up and control your computer
Five Things You Can Do with Codex
Catalogue and organise your data
Five Things You Can Do with Codex
Example: organising a downloads folder
- Ask what is in the folder
- Check names, dates, sizes, and duplicates
- Ask for suggested organising schemes
- Rename or move files after review
Five Things You Can Do with Codex
Manage projects and create project outputs
Five Things You Can Do with Codex
Research project outputs
- Trackers
- Timelines
- Literature-review tracker
- Dashboards
- Fieldwork or admin timeline
Five Things You Can Do with Codex
Research project example
- Read milestones and task lists from a project folder
- Build a literature-review tracker
- Turn notes into a dashboard or timeline
Five Things You Can Do with Codex
Build and maintain a note system
Five Things You Can Do with Codex
Notes example
- Design a note structure in Markdown
- Add new notes into the same system
- Link notes to each other
- Summarise material into reusable research notes
Five Things You Can Do with Codex
Create small tools, websites, and dissemination outputs
Five Things You Can Do with Codex
From script to website
- Start from a prompt and a folder of files
- Generate a script, tool, or web page
- Keep the files locally so you can keep editing them
- Publish later if you want to
Five Things You Can Do with Codex
Set up and control your computer
Five Things You Can Do with Codex
Setup example
- Install a utility or CLI tool
- Configure it on the machine
- Keep a log of what changed
- Repeat the same setup in another project folder
What do you need to know about agents?
What do you need to know about agents?
Every AI app is two things.
Model + orchestration.
The orchestration is now called the harness.
What do you need to know about agents?
The model and the harness, and what each does.
HARNESS · on your computer
does things
- parses model output
- formats text and tables
- opens, creates, edits files
- runs shell commands
- loads files into context
- sandboxing & permissions
- shows the user what's going on
- often via complex CLI tools
- works inside a folder
MODEL · in the cloud
thinks
- follows instructions
- writes commands
- understands text and images
- makes plans, reasons
- keeps things in context
- can read screenshots
- can emit click coordinates
- no persistent memory
- string in, string out
What do you need to know about agents?
On your computer · in the cloud.
YOUR COMPUTER
YOUR FOLDER
- harness — parses, runs, displays
- your files
- terminal · browser · tools
- files survive sessions · audit trail · privacy
THE CLOUD
A SERVER
- model — stateless · text in, text out
- no access to your files
- apps / no memory
What do you need to know about agents?
Most "intelligence" is tool-use.
It is also true of us. We just stopped noticing.
What do you need to know about agents?
~300 instructions in mind.
- Context windows: 200k–1M tokens
- Models can keep ~300 instructions in mind and act on them
What do you need to know about agents?
Models are now updated every six weeks.
Not retrained from scratch. Fine-tuned variants. The model moves under you.
The agentic loop
The agentic loop
The agentic loop.
The agentic loop
Step 1 — the harness assembles the prompt.
The prompt is not just what you typed.
The agentic loop
What the prompt is made of.
- system prompt — sets the agent's personality and constraints
- prior turns of this conversation
- tool definitions — names, descriptions, argument schemas
- AGENTS.md from the current folder, if present
- files the user attached or the model previously asked for
- the user's latest message
The agentic loop
Step 2 — the model responds to the prompt.
The model is generating tokens one at a time. Reasoning is just earlier tokens.
The agentic loop
A response can be:
- a chat reply — just text for the user
- a tool call — a structured invocation the harness will recognise
- reasoning followed by either of the above
The agentic loop
Step 3 — the harness parses the response.
- Pattern-match for tool-call markers
- Dispatch to the right handler
- Sandbox dangerous calls
- Ask the user for permission when needed
- This is plain code · never an AI decision
The agentic loop
Step 4 — append, repeat.
- Tool result becomes the next conversation message
- Model sees it as if a colleague had replied
- Or: response was just chat — render it, wait for the user
- The loop closes either way
The agentic loop
List a folder.
The agentic loop
Visualise data — AGENTS.md as project rule.
The agentic loop
Check Gmail — with authentication.
The agentic loop
Research and write a report.
What else do you need to know?
What else do you need to know?
What do you need to know about computers in the age of AI?
Question: Who uses git to manage versions of your work?
What else do you need to know?
Do you need to know any code?
What else do you need to know?
History of software abstractions
- Machine code: 89 06 D1 E0 8B 07
- Assembler: movl (%rdi), %eax sall $1, %eax movl %eax, (%rsi)
- C: int b = 42; int ptr_b = &b; int a = (ptr_b) * 2;
- Python: a = b * 2
- LLM: Write code that multiplies all values a user enters by 2
What else do you need to know?
But actually most of the code we write is
jsimport ffi from 'ffi-napi';
import ref from 'ref-napi';
// 1. Define explicit types using the ref library
const intPtr = ref.refType(ref.types.int);
// 2. Load the compiled native C library and declare the function signature
const lib = ffi.Library('./libmultiply', {
'double_value': ['void', [intPtr, intPtr]] // returns void, takes two int pointers
});
// 3. Allocate physical memory buffers to act as pointers
const b_ptr = ref.alloc(ref.types.int, 42); // b = 42
const a_ptr = ref.alloc(ref.types.int, 0); // a = 0 (allocated space for result)
// 4. Pass the raw memory pointers to the compiled C function
lib.double_value(b_ptr, a_ptr);
// 5. Dereference the pointer to retrieve the value from physical memory
const a = a_ptr.deref();
console.log(`Value at pointer a_ptr: ${a}`); // Outputs: 84
What else do you need to know?
It's hard to read code for humans
There’s a subtle reason that programmers always want to throw away the code and start over. The reason is that they think the old code is a mess. And here is the interesting observation: they are probably wrong. The reason that they think the old code is a mess is because of a cardinal, fundamental law of programming:
It’s harder to read code than to write it.
- Joel Spolsky, 2000, Things You Should Never Do, Part I – Joel on Software
What else do you need to know?
It's easy to read code for Large Language Models
For Large Language Models reading and writing code are perfectly symmetrical skills just like reading and writing emails for humans.
What else do you need to know?
But you still have to learn something because of...
What else do you need to know?
Law of Leaky abstractions
All non-trivial abstractions, to some degree, are leaky.
- Joel Spolsky, The Law of Leaky Abstractions)
What else do you need to know?
Leaky abstractions
- Abstractions hide underlying complexity and make it seem that things just work as one thing.
- But no abstraction is perfect. The complexity leaks, and if you have some understanding of it, you can deal with it better.
- Abstractions are all around us. Words, objects, machines, concepts, theories.
What else do you need to know? · What do you need to know about computers?
What do you need to know about computers?
What else do you need to know?
What do you need to know about computers?
Step 1: Ask the LLM to make you software
Write a tool that will analyse the word frequencies in a text and display them in a graph.
Step 2: Get result
Question: Why is "the" not the most frequent word?
Explore puzzle
Abstraction you have to know: Typical word frequency analyser software includes "stopwords".
What else do you need to know? · Leaky abstractions are not just a problem for computers
Leaky abstractions are not just a problem for computers
What else do you need to know?
Leaky abstractions are not just a problem for computers
What else do you need to know?
The knowledge needed today
Knowledge you don't need
- Complex algorithms
- Syntax of computer languages
- Variables, functions, etc.
- How to use developer tools
Knowledge you do need
- How your computer works
- Principles of software architecture
- Developer tools landscape
- How AI agents work
- What is possible with software
What else do you need to know?
Some useful things to know
Computer
File system details
Plain text / Markdown vs Rich Text
CLI tools
Pipes and deamons
Development
Libraries
Git and GitHub
Content vs display
What an API can do for you
Hosting (Cloudflare)
Software architecture
When you need a database
Refactoring
Modularity
Security concerns
Agents
Model v harness
Model tool use
Model capabilities
What else do you need to know?
Big lesson from the agent revolution
It is now worth learning more about computers at every level because you can now do something about it.
What else do you need to know?
Today
First step on the journey.
Your "first" agent task
Your "first" agent task
The assignment
Organise a messy folder full of files.
Your "first" agent task
What to do:
- 1Unzip sample folder on your computer
- 2Open
messy-research-archivein Codex - 3Ask Codex to analyse it
Your "first" agent task
Debrief
How did it go? Lessons learned?
Your "first" agent task
What permissions did you give Codex?
Break - 15 minutes
How to instruct your agents with AGENTS.md
How to instruct your agents with AGENTS.md
Instructions and outputs belong in files and not in chat
AGENTS.md- Plan files
- Notes and reports
How to instruct your agents with AGENTS.md
Plain text is best
How to instruct your agents with AGENTS.md · Use Markdown to give plaint text structure
Use Markdown to give plaint text structure
| Markdown element | Example |
|---|---|
| Heading | # Project plan |
| Bullet list | - interview notes |
| Link | instructions |
How to instruct your agents with AGENTS.md
Why files?
- Chat is transient files are persistent
- Other agents can use the files
- Other people can access the files
How to instruct your agents with AGENTS.md · Context management
Context management
Files help you manage what gets put in the context window must manage the context window.
How to instruct your agents with AGENTS.md · The real prompt
The real prompt
Every time you send a prompt, the Large Language Model (LLM) receives about 20-30k words of other instructions including the whole history of the current chat.
How to instruct your agents with AGENTS.md · When the chat history is too long
When the chat history is too long
How to instruct your agents with AGENTS.md
When the chat history is too long
How to instruct your agents with AGENTS.md
AGENTS.md is 'permanent context'
- Place AGENTS.md in a folder
- Codex or Antigravity automatically load it when they work in the folder
- Subfolders can have their own AGENTS.md
How to instruct your agents with AGENTS.md
Claude workaround
- Claude uses CLAUDE.md
- Put it next to AGENTS.md
- Put
@./AGENTS.mdas the only text in CLAUDE.md
How to instruct your agents with AGENTS.md · What to put in AGENTS.md
What to put in AGENTS.md
| Include | Why it helps |
|---|---|
| What the folder is for | Keeps the project purpose visible |
| Naming conventions | Makes new files consistent |
| Expected outputs | Tells Codex what to create |
| Things to avoid | Reduces accidental changes |
| Privacy or sharing notes | Makes handling expectations explicit |
How to instruct your agents with AGENTS.md
Take advantage of "progressive disclosure"
- 1Models can only keep about 300 instructions "in mind"
- 2Agents can use tools to load files
- 3Point AGENTS.md at files with more instructions
How to instruct your agents with AGENTS.md
My "new project" set up
How to instruct your agents with AGENTS.md · Time to explore
Time to explore
How to instruct your agents with AGENTS.md
Time to explore
Have a look at AGENTS.md site
Explore the sample project
Open project-management-sample-project in Codex and ask to create an html report on the project status.
From AGENTS.md to SKILL.md
From AGENTS.md to SKILL.md · What is a skill?
What is a skill?
Agent Skills are a lightweight, open format for extending AI agent capabilities with specialized knowledge and workflows.
From AGENTS.md to SKILL.md
How does a skill work?
Skill is a folder with instruction files in it. By convention the top file is called SKILL.md and the agent loads its metadata into its context window.
From AGENTS.md to SKILL.md
How does an agent use a skill?
- 1You can "install" a skill
- 2You can point the agent at the skill link or folder and tell it to use it
From AGENTS.md to SKILL.md
SKILL.md vs AGENTS.md
AGENTS.md
Placed in project folder
Automatically loaded by agent when working in the folder
SKILL.md
Placed in its own folder
Agent loads it on request or when it thinks it needs it
Lunch - 1 hour
Trying and making skills
Trying and making skills
Where do Skills live in Codex?
Trying and making skills
How do you call a skill?
Trying and making skills
Use a skill
Install Matt Pocock's TEACH skill from skills/skills/productivity/teach at main · mattpocock/skills · GitHub
Note: This way is very inefficient and 'token expensive', it is easier to download the skill.
Trying and making skills
Create a skill
Agents using tools
Agents using tools
Three types of tools
- Shell commands
- cd
- ls
- mkdir
- CLI tools
- git
- gh
- wrangler
- API services
- Lab notebooks
- Canvas
Agents using tools
Two tools for today
- GitHub
- keeping records
- versioning
- sharing
- Cloudflare
- building websites
- disseminating
Tracking data with Git and Github
Tracking data with Git and Github
Why do you need git?
Track versions of your documents.
Tracking data with Git and Github
Prerequisites
- 1Create a GitHub account
- 2Install git
- 3Install GitHub CLI tool gh
Tracking data with Git and Github · Agents can help you
Agents can help you
Tracking data with Git and Github
Agents can help you
Step 1
Ask Codex to help you set up git and the GitHub CLI.
Step 2
Ask Codex to make a sample current folder a git repo and push it to GitHub as a private repository.
Break
Putting it all together - Working with data
Putting it all together - Working with data · Getting started
Getting started
Open research-data-for-replication in Codex
Putting it all together - Working with data
Setting up a project
- 1Ask to make it a git repository
- 2Ask Codex to help you create AGENTS.md with instructions
- 3Ask Codex to analyse the data
Putting it all together - Working with data
Dissemination with Cloudflare (optional)
Your task is to publish a simple free website with your results using Cloudflare Pages. Codex can help you. You should be able to take all that you learned and be able to achieve the result.
Agents and Your Data: Big questions
Agents and Your Data: Big questions
Big questions
- Security, safety and privacy
- Cost
- Learning
Agents and Your Data: Big questions
Three questions about safety
- Safe use of your computer
- Secure tools built
- Data protection
Agents and Your Data: Big questions · Safe use of your computer
Safe use of your computer
Agents and Your Data: Big questions
Safe use of your computer
Permissions
Secure software
- Ask for security reviews
- Keep secrets in a secret manager
- Be mindful about architecture and dependencies
This requires focus on typical security practices. Codex can help but force it consider what it does, search and review.
Data protection
Spend time learning about the data policies of your project, research group and university. There are no blank permissions.
Agents and Your Data: Big questions
Final question - Cost / Benefit
How much are you willing to invest in AI-assisted research?
- Money
- Time
- Effort to learn and keep up
Agents and Your Data: Big questions
What does research look like in the age of AI agents?
Thank you
Dominik Lukeš (Dominik.Lukes@oerc.ox.ac.uk)