mirror of https://github.com/web-arena-x/webarena.git synced 2026-02-06 11:16:53 +00:00

Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"

agent nlp

Go to file

alexisxy 741292e1d2 fix force_prefix missing bug		2023-09-23 00:16:18 -04:00
.github/workflows	Update tests configs to fit the current settings	2023-08-16 01:29:16 -04:00
agent	fix force_prefix missing bug	2023-09-23 00:16:18 -04:00
browser_env	fix statictext bounding box bug	2023-09-21 23:29:04 -04:00
config_files	Use more exact_match if possible	2023-09-15 17:38:17 -04:00
environment_docker	add mirror download links	2023-09-08 05:46:42 -04:00
evaluation_harness	add support to evaluate by trace	2023-09-21 23:31:05 -04:00
llms	support generation retry when the parsing of the action failed	2023-09-22 17:29:49 -04:00
media	add logo	2023-08-15 21:30:45 -04:00
resources	Add agent execution traces	2023-08-07 23:19:06 -04:00
scripts	remove beartype for efficency purpose	2023-09-12 22:26:23 -04:00
tests	Merge remote-tracking branch 'origin/bug-in-current-viewport-gitlab' into new_eval	2023-09-18 13:02:18 -04:00
.gitignore	ignore cache	2023-09-22 17:30:02 -04:00
.pre-commit-config.yaml	add instruction for self-hosting webarena	2023-08-04 00:07:25 -04:00
check_errors.sh	release commit	2023-07-25 00:30:29 -10:00
CITATION.cff	Create CITATION.cff	2023-07-31 09:26:24 -04:00
LICENSE	release commit	2023-07-25 00:30:29 -10:00
minimal_example.py	Update tests configs to fit the current settings	2023-08-16 01:29:16 -04:00
prepare.sh	release commit	2023-07-25 00:30:29 -10:00
README.md	Merge pull request #22 from web-arena-x/improve_doc	2023-08-23 18:02:28 -04:00
requirements.txt	update evaluators to match the new config format	2023-09-15 22:28:58 -04:00
run.py	support generation retry when the parsing of the action failed	2023-09-22 17:29:49 -04:00
setup.cfg	update README	2023-07-29 01:58:09 -10:00
setup.py	release commit	2023-07-25 00:30:29 -10:00

README.md

WebArena: A Realistic Web Environment for Building Autonomous Agents

WebArena is a standalone, self-hostable web environment for building autonomous agents

Website • Paper

Roadmap

In-house end-to-end evaluation. We are working on an API that accepts predicted actions from any interface and then returns the subsequent observation.
Support more agents with different prompting mechanisms such as ASH.

News

[8/4/2023] Added the instructions and the docker resources to host your own WebArena Environment. Check out this page for details.
[7/29/2023] Added a well commented script to walk through the environment setup.

Install

# Python 3.10+
conda create -n webarena python=3.10; conda activate webarena
pip install -r requirements.txt
playwright install
pip install -e .

# optional, dev only
pip install -e ".[dev]"
mypy --install-types --non-interactive browser_env agents evaluation_harness
pip install pre-commit
pre-commit install

Quick Walkthrough

Check out this script for a quick walkthrough on how to set up the browser environment and interact with it using the demo sites we hosted. This script is only for education purpose, to perform reproducible experiments, please check out the next section. In the nutshell, using WebArena is very similar to using OpenAI Gym. The following code snippet shows how to interact with the environment.

from browser_env import ScriptBrowserEnv, create_id_based_action
# init the environment
env = ScriptBrowserEnv(
    headless=False,
    observation_type="accessibility_tree",
    current_viewport_only=True,
    viewport_size={"width": 1280, "height": 720},
)
# prepare the environment for a configuration defined in a json file
config_file = "config_files/0.json"
obs, info = env.reset(options={"config_file": config_file})
# get the text observation (e.g., html, accessibility tree) through obs["text"]

# create a random action
id = random.randint(0, 1000)
action = create_id_based_action(f"click [id]")

# take the action
obs, _, terminated, _, info = env.step(action)

End-to-end Evaluation

Setup the standalone environment. Please check out this page for details.
Configurate the urls for each website.

export SHOPPING="<your_shopping_site_domain>:7770"
export SHOPPING_ADMIN="<your_e_commerce_cms_domain>:7780/admin"
export REDDIT="<your_reddit_domain>:9999"
export GITLAB="<your_gitlab_domain>:8023"
export MAP="<your_map_domain>:3000"
export WIKIPEDIA="<your_wikipedia_domain>:8888/wikipedia_en_all_maxi_2022-05/A/User:The_other_Kiwix_guy/Landing"
export HOMEPAGE="<your_homepage_domain>:4399" # this is a placeholder

You are encouraged to update the environment variables in github workflow to ensure the correctness of unit tests

Generate config file for each test example

python scripts/generate_test_data.py

You will see *.json files generated in config_files folder. Each file contains the configuration for one test example.

Obtain the auto-login cookies for all websites

mkdir -p ./.auth
python browser_env/auto_login.py

export OPENAI_API_KEY=your_key, a valid OpenAI API key starts with sk-
Launch the evaluation

python run.py \
  --instruction_path agent/prompts/jsons/p_cot_id_actree_2s.json \ # this is the reasoning agent prompt we used in the paper
  --test_start_idx 0 \
  --test_end_idx 1 \
  --model gpt-3.5-turbo \
  --result_dir <your_result_dir>

This script will run the first example with GPT-3.5 reasoning agent. The trajectory will be saved in <your_result_dir>/0.html

Develop Your Prompt-based Agent

Define the prompts. We provide two baseline agents whose correrponding prompts are listed here. Each prompt is a dictionary with the following keys:

prompt = {
  "intro": <The overall guideline which includes the task description, available action, hint and others>,
  "examples": [
    (
      example_1_observation,
      example_1_response
    ),
    (
      example_2_observation,
      example_2_response
    ),
    ...
  ],
  "template": <How to organize different information such as observation, previous action, instruction, url>,
  "meta_data": {
    "observation": <Which observation space the agent uses>,
    "action_type": <Which action space the agent uses>,
    "keywords": <The keywords used in the template, the program will later enumerate all keywords in the template to see if all of them are correctly replaced with the content>,
    "prompt_constructor": <Which prompt construtor is in used, the prompt constructor will construct the input feed to an LLM and extract the action from the generation, more details below>,
    "action_splitter": <Inside which splitter can we extract the action, used by the prompt constructor>
    }
  }

Implement the prompt constructor. An example prompt constructor using Chain-of-thought/ReAct style reasoning is here. The prompt constructor is a class with the following methods:

construct: construct the input feed to an LLM
_extract_action: given the generation from an LLM, how to extract the phrase that corresponds to the action

Citation

If you use our environment or data, please cite our paper:

@article{zhou2023webarena,
  title={WebArena: A Realistic Web Environment for Building Autonomous Agents},
  author={Zhou, Shuyan and Xu, Frank F and Zhu, Hao and Zhou, Xuhui and Lo, Robert and Sridhar, Abishek and Cheng, Xianyi and Bisk, Yonatan and Fried, Daniel and Alon, Uri and others},
  journal={arXiv preprint arXiv:2307.13854},
  year={2023}
}