diff --git a/README.md b/README.md
index e236edb..634f2dd 100644
--- a/README.md
+++ b/README.md
@@ -1,17 +1,23 @@
-[](https://www.python.org/downloads/release/python-3109/)
-[](https://pre-commit.com/)
-
-[](https://mypy-lang.org/)
-[](https://beartype.readthedocs.io)
-
# WebArena: A Realistic Web Environment for Building Autonomous Agents
-[[Website]](https://webarena.dev/)
-[[Paper]](https://arxiv.org/pdf/2307.13854.pdf)
+
+WebArena is a standalone, self-hostable web environment for building autonomous agents +
+ + + +  -> WebArena is a standalone, self-hostable web environment for building autonomous agents -> **Note** This README is still under constructions. Stay tuned! ## News * [8/4/2023] Added the instructions and the docker resources to host your own WebArena Environment. Check out [this page](environment_docker/README.md) for details. * [7/29/2023] Added [a well commented script](minimal_example.py) to walk through the environment setup. @@ -25,7 +31,7 @@ pip install -e . # optional, dev only pip install -e ".[dev]" -mypy --install-types --non-interactive browser_env +mypy --install-types --non-interactive browser_env agents evaluation_harness pip install pre-commit pre-commit install ``` @@ -33,11 +39,70 @@ pre-commit install Check out [this script](minimal_example.py) for a quick walkthrough on how to set up the environment and interact with it. ## To Reproduce Our Results -* Setup the `environ` as described in the quick walkthrough -* `python scripts/generate_test_data.py` will generate individual config file for each test example in [config_files](config_files) -* `bash prepare.sh` to obtain the auto-login cookies for all websites -* export OPENAI_API_KEY=your_key -* `python run.py --instruction_path agent/prompts/jsons/p_cot_id_actree_2s.json --test_start_idx 0 --test_end_idx 1 --model gpt-3.5-turbo --result_dir your_result_dir` to run the first example with GPT-3.5 reasoning agent. The trajectory will be saved in `your_result_dir/0.html` +1. Configurate the urls for each website, in the following example, we use the demo websites we host as an example. You can replace the URLs with your own websites if you [host your own WebArena environment](./environment_docker/). +```bash +export SHOPPING="http://ec2-3-131-244-37.us-east-2.compute.amazonaws.com:7770" +export SHOPPING_ADMIN="http://ec2-3-131-244-37.us-east-2.compute.amazonaws.com:7780/admin" +export REDDIT="http://ec2-3-131-244-37.us-east-2.compute.amazonaws.com:9999" +export GITLAB="http://ec2-3-131-244-37.us-east-2.compute.amazonaws.com:8023" +export MAP="http://ec2-3-131-244-37.us-east-2.compute.amazonaws.com:3000" +export WIKIPEDIA="http://ec2-3-131-244-37.us-east-2.compute.amazonaws.com:8888/wikipedia_en_all_maxi_2022-05/A/User:The_other_Kiwix_guy/Landing" +export HOMEPAGE="PASS" # this is a placeholder +``` + +2. Generate config file for each test example +```bash +python scripts/generate_test_data.py +``` +You will see `*.json` files generated in [config_files](./config_files) folder. Each file contains the configuration for one test example. + +3. Obtain the auto-login cookies for all websites +``` +bash prepare.sh +``` +4. export `OPENAI_API_KEY=your_key`, a valid OpenAI API key starts with `sk-` + +5. Launch the evaluation +```bash +python run.py \ + --instruction_path agent/prompts/jsons/p_cot_id_actree_2s.json \ # this is the reasoning agent prompt we used in the paper + --test_start_idx 0 \ + --test_end_idx 1 \ + --model gpt-3.5-turbo \ + --result_dir{action['raw_prediction']}{repr(action)}{action2str(action, action_set_tag, node_content)}{_config_str}\n"
+ task_id = _config["task_id"]
+
+ self.action_set_tag = action_set_tag
+
+ self.render_file = open(
+ Path(result_dir) / f"render_{task_id}.html", "a+"
+ )
+ self.render_file.truncate(0)
+ # write init template
+ self.render_file.write(HTML_TEMPLATE.format(body=f"{_config_str}"))
+ self.render_file.read()
+ self.render_file.flush()
+
+ def render(
+ self,
+ action: Action,
+ state_info: StateInfo,
+ meta_data: dict[str, Any],
+ render_screenshot: bool = False,
+ ) -> None:
+ """Render the trajectory"""
+ # text observation
+ observation = state_info["observation"]
+ text_obs = observation["text"]
+ info = state_info["info"]
+ new_content = f"{text_obs}{action['raw_prediction']}{repr(action)}{action2str(action, action_set_tag, node_content)}{_config_str}\n"
- task_id = _config["task_id"]
-
- self.action_set_tag = action_set_tag
-
- self.render_file = open(
- Path(result_dir) / f"render_{task_id}.html", "a+"
- )
- self.render_file.truncate(0)
- # write init template
- self.render_file.write(HTML_TEMPLATE.format(body=f"{_config_str}"))
- self.render_file.read()
- self.render_file.flush()
-
- def render(
- self,
- action: Action,
- state_info: StateInfo,
- meta_data: dict[str, Any],
- render_screenshot: bool = False,
- ) -> None:
- """Render the trajectory"""
- # text observation
- observation = state_info["observation"]
- text_obs = observation["text"]
- info = state_info["info"]
- new_content = f"{text_obs}