diff --git a/README.md b/README.md index a1386aa..0650635 100644 --- a/README.md +++ b/README.md @@ -24,7 +24,7 @@ ![Overview](media/overview.png) ## Update on 12/5/2024 -> [!IMPORTANT] +> [!IMPORTANT] > This repository hosts the *canonical* implementation of WebArena to reproduce the results reported in the paper. The web navigation infrastructure has been significantly enhanced by [AgentLab](https://github.com/ServiceNow/AgentLab/), introducing several key features: (1) support for parallel experiments using [BrowserGym](https://github.com/ServiceNow/BrowserGym), (2) integration of popular web navigation benchmarks (e.g., VisualWebArena) within a unified framework, (3) unified leaderboard reporting, and (4) improved handling of environment edge cases. We strongly recommend using this framework for your experiments. ## News diff --git a/WEBARENA_DEPLOYMENT_GUIDE.md b/WEBARENA_DEPLOYMENT_GUIDE.md index 57c3b22..895de0d 100644 --- a/WEBARENA_DEPLOYMENT_GUIDE.md +++ b/WEBARENA_DEPLOYMENT_GUIDE.md @@ -38,6 +38,8 @@ The WebArena deployment consists of two main components: 3. **User Data**: Copy the entire contents of `webarena-map-backend-boot-init.yaml` into the "User data" field during instance launch. 4. **Key Pair**: Select or create an SSH key pair for access. + - **Important**: Save the private key file (`.pem`) securely as you'll need it for both backend and frontend instances + - If using AWS CLI, you can create a key pair with: `aws ec2 create-key-pair --key-name webarena-key --query 'KeyMaterial' --output text > webarena-key.pem && chmod 600 webarena-key.pem` 5. **Launch the instance** and note the **Instance ID** and **Public IP**. @@ -99,6 +101,8 @@ curl "http://:5000/route/v1/driving/-79.9959,40.4406;-79.9,40.45?over ``` 3. **Key Pair**: Use the same SSH key pair as the backend server. + - **Critical**: Ensure you have access to the private key file from Step 1 + - If you don't have the key, you'll need to terminate and relaunch the instance with a new key pair 4. **Launch the instance** and note the **Instance ID** and **Public IP**. @@ -275,10 +279,32 @@ cat /home/ubuntu/openstreetmap-website/config/settings.yml | grep -A5 -B5 nomina - Consider using VPC and private subnets for backend services - Rotate any AWS credentials used during setup +## Resource Cleanup + +When you're done with testing, clean up AWS resources to avoid ongoing charges: + +```bash +# Get instance IDs +aws ec2 describe-instances --region us-east-2 --filters "Name=tag:Name,Values=webarena-*" --query 'Reservations[*].Instances[*].[InstanceId,Tags[?Key==`Name`].Value|[0],State.Name]' --output table + +# Terminate instances +aws ec2 terminate-instances --region us-east-2 --instance-ids + +# Release Elastic IP (optional, but saves costs) +aws ec2 describe-addresses --region us-east-2 --query 'Addresses[*].[AllocationId,PublicIp]' --output table +aws ec2 release-address --region us-east-2 --allocation-id + +# Delete security groups (optional) +aws ec2 delete-security-group --region us-east-2 --group-id + +# Delete key pair (optional) +aws ec2 delete-key-pair --region us-east-2 --key-name webarena-key +``` + ## Support If you encounter issues: 1. Check the troubleshooting section above 2. Review logs: `/var/log/webarena-map-bootstrap.log` on backend 3. Verify all configuration changes were applied correctly -4. Ensure both instances are in the same AWS region for optimal performance \ No newline at end of file +4. Ensure both instances are in the same AWS region for optimal performance diff --git a/environment_docker/README.md b/environment_docker/README.md index 5391bda..af51cee 100644 --- a/environment_docker/README.md +++ b/environment_docker/README.md @@ -199,7 +199,7 @@ Then run the tile server: docker run --volume=osm-data:/data/database/ --volume=osm-tiles:/data/tiles/ -p 8080:80 --detach=true overv/openstreetmap-tile-server run ``` -Now, inside the file `webarena/openstreetmap-website/vendor/assets/leaflet/leaflet.osm.js`, change `http://ogma.lti.cs.cmu.edu:8080/tile/{z}/{x}/{y}.png` to `http://:8080/tile/{z}/{x}/{y}.png` +Now, inside the file `webarena/openstreetmap-website/vendor/assets/leaflet/leaflet.osm.js`, change `http://ogma.lti.cs.cmu.edu:8080/tile/{z}/{x}/{y}.png` to `http://:8080/tile/{z}/{x}/{y}.png` > [!NOTE] > By default, the `url` in `TileLayer` and `Mapnik` is set to `"http://ogma.lti.cs.cmu.edu:8080/tile/{z}/{x}/{y}.png"`. You replace it with `https://tile.openstreetmap.org/{z}/{x}/{y}.png` (the official link) as a way to test in case you run into issues during the setup. diff --git a/webarena-map-backend-boot-init.yaml b/webarena-map-backend-boot-init.yaml index 6dbca17..c5391e9 100644 --- a/webarena-map-backend-boot-init.yaml +++ b/webarena-map-backend-boot-init.yaml @@ -41,25 +41,25 @@ runcmd: # Wait for package locks to be released - while fuser /var/lib/dpkg/lock-frontend >/dev/null 2>&1; do echo "Waiting for dpkg lock..."; sleep 5; done - while fuser /var/lib/apt/lists/lock >/dev/null 2>&1; do echo "Waiting for apt lock..."; sleep 5; done - + # Enable and start Docker with retries - systemctl enable docker - systemctl start docker - sleep 10 - + # Add ubuntu user to docker group - usermod -aG docker ubuntu - + # Create necessary directories - mkdir -p /opt/osm_dump /opt/osrm /var/lib/docker/volumes - mkdir -p /root/logs - + # Install AWS CLI v2 (awscli package not available in Ubuntu 24.04) - curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o /tmp/awscliv2.zip - unzip /tmp/awscliv2.zip -d /tmp/ - /tmp/aws/install - rm -rf /tmp/awscliv2.zip /tmp/aws - + # Configure AWS CLI for S3 access (no credentials needed for public buckets) - mkdir -p /root/.aws - | @@ -68,19 +68,19 @@ runcmd: region = us-east-2 output = json EOF - + # Create a comprehensive bootstrap script that runs in background - | cat > /root/bootstrap.sh << 'EOF' #!/bin/bash set -euo pipefail exec > >(tee -a /var/log/webarena-map-bootstrap.log) 2>&1 - + echo "$(date): Starting WebArena map server bootstrap" echo "$(date): System info: $(uname -a)" echo "$(date): Available memory: $(free -h)" echo "$(date): Available disk space: $(df -h)" - + # Check if we have enough disk space (need at least 200GB free) AVAILABLE_GB=$(df / | awk 'NR==2 {print int($4/1024/1024)}') echo "$(date): Available disk space: ${AVAILABLE_GB}GB" @@ -88,7 +88,7 @@ runcmd: echo "$(date): ERROR: Insufficient disk space. Need at least 200GB, have ${AVAILABLE_GB}GB" exit 1 fi - + # Function to retry commands with exponential backoff retry() { local n=1 @@ -108,7 +108,7 @@ runcmd: } done } - + # Function to monitor background processes monitor_extraction() { local pid=$1 @@ -127,79 +127,79 @@ runcmd: return $exit_code fi } - + # Download and extract data with retries and parallel processing where safe echo "$(date): Starting data downloads..." - + # Download all files first (can be done in parallel) echo "$(date): Downloading OSM tile server data..." retry aws s3 cp --no-sign-request s3://webarena-map-server-data/osm_tile_server.tar /root/osm_tile_server.tar & DOWNLOAD_TILE_PID=$! - + echo "$(date): Downloading Nominatim data..." retry aws s3 cp --no-sign-request s3://webarena-map-server-data/nominatim_volumes.tar /root/nominatim_volumes.tar & DOWNLOAD_NOM_PID=$! - + echo "$(date): Downloading OSM dump..." retry aws s3 cp --no-sign-request s3://webarena-map-server-data/osm_dump.tar /root/osm_dump.tar & DOWNLOAD_DUMP_PID=$! - + echo "$(date): Downloading OSRM routing data..." retry aws s3 cp --no-sign-request s3://webarena-map-server-data/osrm_routing.tar /root/osrm_routing.tar & DOWNLOAD_OSRM_PID=$! - + # Wait for all downloads to complete echo "$(date): Waiting for downloads to complete..." monitor_extraction $DOWNLOAD_TILE_PID "OSM tile server download" monitor_extraction $DOWNLOAD_NOM_PID "Nominatim download" monitor_extraction $DOWNLOAD_DUMP_PID "OSM dump download" monitor_extraction $DOWNLOAD_OSRM_PID "OSRM routing download" - + echo "$(date): All downloads completed. Starting extractions..." - + # Extract files sequentially to avoid memory issues and clean up immediately echo "$(date): Extracting OSM tile server data..." tar -C /var/lib/docker/volumes -xf /root/osm_tile_server.tar rm -f /root/osm_tile_server.tar # Clean up immediately to save space echo "$(date): ✅ OSM tile server data extracted and cleaned up" - + echo "$(date): Extracting Nominatim data..." tar -C /var/lib/docker/volumes -xf /root/nominatim_volumes.tar rm -f /root/nominatim_volumes.tar # Clean up immediately to save space echo "$(date): ✅ Nominatim data extracted and cleaned up" - + echo "$(date): Extracting OSM dump..." tar -C /opt/osm_dump -xf /root/osm_dump.tar rm -f /root/osm_dump.tar # Clean up immediately to save space echo "$(date): ✅ OSM dump extracted and cleaned up" - + echo "$(date): Extracting OSRM routing data..." tar -C /opt/osrm -xf /root/osrm_routing.tar rm -f /root/osrm_routing.tar # Clean up immediately to save space echo "$(date): ✅ OSRM routing data extracted and cleaned up" - + # Verify extracted data echo "$(date): Verifying extracted data..." ls -la /var/lib/docker/volumes/ | head -20 ls -la /opt/osm_dump/ | head -10 ls -la /opt/osrm/ | head -10 - + # Pull Docker images echo "$(date): Pulling Docker images..." docker pull overv/openstreetmap-tile-server docker pull mediagis/nominatim:4.2 docker pull ghcr.io/project-osrm/osrm-backend:v5.27.1 - + # Start containers with restart policies and proper resource limits echo "$(date): Starting tile server..." docker run --name tile --restart unless-stopped \ --memory=2g --memory-swap=4g \ --volume=osm-data:/data/database/ --volume=osm-tiles:/data/tiles/ \ -p 8080:80 -d overv/openstreetmap-tile-server run - + # Wait a bit for tile server to initialize sleep 30 - + echo "$(date): Starting Nominatim geocoding server..." docker run --name nominatim --restart unless-stopped \ --memory=4g --memory-swap=8g \ @@ -210,53 +210,53 @@ runcmd: --volume=nominatim-data:/var/lib/postgresql/14/main \ --volume=nominatim-flatnode:/nominatim/flatnode \ -p 8085:8080 -d mediagis/nominatim:4.2 /app/start.sh - + # Wait for Nominatim to initialize sleep 60 - + echo "$(date): Starting OSRM routing servers..." - + # Start OSRM car routing docker run --name osrm-car --restart unless-stopped \ --memory=1g --memory-swap=2g \ --volume=/opt/osrm/car:/data -p 5000:5000 -d \ ghcr.io/project-osrm/osrm-backend:v5.27.1 osrm-routed --algorithm mld /data/us-northeast-latest.osrm - + # Start OSRM bike routing docker run --name osrm-bike --restart unless-stopped \ --memory=1g --memory-swap=2g \ --volume=/opt/osrm/bike:/data -p 5001:5000 -d \ ghcr.io/project-osrm/osrm-backend:v5.27.1 osrm-routed --algorithm mld /data/us-northeast-latest.osrm - + # Start OSRM foot routing docker run --name osrm-foot --restart unless-stopped \ --memory=1g --memory-swap=2g \ --volume=/opt/osrm/foot:/data -p 5002:5000 -d \ ghcr.io/project-osrm/osrm-backend:v5.27.1 osrm-routed --algorithm mld /data/us-northeast-latest.osrm - + echo "$(date): All services started. Waiting for initialization..." sleep 120 - + echo "$(date): Verifying service health..." docker ps --format "table {{.Names}}\t{{.Image}}\t{{.Status}}\t{{.Ports}}" - + # Test service endpoints echo "$(date): Testing service endpoints..." - + # Test tile server if curl -f -s -o /dev/null "http://localhost:8080/tile/0/0/0.png"; then echo "$(date): ✅ Tile server is responding" else echo "$(date): ❌ Tile server is not responding" fi - + # Test Nominatim if curl -f -s -o /dev/null "http://localhost:8085/search?q=test&format=json&limit=1"; then echo "$(date): ✅ Nominatim is responding" else echo "$(date): ❌ Nominatim is not responding" fi - + # Test OSRM services for service in car bike foot; do port=$((5000 + $(echo "car bike foot" | tr ' ' '\n' | grep -n $service | cut -d: -f1) - 1)) @@ -266,9 +266,9 @@ runcmd: echo "$(date): ❌ OSRM $service routing is not responding" fi done - + # All tar files already cleaned up during extraction - + # Final status report echo "$(date): Bootstrap completed!" echo "$(date): Final service status:" @@ -277,17 +277,17 @@ runcmd: df -h echo "$(date): Memory usage:" free -h - + echo "$(date): Services are available at:" echo " - Tile server: http://$(curl -s http://169.254.169.254/latest/meta-data/public-ipv4):8080/tile/{z}/{x}/{y}.png" echo " - Geocoding: http://$(curl -s http://169.254.169.254/latest/meta-data/public-ipv4):8085/" echo " - OSRM Car: http://$(curl -s http://169.254.169.254/latest/meta-data/public-ipv4):5000/" echo " - OSRM Bike: http://$(curl -s http://169.254.169.254/latest/meta-data/public-ipv4):5001/" echo " - OSRM Foot: http://$(curl -s http://169.254.169.254/latest/meta-data/public-ipv4):5002/" - + echo "$(date): Bootstrap script completed successfully!" EOF - + # Make bootstrap script executable and run it in background - chmod +x /root/bootstrap.sh - nohup /root/bootstrap.sh > /var/log/webarena-map-bootstrap.log 2>&1 & @@ -308,4 +308,4 @@ final_message: | Services will be available at: - Tiles: http://:8080/tile/{z}/{x}/{y}.png - Geocoding: http://:8085/ - - Routing: http://:5000 (car), :5001 (bike), :5002 (foot) \ No newline at end of file + - Routing: http://:5000 (car), :5001 (bike), :5002 (foot)