Improve deployment guide and fix formatting

- Add key pair management guidance based on deployment experience
- Add resource cleanup section for cost management
- Fix trailing whitespace and end-of-file formatting issues
- All pre-commit checks now pass

Co-authored-by: openhands <openhands@all-hands.dev>
This commit is contained in:
openhands 2025-09-02 12:40:18 +00:00
parent 3adbc3cf94
commit 79fc3d8303
4 changed files with 72 additions and 46 deletions

View File

@ -24,7 +24,7 @@
![Overview](media/overview.png)
## Update on 12/5/2024
> [!IMPORTANT]
> [!IMPORTANT]
> This repository hosts the *canonical* implementation of WebArena to reproduce the results reported in the paper. The web navigation infrastructure has been significantly enhanced by [AgentLab](https://github.com/ServiceNow/AgentLab/), introducing several key features: (1) support for parallel experiments using [BrowserGym](https://github.com/ServiceNow/BrowserGym), (2) integration of popular web navigation benchmarks (e.g., VisualWebArena) within a unified framework, (3) unified leaderboard reporting, and (4) improved handling of environment edge cases. We strongly recommend using this framework for your experiments.
## News

View File

@ -38,6 +38,8 @@ The WebArena deployment consists of two main components:
3. **User Data**: Copy the entire contents of `webarena-map-backend-boot-init.yaml` into the "User data" field during instance launch.
4. **Key Pair**: Select or create an SSH key pair for access.
- **Important**: Save the private key file (`.pem`) securely as you'll need it for both backend and frontend instances
- If using AWS CLI, you can create a key pair with: `aws ec2 create-key-pair --key-name webarena-key --query 'KeyMaterial' --output text > webarena-key.pem && chmod 600 webarena-key.pem`
5. **Launch the instance** and note the **Instance ID** and **Public IP**.
@ -99,6 +101,8 @@ curl "http://<PUBLIC_IP>:5000/route/v1/driving/-79.9959,40.4406;-79.9,40.45?over
```
3. **Key Pair**: Use the same SSH key pair as the backend server.
- **Critical**: Ensure you have access to the private key file from Step 1
- If you don't have the key, you'll need to terminate and relaunch the instance with a new key pair
4. **Launch the instance** and note the **Instance ID** and **Public IP**.
@ -275,10 +279,32 @@ cat /home/ubuntu/openstreetmap-website/config/settings.yml | grep -A5 -B5 nomina
- Consider using VPC and private subnets for backend services
- Rotate any AWS credentials used during setup
## Resource Cleanup
When you're done with testing, clean up AWS resources to avoid ongoing charges:
```bash
# Get instance IDs
aws ec2 describe-instances --region us-east-2 --filters "Name=tag:Name,Values=webarena-*" --query 'Reservations[*].Instances[*].[InstanceId,Tags[?Key==`Name`].Value|[0],State.Name]' --output table
# Terminate instances
aws ec2 terminate-instances --region us-east-2 --instance-ids <FRONTEND_INSTANCE_ID> <BACKEND_INSTANCE_ID>
# Release Elastic IP (optional, but saves costs)
aws ec2 describe-addresses --region us-east-2 --query 'Addresses[*].[AllocationId,PublicIp]' --output table
aws ec2 release-address --region us-east-2 --allocation-id <ALLOCATION_ID>
# Delete security groups (optional)
aws ec2 delete-security-group --region us-east-2 --group-id <SECURITY_GROUP_ID>
# Delete key pair (optional)
aws ec2 delete-key-pair --region us-east-2 --key-name webarena-key
```
## Support
If you encounter issues:
1. Check the troubleshooting section above
2. Review logs: `/var/log/webarena-map-bootstrap.log` on backend
3. Verify all configuration changes were applied correctly
4. Ensure both instances are in the same AWS region for optimal performance
4. Ensure both instances are in the same AWS region for optimal performance

View File

@ -199,7 +199,7 @@ Then run the tile server:
docker run --volume=osm-data:/data/database/ --volume=osm-tiles:/data/tiles/ -p 8080:80 --detach=true overv/openstreetmap-tile-server run
```
Now, inside the file `webarena/openstreetmap-website/vendor/assets/leaflet/leaflet.osm.js`, change `http://ogma.lti.cs.cmu.edu:8080/tile/{z}/{x}/{y}.png` to `http://<public-url-to-your-tile-server>:8080/tile/{z}/{x}/{y}.png`
Now, inside the file `webarena/openstreetmap-website/vendor/assets/leaflet/leaflet.osm.js`, change `http://ogma.lti.cs.cmu.edu:8080/tile/{z}/{x}/{y}.png` to `http://<public-url-to-your-tile-server>:8080/tile/{z}/{x}/{y}.png`
> [!NOTE]
> By default, the `url` in `TileLayer` and `Mapnik` is set to `"http://ogma.lti.cs.cmu.edu:8080/tile/{z}/{x}/{y}.png"`. You replace it with `https://tile.openstreetmap.org/{z}/{x}/{y}.png` (the official link) as a way to test in case you run into issues during the setup.

View File

@ -41,25 +41,25 @@ runcmd:
# Wait for package locks to be released
- while fuser /var/lib/dpkg/lock-frontend >/dev/null 2>&1; do echo "Waiting for dpkg lock..."; sleep 5; done
- while fuser /var/lib/apt/lists/lock >/dev/null 2>&1; do echo "Waiting for apt lock..."; sleep 5; done
# Enable and start Docker with retries
- systemctl enable docker
- systemctl start docker
- sleep 10
# Add ubuntu user to docker group
- usermod -aG docker ubuntu
# Create necessary directories
- mkdir -p /opt/osm_dump /opt/osrm /var/lib/docker/volumes
- mkdir -p /root/logs
# Install AWS CLI v2 (awscli package not available in Ubuntu 24.04)
- curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o /tmp/awscliv2.zip
- unzip /tmp/awscliv2.zip -d /tmp/
- /tmp/aws/install
- rm -rf /tmp/awscliv2.zip /tmp/aws
# Configure AWS CLI for S3 access (no credentials needed for public buckets)
- mkdir -p /root/.aws
- |
@ -68,19 +68,19 @@ runcmd:
region = us-east-2
output = json
EOF
# Create a comprehensive bootstrap script that runs in background
- |
cat > /root/bootstrap.sh << 'EOF'
#!/bin/bash
set -euo pipefail
exec > >(tee -a /var/log/webarena-map-bootstrap.log) 2>&1
echo "$(date): Starting WebArena map server bootstrap"
echo "$(date): System info: $(uname -a)"
echo "$(date): Available memory: $(free -h)"
echo "$(date): Available disk space: $(df -h)"
# Check if we have enough disk space (need at least 200GB free)
AVAILABLE_GB=$(df / | awk 'NR==2 {print int($4/1024/1024)}')
echo "$(date): Available disk space: ${AVAILABLE_GB}GB"
@ -88,7 +88,7 @@ runcmd:
echo "$(date): ERROR: Insufficient disk space. Need at least 200GB, have ${AVAILABLE_GB}GB"
exit 1
fi
# Function to retry commands with exponential backoff
retry() {
local n=1
@ -108,7 +108,7 @@ runcmd:
}
done
}
# Function to monitor background processes
monitor_extraction() {
local pid=$1
@ -127,79 +127,79 @@ runcmd:
return $exit_code
fi
}
# Download and extract data with retries and parallel processing where safe
echo "$(date): Starting data downloads..."
# Download all files first (can be done in parallel)
echo "$(date): Downloading OSM tile server data..."
retry aws s3 cp --no-sign-request s3://webarena-map-server-data/osm_tile_server.tar /root/osm_tile_server.tar &
DOWNLOAD_TILE_PID=$!
echo "$(date): Downloading Nominatim data..."
retry aws s3 cp --no-sign-request s3://webarena-map-server-data/nominatim_volumes.tar /root/nominatim_volumes.tar &
DOWNLOAD_NOM_PID=$!
echo "$(date): Downloading OSM dump..."
retry aws s3 cp --no-sign-request s3://webarena-map-server-data/osm_dump.tar /root/osm_dump.tar &
DOWNLOAD_DUMP_PID=$!
echo "$(date): Downloading OSRM routing data..."
retry aws s3 cp --no-sign-request s3://webarena-map-server-data/osrm_routing.tar /root/osrm_routing.tar &
DOWNLOAD_OSRM_PID=$!
# Wait for all downloads to complete
echo "$(date): Waiting for downloads to complete..."
monitor_extraction $DOWNLOAD_TILE_PID "OSM tile server download"
monitor_extraction $DOWNLOAD_NOM_PID "Nominatim download"
monitor_extraction $DOWNLOAD_DUMP_PID "OSM dump download"
monitor_extraction $DOWNLOAD_OSRM_PID "OSRM routing download"
echo "$(date): All downloads completed. Starting extractions..."
# Extract files sequentially to avoid memory issues and clean up immediately
echo "$(date): Extracting OSM tile server data..."
tar -C /var/lib/docker/volumes -xf /root/osm_tile_server.tar
rm -f /root/osm_tile_server.tar # Clean up immediately to save space
echo "$(date): ✅ OSM tile server data extracted and cleaned up"
echo "$(date): Extracting Nominatim data..."
tar -C /var/lib/docker/volumes -xf /root/nominatim_volumes.tar
rm -f /root/nominatim_volumes.tar # Clean up immediately to save space
echo "$(date): ✅ Nominatim data extracted and cleaned up"
echo "$(date): Extracting OSM dump..."
tar -C /opt/osm_dump -xf /root/osm_dump.tar
rm -f /root/osm_dump.tar # Clean up immediately to save space
echo "$(date): ✅ OSM dump extracted and cleaned up"
echo "$(date): Extracting OSRM routing data..."
tar -C /opt/osrm -xf /root/osrm_routing.tar
rm -f /root/osrm_routing.tar # Clean up immediately to save space
echo "$(date): ✅ OSRM routing data extracted and cleaned up"
# Verify extracted data
echo "$(date): Verifying extracted data..."
ls -la /var/lib/docker/volumes/ | head -20
ls -la /opt/osm_dump/ | head -10
ls -la /opt/osrm/ | head -10
# Pull Docker images
echo "$(date): Pulling Docker images..."
docker pull overv/openstreetmap-tile-server
docker pull mediagis/nominatim:4.2
docker pull ghcr.io/project-osrm/osrm-backend:v5.27.1
# Start containers with restart policies and proper resource limits
echo "$(date): Starting tile server..."
docker run --name tile --restart unless-stopped \
--memory=2g --memory-swap=4g \
--volume=osm-data:/data/database/ --volume=osm-tiles:/data/tiles/ \
-p 8080:80 -d overv/openstreetmap-tile-server run
# Wait a bit for tile server to initialize
sleep 30
echo "$(date): Starting Nominatim geocoding server..."
docker run --name nominatim --restart unless-stopped \
--memory=4g --memory-swap=8g \
@ -210,53 +210,53 @@ runcmd:
--volume=nominatim-data:/var/lib/postgresql/14/main \
--volume=nominatim-flatnode:/nominatim/flatnode \
-p 8085:8080 -d mediagis/nominatim:4.2 /app/start.sh
# Wait for Nominatim to initialize
sleep 60
echo "$(date): Starting OSRM routing servers..."
# Start OSRM car routing
docker run --name osrm-car --restart unless-stopped \
--memory=1g --memory-swap=2g \
--volume=/opt/osrm/car:/data -p 5000:5000 -d \
ghcr.io/project-osrm/osrm-backend:v5.27.1 osrm-routed --algorithm mld /data/us-northeast-latest.osrm
# Start OSRM bike routing
docker run --name osrm-bike --restart unless-stopped \
--memory=1g --memory-swap=2g \
--volume=/opt/osrm/bike:/data -p 5001:5000 -d \
ghcr.io/project-osrm/osrm-backend:v5.27.1 osrm-routed --algorithm mld /data/us-northeast-latest.osrm
# Start OSRM foot routing
docker run --name osrm-foot --restart unless-stopped \
--memory=1g --memory-swap=2g \
--volume=/opt/osrm/foot:/data -p 5002:5000 -d \
ghcr.io/project-osrm/osrm-backend:v5.27.1 osrm-routed --algorithm mld /data/us-northeast-latest.osrm
echo "$(date): All services started. Waiting for initialization..."
sleep 120
echo "$(date): Verifying service health..."
docker ps --format "table {{.Names}}\t{{.Image}}\t{{.Status}}\t{{.Ports}}"
# Test service endpoints
echo "$(date): Testing service endpoints..."
# Test tile server
if curl -f -s -o /dev/null "http://localhost:8080/tile/0/0/0.png"; then
echo "$(date): ✅ Tile server is responding"
else
echo "$(date): ❌ Tile server is not responding"
fi
# Test Nominatim
if curl -f -s -o /dev/null "http://localhost:8085/search?q=test&format=json&limit=1"; then
echo "$(date): ✅ Nominatim is responding"
else
echo "$(date): ❌ Nominatim is not responding"
fi
# Test OSRM services
for service in car bike foot; do
port=$((5000 + $(echo "car bike foot" | tr ' ' '\n' | grep -n $service | cut -d: -f1) - 1))
@ -266,9 +266,9 @@ runcmd:
echo "$(date): ❌ OSRM $service routing is not responding"
fi
done
# All tar files already cleaned up during extraction
# Final status report
echo "$(date): Bootstrap completed!"
echo "$(date): Final service status:"
@ -277,17 +277,17 @@ runcmd:
df -h
echo "$(date): Memory usage:"
free -h
echo "$(date): Services are available at:"
echo " - Tile server: http://$(curl -s http://169.254.169.254/latest/meta-data/public-ipv4):8080/tile/{z}/{x}/{y}.png"
echo " - Geocoding: http://$(curl -s http://169.254.169.254/latest/meta-data/public-ipv4):8085/"
echo " - OSRM Car: http://$(curl -s http://169.254.169.254/latest/meta-data/public-ipv4):5000/"
echo " - OSRM Bike: http://$(curl -s http://169.254.169.254/latest/meta-data/public-ipv4):5001/"
echo " - OSRM Foot: http://$(curl -s http://169.254.169.254/latest/meta-data/public-ipv4):5002/"
echo "$(date): Bootstrap script completed successfully!"
EOF
# Make bootstrap script executable and run it in background
- chmod +x /root/bootstrap.sh
- nohup /root/bootstrap.sh > /var/log/webarena-map-bootstrap.log 2>&1 &
@ -308,4 +308,4 @@ final_message: |
Services will be available at:
- Tiles: http://<instance-ip>:8080/tile/{z}/{x}/{y}.png
- Geocoding: http://<instance-ip>:8085/
- Routing: http://<instance-ip>:5000 (car), :5001 (bike), :5002 (foot)
- Routing: http://<instance-ip>:5000 (car), :5001 (bike), :5002 (foot)