mirror of
https://github.com/web-arena-x/webarena.git
synced 2026-02-06 03:06:47 +00:00
Improve deployment guide and fix formatting
- Add key pair management guidance based on deployment experience - Add resource cleanup section for cost management - Fix trailing whitespace and end-of-file formatting issues - All pre-commit checks now pass Co-authored-by: openhands <openhands@all-hands.dev>
This commit is contained in:
parent
3adbc3cf94
commit
79fc3d8303
@ -24,7 +24,7 @@
|
||||

|
||||
|
||||
## Update on 12/5/2024
|
||||
> [!IMPORTANT]
|
||||
> [!IMPORTANT]
|
||||
> This repository hosts the *canonical* implementation of WebArena to reproduce the results reported in the paper. The web navigation infrastructure has been significantly enhanced by [AgentLab](https://github.com/ServiceNow/AgentLab/), introducing several key features: (1) support for parallel experiments using [BrowserGym](https://github.com/ServiceNow/BrowserGym), (2) integration of popular web navigation benchmarks (e.g., VisualWebArena) within a unified framework, (3) unified leaderboard reporting, and (4) improved handling of environment edge cases. We strongly recommend using this framework for your experiments.
|
||||
|
||||
## News
|
||||
|
||||
@ -38,6 +38,8 @@ The WebArena deployment consists of two main components:
|
||||
3. **User Data**: Copy the entire contents of `webarena-map-backend-boot-init.yaml` into the "User data" field during instance launch.
|
||||
|
||||
4. **Key Pair**: Select or create an SSH key pair for access.
|
||||
- **Important**: Save the private key file (`.pem`) securely as you'll need it for both backend and frontend instances
|
||||
- If using AWS CLI, you can create a key pair with: `aws ec2 create-key-pair --key-name webarena-key --query 'KeyMaterial' --output text > webarena-key.pem && chmod 600 webarena-key.pem`
|
||||
|
||||
5. **Launch the instance** and note the **Instance ID** and **Public IP**.
|
||||
|
||||
@ -99,6 +101,8 @@ curl "http://<PUBLIC_IP>:5000/route/v1/driving/-79.9959,40.4406;-79.9,40.45?over
|
||||
```
|
||||
|
||||
3. **Key Pair**: Use the same SSH key pair as the backend server.
|
||||
- **Critical**: Ensure you have access to the private key file from Step 1
|
||||
- If you don't have the key, you'll need to terminate and relaunch the instance with a new key pair
|
||||
|
||||
4. **Launch the instance** and note the **Instance ID** and **Public IP**.
|
||||
|
||||
@ -275,10 +279,32 @@ cat /home/ubuntu/openstreetmap-website/config/settings.yml | grep -A5 -B5 nomina
|
||||
- Consider using VPC and private subnets for backend services
|
||||
- Rotate any AWS credentials used during setup
|
||||
|
||||
## Resource Cleanup
|
||||
|
||||
When you're done with testing, clean up AWS resources to avoid ongoing charges:
|
||||
|
||||
```bash
|
||||
# Get instance IDs
|
||||
aws ec2 describe-instances --region us-east-2 --filters "Name=tag:Name,Values=webarena-*" --query 'Reservations[*].Instances[*].[InstanceId,Tags[?Key==`Name`].Value|[0],State.Name]' --output table
|
||||
|
||||
# Terminate instances
|
||||
aws ec2 terminate-instances --region us-east-2 --instance-ids <FRONTEND_INSTANCE_ID> <BACKEND_INSTANCE_ID>
|
||||
|
||||
# Release Elastic IP (optional, but saves costs)
|
||||
aws ec2 describe-addresses --region us-east-2 --query 'Addresses[*].[AllocationId,PublicIp]' --output table
|
||||
aws ec2 release-address --region us-east-2 --allocation-id <ALLOCATION_ID>
|
||||
|
||||
# Delete security groups (optional)
|
||||
aws ec2 delete-security-group --region us-east-2 --group-id <SECURITY_GROUP_ID>
|
||||
|
||||
# Delete key pair (optional)
|
||||
aws ec2 delete-key-pair --region us-east-2 --key-name webarena-key
|
||||
```
|
||||
|
||||
## Support
|
||||
|
||||
If you encounter issues:
|
||||
1. Check the troubleshooting section above
|
||||
2. Review logs: `/var/log/webarena-map-bootstrap.log` on backend
|
||||
3. Verify all configuration changes were applied correctly
|
||||
4. Ensure both instances are in the same AWS region for optimal performance
|
||||
4. Ensure both instances are in the same AWS region for optimal performance
|
||||
|
||||
@ -199,7 +199,7 @@ Then run the tile server:
|
||||
docker run --volume=osm-data:/data/database/ --volume=osm-tiles:/data/tiles/ -p 8080:80 --detach=true overv/openstreetmap-tile-server run
|
||||
```
|
||||
|
||||
Now, inside the file `webarena/openstreetmap-website/vendor/assets/leaflet/leaflet.osm.js`, change `http://ogma.lti.cs.cmu.edu:8080/tile/{z}/{x}/{y}.png` to `http://<public-url-to-your-tile-server>:8080/tile/{z}/{x}/{y}.png`
|
||||
Now, inside the file `webarena/openstreetmap-website/vendor/assets/leaflet/leaflet.osm.js`, change `http://ogma.lti.cs.cmu.edu:8080/tile/{z}/{x}/{y}.png` to `http://<public-url-to-your-tile-server>:8080/tile/{z}/{x}/{y}.png`
|
||||
|
||||
> [!NOTE]
|
||||
> By default, the `url` in `TileLayer` and `Mapnik` is set to `"http://ogma.lti.cs.cmu.edu:8080/tile/{z}/{x}/{y}.png"`. You replace it with `https://tile.openstreetmap.org/{z}/{x}/{y}.png` (the official link) as a way to test in case you run into issues during the setup.
|
||||
|
||||
@ -41,25 +41,25 @@ runcmd:
|
||||
# Wait for package locks to be released
|
||||
- while fuser /var/lib/dpkg/lock-frontend >/dev/null 2>&1; do echo "Waiting for dpkg lock..."; sleep 5; done
|
||||
- while fuser /var/lib/apt/lists/lock >/dev/null 2>&1; do echo "Waiting for apt lock..."; sleep 5; done
|
||||
|
||||
|
||||
# Enable and start Docker with retries
|
||||
- systemctl enable docker
|
||||
- systemctl start docker
|
||||
- sleep 10
|
||||
|
||||
|
||||
# Add ubuntu user to docker group
|
||||
- usermod -aG docker ubuntu
|
||||
|
||||
|
||||
# Create necessary directories
|
||||
- mkdir -p /opt/osm_dump /opt/osrm /var/lib/docker/volumes
|
||||
- mkdir -p /root/logs
|
||||
|
||||
|
||||
# Install AWS CLI v2 (awscli package not available in Ubuntu 24.04)
|
||||
- curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o /tmp/awscliv2.zip
|
||||
- unzip /tmp/awscliv2.zip -d /tmp/
|
||||
- /tmp/aws/install
|
||||
- rm -rf /tmp/awscliv2.zip /tmp/aws
|
||||
|
||||
|
||||
# Configure AWS CLI for S3 access (no credentials needed for public buckets)
|
||||
- mkdir -p /root/.aws
|
||||
- |
|
||||
@ -68,19 +68,19 @@ runcmd:
|
||||
region = us-east-2
|
||||
output = json
|
||||
EOF
|
||||
|
||||
|
||||
# Create a comprehensive bootstrap script that runs in background
|
||||
- |
|
||||
cat > /root/bootstrap.sh << 'EOF'
|
||||
#!/bin/bash
|
||||
set -euo pipefail
|
||||
exec > >(tee -a /var/log/webarena-map-bootstrap.log) 2>&1
|
||||
|
||||
|
||||
echo "$(date): Starting WebArena map server bootstrap"
|
||||
echo "$(date): System info: $(uname -a)"
|
||||
echo "$(date): Available memory: $(free -h)"
|
||||
echo "$(date): Available disk space: $(df -h)"
|
||||
|
||||
|
||||
# Check if we have enough disk space (need at least 200GB free)
|
||||
AVAILABLE_GB=$(df / | awk 'NR==2 {print int($4/1024/1024)}')
|
||||
echo "$(date): Available disk space: ${AVAILABLE_GB}GB"
|
||||
@ -88,7 +88,7 @@ runcmd:
|
||||
echo "$(date): ERROR: Insufficient disk space. Need at least 200GB, have ${AVAILABLE_GB}GB"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
|
||||
# Function to retry commands with exponential backoff
|
||||
retry() {
|
||||
local n=1
|
||||
@ -108,7 +108,7 @@ runcmd:
|
||||
}
|
||||
done
|
||||
}
|
||||
|
||||
|
||||
# Function to monitor background processes
|
||||
monitor_extraction() {
|
||||
local pid=$1
|
||||
@ -127,79 +127,79 @@ runcmd:
|
||||
return $exit_code
|
||||
fi
|
||||
}
|
||||
|
||||
|
||||
# Download and extract data with retries and parallel processing where safe
|
||||
echo "$(date): Starting data downloads..."
|
||||
|
||||
|
||||
# Download all files first (can be done in parallel)
|
||||
echo "$(date): Downloading OSM tile server data..."
|
||||
retry aws s3 cp --no-sign-request s3://webarena-map-server-data/osm_tile_server.tar /root/osm_tile_server.tar &
|
||||
DOWNLOAD_TILE_PID=$!
|
||||
|
||||
|
||||
echo "$(date): Downloading Nominatim data..."
|
||||
retry aws s3 cp --no-sign-request s3://webarena-map-server-data/nominatim_volumes.tar /root/nominatim_volumes.tar &
|
||||
DOWNLOAD_NOM_PID=$!
|
||||
|
||||
|
||||
echo "$(date): Downloading OSM dump..."
|
||||
retry aws s3 cp --no-sign-request s3://webarena-map-server-data/osm_dump.tar /root/osm_dump.tar &
|
||||
DOWNLOAD_DUMP_PID=$!
|
||||
|
||||
|
||||
echo "$(date): Downloading OSRM routing data..."
|
||||
retry aws s3 cp --no-sign-request s3://webarena-map-server-data/osrm_routing.tar /root/osrm_routing.tar &
|
||||
DOWNLOAD_OSRM_PID=$!
|
||||
|
||||
|
||||
# Wait for all downloads to complete
|
||||
echo "$(date): Waiting for downloads to complete..."
|
||||
monitor_extraction $DOWNLOAD_TILE_PID "OSM tile server download"
|
||||
monitor_extraction $DOWNLOAD_NOM_PID "Nominatim download"
|
||||
monitor_extraction $DOWNLOAD_DUMP_PID "OSM dump download"
|
||||
monitor_extraction $DOWNLOAD_OSRM_PID "OSRM routing download"
|
||||
|
||||
|
||||
echo "$(date): All downloads completed. Starting extractions..."
|
||||
|
||||
|
||||
# Extract files sequentially to avoid memory issues and clean up immediately
|
||||
echo "$(date): Extracting OSM tile server data..."
|
||||
tar -C /var/lib/docker/volumes -xf /root/osm_tile_server.tar
|
||||
rm -f /root/osm_tile_server.tar # Clean up immediately to save space
|
||||
echo "$(date): ✅ OSM tile server data extracted and cleaned up"
|
||||
|
||||
|
||||
echo "$(date): Extracting Nominatim data..."
|
||||
tar -C /var/lib/docker/volumes -xf /root/nominatim_volumes.tar
|
||||
rm -f /root/nominatim_volumes.tar # Clean up immediately to save space
|
||||
echo "$(date): ✅ Nominatim data extracted and cleaned up"
|
||||
|
||||
|
||||
echo "$(date): Extracting OSM dump..."
|
||||
tar -C /opt/osm_dump -xf /root/osm_dump.tar
|
||||
rm -f /root/osm_dump.tar # Clean up immediately to save space
|
||||
echo "$(date): ✅ OSM dump extracted and cleaned up"
|
||||
|
||||
|
||||
echo "$(date): Extracting OSRM routing data..."
|
||||
tar -C /opt/osrm -xf /root/osrm_routing.tar
|
||||
rm -f /root/osrm_routing.tar # Clean up immediately to save space
|
||||
echo "$(date): ✅ OSRM routing data extracted and cleaned up"
|
||||
|
||||
|
||||
# Verify extracted data
|
||||
echo "$(date): Verifying extracted data..."
|
||||
ls -la /var/lib/docker/volumes/ | head -20
|
||||
ls -la /opt/osm_dump/ | head -10
|
||||
ls -la /opt/osrm/ | head -10
|
||||
|
||||
|
||||
# Pull Docker images
|
||||
echo "$(date): Pulling Docker images..."
|
||||
docker pull overv/openstreetmap-tile-server
|
||||
docker pull mediagis/nominatim:4.2
|
||||
docker pull ghcr.io/project-osrm/osrm-backend:v5.27.1
|
||||
|
||||
|
||||
# Start containers with restart policies and proper resource limits
|
||||
echo "$(date): Starting tile server..."
|
||||
docker run --name tile --restart unless-stopped \
|
||||
--memory=2g --memory-swap=4g \
|
||||
--volume=osm-data:/data/database/ --volume=osm-tiles:/data/tiles/ \
|
||||
-p 8080:80 -d overv/openstreetmap-tile-server run
|
||||
|
||||
|
||||
# Wait a bit for tile server to initialize
|
||||
sleep 30
|
||||
|
||||
|
||||
echo "$(date): Starting Nominatim geocoding server..."
|
||||
docker run --name nominatim --restart unless-stopped \
|
||||
--memory=4g --memory-swap=8g \
|
||||
@ -210,53 +210,53 @@ runcmd:
|
||||
--volume=nominatim-data:/var/lib/postgresql/14/main \
|
||||
--volume=nominatim-flatnode:/nominatim/flatnode \
|
||||
-p 8085:8080 -d mediagis/nominatim:4.2 /app/start.sh
|
||||
|
||||
|
||||
# Wait for Nominatim to initialize
|
||||
sleep 60
|
||||
|
||||
|
||||
echo "$(date): Starting OSRM routing servers..."
|
||||
|
||||
|
||||
# Start OSRM car routing
|
||||
docker run --name osrm-car --restart unless-stopped \
|
||||
--memory=1g --memory-swap=2g \
|
||||
--volume=/opt/osrm/car:/data -p 5000:5000 -d \
|
||||
ghcr.io/project-osrm/osrm-backend:v5.27.1 osrm-routed --algorithm mld /data/us-northeast-latest.osrm
|
||||
|
||||
|
||||
# Start OSRM bike routing
|
||||
docker run --name osrm-bike --restart unless-stopped \
|
||||
--memory=1g --memory-swap=2g \
|
||||
--volume=/opt/osrm/bike:/data -p 5001:5000 -d \
|
||||
ghcr.io/project-osrm/osrm-backend:v5.27.1 osrm-routed --algorithm mld /data/us-northeast-latest.osrm
|
||||
|
||||
|
||||
# Start OSRM foot routing
|
||||
docker run --name osrm-foot --restart unless-stopped \
|
||||
--memory=1g --memory-swap=2g \
|
||||
--volume=/opt/osrm/foot:/data -p 5002:5000 -d \
|
||||
ghcr.io/project-osrm/osrm-backend:v5.27.1 osrm-routed --algorithm mld /data/us-northeast-latest.osrm
|
||||
|
||||
|
||||
echo "$(date): All services started. Waiting for initialization..."
|
||||
sleep 120
|
||||
|
||||
|
||||
echo "$(date): Verifying service health..."
|
||||
docker ps --format "table {{.Names}}\t{{.Image}}\t{{.Status}}\t{{.Ports}}"
|
||||
|
||||
|
||||
# Test service endpoints
|
||||
echo "$(date): Testing service endpoints..."
|
||||
|
||||
|
||||
# Test tile server
|
||||
if curl -f -s -o /dev/null "http://localhost:8080/tile/0/0/0.png"; then
|
||||
echo "$(date): ✅ Tile server is responding"
|
||||
else
|
||||
echo "$(date): ❌ Tile server is not responding"
|
||||
fi
|
||||
|
||||
|
||||
# Test Nominatim
|
||||
if curl -f -s -o /dev/null "http://localhost:8085/search?q=test&format=json&limit=1"; then
|
||||
echo "$(date): ✅ Nominatim is responding"
|
||||
else
|
||||
echo "$(date): ❌ Nominatim is not responding"
|
||||
fi
|
||||
|
||||
|
||||
# Test OSRM services
|
||||
for service in car bike foot; do
|
||||
port=$((5000 + $(echo "car bike foot" | tr ' ' '\n' | grep -n $service | cut -d: -f1) - 1))
|
||||
@ -266,9 +266,9 @@ runcmd:
|
||||
echo "$(date): ❌ OSRM $service routing is not responding"
|
||||
fi
|
||||
done
|
||||
|
||||
|
||||
# All tar files already cleaned up during extraction
|
||||
|
||||
|
||||
# Final status report
|
||||
echo "$(date): Bootstrap completed!"
|
||||
echo "$(date): Final service status:"
|
||||
@ -277,17 +277,17 @@ runcmd:
|
||||
df -h
|
||||
echo "$(date): Memory usage:"
|
||||
free -h
|
||||
|
||||
|
||||
echo "$(date): Services are available at:"
|
||||
echo " - Tile server: http://$(curl -s http://169.254.169.254/latest/meta-data/public-ipv4):8080/tile/{z}/{x}/{y}.png"
|
||||
echo " - Geocoding: http://$(curl -s http://169.254.169.254/latest/meta-data/public-ipv4):8085/"
|
||||
echo " - OSRM Car: http://$(curl -s http://169.254.169.254/latest/meta-data/public-ipv4):5000/"
|
||||
echo " - OSRM Bike: http://$(curl -s http://169.254.169.254/latest/meta-data/public-ipv4):5001/"
|
||||
echo " - OSRM Foot: http://$(curl -s http://169.254.169.254/latest/meta-data/public-ipv4):5002/"
|
||||
|
||||
|
||||
echo "$(date): Bootstrap script completed successfully!"
|
||||
EOF
|
||||
|
||||
|
||||
# Make bootstrap script executable and run it in background
|
||||
- chmod +x /root/bootstrap.sh
|
||||
- nohup /root/bootstrap.sh > /var/log/webarena-map-bootstrap.log 2>&1 &
|
||||
@ -308,4 +308,4 @@ final_message: |
|
||||
Services will be available at:
|
||||
- Tiles: http://<instance-ip>:8080/tile/{z}/{x}/{y}.png
|
||||
- Geocoding: http://<instance-ip>:8085/
|
||||
- Routing: http://<instance-ip>:5000 (car), :5001 (bike), :5002 (foot)
|
||||
- Routing: http://<instance-ip>:5000 (car), :5001 (bike), :5002 (foot)
|
||||
|
||||
Loading…
Reference in New Issue
Block a user