detect OOM reaper action on p4-fusion (#51284)

Address #42385 and other issues that crop up by wrapping `p4-fusion`
calls with a shell script that detects when `p4-fusion` is killed,
gathers resource usage stats about the killed process, outputs info
about the death along with the resource usage stats, and exits with a
non-zero return code, so that the error is picked up by the UI and shown
in the repo list (see [attached
video](https://www.loom.com/share/b9ae6abb14dd4f3b9a6670708b22c8d0))

To reduce friction, the shell script is named `p4-fusion` and the actual
`p4-fusion` binary executable is renamed `p4-fusion-binary`. If we want
to use different names, we will also need to modify the gitserver code
that runs the `p4-fusion` commands.

## Test plan

Build a gitserver Docker image, run a `p4-fusion` command in it, kill
the `p4-fusion-binary` process, see that the output ends with info about
the killed process, and see that the return code is non-zero.

Build a gitserver Docker image:
```
VERSION=dev IMAGE=sourcegraph/gitserver ./cmd/gitserver/build.sh
```

Start a gitserver instance:
```
HOSTNAME=127.0.0.1:3178
GITSERVER_EXTERNAL_ADDR=127.0.0.1:3503
GITSERVER_ADDR=127.0.0.1:3503
SRC_REPOS_DIR=$HOME/.sourcegraph/repos_3
SRC_PROF_HTTP=127.0.0.1:3553
GITSERVER_INDEX=3
docker run \
--rm \
-e "GITSERVER_EXTERNAL_ADDR=${GITSERVER_EXTERNAL_ADDR}" \
-e "GITSERVER_ADDR=0.0.0.0:${HOSTNAME##*:}" \
-e "SRC_FRONTEND_INTERNAL=host.docker.internal:${SRC_FRONTEND_INTERNAL##*:}" \
-e "SRC_PROF_HTTP=0.0.0.0:${SRC_PROF_HTTP##*:}" \
-e "HOSTNAME=${HOSTNAME}" \
-p ${GITSERVER_ADDR}:${HOSTNAME##*:} \
-p ${SRC_PROF_HTTP}:${SRC_PROF_HTTP##*:} \
-v ${SRC_REPOS_DIR}:/data/repos \
--detach \
--name gitserver-${GITSERVER_INDEX} \
sourcegraph/gitserver
```

Connect two terminals to it:

```
docker exec -it gitserver-3 bash
```

In one terminal, run this command (get the admin password from
1Password):

```
P4PORT=perforce.sgdev.org:1666
P4USER=admin
export P4PORT P4USER
p4 login -a <<<"REDACTED PASSWORD"
p4-fusion \
--path //go/... \
--client "" \
--user "${P4USER}" \
--src /data/repos/go/.git \
--networkThreads 64 \
--printBatch 10 \
--port "${P4PORT}" \
--lookAhead 2000 \
--retries 10 \
--refresh 100000 \
--maxChanges 4000 \
--includeBinaries false \
--fsyncEnable true \
--noColor true
```

In the other terminal, use `pkill -9 p4-fusion-binar` to end it. That's
not a typo: Alpine's process table stores only the first 15 characters
of the command. You could instead use `pkill -9 -f p4-fusion-binary`,
but that matches against the entire command line so it's more dangerous.

In the first terminal:

- see the output end with something like:

> p4-fusion was killed by an external signal. At the time of its demise,
it had been running for 00:20, had used 0:01.00 CPU time, reserved
390.95m RAM and was using .14m.

- type`$?` and hit Enter to see the return code of the `p4-fusion`
command. It should be 137.

Success!

---------

Co-authored-by: Indradhanush Gupta <indradhanush.gupta@gmail.com>
This commit is contained in:
Peter Guy 2023-05-17 07:26:53 -07:00 committed by GitHub
parent fad5c37d04
commit 018beee3ee
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
10 changed files with 341 additions and 39 deletions

View File

@ -24,6 +24,7 @@ All notable changes to Sourcegraph are documented in this file.
- Permissions center statistics pane is added. Stats include numbers of queued jobs, users/repos with failed jobs, no permissions, and outdated permissions. [#50535](https://github.com/sourcegraph/sourcegraph/pull/50535)
- SCIM user provisioning support for Deactivate/Reactivation of users. [#50533](https://github.com/sourcegraph/sourcegraph/pull/50533)
- Login form can now be configured with ordering and limit of auth providers. [See docs](https://docs.sourcegraph.com/admin/auth/login_form). [#50586](https://github.com/sourcegraph/sourcegraph/pull/50586), [50284](https://github.com/sourcegraph/sourcegraph/pull/50284) and [#50705](https://github.com/sourcegraph/sourcegraph/pull/50705)
- OOM reaper events affecting `p4-fusion` jobs on `gitserver` are better detected and handled. Error (non-zero) exit status is used, and the resource (CPU, memory) usage of the job process is appended to the job output so that admins can infer possible OOM activity and take steps to address it. [#51284](https://github.com/sourcegraph/sourcegraph/pull/51284)
- When creating a new batch change, spaces are automatically replaced with dashes in the name field. [#50825](https://github.com/sourcegraph/sourcegraph/pull/50825) and [51071](https://github.com/sourcegraph/sourcegraph/pull/51071)
- Support for custom HTML injection behind an environment variable (`ENABLE_INJECT_HTML`). This allows users to enable or disable HTML customization as needed, which is now disabled by default. [#51400](https://github.com/sourcegraph/sourcegraph/pull/51400)
- Added the ability to block auto-indexing scheduling and inference via the `codeintel_autoindexing_exceptions` Postgres table. [#51578](https://github.com/sourcegraph/sourcegraph/pull/51578)

View File

@ -4,7 +4,7 @@
# ignores.
# Install p4 CLI (keep this up to date with cmd/server/Dockerfile)
FROM sourcegraph/alpine-3.14:213466_2023-04-17_5.0-bdda34a71619@sha256:6354a4ff578b685e36c8fbde81f62125ae0011b047fb2cc22d1b0de616b3c59a AS p4cli
FROM sourcegraph/alpine-3.14:213466_2023-04-17_5.0-bdda34a71619@sha256:6354a4ff578b685e36c8fbde81f62125ae0011b047fb2cc22d1b0de616b3c59a AS build
# hash provided in http://filehost.perforce.com/perforce/r22.2/bin.linux26x86_64/SHA256SUMS
# if the hash is not provided, calculate it by downloading the file and running `sha256sum` on it in Terminal
@ -13,13 +13,9 @@ RUN echo "8bc10fca1c5a26262b4072deec76150a668581a9749d0504cd443084773d4fd0 /usr
chmod +x /usr/local/bin/p4 && \
sha256sum -c expected_hash
FROM sourcegraph/alpine-3.14:213466_2023-04-17_5.0-bdda34a71619@sha256:6354a4ff578b685e36c8fbde81f62125ae0011b047fb2cc22d1b0de616b3c59a AS p4-fusion
COPY p4-fusion-install-alpine.sh /p4-fusion-install-alpine.sh
RUN /p4-fusion-install-alpine.sh
FROM sourcegraph/alpine-3.14:213466_2023-04-17_5.0-bdda34a71619@sha256:6354a4ff578b685e36c8fbde81f62125ae0011b047fb2cc22d1b0de616b3c59a AS coursier
RUN wget -O coursier.gz https://github.com/coursier/coursier/releases/download/v2.1.0-RC4/cs-x86_64-pc-linux-static.gz && \
gzip -d coursier.gz && \
mv coursier /usr/local/bin/coursier && \
@ -37,11 +33,12 @@ LABEL org.opencontainers.image.version=${VERSION}
LABEL com.sourcegraph.github.url=https://github.com/sourcegraph/sourcegraph/commit/${COMMIT_SHA}
RUN apk add --no-cache \
# Minimal version requirement to address vulnerabilities
# https://github.blog/2023-02-14-git-security-vulnerabilities-announced-3/
# Don't use alpine/edge, the git release on this segfaults
'git>=2.38.0' --repository=http://dl-cdn.alpinelinux.org/alpine/v3.17/main \
git-lfs \
git-p4 \
&& apk add --no-cache \
openssh-client \
# We require libstdc++ for p4-fusion
libstdc++ \
@ -49,11 +46,15 @@ RUN apk add --no-cache \
python3 \
bash
COPY --from=p4cli /usr/local/bin/p4 /usr/local/bin/p4
COPY --from=build /usr/local/bin/p4 /usr/local/bin/p4
COPY --from=build /usr/local/bin/coursier /usr/local/bin/coursier
COPY --from=p4-fusion /usr/local/bin/p4-fusion /usr/local/bin/p4-fusion
COPY --from=coursier /usr/local/bin/coursier /usr/local/bin/coursier
# copy into place the p4-fusion binary and the wrapper shell script
# that facilitates better handling of killing of the p4-fusion
# (for example, if the Docker host's OOM Reaper killed it)
COPY --from=build /usr/local/bin/p4-fusion /usr/local/bin/p4-fusion-binary
COPY p4-fusion-wrapper-detect-kill.sh /usr/local/bin/p4-fusion
COPY process-stats-watcher.sh /usr/local/bin/process-stats-watcher.sh
# This is a trick to include libraries required by p4,
# please refer to https://blog.tilander.org/docker-perforce/

View File

@ -29,10 +29,39 @@ read/written please use atomic filesystem patterns. This usually involves
heavy use of `os.Rename`. Search for existing uses of `os.Rename` to see
examples.
#### Scaling
## Scaling
gitserver's memory usage consists of short lived git subprocesses.
This is an IO and compute heavy service since most Sourcegraph requests will trigger 1 or more git commands. As such we shard requests for a repo to a specific replica. This allows us to horizontally scale out the service.
The service is stateful (maintaining git clones). However, it only contains data mirrored from upstream code hosts.
## Perforce depots
Syncing of Perforce depots is accomplished by either `p4-fusion` or `git p4` (deprecated), both of which clone Perforce depots into Git repositories in `gitserver`.
### p4-fusion in development
To use `p4-fusion` while developing Sourcegraph, there are a couple of options.
#### Docker
[Run `gitserver` in a Docker container](https://docs.sourcegraph.com/dev/background-information/sg#run-gitserver-in-a-docker-container). This is the option that gives an experience closest to a deployed Sourcegraph instance, and will work for any platform/OS on which you're developing (running `sg start`).
#### Native binary executable
The `p4-fusion` native binary has been built on Linux and macOS, but is untested on Windows.
Read the [comprehensive instructions](https://docs.sourcegraph.com/dev/background-information/build_p4_fusion).
If you do go the native binary route, you may also want to enable using the wrapper shell script that detects when the process has been killed and outputs an error so that the calling process can handle it.
That wrapper shell script is `p4-fusion-wrapper-detect-kill.sh`, and in order to use it:
1. Rename the `p4-fusion` binary executable to `p4-fusion-binary` and move it to a location in the `PATH`.
1. Copy the shell script `p4-fusion-wrapper-detect-kill.sh` to a location in the `PATH`, renaming it `p4-fusion`.
1. Copy the shell script `process-stats-watcher.sh` to a location in the `PATH`.
1. Ensure all three of those are executable.
After those steps, when a native `gitserver` process runs `p4-fusion`, it will run the wrapper shell script, which will itself run the `p4-fusion-binary` executable, and the `process-stats-watcher.sh` executable.

View File

@ -1,8 +1,22 @@
#!/usr/bin/env bash
# We want to build multiple go binaries, so we use a custom build step on CI.
cd "$(dirname "${BASH_SOURCE[0]}")"/../..
set -ex
# the build process for the OSS gitserver is identical to the build process for the Enterprise gitserver
# pull some shenanigans up front so that we don't have to sprinkle "enterprise" all throughout the enterprise version
exedir=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
path="cmd/gitserver"
if [[ ${exedir} = */enterprise/cmd/gitserver ]]; then
# We want to build multiple go binaries, so we use a custom build step on CI.
cd "${exedir}"/../../.. || exit 1
path="enterprise/${path}"
else
# We want to build multiple go binaries, so we use a custom build step on CI.
cd "${exedir}"/../.. || exit 1
fi
### OSS and Enterprise builds should be identical after this point
OUTPUT=$(mktemp -d -t sgdockerbuild_XXXXXXX)
@ -12,14 +26,16 @@ cleanup() {
trap cleanup EXIT
cp -a ./cmd/gitserver/p4-fusion-install-alpine.sh "$OUTPUT"
for f in p4-fusion-install-alpine.sh p4-fusion-wrapper-detect-kill.sh process-stats-watcher.sh; do
cp -a "./${path}/${f}" "${OUTPUT}"
done
if [[ "${DOCKER_BAZEL:-false}" == "true" ]]; then
./dev/ci/bazel.sh build //cmd/gitserver
out=$(./dev/ci/bazel.sh cquery //cmd/gitserver --output=files)
./dev/ci/bazel.sh build //${path}
out=$(./dev/ci/bazel.sh cquery //${path} --output=files)
cp "$out" "$OUTPUT"
docker build -f cmd/gitserver/Dockerfile -t "$IMAGE" "$OUTPUT" \
docker build -f ${path}/Dockerfile -t "$IMAGE" "$OUTPUT" \
--progress=plain \
--build-arg COMMIT_SHA \
--build-arg DATE \
@ -33,10 +49,10 @@ export GOARCH=amd64
export GOOS=linux
export CGO_ENABLED=0
pkg="github.com/sourcegraph/sourcegraph/cmd/gitserver"
pkg="github.com/sourcegraph/sourcegraph/${path}"
go build -trimpath -ldflags "-X github.com/sourcegraph/sourcegraph/internal/version.version=$VERSION -X github.com/sourcegraph/sourcegraph/internal/version.timestamp=$(date +%s)" -buildmode exe -tags dist -o "$OUTPUT/$(basename $pkg)" "$pkg"
docker build -f cmd/gitserver/Dockerfile -t "$IMAGE" "$OUTPUT" \
docker build -f ${path}/Dockerfile -t "$IMAGE" "$OUTPUT" \
--progress=plain \
--build-arg COMMIT_SHA \
--build-arg DATE \

View File

@ -0,0 +1,74 @@
#!/usr/bin/env bash
# shellcheck disable=SC2064,SC2207
# create a file to hold the output of p4-fusion
fusionout=$(mktemp || mktemp -t fusionout_XXXXXXXX)
# create a pipe to use for capturing output of p4-fusion
# so that it can be sent to stdout and also to a file for analyzing later
fusionpipe=$(mktemp || mktemp -t fusionpipe_XXXXXXXX)
rm -f "${fusionpipe}"
mknod "${fusionpipe}" p
tee <"${fusionpipe}" "${fusionout}" &
# create a file to hold the output of `wait`
waitout=$(mktemp || mktemp -t waitout_XXXXXXXX)
# create a file to hold the resource usage of the child process
stats=$(mktemp || mktemp -t resource_XXXXXXXX)
# make sure to clean up on exit
trap "rm -f \"${fusionout}\" \"${fusionpipe}\" \"${waitout}\" \"${stats}\"" EXIT
# launch p4-fusion in the background, sending all output to the pipe for capture and re-echoing
# depends on the p4-fusion binary executable being copied to p4-fusion-binary in the gitserver Dockerfile
p4-fusion-binary "${@}" >"${fusionpipe}" 2>&1 &
# capture the pid of the child process
fpid=$!
# start up a "sidecar" process to capture resource usage.
# it will terminate when the p4-fusion process terminates.
process-stats-watcher.sh "${fpid}" "p4-fusion-binary" >"${stats}" &
spid=$!
# Wait for the child process to finish
wait ${fpid} >"${waitout}" 2>&1
# capture the result of the wait, which is the result of the child process
# or the result of external action on the child process, like SIGKILL
waitcode=$?
# the sidecar process should have exited by now, but just in case, wait for it
wait "${spid}" >/dev/null 2>&1
[ ${waitcode} -eq 0 ] || {
# if the wait exit code indicates a problem,
# check to see if the child process was killed
killed=""
# if the process was killed with SIGKILL, the `wait` process will have generated a notification
grep -qs "Killed" "${waitout}" && killed=y
[ -z "${killed}" ] && {
# If the wait process did not generate an error message, check the process output.
# The process traps SIGINT and SIGTERM; uncaught signals will be displayed as "uncaught"
tail -5 "${fusionout}" | grep -Eqs "Signal Received:|uncaught target signal" && killed=y
}
[ -z "${killed}" ] || {
# include the signal if it's SIGINT, SIGTERM, or SIGKILL
# not gauranteed to work, but nice if we can include the info
signal="$(kill -l ${waitcode})"
[ -z "${signal}" ] || signal=" (SIG${signal})"
# get info if available from the sidecar process
rusage=""
[ -s "${stats}" ] && {
# expect the last (maybe only) line to be four fields:
# RSS VSZ ETIME TIME
x=($(tail -1 "${stats}"))
# NOTE: bash indexes from 0; zsh indexes from 1
[ ${#x[@]} -eq 4 ] && rusage=" At the time of its demise, it had been running for ${x[2]}, had used ${x[3]} CPU time, reserved ${x[1]} RAM and was using ${x[0]}."
}
echo "p4-fusion was killed by an external signal${signal}.${rusage}"
}
}
exit ${waitcode}

View File

@ -0,0 +1,45 @@
#!/usr/bin/env bash
# shellcheck disable=SC2064,SC2207,SC2009
humanize() {
local num=${1}
[[ ${num} =~ ^[0-9][0-9]*$ ]] && num=$(bc <<<"scale=2;${num}/1024/1024")m
printf -- '%s' "${num}"
return 0
}
# read resource usage statistics for a process
# several times a second until it terminates
# at which point, output the most recent stats on stdout
# the output format is "RSS VSZ ETIME TIME"
# input is the pid of the process
pid="${1}"
# and its name, which is used to avoid tracking
# another process in case the original process completed,
# and another started up and got assigned the same pid
cmd="${2}"
unset rss vsz etime time
while true; do
# Alpine has a rather limited `ps`
# it does not limit output to just one process, even when specifying a pid
# so we need to filter the output by pid
x="$(ps -o pid,stat,rss,vsz,etime,time,comm,args "${pid}" | grep "^ *${pid} " | grep "${cmd}" | tail -1)"
[ -z "${x}" ] && break
IFS=" " read -r -a a <<<"$x"
# drop out of here if the process has died or become a zombie - no coming back from the dead
[[ "${a[1]}" =~ ^[ZXx] ]] && break
# only collect stats for processes that are active (running, sleeping, disk sleep, which is waiting for I/O to complete)
# but don't stop until it is really is dead
[[ "${a[1]}" =~ ^[RSD] ]] && {
rss=${a[2]}
vsz=${a[3]}
etime=${a[4]}
time=${a[5]}
}
sleep 0.2
done
printf '%s %s %s %s' "$(humanize "${rss}")" "$(humanize "${vsz}")" "${etime}" "${time}"

View File

@ -4,7 +4,7 @@
# ignores.
# Install p4 CLI (keep this up to date with cmd/server/Dockerfile)
FROM sourcegraph/alpine-3.14:213466_2023-04-17_5.0-bdda34a71619@sha256:6354a4ff578b685e36c8fbde81f62125ae0011b047fb2cc22d1b0de616b3c59a AS p4cli
FROM sourcegraph/alpine-3.14:213466_2023-04-17_5.0-bdda34a71619@sha256:6354a4ff578b685e36c8fbde81f62125ae0011b047fb2cc22d1b0de616b3c59a AS build
# hash provided in http://filehost.perforce.com/perforce/r22.2/bin.linux26x86_64/SHA256SUMS
# if the hash is not provided, calculate it by downloading the file and running `sha256sum` on it in Terminal
@ -13,13 +13,9 @@ RUN echo "8bc10fca1c5a26262b4072deec76150a668581a9749d0504cd443084773d4fd0 /usr
chmod +x /usr/local/bin/p4 && \
sha256sum -c expected_hash
FROM sourcegraph/alpine-3.14:213466_2023-04-17_5.0-bdda34a71619@sha256:6354a4ff578b685e36c8fbde81f62125ae0011b047fb2cc22d1b0de616b3c59a AS p4-fusion
COPY p4-fusion-install-alpine.sh /p4-fusion-install-alpine.sh
RUN /p4-fusion-install-alpine.sh
FROM sourcegraph/alpine-3.14:213466_2023-04-17_5.0-bdda34a71619@sha256:6354a4ff578b685e36c8fbde81f62125ae0011b047fb2cc22d1b0de616b3c59a AS coursier
RUN wget -O coursier.gz https://github.com/coursier/coursier/releases/download/v2.1.0-RC4/cs-x86_64-pc-linux-static.gz && \
gzip -d coursier.gz && \
mv coursier /usr/local/bin/coursier && \
@ -43,7 +39,6 @@ RUN apk add --no-cache \
'git>=2.38.0' --repository=http://dl-cdn.alpinelinux.org/alpine/v3.17/main \
git-lfs \
git-p4 \
&& apk add --no-cache \
openssh-client \
# We require libstdc++ for p4-fusion
libstdc++ \
@ -51,11 +46,16 @@ RUN apk add --no-cache \
python3 \
bash
COPY --from=p4cli /usr/local/bin/p4 /usr/local/bin/p4
COPY --from=build /usr/local/bin/p4 /usr/local/bin/p4
COPY --from=build /usr/local/bin/coursier /usr/local/bin/coursier
COPY --from=p4-fusion /usr/local/bin/p4-fusion /usr/local/bin/p4-fusion
COPY --from=coursier /usr/local/bin/coursier /usr/local/bin/coursier
# copy into place the p4-fusion binary and the wrapper shell script
# that facilitates better handling of killing of the p4-fusion
# (for example, either because it exceeded gitLongCommandTimeout, or the Docker host's OOM Reaper killed it)
# actually, I'm not sure about gitLongCommandTimeout, because that may directly terminate the wrapper script.
COPY --from=build /usr/local/bin/p4-fusion /usr/local/bin/p4-fusion-binary
COPY p4-fusion-wrapper-detect-kill.sh /usr/local/bin/p4-fusion
COPY process-stats-watcher.sh /usr/local/bin/process-stats-watcher.sh
# This is a trick to include libraries required by p4,
# please refer to https://blog.tilander.org/docker-perforce/

View File

@ -1,8 +1,22 @@
#!/usr/bin/env bash
# We want to build multiple go binaries, so we use a custom build step on CI.
cd "$(dirname "${BASH_SOURCE[0]}")/../../.."
set -ex
# the build process for the OSS gitserver is identical to the build process for the Enterprise gitserver
# pull some shenanigans up front so that we don't have to sprinkle "enterprise" all throughout the enterprise version
exedir=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
path="cmd/gitserver"
if [[ ${exedir} = */enterprise/cmd/gitserver ]]; then
# We want to build multiple go binaries, so we use a custom build step on CI.
cd "${exedir}"/../../.. || exit 1
path="enterprise/${path}"
else
# We want to build multiple go binaries, so we use a custom build step on CI.
cd "${exedir}"/../.. || exit 1
fi
### OSS and Enterprise builds should be identical after this point
OUTPUT=$(mktemp -d -t sgdockerbuild_XXXXXXX)
@ -12,14 +26,16 @@ cleanup() {
trap cleanup EXIT
cp -a ./enterprise/cmd/gitserver/p4-fusion-install-alpine.sh "$OUTPUT"
for f in p4-fusion-install-alpine.sh p4-fusion-wrapper-detect-kill.sh process-stats-watcher.sh; do
cp -a "./${path}/${f}" "${OUTPUT}"
done
if [[ "${DOCKER_BAZEL:-false}" == "true" ]]; then
./dev/ci/bazel.sh build //enterprise/cmd/gitserver
out=$(./dev/ci/bazel.sh cquery //enterprise/cmd/gitserver --output=files)
./dev/ci/bazel.sh build //${path}
out=$(./dev/ci/bazel.sh cquery //${path} --output=files)
cp "$out" "$OUTPUT"
docker build -f enterprise/cmd/gitserver/Dockerfile -t "$IMAGE" "$OUTPUT" \
docker build -f ${path}/Dockerfile -t "$IMAGE" "$OUTPUT" \
--progress=plain \
--build-arg COMMIT_SHA \
--build-arg DATE \
@ -33,10 +49,10 @@ export GOARCH=amd64
export GOOS=linux
export CGO_ENABLED=0
pkg="github.com/sourcegraph/sourcegraph/enterprise/cmd/gitserver"
pkg="github.com/sourcegraph/sourcegraph/${path}"
go build -trimpath -ldflags "-X github.com/sourcegraph/sourcegraph/internal/version.version=$VERSION -X github.com/sourcegraph/sourcegraph/internal/version.timestamp=$(date +%s)" -buildmode exe -tags dist -o "$OUTPUT/$(basename $pkg)" "$pkg"
docker build -f enterprise/cmd/gitserver/Dockerfile -t "$IMAGE" "$OUTPUT" \
docker build -f ${path}/Dockerfile -t "$IMAGE" "$OUTPUT" \
--progress=plain \
--build-arg COMMIT_SHA \
--build-arg DATE \

View File

@ -0,0 +1,75 @@
#!/usr/bin/env bash
# shellcheck disable=SC2064,SC2207
# create a file to hold the output of p4-fusion
# TODO: consider recording/storing/capturing the file for logs display in the UI if there's a problem
fusionout=$(mktemp || mktemp -t fusionout_XXXXXXXX)
# create a pipe to use for capturing output of p4-fusion
# so that it can be sent to stdout and also to a file for analyzing later
fusionpipe=$(mktemp || mktemp -t fusionpipe_XXXXXXXX)
rm -f "${fusionpipe}"
mknod "${fusionpipe}" p
tee <"${fusionpipe}" "${fusionout}" &
# create a file to hold the output of `wait`
waitout=$(mktemp || mktemp -t waitout_XXXXXXXX)
# create a file to hold the resource usage of the child process
stats=$(mktemp || mktemp -t resource_XXXXXXXX)
# make sure to clean up on exit
trap "rm -f \"${fusionout}\" \"${fusionpipe}\" \"${waitout}\" \"${stats}\"" EXIT
# launch p4-fusion in the background, sending all output to the pipe for capture and re-echoing
# depends on the p4-fusion binary executable being copied to p4-fusion-binary in the gitserver Dockerfile
p4-fusion-binary "${@}" >"${fusionpipe}" 2>&1 &
# capture the pid of the child process
fpid=$!
# start up a "sidecar" process to capture resource usage.
# it will terminate when the p4-fusion process terminates.
process-stats-watcher.sh "${fpid}" "p4-fusion-binary" >"${stats}" &
spid=$!
# Wait for the child process to finish
wait ${fpid} >"${waitout}" 2>&1
# capture the result of the wait, which is the result of the child process
# or the result of external action on the child process, like SIGKILL
waitcode=$?
# the sidecar process should have exited by now, but just in case, wait for it
wait "${spid}" >/dev/null 2>&1
[ ${waitcode} -eq 0 ] || {
# if the wait exit code indicates a problem,
# check to see if the child process was killed
killed=""
# if the process was killed with SIGKILL, the `wait` process will have generated a notification
grep -qs "Killed" "${waitout}" && killed=y
[ -z "${killed}" ] && {
# If the wait process did not generate an error message, check the process output.
# The process traps SIGINT and SIGTERM; uncaught signals will be displayed as "uncaught"
tail -5 "${fusionout}" | grep -Eqs "Signal Received:|uncaught target signal" && killed=y
}
[ -z "${killed}" ] || {
# include the signal if it's SIGINT, SIGTERM, or SIGKILL
# not gauranteed to work, but nice if we can include the info
signal="$(kill -l ${waitcode})"
[ -z "${signal}" ] || signal=" (SIG${signal})"
# get info if available from the sidecar process
rusage=""
[ -s "${stats}" ] && {
# expect the last (maybe only) line to be four fields:
# RSS VSZ ETIME TIME
x=($(tail -1 "${stats}"))
# NOTE: bash indexes from 0; zsh indexes from 1
[ ${#x[@]} -eq 4 ] && rusage=" At the time of its demise, it had been running for ${x[2]}, had used ${x[3]} CPU time, reserved ${x[1]} RAM and was using ${x[0]}."
}
echo "p4-fusion was killed by an external signal${signal}.${rusage}"
}
}
exit ${waitcode}

View File

@ -0,0 +1,45 @@
#!/usr/bin/env bash
# shellcheck disable=SC2064,SC2207,SC2009
humanize() {
local num=${1}
[[ ${num} =~ ^[0-9][0-9]*$ ]] && num=$(bc <<<"scale=2;${num}/1024/1024")m
printf -- '%s' "${num}"
return 0
}
# read resource usage statistics for a process
# several times a second until it terminates
# at which point, output the most recent stats on stdout
# the output format is "RSS VSZ ETIME TIME"
# input is the pid of the process
pid="${1}"
# and its name, which is used to avoid tracking
# another process in case the original process completed,
# and another started up and got assigned the same pid
cmd="${2}"
unset rss vsz etime time
while true; do
# Alpine has a rather limited `ps`
# it does not limit output to just one process, even when specifying a pid
# so we need to filter the output by pid
x="$(ps -o pid,stat,rss,vsz,etime,time,comm,args "${pid}" | grep "^ *${pid} " | grep "${cmd}" | tail -1)"
[ -z "${x}" ] && break
IFS=" " read -r -a a <<<"$x"
# drop out of here if the process has died or become a zombie - no coming back from the dead
[[ "${a[1]}" =~ ^[ZXx] ]] && break
# only collect stats for processes that are active (running, sleeping, disk sleep, which is waiting for I/O to complete)
# but don't stop until it is really is dead
[[ "${a[1]}" =~ ^[RSD] ]] && {
rss=${a[2]}
vsz=${a[3]}
etime=${a[4]}
time=${a[5]}
}
sleep 0.2
done
printf '%s %s %s %s' "$(humanize "${rss}")" "$(humanize "${vsz}")" "${etime}" "${time}"