mirror of
https://github.com/sourcegraph/sourcegraph.git
synced 2026-02-06 20:51:43 +00:00
detect OOM reaper action on p4-fusion (#51284)
Address #42385 and other issues that crop up by wrapping `p4-fusion` calls with a shell script that detects when `p4-fusion` is killed, gathers resource usage stats about the killed process, outputs info about the death along with the resource usage stats, and exits with a non-zero return code, so that the error is picked up by the UI and shown in the repo list (see [attached video](https://www.loom.com/share/b9ae6abb14dd4f3b9a6670708b22c8d0)) To reduce friction, the shell script is named `p4-fusion` and the actual `p4-fusion` binary executable is renamed `p4-fusion-binary`. If we want to use different names, we will also need to modify the gitserver code that runs the `p4-fusion` commands. ## Test plan Build a gitserver Docker image, run a `p4-fusion` command in it, kill the `p4-fusion-binary` process, see that the output ends with info about the killed process, and see that the return code is non-zero. Build a gitserver Docker image: ``` VERSION=dev IMAGE=sourcegraph/gitserver ./cmd/gitserver/build.sh ``` Start a gitserver instance: ``` HOSTNAME=127.0.0.1:3178 GITSERVER_EXTERNAL_ADDR=127.0.0.1:3503 GITSERVER_ADDR=127.0.0.1:3503 SRC_REPOS_DIR=$HOME/.sourcegraph/repos_3 SRC_PROF_HTTP=127.0.0.1:3553 GITSERVER_INDEX=3 docker run \ --rm \ -e "GITSERVER_EXTERNAL_ADDR=${GITSERVER_EXTERNAL_ADDR}" \ -e "GITSERVER_ADDR=0.0.0.0:${HOSTNAME##*:}" \ -e "SRC_FRONTEND_INTERNAL=host.docker.internal:${SRC_FRONTEND_INTERNAL##*:}" \ -e "SRC_PROF_HTTP=0.0.0.0:${SRC_PROF_HTTP##*:}" \ -e "HOSTNAME=${HOSTNAME}" \ -p ${GITSERVER_ADDR}:${HOSTNAME##*:} \ -p ${SRC_PROF_HTTP}:${SRC_PROF_HTTP##*:} \ -v ${SRC_REPOS_DIR}:/data/repos \ --detach \ --name gitserver-${GITSERVER_INDEX} \ sourcegraph/gitserver ``` Connect two terminals to it: ``` docker exec -it gitserver-3 bash ``` In one terminal, run this command (get the admin password from 1Password): ``` P4PORT=perforce.sgdev.org:1666 P4USER=admin export P4PORT P4USER p4 login -a <<<"REDACTED PASSWORD" p4-fusion \ --path //go/... \ --client "" \ --user "${P4USER}" \ --src /data/repos/go/.git \ --networkThreads 64 \ --printBatch 10 \ --port "${P4PORT}" \ --lookAhead 2000 \ --retries 10 \ --refresh 100000 \ --maxChanges 4000 \ --includeBinaries false \ --fsyncEnable true \ --noColor true ``` In the other terminal, use `pkill -9 p4-fusion-binar` to end it. That's not a typo: Alpine's process table stores only the first 15 characters of the command. You could instead use `pkill -9 -f p4-fusion-binary`, but that matches against the entire command line so it's more dangerous. In the first terminal: - see the output end with something like: > p4-fusion was killed by an external signal. At the time of its demise, it had been running for 00:20, had used 0:01.00 CPU time, reserved 390.95m RAM and was using .14m. - type`$?` and hit Enter to see the return code of the `p4-fusion` command. It should be 137. Success! --------- Co-authored-by: Indradhanush Gupta <indradhanush.gupta@gmail.com>
This commit is contained in:
parent
fad5c37d04
commit
018beee3ee
@ -24,6 +24,7 @@ All notable changes to Sourcegraph are documented in this file.
|
||||
- Permissions center statistics pane is added. Stats include numbers of queued jobs, users/repos with failed jobs, no permissions, and outdated permissions. [#50535](https://github.com/sourcegraph/sourcegraph/pull/50535)
|
||||
- SCIM user provisioning support for Deactivate/Reactivation of users. [#50533](https://github.com/sourcegraph/sourcegraph/pull/50533)
|
||||
- Login form can now be configured with ordering and limit of auth providers. [See docs](https://docs.sourcegraph.com/admin/auth/login_form). [#50586](https://github.com/sourcegraph/sourcegraph/pull/50586), [50284](https://github.com/sourcegraph/sourcegraph/pull/50284) and [#50705](https://github.com/sourcegraph/sourcegraph/pull/50705)
|
||||
- OOM reaper events affecting `p4-fusion` jobs on `gitserver` are better detected and handled. Error (non-zero) exit status is used, and the resource (CPU, memory) usage of the job process is appended to the job output so that admins can infer possible OOM activity and take steps to address it. [#51284](https://github.com/sourcegraph/sourcegraph/pull/51284)
|
||||
- When creating a new batch change, spaces are automatically replaced with dashes in the name field. [#50825](https://github.com/sourcegraph/sourcegraph/pull/50825) and [51071](https://github.com/sourcegraph/sourcegraph/pull/51071)
|
||||
- Support for custom HTML injection behind an environment variable (`ENABLE_INJECT_HTML`). This allows users to enable or disable HTML customization as needed, which is now disabled by default. [#51400](https://github.com/sourcegraph/sourcegraph/pull/51400)
|
||||
- Added the ability to block auto-indexing scheduling and inference via the `codeintel_autoindexing_exceptions` Postgres table. [#51578](https://github.com/sourcegraph/sourcegraph/pull/51578)
|
||||
|
||||
@ -4,7 +4,7 @@
|
||||
# ignores.
|
||||
|
||||
# Install p4 CLI (keep this up to date with cmd/server/Dockerfile)
|
||||
FROM sourcegraph/alpine-3.14:213466_2023-04-17_5.0-bdda34a71619@sha256:6354a4ff578b685e36c8fbde81f62125ae0011b047fb2cc22d1b0de616b3c59a AS p4cli
|
||||
FROM sourcegraph/alpine-3.14:213466_2023-04-17_5.0-bdda34a71619@sha256:6354a4ff578b685e36c8fbde81f62125ae0011b047fb2cc22d1b0de616b3c59a AS build
|
||||
|
||||
# hash provided in http://filehost.perforce.com/perforce/r22.2/bin.linux26x86_64/SHA256SUMS
|
||||
# if the hash is not provided, calculate it by downloading the file and running `sha256sum` on it in Terminal
|
||||
@ -13,13 +13,9 @@ RUN echo "8bc10fca1c5a26262b4072deec76150a668581a9749d0504cd443084773d4fd0 /usr
|
||||
chmod +x /usr/local/bin/p4 && \
|
||||
sha256sum -c expected_hash
|
||||
|
||||
FROM sourcegraph/alpine-3.14:213466_2023-04-17_5.0-bdda34a71619@sha256:6354a4ff578b685e36c8fbde81f62125ae0011b047fb2cc22d1b0de616b3c59a AS p4-fusion
|
||||
|
||||
COPY p4-fusion-install-alpine.sh /p4-fusion-install-alpine.sh
|
||||
RUN /p4-fusion-install-alpine.sh
|
||||
|
||||
FROM sourcegraph/alpine-3.14:213466_2023-04-17_5.0-bdda34a71619@sha256:6354a4ff578b685e36c8fbde81f62125ae0011b047fb2cc22d1b0de616b3c59a AS coursier
|
||||
|
||||
RUN wget -O coursier.gz https://github.com/coursier/coursier/releases/download/v2.1.0-RC4/cs-x86_64-pc-linux-static.gz && \
|
||||
gzip -d coursier.gz && \
|
||||
mv coursier /usr/local/bin/coursier && \
|
||||
@ -37,11 +33,12 @@ LABEL org.opencontainers.image.version=${VERSION}
|
||||
LABEL com.sourcegraph.github.url=https://github.com/sourcegraph/sourcegraph/commit/${COMMIT_SHA}
|
||||
|
||||
RUN apk add --no-cache \
|
||||
# Minimal version requirement to address vulnerabilities
|
||||
# https://github.blog/2023-02-14-git-security-vulnerabilities-announced-3/
|
||||
# Don't use alpine/edge, the git release on this segfaults
|
||||
'git>=2.38.0' --repository=http://dl-cdn.alpinelinux.org/alpine/v3.17/main \
|
||||
git-lfs \
|
||||
git-p4 \
|
||||
&& apk add --no-cache \
|
||||
openssh-client \
|
||||
# We require libstdc++ for p4-fusion
|
||||
libstdc++ \
|
||||
@ -49,11 +46,15 @@ RUN apk add --no-cache \
|
||||
python3 \
|
||||
bash
|
||||
|
||||
COPY --from=p4cli /usr/local/bin/p4 /usr/local/bin/p4
|
||||
COPY --from=build /usr/local/bin/p4 /usr/local/bin/p4
|
||||
COPY --from=build /usr/local/bin/coursier /usr/local/bin/coursier
|
||||
|
||||
COPY --from=p4-fusion /usr/local/bin/p4-fusion /usr/local/bin/p4-fusion
|
||||
|
||||
COPY --from=coursier /usr/local/bin/coursier /usr/local/bin/coursier
|
||||
# copy into place the p4-fusion binary and the wrapper shell script
|
||||
# that facilitates better handling of killing of the p4-fusion
|
||||
# (for example, if the Docker host's OOM Reaper killed it)
|
||||
COPY --from=build /usr/local/bin/p4-fusion /usr/local/bin/p4-fusion-binary
|
||||
COPY p4-fusion-wrapper-detect-kill.sh /usr/local/bin/p4-fusion
|
||||
COPY process-stats-watcher.sh /usr/local/bin/process-stats-watcher.sh
|
||||
|
||||
# This is a trick to include libraries required by p4,
|
||||
# please refer to https://blog.tilander.org/docker-perforce/
|
||||
|
||||
@ -29,10 +29,39 @@ read/written please use atomic filesystem patterns. This usually involves
|
||||
heavy use of `os.Rename`. Search for existing uses of `os.Rename` to see
|
||||
examples.
|
||||
|
||||
#### Scaling
|
||||
## Scaling
|
||||
|
||||
gitserver's memory usage consists of short lived git subprocesses.
|
||||
|
||||
This is an IO and compute heavy service since most Sourcegraph requests will trigger 1 or more git commands. As such we shard requests for a repo to a specific replica. This allows us to horizontally scale out the service.
|
||||
|
||||
The service is stateful (maintaining git clones). However, it only contains data mirrored from upstream code hosts.
|
||||
|
||||
## Perforce depots
|
||||
|
||||
Syncing of Perforce depots is accomplished by either `p4-fusion` or `git p4` (deprecated), both of which clone Perforce depots into Git repositories in `gitserver`.
|
||||
|
||||
### p4-fusion in development
|
||||
|
||||
To use `p4-fusion` while developing Sourcegraph, there are a couple of options.
|
||||
|
||||
#### Docker
|
||||
|
||||
[Run `gitserver` in a Docker container](https://docs.sourcegraph.com/dev/background-information/sg#run-gitserver-in-a-docker-container). This is the option that gives an experience closest to a deployed Sourcegraph instance, and will work for any platform/OS on which you're developing (running `sg start`).
|
||||
|
||||
#### Native binary executable
|
||||
|
||||
The `p4-fusion` native binary has been built on Linux and macOS, but is untested on Windows.
|
||||
|
||||
Read the [comprehensive instructions](https://docs.sourcegraph.com/dev/background-information/build_p4_fusion).
|
||||
|
||||
If you do go the native binary route, you may also want to enable using the wrapper shell script that detects when the process has been killed and outputs an error so that the calling process can handle it.
|
||||
|
||||
That wrapper shell script is `p4-fusion-wrapper-detect-kill.sh`, and in order to use it:
|
||||
|
||||
1. Rename the `p4-fusion` binary executable to `p4-fusion-binary` and move it to a location in the `PATH`.
|
||||
1. Copy the shell script `p4-fusion-wrapper-detect-kill.sh` to a location in the `PATH`, renaming it `p4-fusion`.
|
||||
1. Copy the shell script `process-stats-watcher.sh` to a location in the `PATH`.
|
||||
1. Ensure all three of those are executable.
|
||||
|
||||
After those steps, when a native `gitserver` process runs `p4-fusion`, it will run the wrapper shell script, which will itself run the `p4-fusion-binary` executable, and the `process-stats-watcher.sh` executable.
|
||||
|
||||
@ -1,8 +1,22 @@
|
||||
#!/usr/bin/env bash
|
||||
|
||||
# We want to build multiple go binaries, so we use a custom build step on CI.
|
||||
cd "$(dirname "${BASH_SOURCE[0]}")"/../..
|
||||
set -ex
|
||||
# the build process for the OSS gitserver is identical to the build process for the Enterprise gitserver
|
||||
# pull some shenanigans up front so that we don't have to sprinkle "enterprise" all throughout the enterprise version
|
||||
|
||||
exedir=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
|
||||
|
||||
path="cmd/gitserver"
|
||||
|
||||
if [[ ${exedir} = */enterprise/cmd/gitserver ]]; then
|
||||
# We want to build multiple go binaries, so we use a custom build step on CI.
|
||||
cd "${exedir}"/../../.. || exit 1
|
||||
path="enterprise/${path}"
|
||||
else
|
||||
# We want to build multiple go binaries, so we use a custom build step on CI.
|
||||
cd "${exedir}"/../.. || exit 1
|
||||
fi
|
||||
|
||||
### OSS and Enterprise builds should be identical after this point
|
||||
|
||||
OUTPUT=$(mktemp -d -t sgdockerbuild_XXXXXXX)
|
||||
|
||||
@ -12,14 +26,16 @@ cleanup() {
|
||||
|
||||
trap cleanup EXIT
|
||||
|
||||
cp -a ./cmd/gitserver/p4-fusion-install-alpine.sh "$OUTPUT"
|
||||
for f in p4-fusion-install-alpine.sh p4-fusion-wrapper-detect-kill.sh process-stats-watcher.sh; do
|
||||
cp -a "./${path}/${f}" "${OUTPUT}"
|
||||
done
|
||||
|
||||
if [[ "${DOCKER_BAZEL:-false}" == "true" ]]; then
|
||||
./dev/ci/bazel.sh build //cmd/gitserver
|
||||
out=$(./dev/ci/bazel.sh cquery //cmd/gitserver --output=files)
|
||||
./dev/ci/bazel.sh build //${path}
|
||||
out=$(./dev/ci/bazel.sh cquery //${path} --output=files)
|
||||
cp "$out" "$OUTPUT"
|
||||
|
||||
docker build -f cmd/gitserver/Dockerfile -t "$IMAGE" "$OUTPUT" \
|
||||
docker build -f ${path}/Dockerfile -t "$IMAGE" "$OUTPUT" \
|
||||
--progress=plain \
|
||||
--build-arg COMMIT_SHA \
|
||||
--build-arg DATE \
|
||||
@ -33,10 +49,10 @@ export GOARCH=amd64
|
||||
export GOOS=linux
|
||||
export CGO_ENABLED=0
|
||||
|
||||
pkg="github.com/sourcegraph/sourcegraph/cmd/gitserver"
|
||||
pkg="github.com/sourcegraph/sourcegraph/${path}"
|
||||
go build -trimpath -ldflags "-X github.com/sourcegraph/sourcegraph/internal/version.version=$VERSION -X github.com/sourcegraph/sourcegraph/internal/version.timestamp=$(date +%s)" -buildmode exe -tags dist -o "$OUTPUT/$(basename $pkg)" "$pkg"
|
||||
|
||||
docker build -f cmd/gitserver/Dockerfile -t "$IMAGE" "$OUTPUT" \
|
||||
docker build -f ${path}/Dockerfile -t "$IMAGE" "$OUTPUT" \
|
||||
--progress=plain \
|
||||
--build-arg COMMIT_SHA \
|
||||
--build-arg DATE \
|
||||
|
||||
74
cmd/gitserver/p4-fusion-wrapper-detect-kill.sh
Executable file
74
cmd/gitserver/p4-fusion-wrapper-detect-kill.sh
Executable file
@ -0,0 +1,74 @@
|
||||
#!/usr/bin/env bash
|
||||
# shellcheck disable=SC2064,SC2207
|
||||
|
||||
# create a file to hold the output of p4-fusion
|
||||
fusionout=$(mktemp || mktemp -t fusionout_XXXXXXXX)
|
||||
|
||||
# create a pipe to use for capturing output of p4-fusion
|
||||
# so that it can be sent to stdout and also to a file for analyzing later
|
||||
fusionpipe=$(mktemp || mktemp -t fusionpipe_XXXXXXXX)
|
||||
rm -f "${fusionpipe}"
|
||||
mknod "${fusionpipe}" p
|
||||
tee <"${fusionpipe}" "${fusionout}" &
|
||||
|
||||
# create a file to hold the output of `wait`
|
||||
waitout=$(mktemp || mktemp -t waitout_XXXXXXXX)
|
||||
|
||||
# create a file to hold the resource usage of the child process
|
||||
stats=$(mktemp || mktemp -t resource_XXXXXXXX)
|
||||
|
||||
# make sure to clean up on exit
|
||||
trap "rm -f \"${fusionout}\" \"${fusionpipe}\" \"${waitout}\" \"${stats}\"" EXIT
|
||||
|
||||
# launch p4-fusion in the background, sending all output to the pipe for capture and re-echoing
|
||||
# depends on the p4-fusion binary executable being copied to p4-fusion-binary in the gitserver Dockerfile
|
||||
p4-fusion-binary "${@}" >"${fusionpipe}" 2>&1 &
|
||||
|
||||
# capture the pid of the child process
|
||||
fpid=$!
|
||||
|
||||
# start up a "sidecar" process to capture resource usage.
|
||||
# it will terminate when the p4-fusion process terminates.
|
||||
process-stats-watcher.sh "${fpid}" "p4-fusion-binary" >"${stats}" &
|
||||
spid=$!
|
||||
|
||||
# Wait for the child process to finish
|
||||
wait ${fpid} >"${waitout}" 2>&1
|
||||
|
||||
# capture the result of the wait, which is the result of the child process
|
||||
# or the result of external action on the child process, like SIGKILL
|
||||
waitcode=$?
|
||||
|
||||
# the sidecar process should have exited by now, but just in case, wait for it
|
||||
wait "${spid}" >/dev/null 2>&1
|
||||
|
||||
[ ${waitcode} -eq 0 ] || {
|
||||
# if the wait exit code indicates a problem,
|
||||
# check to see if the child process was killed
|
||||
killed=""
|
||||
# if the process was killed with SIGKILL, the `wait` process will have generated a notification
|
||||
grep -qs "Killed" "${waitout}" && killed=y
|
||||
[ -z "${killed}" ] && {
|
||||
# If the wait process did not generate an error message, check the process output.
|
||||
# The process traps SIGINT and SIGTERM; uncaught signals will be displayed as "uncaught"
|
||||
tail -5 "${fusionout}" | grep -Eqs "Signal Received:|uncaught target signal" && killed=y
|
||||
}
|
||||
[ -z "${killed}" ] || {
|
||||
# include the signal if it's SIGINT, SIGTERM, or SIGKILL
|
||||
# not gauranteed to work, but nice if we can include the info
|
||||
signal="$(kill -l ${waitcode})"
|
||||
[ -z "${signal}" ] || signal=" (SIG${signal})"
|
||||
# get info if available from the sidecar process
|
||||
rusage=""
|
||||
[ -s "${stats}" ] && {
|
||||
# expect the last (maybe only) line to be four fields:
|
||||
# RSS VSZ ETIME TIME
|
||||
x=($(tail -1 "${stats}"))
|
||||
# NOTE: bash indexes from 0; zsh indexes from 1
|
||||
[ ${#x[@]} -eq 4 ] && rusage=" At the time of its demise, it had been running for ${x[2]}, had used ${x[3]} CPU time, reserved ${x[1]} RAM and was using ${x[0]}."
|
||||
}
|
||||
echo "p4-fusion was killed by an external signal${signal}.${rusage}"
|
||||
}
|
||||
}
|
||||
|
||||
exit ${waitcode}
|
||||
45
cmd/gitserver/process-stats-watcher.sh
Executable file
45
cmd/gitserver/process-stats-watcher.sh
Executable file
@ -0,0 +1,45 @@
|
||||
#!/usr/bin/env bash
|
||||
# shellcheck disable=SC2064,SC2207,SC2009
|
||||
|
||||
humanize() {
|
||||
local num=${1}
|
||||
[[ ${num} =~ ^[0-9][0-9]*$ ]] && num=$(bc <<<"scale=2;${num}/1024/1024")m
|
||||
printf -- '%s' "${num}"
|
||||
return 0
|
||||
}
|
||||
|
||||
# read resource usage statistics for a process
|
||||
# several times a second until it terminates
|
||||
# at which point, output the most recent stats on stdout
|
||||
# the output format is "RSS VSZ ETIME TIME"
|
||||
|
||||
# input is the pid of the process
|
||||
pid="${1}"
|
||||
# and its name, which is used to avoid tracking
|
||||
# another process in case the original process completed,
|
||||
# and another started up and got assigned the same pid
|
||||
cmd="${2}"
|
||||
|
||||
unset rss vsz etime time
|
||||
|
||||
while true; do
|
||||
# Alpine has a rather limited `ps`
|
||||
# it does not limit output to just one process, even when specifying a pid
|
||||
# so we need to filter the output by pid
|
||||
x="$(ps -o pid,stat,rss,vsz,etime,time,comm,args "${pid}" | grep "^ *${pid} " | grep "${cmd}" | tail -1)"
|
||||
[ -z "${x}" ] && break
|
||||
IFS=" " read -r -a a <<<"$x"
|
||||
# drop out of here if the process has died or become a zombie - no coming back from the dead
|
||||
[[ "${a[1]}" =~ ^[ZXx] ]] && break
|
||||
# only collect stats for processes that are active (running, sleeping, disk sleep, which is waiting for I/O to complete)
|
||||
# but don't stop until it is really is dead
|
||||
[[ "${a[1]}" =~ ^[RSD] ]] && {
|
||||
rss=${a[2]}
|
||||
vsz=${a[3]}
|
||||
etime=${a[4]}
|
||||
time=${a[5]}
|
||||
}
|
||||
sleep 0.2
|
||||
done
|
||||
|
||||
printf '%s %s %s %s' "$(humanize "${rss}")" "$(humanize "${vsz}")" "${etime}" "${time}"
|
||||
@ -4,7 +4,7 @@
|
||||
# ignores.
|
||||
|
||||
# Install p4 CLI (keep this up to date with cmd/server/Dockerfile)
|
||||
FROM sourcegraph/alpine-3.14:213466_2023-04-17_5.0-bdda34a71619@sha256:6354a4ff578b685e36c8fbde81f62125ae0011b047fb2cc22d1b0de616b3c59a AS p4cli
|
||||
FROM sourcegraph/alpine-3.14:213466_2023-04-17_5.0-bdda34a71619@sha256:6354a4ff578b685e36c8fbde81f62125ae0011b047fb2cc22d1b0de616b3c59a AS build
|
||||
|
||||
# hash provided in http://filehost.perforce.com/perforce/r22.2/bin.linux26x86_64/SHA256SUMS
|
||||
# if the hash is not provided, calculate it by downloading the file and running `sha256sum` on it in Terminal
|
||||
@ -13,13 +13,9 @@ RUN echo "8bc10fca1c5a26262b4072deec76150a668581a9749d0504cd443084773d4fd0 /usr
|
||||
chmod +x /usr/local/bin/p4 && \
|
||||
sha256sum -c expected_hash
|
||||
|
||||
FROM sourcegraph/alpine-3.14:213466_2023-04-17_5.0-bdda34a71619@sha256:6354a4ff578b685e36c8fbde81f62125ae0011b047fb2cc22d1b0de616b3c59a AS p4-fusion
|
||||
|
||||
COPY p4-fusion-install-alpine.sh /p4-fusion-install-alpine.sh
|
||||
RUN /p4-fusion-install-alpine.sh
|
||||
|
||||
FROM sourcegraph/alpine-3.14:213466_2023-04-17_5.0-bdda34a71619@sha256:6354a4ff578b685e36c8fbde81f62125ae0011b047fb2cc22d1b0de616b3c59a AS coursier
|
||||
|
||||
RUN wget -O coursier.gz https://github.com/coursier/coursier/releases/download/v2.1.0-RC4/cs-x86_64-pc-linux-static.gz && \
|
||||
gzip -d coursier.gz && \
|
||||
mv coursier /usr/local/bin/coursier && \
|
||||
@ -43,7 +39,6 @@ RUN apk add --no-cache \
|
||||
'git>=2.38.0' --repository=http://dl-cdn.alpinelinux.org/alpine/v3.17/main \
|
||||
git-lfs \
|
||||
git-p4 \
|
||||
&& apk add --no-cache \
|
||||
openssh-client \
|
||||
# We require libstdc++ for p4-fusion
|
||||
libstdc++ \
|
||||
@ -51,11 +46,16 @@ RUN apk add --no-cache \
|
||||
python3 \
|
||||
bash
|
||||
|
||||
COPY --from=p4cli /usr/local/bin/p4 /usr/local/bin/p4
|
||||
COPY --from=build /usr/local/bin/p4 /usr/local/bin/p4
|
||||
COPY --from=build /usr/local/bin/coursier /usr/local/bin/coursier
|
||||
|
||||
COPY --from=p4-fusion /usr/local/bin/p4-fusion /usr/local/bin/p4-fusion
|
||||
|
||||
COPY --from=coursier /usr/local/bin/coursier /usr/local/bin/coursier
|
||||
# copy into place the p4-fusion binary and the wrapper shell script
|
||||
# that facilitates better handling of killing of the p4-fusion
|
||||
# (for example, either because it exceeded gitLongCommandTimeout, or the Docker host's OOM Reaper killed it)
|
||||
# actually, I'm not sure about gitLongCommandTimeout, because that may directly terminate the wrapper script.
|
||||
COPY --from=build /usr/local/bin/p4-fusion /usr/local/bin/p4-fusion-binary
|
||||
COPY p4-fusion-wrapper-detect-kill.sh /usr/local/bin/p4-fusion
|
||||
COPY process-stats-watcher.sh /usr/local/bin/process-stats-watcher.sh
|
||||
|
||||
# This is a trick to include libraries required by p4,
|
||||
# please refer to https://blog.tilander.org/docker-perforce/
|
||||
|
||||
@ -1,8 +1,22 @@
|
||||
#!/usr/bin/env bash
|
||||
|
||||
# We want to build multiple go binaries, so we use a custom build step on CI.
|
||||
cd "$(dirname "${BASH_SOURCE[0]}")/../../.."
|
||||
set -ex
|
||||
# the build process for the OSS gitserver is identical to the build process for the Enterprise gitserver
|
||||
# pull some shenanigans up front so that we don't have to sprinkle "enterprise" all throughout the enterprise version
|
||||
|
||||
exedir=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
|
||||
|
||||
path="cmd/gitserver"
|
||||
|
||||
if [[ ${exedir} = */enterprise/cmd/gitserver ]]; then
|
||||
# We want to build multiple go binaries, so we use a custom build step on CI.
|
||||
cd "${exedir}"/../../.. || exit 1
|
||||
path="enterprise/${path}"
|
||||
else
|
||||
# We want to build multiple go binaries, so we use a custom build step on CI.
|
||||
cd "${exedir}"/../.. || exit 1
|
||||
fi
|
||||
|
||||
### OSS and Enterprise builds should be identical after this point
|
||||
|
||||
OUTPUT=$(mktemp -d -t sgdockerbuild_XXXXXXX)
|
||||
|
||||
@ -12,14 +26,16 @@ cleanup() {
|
||||
|
||||
trap cleanup EXIT
|
||||
|
||||
cp -a ./enterprise/cmd/gitserver/p4-fusion-install-alpine.sh "$OUTPUT"
|
||||
for f in p4-fusion-install-alpine.sh p4-fusion-wrapper-detect-kill.sh process-stats-watcher.sh; do
|
||||
cp -a "./${path}/${f}" "${OUTPUT}"
|
||||
done
|
||||
|
||||
if [[ "${DOCKER_BAZEL:-false}" == "true" ]]; then
|
||||
./dev/ci/bazel.sh build //enterprise/cmd/gitserver
|
||||
out=$(./dev/ci/bazel.sh cquery //enterprise/cmd/gitserver --output=files)
|
||||
./dev/ci/bazel.sh build //${path}
|
||||
out=$(./dev/ci/bazel.sh cquery //${path} --output=files)
|
||||
cp "$out" "$OUTPUT"
|
||||
|
||||
docker build -f enterprise/cmd/gitserver/Dockerfile -t "$IMAGE" "$OUTPUT" \
|
||||
docker build -f ${path}/Dockerfile -t "$IMAGE" "$OUTPUT" \
|
||||
--progress=plain \
|
||||
--build-arg COMMIT_SHA \
|
||||
--build-arg DATE \
|
||||
@ -33,10 +49,10 @@ export GOARCH=amd64
|
||||
export GOOS=linux
|
||||
export CGO_ENABLED=0
|
||||
|
||||
pkg="github.com/sourcegraph/sourcegraph/enterprise/cmd/gitserver"
|
||||
pkg="github.com/sourcegraph/sourcegraph/${path}"
|
||||
go build -trimpath -ldflags "-X github.com/sourcegraph/sourcegraph/internal/version.version=$VERSION -X github.com/sourcegraph/sourcegraph/internal/version.timestamp=$(date +%s)" -buildmode exe -tags dist -o "$OUTPUT/$(basename $pkg)" "$pkg"
|
||||
|
||||
docker build -f enterprise/cmd/gitserver/Dockerfile -t "$IMAGE" "$OUTPUT" \
|
||||
docker build -f ${path}/Dockerfile -t "$IMAGE" "$OUTPUT" \
|
||||
--progress=plain \
|
||||
--build-arg COMMIT_SHA \
|
||||
--build-arg DATE \
|
||||
|
||||
75
enterprise/cmd/gitserver/p4-fusion-wrapper-detect-kill.sh
Executable file
75
enterprise/cmd/gitserver/p4-fusion-wrapper-detect-kill.sh
Executable file
@ -0,0 +1,75 @@
|
||||
#!/usr/bin/env bash
|
||||
# shellcheck disable=SC2064,SC2207
|
||||
|
||||
# create a file to hold the output of p4-fusion
|
||||
# TODO: consider recording/storing/capturing the file for logs display in the UI if there's a problem
|
||||
fusionout=$(mktemp || mktemp -t fusionout_XXXXXXXX)
|
||||
|
||||
# create a pipe to use for capturing output of p4-fusion
|
||||
# so that it can be sent to stdout and also to a file for analyzing later
|
||||
fusionpipe=$(mktemp || mktemp -t fusionpipe_XXXXXXXX)
|
||||
rm -f "${fusionpipe}"
|
||||
mknod "${fusionpipe}" p
|
||||
tee <"${fusionpipe}" "${fusionout}" &
|
||||
|
||||
# create a file to hold the output of `wait`
|
||||
waitout=$(mktemp || mktemp -t waitout_XXXXXXXX)
|
||||
|
||||
# create a file to hold the resource usage of the child process
|
||||
stats=$(mktemp || mktemp -t resource_XXXXXXXX)
|
||||
|
||||
# make sure to clean up on exit
|
||||
trap "rm -f \"${fusionout}\" \"${fusionpipe}\" \"${waitout}\" \"${stats}\"" EXIT
|
||||
|
||||
# launch p4-fusion in the background, sending all output to the pipe for capture and re-echoing
|
||||
# depends on the p4-fusion binary executable being copied to p4-fusion-binary in the gitserver Dockerfile
|
||||
p4-fusion-binary "${@}" >"${fusionpipe}" 2>&1 &
|
||||
|
||||
# capture the pid of the child process
|
||||
fpid=$!
|
||||
|
||||
# start up a "sidecar" process to capture resource usage.
|
||||
# it will terminate when the p4-fusion process terminates.
|
||||
process-stats-watcher.sh "${fpid}" "p4-fusion-binary" >"${stats}" &
|
||||
spid=$!
|
||||
|
||||
# Wait for the child process to finish
|
||||
wait ${fpid} >"${waitout}" 2>&1
|
||||
|
||||
# capture the result of the wait, which is the result of the child process
|
||||
# or the result of external action on the child process, like SIGKILL
|
||||
waitcode=$?
|
||||
|
||||
# the sidecar process should have exited by now, but just in case, wait for it
|
||||
wait "${spid}" >/dev/null 2>&1
|
||||
|
||||
[ ${waitcode} -eq 0 ] || {
|
||||
# if the wait exit code indicates a problem,
|
||||
# check to see if the child process was killed
|
||||
killed=""
|
||||
# if the process was killed with SIGKILL, the `wait` process will have generated a notification
|
||||
grep -qs "Killed" "${waitout}" && killed=y
|
||||
[ -z "${killed}" ] && {
|
||||
# If the wait process did not generate an error message, check the process output.
|
||||
# The process traps SIGINT and SIGTERM; uncaught signals will be displayed as "uncaught"
|
||||
tail -5 "${fusionout}" | grep -Eqs "Signal Received:|uncaught target signal" && killed=y
|
||||
}
|
||||
[ -z "${killed}" ] || {
|
||||
# include the signal if it's SIGINT, SIGTERM, or SIGKILL
|
||||
# not gauranteed to work, but nice if we can include the info
|
||||
signal="$(kill -l ${waitcode})"
|
||||
[ -z "${signal}" ] || signal=" (SIG${signal})"
|
||||
# get info if available from the sidecar process
|
||||
rusage=""
|
||||
[ -s "${stats}" ] && {
|
||||
# expect the last (maybe only) line to be four fields:
|
||||
# RSS VSZ ETIME TIME
|
||||
x=($(tail -1 "${stats}"))
|
||||
# NOTE: bash indexes from 0; zsh indexes from 1
|
||||
[ ${#x[@]} -eq 4 ] && rusage=" At the time of its demise, it had been running for ${x[2]}, had used ${x[3]} CPU time, reserved ${x[1]} RAM and was using ${x[0]}."
|
||||
}
|
||||
echo "p4-fusion was killed by an external signal${signal}.${rusage}"
|
||||
}
|
||||
}
|
||||
|
||||
exit ${waitcode}
|
||||
45
enterprise/cmd/gitserver/process-stats-watcher.sh
Executable file
45
enterprise/cmd/gitserver/process-stats-watcher.sh
Executable file
@ -0,0 +1,45 @@
|
||||
#!/usr/bin/env bash
|
||||
# shellcheck disable=SC2064,SC2207,SC2009
|
||||
|
||||
humanize() {
|
||||
local num=${1}
|
||||
[[ ${num} =~ ^[0-9][0-9]*$ ]] && num=$(bc <<<"scale=2;${num}/1024/1024")m
|
||||
printf -- '%s' "${num}"
|
||||
return 0
|
||||
}
|
||||
|
||||
# read resource usage statistics for a process
|
||||
# several times a second until it terminates
|
||||
# at which point, output the most recent stats on stdout
|
||||
# the output format is "RSS VSZ ETIME TIME"
|
||||
|
||||
# input is the pid of the process
|
||||
pid="${1}"
|
||||
# and its name, which is used to avoid tracking
|
||||
# another process in case the original process completed,
|
||||
# and another started up and got assigned the same pid
|
||||
cmd="${2}"
|
||||
|
||||
unset rss vsz etime time
|
||||
|
||||
while true; do
|
||||
# Alpine has a rather limited `ps`
|
||||
# it does not limit output to just one process, even when specifying a pid
|
||||
# so we need to filter the output by pid
|
||||
x="$(ps -o pid,stat,rss,vsz,etime,time,comm,args "${pid}" | grep "^ *${pid} " | grep "${cmd}" | tail -1)"
|
||||
[ -z "${x}" ] && break
|
||||
IFS=" " read -r -a a <<<"$x"
|
||||
# drop out of here if the process has died or become a zombie - no coming back from the dead
|
||||
[[ "${a[1]}" =~ ^[ZXx] ]] && break
|
||||
# only collect stats for processes that are active (running, sleeping, disk sleep, which is waiting for I/O to complete)
|
||||
# but don't stop until it is really is dead
|
||||
[[ "${a[1]}" =~ ^[RSD] ]] && {
|
||||
rss=${a[2]}
|
||||
vsz=${a[3]}
|
||||
etime=${a[4]}
|
||||
time=${a[5]}
|
||||
}
|
||||
sleep 0.2
|
||||
done
|
||||
|
||||
printf '%s %s %s %s' "$(humanize "${rss}")" "$(humanize "${vsz}")" "${etime}" "${time}"
|
||||
Loading…
Reference in New Issue
Block a user