One of the most common causes for environment builds that get stuck is a runaway/stuck process.
To ensure data consistency, the deployment flow tries to terminate running processes gracefully. Sometimes, this does not work and the deployment ends up waiting forever due to a blocking process. Using two simple commands when connected to SSH can help you get things moving without having to wait for our support team to intervene.
As an example, let’s assume a cron process is stuck.
The demo application contains one blocker.sh
script:
#!/bin/sh
sleep 3600
which is configured as an every 5 minute cron in .platform.app.yaml
:
blocker:
spec: '*/5 * * * *'
cmd: '/bin/bash /app/block.sh'
Now, this process is blocking our new deployment - the log is stuck at:
Redeploying environment main
Preparing deployment
Closing services router and app
First thing to do is see if you can connect with SSH to the environment (this should work most of the time). If the SSH connection is successful, run: ps fuxa
.
The output will be a list of processes, similar to this one:
web@app.0:~$ ps fuxa
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 153 0.9 0.0 197656 33176 ? Sl 15:05 0:00 /usr/bin/python2.7 /etc/platform/commands/notify
web 159 0.0 0.0 271388 26620 ? Sl 15:05 0:00 \_ /usr/bin/python2.7 /etc/platform/commands/notify
web 162 0.0 0.0 4280 740 ? S 15:05 0:00 \_ /bin/dash -c /bin/bash /app/blocker.sh
web 165 0.0 0.0 12920 2740 ? S 15:05 0:00 \_ /bin/bash /app/blocker.sh
web 166 0.0 0.0 7580 668 ? S 15:05 0:00 \_ sleep 3600
root 1 0.0 0.0 15816 1096 ? Ss+ 15:02 0:00 init [2]
root 74 0.0 0.0 4204 1132 ? Ss 15:02 0:00 runsvdir -P /etc/service log: ...................................................................................................................
root 80 0.0 0.0 4052 704 ? Ss 15:02 0:00 \_ runsv tideways
root 81 0.0 0.0 4052 696 ? Ss 15:02 0:00 \_ runsv ssh
root 90 0.0 0.0 72104 5608 ? S 15:02 0:00 | \_ /usr/sbin/sshd -D
root 167 0.0 0.0 94876 6472 ? Ss 15:06 0:00 | \_ sshd: web [priv]
web 173 0.0 0.0 94876 3668 ? S 15:06 0:00 | \_ sshd: web@pts/0
web 174 0.0 0.0 21768 3852 pts/0 Ss 15:06 0:00 | \_ -bash
web 190 0.0 0.0 37448 3172 pts/0 R+ 15:06 0:00 | \_ ps fuxa
root 82 0.0 0.0 4052 700 ? Ss 15:02 0:00 \_ runsv nginx
root 115 0.0 0.0 36984 6684 ? S 15:02 0:00 | \_ nginx: master process /usr/sbin/nginx -g daemon off; error_log /var/log/error.log; -c /etc/nginx/nginx.conf
web 121 0.0 0.0 45460 11560 ? S 15:02 0:00 | \_ nginx: worker process
root 83 0.0 0.0 4052 752 ? Ss 15:02 0:00 \_ runsv newrelic
root 84 0.0 0.0 4052 644 ? Ss 15:02 0:00 \_ runsv idmapd
root 87 0.0 0.0 23348 2152 ? S 15:02 0:00 | \_ /usr/sbin/rpc.idmapd -f -C -p /run/rpc_pipefs
root 85 0.0 0.0 4052 700 ? Ss 15:02 0:00 \_ runsv app
web 111 0.0 0.0 359464 30288 ? Ss 15:02 0:00 \_ php-fpm: master process (/etc/php/7.2-zts/fpm/php-fpm.conf)
web 116 0.0 0.0 12932 296 ? S 15:02 0:00 \_ /bin/bash /etc/platform/start-app
web 117 0.0 0.0 7584 656 ? S 15:02 0:00 \_ tee -a /var/log/app.log
You can probably see already our stuck process is here:
web 162 0.0 0.0 4280 740 ? S 15:05 0:00 \_ /bin/dash -c /bin/bash /app/blocker.sh
web 165 0.0 0.0 12920 2740 ? S 15:05 0:00 \_ /bin/bash /app/blocker.sh
web 166 0.0 0.0 7580 668 ? S 15:05 0:00 \_ sleep 3600
If you have trouble locating the stuck process, it’s generally found below:
/usr/bin/python2.7 /etc/platform/commands/notify
The notify process is a special process that monitors the container state.
The important thing in the above output is the number listed in the second column, after the web
user: this is the process ID, a unique identifier for each running process. As we’re interested in stopping the process, it’d be very useful to somehow forcefully stop it.
This can be done with the kill -9
command, followed by the list of process IDs you want to stop.
Therefore, to stop the cron in our example, we’d need to run:
kill -9 162 165 166
Be careful: there might be more processes that block the deployment ! Inspect the process list carefully (all application processes will be under the web
user) and repeat the previous command for all of them.
Once done, the SSH connection will be terminated and you’ll see the friendly:
Message from bot@platform.sh at 15:09:36:
This container is being dematerialized. See you on the other side.
message. Your previously stuck deployment will now continue.
Note: if the SSH connection cannot be established, you will need to open a support ticket.