Screen

I usually use screen to manage parallel tasks so I can keep track of the cmd I used for each task. Of course you should also keep them either local or in a notebook somewhere in case your machine is restarted and you will lose all your screens all at once. Sometimes I have more than 10 screens and I lose track of them. Usually I will do screen -r to list out all the screens so I know the exact name of the screen that I’d like to attach to. Recently, I’ve run into the situation that screen -r would just hang. For the screens whose names I can remember, there was no problem. I can attach them by doing screen -r abc. So what is going on?

After diagnosing with AI, it turns out that one of my screens was in the T status, or was stopped. I do remember stopping jobs within that screen, but I don’t remember and don’t know how to stop a screen. But anyways, since screen -ls was also hanging, there are two helpful cmds that you can use to see what is going on.

The first one is ls -la /run/screen/S-$USER. This one will allow you to see the full names of all the screens that you have started. This way, if the screen you want to go back to is OK (as in not in T status etc.), you will be able to see their full names and attach back by doing screen -r abc.

Of course we also want to identify the root cause. This is a snippet of code that AI asked me to run:

for s in /run/screen/S-$USER/*; do
  p=${s##*/}; p=${p%%.*}
  st=$(cut -d' ' -f3 /proc/$p/stat 2>/dev/null)
  wc=$(cat /proc/$p/wchan 2>/dev/null)
  printf '%-8s %-22s %-4s %s\n' "$p" "${s##*/}" "${st:-GONE}" "$wc"
done

This would retrieve the PID of the screens and return their status. In my case, all my screens were in the mode of S/do_select (sleeping on the socket waiting for a client) except one being on T/do_signal_stop. This means the screen daemon was hit with a job-control stop signal (SIGSTOP/SIGTSTP) and is suspended. What to do? You just need to resume this job by kill -CONT $PID. Note that the PID is the number attached to the front of the screen name when you created it.

After resuming this job, you will be able to do screen -r and screen -ls without hang.

So screen has created a lot of problem for me so far. Something it gets stuck, and you can use ctrl + A + Q to exit.

I haven’t decided whether to move to tmux.

Huan Fan /
Published under (CC) BY-NC-SA in categories notes  tagged with unix 
comments powered by Disqus