Saturday, August 3, 2013

More on Linux Threads

Got Linux thread names working in LLDB. "thread list" will now display the proper thread name and will be updated after calling pthread_setname_np(), etc. Still need thread-events, but that's a bit lower priority right now.

Couple of interesting notes & questions.

1. I initially implemented this by reading the "/proc/[pid]/task/[tid]/comm" file. Matt Kopec pointed out this could be read from "/proc/[pid]/comm" as well, even though "/proc/[tid]" isn't visible using ls in the terminal. This directory existing makes sense as threads are just light-weight processes, I just had never thought or read about it anywhere before. (Although to be fair, Pierre-Loup said he mentioned it to me at some point.)

2. For the curious, "/proc/self" has process granularity. Ie, I read "/proc/self/comm" from a background thread and it was the name of the process.

3. The "man proc" page for "/proc/[pid]/task" has this warning:
In a multithreaded process, the contents of the /proc/[pid]/task directory are not available if the main thread has already terminated (typically by calling pthread_exit(3)).

If anyone knows a system where this is true, I'd love to hear about it.

4. Gdb uses this libthread_db library to get notifications about new threads, and it looks like this is quite the doozy to set up and get running. Some great ( and only other than source? :) info on that here:

LLDB doesn't use libthread_db though - it uses signals. Source code can be found in ProcessMonitor.cpp if you search for the "case (SIGTRAP | (PTRACE_EVENT_CLONE << 8))" statement in ProcessMonitor::MonitorSIGTRAP().

My question would be: why on earth go through all the trouble to use libthread_db if signals will work just as well?

There is an intriguing note in the libthread_db post where he mentions accessing thread local data:

Now you can use the library

At this point, you’ve done enough setup to be able to dlsym search for and call various functions to iterate over the threads in a remote process, to be notified asynchronously when threads are created or destroyed, and to access thread local data if you want to.
Now that could be incredibly useful... but from what I can tell, gdb doesn't use this feature. Getting to tls data in gdb (unless I've missed something) is a bit of a pain in the backside.

I'm going to put these on the backburner for now and start trying to track down some stack tracing bugs. Which means diving in and trying to understand CIE and FDEs:

Good times!


  1. 3.
    As far as I know, this is possible on Linux. If you create some background / worker threads and forget to join before returning from the main method, they run without a main thread. I don't know how long this is possible. I've learned this fact some time ago by making this stupid mistake and got some segfaults from the background the threads.

    1. If you step out of the main() function when you return, you should wind up in _exit() where it calls the exit_group syscall. This is supposed to terminate all threads in the process:

      Calling pthread_exit() on the main thread doesn't though. It sets the state of the main thread to zombie (check /proc/[pid]/status) and the other threads keep running. From my tests with 64-bit Ubuntu 12.04, the /proc/[pid]/task directories are still all there.

      Guess I should poke through the Kernel source history and see if this has been changed at some point. Other possibility is that warning is for a different OS...

  2. Interestingly enough, exiting the main thread sure seems to confuse gdb. With gdb 7.6, I get the following after calling pthread_exit() on the main thread:

    (gdb) info threads
    [New Thread 0x7ffff7fd2740 (LWP 26340)]
    Id Target Id Frame
    4 Thread 0x7ffff7fd2740 (LWP 26340) "mainthrd" (Exiting) Couldn't get registers: No such process.
    (gdb) c
    Couldn't get registers: No such process.

    procfs tells me my other threads are still there and running just fine.

  3. From the gdb faq:

    "GDB itself does not know how to decode "thread control blocks" maintained by glibc and considered to be glibc private implementation detail. It uses (part of glibc) to help it do so."