We deployed our Docker container on Triton and everything works well. Suddenly, today there a issue. Our application is slow or stopped working. And in our container there is not debugging tool. So, what should we do?
Rebooting is no solution
Explore Triton’s Container a bit
I started again a nginx container on Triton:
When the container has started, start a shell inside the container with ‘docker exec’. Then explore the file system a bit.
/native?
What the hell is /native? Look a around a bit more. Maybe try to find popular unix tools in there? Aha! From `awk` to `zcat`, everything is there. So /native is another Unix? Right! /native is a window into Triton’s native operating system. And there are popular unix tools there.
Explore!
At /native/ old unix tools live.
Let’s use /native
Now we know that /native exists so let’s use it. First let’s deploy a bad container. I’ve prepared a ‘bad container’ example. The code is here. Let’s start that ‘gamlerhart/waste-io’ container:
This container has a issue. Let’s start finding out what the problem is. Again, we `docker exec` into the container. If we try to use `iostat`, nothing is there. However, we can use /native. Add /native to the PATH and we can use `iostat`.
.
Aha! Our container burns quite a bit of IO. However, we want to know more. Maybe some kind of tracing. Let’s dtrace. Yep, there is dtrace in /native. I’m not a dtrace pro, there is tons of materials online. (dtrace.org, guide, examples). We’ll use the ‘lx-syscall’ probe (LinuX-syscall).`-ln lx-syscall:::` lists lx-syscall probes. -n `syscall::: { @num[execname,pid,probefunc] = count() }` starts the probe, and groups by program, process-id and probe name. Stop the probe with `Ctrl+C`:
We see our node process is doing tons of write, read, open unlink and futex system calls, so our node process is the issue. But what is this program doing? What files is it opening? Let’s do more dtrace probes. `x-syscall::open:entry` only shows the open syscall, `execname==”node”` filters out node only. And we group by file with “@num[copyinstr(arg0)] = count()`.
Ah, tons of temporary files. Hmm…well where in the node program do we create these temp files? With Dtrace’s ustack/jstack we can the the programs stack trace:
Ah…no luck. The open syscall is in the event loop (Looking into node via dtrace is another topic). However, we know know that the node program creates tons of tmp/io-file files and with this information we hopefully can fix that bug.
Sherlock ‘dtrace’ Holmes
You know DTrace well?
You know dtrace well and cannot find many probes? Because you can only dtrace within the Triton container you have less probes available. And we cannot go into the global zone of Joyent’s public cloud service. On a private Triton deployment we could. That’s a topic for another time and another blog post.