Dev-log
Debug the NebulaGraph Processes Inside the Docker Container
Requirements
In the development or testing process, we often use the vesoft-inc/nebula-docker-compose repository to deploy NebulaGraph. But to compress the docker image of each NebulaGraph service as much as possible, all the tools commonly used for development are not installed in the Docker images, not even the editor Vim.
This makes it difficult to locate problems inside the containers because it was cumbersome to install essential toolkit every time to get things done. In fact, there is another way to debug a process inside a container, without breaking the content structure of the container or installing any toolkit in it.
Actually, this technique, the Sidecar mode, is commonly used in the K8S environment. The principle is quite simple: Start a container that shares the same namespaces of the PID/network with the container we are debugging. In this way, we can view the processes and network namespaces in the original container through the container we use to debug, which has everything we need installed in it.
Demonstration
Here's how to do it.
Start by deploying a NebulaGraph cluster locally with Docker-Compose, as we mentioned above. For a detailed tutorial, see the README file in the repository. After the deployment, we start all the services and check their status as follows.
$ docker-compose up -d
Creating network "nebula-docker-compose_nebula-net" with the default driver
Creating nebula-docker-compose_metad1_1 ... done
Creating nebula-docker-compose_metad2_1 ... done
Creating nebula-docker-compose_metad0_1 ... done
Creating nebula-docker-compose_storaged2_1 ... done
Creating nebula-docker-compose_storaged1_1 ... done
Creating nebula-docker-compose_storaged0_1 ... done
Creating nebula-docker-compose_graphd_1 ... done
$ docker-compose ps
Name Command State Ports
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
nebula-docker-compose_graphd_1 ./bin/nebula-graphd --flag ... Up (health: starting) 0.0.0.0:32907->13000/tcp,0.0.0.0:32906->13002/tcp, 0.0.0.0:3699->3699/tcp
nebula-docker-compose_metad0_1 ./bin/nebula-metad --flagf ... Up (health: starting) 0.0.0.0:32898->11000/tcp,0.0.0.0:32896->11002/tcp, 45500/tcp,45501/tcp
nebula-docker-compose_metad1_1 ./bin/nebula-metad --flagf ... Up (health: starting) 0.0.0.0:32895->11000/tcp,0.0.0.0:32894->11002/tcp, 45500/tcp,45501/tcp
nebula-docker-compose_metad2_1 ./bin/nebula-metad --flagf ... Up (health: starting) 0.0.0.0:32899->11000/tcp,0.0.0.0:32897->11002/tcp, 45500/tcp,45501/tcp
nebula-docker-compose_storaged0_1 ./bin/nebula-storaged --fl ... Up (health: starting) 0.0.0.0:32901->12000/tcp,0.0.0.0:32900->12002/tcp, 44500/tcp,44501/tcp
nebula-docker-compose_storaged1_1 ./bin/nebula-storaged --fl ... Up (health: starting) 0.0.0.0:32903->12000/tcp,0.0.0.0:32902->12002/tcp, 44500/tcp,44501/tcp
nebula-docker-compose_storaged2_1 ./bin/nebula-storaged --fl ... Up (health: starting) 0.0.0.0:32905->12000/tcp,0.0.0.0:32904->12002/tcp, 44500/tcp,44501/tcp
Next, we demonstrate scenario by scenario, from the process namespaces to the network namespaces. First of all, we need to have a handy image for debugging. There is no need to build one by ourselves since this is only for demonstration. Let's find a well-packed image from Docker Hub. If later we find this image not good enough, we can maintain a nebula-debug image and install all the debugging tools we want then. Here, we use a community solution from nicolaka/netshoot. Let's pull the image to the local host.
$ docker pull nicolaka/netshoot
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
vesoft/nebula-graphd nightly c67fe54665b7 36hours ago 282MB
vesoft/nebula-storaged nightly 5c77dbcdc507 36hours ago 288MB
vesoft/nebula-console nightly f3256c99eda1 36hours ago 249MB
vesoft/nebula-metad nightly 5a78d3e3008f 36hours ago 288MB
nicolaka/netshoot latest 6d7e8891c980 2months ago 352MB
Let's see what is going to happen if we run the image.
$ docker run --rm -ti nicolaka/netshoot bash
bash-5.0# ps
PID USER TIME COMMAND
1 root 0:00 bash
8 root 0:00 ps
bash-5.0#
As shown above, this container does not have any NebulaGraph process. Let's add a few parameters to it and see what happens.
$ docker run --rm -ti --pid container:nebula-docker-compose_metad0_1 --cap-add sys_admin nicolaka/netshoot bash
bash-5.0# ps
PID USER TIME COMMAND
1 root 0:03 ./bin/nebula-metad --flagfile=./etc/nebula-metad.conf --daemonize=false --meta_server_addrs=172.28.1.1:45500,172.28.1.2:45500,172.28.1.3:45500--local_ip=172.28.1.1 --ws_ip=172.28.1.1 --port=45500 --data_path=/data/meta--log_dir=/logs--v=15 --minloglevel=0
452 root 0:00 bash
459 root 0:00 ps
bash-5.0# ls -al /proc/1/net/
total 0
dr-xr-xr-x 6 root root 0 Sep18 07:17 .
dr-xr-xr-x 9 root root 0 Sep18 06:55 ..
-r--r--r-- 1 root root 0 Sep18 07:18 anycast6
-r--r--r-- 1 root root 0 Sep18 07:18 arp
dr-xr-xr-x 2 root root 0 Sep18 07:18 bonding
-r--r--r-- 1 root root 0 Sep18 07:18 dev
...
-r--r--r-- 1 root root 0 Sep18 07:18 sockstat
-r--r--r-- 1 root root 0 Sep18 07:18 sockstat6
-r--r--r-- 1 root root 0 Sep18 07:18 softnet_stat
dr-xr-xr-x 2 root root 0 Sep18 07:18 stat
-r--r--r-- 1 root root 0 Sep18 07:18 tcp
-r--r--r-- 1 root root 0 Sep18 07:18 tcp6
-r--r--r-- 1 root root 0 Sep18 07:18 udp
-r--r--r-- 1 root root 0 Sep18 07:18 udp6
-r--r--r-- 1 root root 0 Sep18 07:18 udplite
-r--r--r-- 1 root root 0 Sep18 07:18 udplite6
-r--r--r-- 1 root root 0 Sep18 07:18 unix
-r--r--r-- 1 root root 0 Sep18 07:18 xfrm_stat
This time it's a bit different. We can see the metad0 process, which has the PID 1. Now we can easily do our work since we can see the process, for example, attaching it in the GDB. I don't have an image with the NebulaGraph binary file handy, so I'll leave the exploration to you.
We see that the PID namespace is shared already by setting --pid container:<container_name|id>
. Next, considering we might need to capture a package sometimes, let's check if we can also see the network status. Run the following command.
bash-5.0# netstat -tulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
There's nothing. It's a little bit different from what we expected, because we already have the metad0 process, and there should not be nothing here, not even one connection. To see the network namespaces within the original container, we need to set a few more options and run the following command to restart the debug container.
$ docker run --rm -ti --pid container:nebula-docker-compose_metad0_1 --network container:nebula-docker-compose_metad0_1 --cap-add sys_admin nicolaka/netshoot bash
bash-5.0# netstat -tulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 172.28.1.1:11000 0.0.0.0:* LISTEN -
tcp 0 0 172.28.1.1:11002 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:45500 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:45501 0.0.0.0:* LISTEN -
tcp 0 0 127.0.0.11:33249 0.0.0.0:* LISTEN -
udp 0 0 127.0.0.11:51929 0.0.0.0:* -
This time the output is not the same with the preceding one. With the --network container:nebula-docker-compose_metad0_1
option, we can check the connections in the metad0 container, capture packages, and debug.
Summary
The key to being able to debug a container environment without installing extra tools in it is to run another container and have it share the PID/network namespaces with the original container. Some people in the community have even developed tools based on this method and made them easier to use. For more information, see Docker-debug.