Debug the Nebula Graph Processes Inside the Docker Container

Yee
2020-10-27

Debug the Nebula Graph processes inside the Docker container

Requirements

In the development or testing process, we often use the vesoft-inc/nebula-docker-compose repository to deploy Nebula Graph. But to compress the docker image of each Nebula Graph service as much as possible, all the tools commonly used for development are not installed in the Docker images, not even the editor Vim.

This makes it difficult to locate problems inside the containers because it was cumbersome to install essential toolkit every time to get things done. In fact, there is another way to debug a process inside a container, without breaking the content structure of the container or installing any toolkit in it.

Actually, this technique, the Sidecar mode, is commonly used in the K8S environment. The principle is quite simple: Start a container that shares the same namespaces of the PID/network with the container we are debugging. In this way, we can view the processes and network namespaces in the original container through the container we use to debug, which has everything we need installed in it.

Demonstration

Here’s how to do it.

Start by deploying a Nebula Graph cluster locally with Docker-Compose, as we mentioned above. For a detailed tutorial, see the README file in the repository. After the deployment, we start all the services and check their status as follows.

$ docker-compose up -d
Creating network "nebula-docker-compose_nebula-net" with the default driver
Creating nebula-docker-compose_metad1_1 ... done
Creating nebula-docker-compose_metad2_1 ... done
Creating nebula-docker-compose_metad0_1 ... done
Creating nebula-docker-compose_storaged2_1 ... done
Creating nebula-docker-compose_storaged1_1 ... done
Creating nebula-docker-compose_storaged0_1 ... done
Creating nebula-docker-compose_graphd_1    ... done
$ docker-compose ps
              Name                             Command                       State                                            Ports
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
nebula-docker-compose_graphd_1      ./bin/nebula-graphd --flag ...   Up (health: starting)   0.0.0.0:32907->13000/tcp,0.0.0.0:32906->13002/tcp, 0.0.0.0:3699->3699/tcp
nebula-docker-compose_metad0_1      ./bin/nebula-metad --flagf ...   Up (health: starting)   0.0.0.0:32898->11000/tcp,0.0.0.0:32896->11002/tcp, 45500/tcp,45501/tcp
nebula-docker-compose_metad1_1      ./bin/nebula-metad --flagf ...   Up (health: starting)   0.0.0.0:32895->11000/tcp,0.0.0.0:32894->11002/tcp, 45500/tcp,45501/tcp
nebula-docker-compose_metad2_1      ./bin/nebula-metad --flagf ...   Up (health: starting)   0.0.0.0:32899->11000/tcp,0.0.0.0:32897->11002/tcp, 45500/tcp,45501/tcp
nebula-docker-compose_storaged0_1   ./bin/nebula-storaged --fl ...   Up (health: starting)   0.0.0.0:32901->12000/tcp,0.0.0.0:32900->12002/tcp, 44500/tcp,44501/tcp
nebula-docker-compose_storaged1_1   ./bin/nebula-storaged --fl ...   Up (health: starting)   0.0.0.0:32903->12000/tcp,0.0.0.0:32902->12002/tcp, 44500/tcp,44501/tcp
nebula-docker-compose_storaged2_1   ./bin/nebula-storaged --fl ...   Up (health: starting)   0.0.0.0:32905->12000/tcp,0.0.0.0:32904->12002/tcp, 44500/tcp,44501/tcp

Next, we demonstrate scenario by scenario, from the process namespaces to the network namespaces. First of all, we need to have a handy image for debugging. There is no need to build one by ourselves since this is only for demonstration. Let’s find a well-packed image from Docker Hub. If later we find this image not good enough, we can maintain a nebula-debug image and install all the debugging tools we want then. Here, we use a community solution from nicolaka/netshoot. Let’s pull the image to the local host.

$ docker pull nicolaka/netshoot
$ docker images
REPOSITORY               TAG                 IMAGE ID            CREATED             SIZE
vesoft/nebula-graphd     nightly             c67fe54665b7        36hours ago        282MB
vesoft/nebula-storaged   nightly            5c77dbcdc507        36hours ago        288MB
vesoft/nebula-console    nightly             f3256c99eda1        36hours ago        249MB
vesoft/nebula-metad      nightly             5a78d3e3008f        36hours ago        288MB
nicolaka/netshoot        latest              6d7e8891c980        2months ago        352MB
Let's see what is going to happen if we run the image.
$ docker run --rm -ti nicolaka/netshoot bash
bash-5.0# ps
PID   USER    TIME  COMMAND
    1 root      0:00 bash
    8 root      0:00 ps
bash-5.0#

As shown above, this container does not have any Nebula Graph process. Let’s add a few parameters to it and see what happens.

$ docker run --rm -ti --pid container:nebula-docker-compose_metad0_1 --cap-add sys_admin nicolaka/netshoot bash
bash-5.0# ps
PID   USER    TIME  COMMAND
    1 root      0:03 ./bin/nebula-metad --flagfile=./etc/nebula-metad.conf --daemonize=false --meta_server_addrs=172.28.1.1:45500,172.28.1.2:45500,172.28.1.3:45500--local_ip=172.28.1.1 --ws_ip=172.28.1.1 --port=45500 --data_path=/data/meta--log_dir=/logs--v=15 --minloglevel=0
  452 root      0:00 bash
  459 root      0:00 ps
bash-5.0# ls -al /proc/1/net/
total 0
dr-xr-xr-x    6 root     root             0 Sep18 07:17 .
dr-xr-xr-x    9 root     root             0 Sep18 06:55 ..
-r--r--r--    1 root     root             0 Sep18 07:18 anycast6
-r--r--r--    1 root     root             0 Sep18 07:18 arp
dr-xr-xr-x    2 root     root             0 Sep18 07:18 bonding
-r--r--r--    1 root     root             0 Sep18 07:18 dev
...
-r--r--r--    1 root     root             0 Sep18 07:18 sockstat
-r--r--r--    1 root     root             0 Sep18 07:18 sockstat6
-r--r--r--    1 root     root             0 Sep18 07:18 softnet_stat
dr-xr-xr-x    2 root     root             0 Sep18 07:18 stat
-r--r--r--    1 root     root             0 Sep18 07:18 tcp
-r--r--r--    1 root     root             0 Sep18 07:18 tcp6
-r--r--r--    1 root     root             0 Sep18 07:18 udp
-r--r--r--    1 root     root             0 Sep18 07:18 udp6
-r--r--r--    1 root     root             0 Sep18 07:18 udplite
-r--r--r--    1 root     root             0 Sep18 07:18 udplite6
-r--r--r--    1 root     root             0 Sep18 07:18 unix
-r--r--r--    1 root     root             0 Sep18 07:18 xfrm_stat

This time it’s a bit different. We can see the metad0 process, which has the PID 1. Now we can easily do our work since we can see the process, for example, attaching it in the GDB. I don’t have an image with the Nebula Graph binary file handy, so I’ll leave the exploration to you.

We see that the PID namespace is shared already by setting --pid container:<container_name|id>. Next, considering we might need to capture a package sometimes, let’s check if we can also see the network status. Run the following command.

bash-5.0# netstat -tulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      PID/Program name

There’s nothing. It’s a little bit different from what we expected, because we already have the metad0 process, and there should not be nothing here, not even one connection. To see the network namespaces within the original container, we need to set a few more options and run the following command to restart the debug container.

$ docker run --rm -ti --pid container:nebula-docker-compose_metad0_1 --network container:nebula-docker-compose_metad0_1 --cap-add sys_admin nicolaka/netshoot bash
bash-5.0# netstat -tulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 172.28.1.1:11000       0.0.0.0:*               LISTEN      -
tcp        0      0 172.28.1.1:11002       0.0.0.0:*               LISTEN      -
tcp        0      0 0.0.0.0:45500           0.0.0.0:*               LISTEN      -
tcp        0      0 0.0.0.0:45501           0.0.0.0:*               LISTEN      -
tcp        0      0 127.0.0.11:33249       0.0.0.0:*               LISTEN      -
udp        0      0 127.0.0.11:51929       0.0.0.0:*                           -

This time the output is not the same with the preceding one. With the --network container:nebula-docker-compose_metad0_1 option, we can check the connections in the metad0 container, capture packages, and debug.

Summary

The key to being able to debug a container environment without installing extra tools in it is to run another container and have it share the PID/network namespaces with the original container. Some people in the community have even developed tools based on this method and made them easier to use. For more information, see Docker-debug.

You might also like

  1. Compiling Trouble Shooting: Segmentation Fault and GCC Illegal Instruction
  2. Dev Log | How to Release jar Package to the Maven Central Repository
  3. How to Reduce Docker Image Size
Like what we do ? Star us on GitHub. https://github.com/vesoft-inc/nebula