Distributed Services With Go: Unable to pass readiness/liveness checks. (page 210, 215)

I’m not quite sure what’s going on here, but I’m unable to have to containers successfully complete the Readiness/Liveness checks. I’m implemented the changes to distributed.go, and agent.go. All tests pass.

But when I run the helm chart, keep getting failed health checks.

I realize there are many things that may be amiss here, just wondering if anyone has experience anything? Manually running k exec -it proglog-0 -- /bin/grpc_health_probe -addr=localhost:8400 naturally results in a timeout error… I’ve run the code from the books repo as is and can’t get it to pass either.

k describe pod prodlog-0

Name:         proglog-0
Namespace:    default
Priority:     0
Node:         kind-control-plane/172.18.0.2
Start Time:   Thu, 23 Sep 2021 03:55:49 +0800
Labels:       app.kubernetes.io/instance=proglog
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=proglog
              app.kubernetes.io/version=1.16.0
              controller-revision-hash=proglog-65749f8b9d
              helm.sh/chart=proglog-0.1.0
              statefulset.kubernetes.io/pod-name=proglog-0
Annotations:  <none>
Status:       Running
IP:           10.244.0.18
IPs:
  IP:           10.244.0.18
Controlled By:  StatefulSet/proglog
Init Containers:
  proglog-config-init:
    Container ID:  containerd://69322bb81a5b8edef4d762d3ee92a10f2b0b03afbe84d068eb8f364d85f9d28e
    Image:         busybox
    Image ID:      docker.io/library/busybox@sha256:52f73a0a43a16cf37cd0720c90887ce972fe60ee06a687ee71fb93a7ca601df7
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
      -c
      ID=$(echo $HOSTNAME | rev | cut -d- -f1 | rev)
      cat > /var/run/proglog/config.yaml <<EOD
      data-dir: /var/run/proglog/data
      rpc-port: 8400
      bind-addr: "$HOSTNAME.proglog.default.svc.cluster.local:8401"
      bootstrap: $([ $ID = 0 ] && echo true || echo false )
      $([ $ID != 0 ] && echo 'start-join-addrs: "proglog-0.proglog.default.svc.cluster.local:8401"')
      EOD
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 23 Sep 2021 03:55:50 +0800
      Finished:     Thu, 23 Sep 2021 03:55:50 +0800
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/proglog from datadir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-hn7xl (ro)
Containers:
  proglog:
    Container ID:  containerd://bcdba20976cfeba4bf399cacd804c09846a2b48686f48cb6d90f7de3bf5088d9
    Image:         github.com/adamwoolhether/proglog:0.0.1
    Image ID:      sha256:e3c4f8d58ba4c3f1c6da1013a19bb0d4684db66de5d8d9c9a2f10334981f775f
    Ports:         8400/TCP, 8401/TCP
    Host Ports:    0/TCP, 0/TCP
    Args:
      --config-file=/var/run/proglog/config.yaml
    State:          Running
      Started:      Thu, 23 Sep 2021 03:57:11 +0800
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 23 Sep 2021 03:56:31 +0800
      Finished:     Thu, 23 Sep 2021 03:57:11 +0800
    Ready:          False
    Restart Count:  2
    Liveness:       exec [/bin/grpc_health_probe -addr=:8400] delay=10s timeout=1s period=10s #success=1 #failure=3
    Readiness:      exec [/bin/grpc_health_probe -addr=:8400] delay=5s timeout=1s period=10s #success=1 #failure=3
    Environment:    <none>
    Mounts:
      /var/run/proglog from datadir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-hn7xl (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  datadir:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  datadir-proglog-0
    ReadOnly:   false
  kube-api-access-hn7xl:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                 From               Message
  ----     ------     ----                ----               -------
  Normal   Scheduled  101s                default-scheduler  Successfully assigned default/proglog-0 to kind-control-plane
  Normal   Pulled     100s                kubelet            Container image "busybox" already present on machine
  Normal   Created    100s                kubelet            Created container proglog-config-init
  Normal   Started    100s                kubelet            Started container proglog-config-init
  Warning  Unhealthy  80s                 kubelet            Liveness probe failed: timeout: failed to connect service ":8400" within 1s
  Normal   Pulled     59s (x2 over 100s)  kubelet            Container image "github.com/adamwoolhether/proglog:0.0.1" already present on machine
  Normal   Created    59s (x2 over 100s)  kubelet            Created container proglog
  Normal   Started    59s (x2 over 100s)  kubelet            Started container proglog
  Warning  Unhealthy  30s (x2 over 60s)   kubelet            Readiness probe failed: timeout: failed to connect service ":8400" within 1s
  Warning  Unhealthy  20s (x6 over 90s)   kubelet            Readiness probe failed:
  Warning  Unhealthy  20s (x5 over 70s)   kubelet            Liveness probe failed:
  Normal   Killing    20s (x2 over 60s)   kubelet            Container proglog failed liveness probe, will be restarted

─❯ k logs proglog-0

2021-09-22T19:55:50.819Z [INFO]  raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:proglog-0 Address:proglog-0.proglog.default.svc.cluster.local:8400}]"
2021-09-22T19:55:50.819Z [INFO]  raft: entering follower state: follower="Node at 10.244.0.18:8400 [Follower]" leader=
2021-09-22T19:55:52.443Z [WARN]  raft: heartbeat timeout reached, starting election: last-leader=
2021-09-22T19:55:52.443Z [INFO]  raft: entering candidate state: node="Node at 10.244.0.18:8400 [Candidate]" term=1163
2021-09-22T19:55:52.446Z [DEBUG] raft: votes: needed=1
2021-09-22T19:55:52.446Z [DEBUG] raft: vote granted: from=proglog-0 term=1163 tally=1
2021-09-22T19:55:52.446Z [INFO]  raft: election won: tally=1
2021-09-22T19:55:52.446Z [INFO]  raft: entering leader state: leader="Node at 10.244.0.18:8400 [Leader]"
2021-09-22T19:55:52.446Z [ERROR] raft: failed to commit logs: error=EOF
2021-09-22T19:55:52.446Z [INFO]  raft: entering follower state: follower="Node at 10.244.0.18:8400 [Follower]" leader=
2021/09/22 19:55:53 [INFO] serf: EventMemberJoin: proglog-0 10.244.0.18
2021-09-22T19:55:54.019Z [WARN]  raft: heartbeat timeout reached, starting election: last-leader=
2021-09-22T19:55:54.019Z [INFO]  raft: entering candidate state: node="Node at 10.244.0.18:8400 [Candidate]" term=1164
2021-09-22T19:55:54.021Z [DEBUG] raft: votes: needed=1
2021-09-22T19:55:54.021Z [DEBUG] raft: vote granted: from=proglog-0 term=1164 tally=1
2021-09-22T19:55:54.021Z [INFO]  raft: election won: tally=1
2021-09-22T19:55:54.021Z [INFO]  raft: entering leader state: leader="Node at 10.244.0.18:8400 [Leader]"
2021-09-22T19:55:54.021Z [ERROR] raft: failed to commit logs: error=EOF
2021-09-22T19:55:54.021Z [INFO]  raft: entering follower state: follower="Node at 10.244.0.18:8400 [Follower]" leader=
2021-09-22T19:55:55.891Z [WARN]  raft: heartbeat timeout reached, starting election: last-leader=
2021-09-22T19:55:55.891Z [INFO]  raft: entering candidate state: node="Node at 10.244.0.18:8400 [Candidate]" term=1165
2021-09-22T19:55:55.895Z [DEBUG] raft: votes: needed=1
2021-09-22T19:55:55.895Z [DEBUG] raft: vote granted: from=proglog-0 term=1165 tally=1
2021-09-22T19:55:55.895Z [INFO]  raft: election won: tally=1
2021-09-22T19:55:55.895Z [INFO]  raft: entering leader state: leader="Node at 10.244.0.18:8400 [Leader]"
2021-09-22T19:55:55.895Z [ERROR] raft: failed to commit logs: error=EOF
2021-09-22T19:55:55.896Z [INFO]  raft: entering follower state: follower="Node at 10.244.0.18:8400 [Follower]" leader=
2021-09-22T19:55:57.301Z [WARN]  raft: heartbeat timeout reached, starting election: last-leader=
2021-09-22T19:55:57.301Z [INFO]  raft: entering candidate state: node="Node at 10.244.0.18:8400 [Candidate]" term=1166
2021-09-22T19:55:57.303Z [DEBUG] raft: votes: needed=1
2021-09-22T19:55:57.303Z [DEBUG] raft: vote granted: from=proglog-0 term=1166 tally=1
2021-09-22T19:55:57.303Z [INFO]  raft: election won: tally=1
2021-09-22T19:55:57.303Z [INFO]  raft: entering leader state: leader="Node at 10.244.0.18:8400 [Leader]"
2021-09-22T19:55:57.303Z [ERROR] raft: failed to commit logs: error=EOF
2021-09-22T19:55:57.303Z [INFO]  raft: entering follower state: follower="Node at 10.244.0.18:8400 [Follower]" leader=
2021-09-22T19:55:59.303Z [WARN]  raft: heartbeat timeout reached, starting election: last-leader=
2021-09-22T19:55:59.303Z [INFO]  raft: entering candidate state: node="Node at 10.244.0.18:8400 [Candidate]" term=1167
2021-09-22T19:55:59.306Z [DEBUG] raft: votes: needed=1
2021-09-22T19:55:59.306Z [DEBUG] raft: vote granted: from=proglog-0 term=1167 tally=1
2021-09-22T19:55:59.306Z [INFO]  raft: election won: tally=1
2021-09-22T19:55:59.306Z [INFO]  raft: entering leader state: leader="Node at 10.244.0.18:8400 [Leader]"
2021-09-22T19:55:59.306Z [ERROR] raft: failed to commit logs: error=EOF
2021-09-22T19:55:59.306Z [INFO]  raft: entering follower state: follower="Node at 10.244.0.18:8400 [Follower]" leader=
2021-09-22T19:56:01.033Z [WARN]  raft: heartbeat timeout reached, starting election: last-leader=
2021-09-22T19:56:01.033Z [INFO]  raft: entering candidate state: node="Node at 10.244.0.18:8400 [Candidate]" term=1168
2021-09-22T19:56:01.035Z [DEBUG] raft: votes: needed=1
2021-09-22T19:56:01.035Z [DEBUG] raft: vote granted: from=proglog-0 term=1168 tally=1
2021-09-22T19:56:01.035Z [INFO]  raft: election won: tally=1
2021-09-22T19:56:01.035Z [INFO]  raft: entering leader state: leader="Node at 10.244.0.18:8400 [Leader]"
2021-09-22T19:56:01.035Z [ERROR] raft: failed to commit logs: error=EOF
2021-09-22T19:56:01.035Z [INFO]  raft: entering follower state: follower="Node at 10.244.0.18:8400 [Follower]" leader=
2021-09-22T19:56:02.440Z [WARN]  raft: heartbeat timeout reached, starting election: last-leader=
2021-09-22T19:56:02.440Z [INFO]  raft: entering candidate state: node="Node at 10.244.0.18:8400 [Candidate]" term=1169
2021-09-22T19:56:02.443Z [DEBUG] raft: votes: needed=1
2021-09-22T19:56:02.443Z [DEBUG] raft: vote granted: from=proglog-0 term=1169 tally=1
2021-09-22T19:56:02.443Z [INFO]  raft: election won: tally=1
2021-09-22T19:56:02.443Z [INFO]  raft: entering leader state: leader="Node at 10.244.0.18:8400 [Leader]"
2021-09-22T19:56:02.443Z [ERROR] raft: failed to commit logs: error=EOF
2021-09-22T19:56:02.443Z [INFO]  raft: entering follower state: follower="Node at 10.244.0.18:8400 [Follower]" leader=
2021-09-22T19:56:03.970Z [WARN]  raft: heartbeat timeout reached, starting election: last-leader=
2021-09-22T19:56:03.970Z [INFO]  raft: entering candidate state: node="Node at 10.244.0.18:8400 [Candidate]" term=1170
2021-09-22T19:56:03.972Z [DEBUG] raft: votes: needed=1
2021-09-22T19:56:03.972Z [DEBUG] raft: vote granted: from=proglog-0 term=1170 tally=1
2021-09-22T19:56:03.973Z [INFO]  raft: election won: tally=1
2021-09-22T19:56:03.973Z [INFO]  raft: entering leader state: leader="Node at 10.244.0.18:8400 [Leader]"
2021-09-22T19:56:03.973Z [ERROR] raft: failed to commit logs: error=EOF
2021-09-22T19:56:03.973Z [INFO]  raft: entering follower state: follower="Node at 10.244.0.18:8400 [Follower]" leader=
2021-09-22T19:56:05.408Z [WARN]  raft: heartbeat timeout reached, starting election: last-leader=
2021-09-22T19:56:05.408Z [INFO]  raft: entering candidate state: node="Node at 10.244.0.18:8400 [Candidate]" term=1171
2021-09-22T19:56:05.410Z [DEBUG] raft: votes: needed=1
2021-09-22T19:56:05.410Z [DEBUG] raft: vote granted: from=proglog-0 term=1171 tally=1
2021-09-22T19:56:05.410Z [INFO]  raft: election won: tally=1
2021-09-22T19:56:05.410Z [INFO]  raft: entering leader state: leader="Node at 10.244.0.18:8400 [Leader]"
2021-09-22T19:56:05.410Z [ERROR] raft: failed to commit logs: error=EOF
2021-09-22T19:56:05.410Z [INFO]  raft: entering follower state: follower="Node at 10.244.0.18:8400 [Follower]" leader=
2021-09-22T19:56:06.805Z [WARN]  raft: heartbeat timeout reached, starting election: last-leader=
2021-09-22T19:56:06.805Z [INFO]  raft: entering candidate state: node="Node at 10.244.0.18:8400 [Candidate]" term=1172
2021-09-22T19:56:06.808Z [DEBUG] raft: votes: needed=1
2021-09-22T19:56:06.809Z [DEBUG] raft: vote granted: from=proglog-0 term=1172 tally=1
2021-09-22T19:56:06.809Z [INFO]  raft: election won: tally=1
2021-09-22T19:56:06.809Z [INFO]  raft: entering leader state: leader="Node at 10.244.0.18:8400 [Leader]"
2021-09-22T19:56:06.809Z [ERROR] raft: failed to commit logs: error=EOF
2021-09-22T19:56:06.809Z [INFO]  raft: entering follower state: follower="Node at 10.244.0.18:8400 [Follower]" leader=
2021-09-22T19:56:07.827Z [WARN]  raft: heartbeat timeout reached, starting election: last-leader=
2021-09-22T19:56:07.827Z [INFO]  raft: entering candidate state: node="Node at 10.244.0.18:8400 [Candidate]" term=1173
2021-09-22T19:56:07.831Z [DEBUG] raft: votes: needed=1
2021-09-22T19:56:07.831Z [DEBUG] raft: vote granted: from=proglog-0 term=1173 tally=1
2021-09-22T19:56:07.831Z [INFO]  raft: election won: tally=1
2021-09-22T19:56:07.831Z [INFO]  raft: entering leader state: leader="Node at 10.244.0.18:8400 [Leader]"
2021-09-22T19:56:07.831Z [ERROR] raft: failed to commit logs: error=EOF
2021-09-22T19:56:07.831Z [INFO]  raft: entering follower state: follower="Node at 10.244.0.18:8400 [Follower]" leader=
2021-09-22T19:56:09.536Z [WARN]  raft: heartbeat timeout reached, starting election: last-leader=
2021-09-22T19:56:09.537Z [INFO]  raft: entering candidate state: node="Node at 10.244.0.18:8400 [Candidate]" term=1174
2021-09-22T19:56:09.539Z [DEBUG] raft: votes: needed=1
2021-09-22T19:56:09.539Z [DEBUG] raft: vote granted: from=proglog-0 term=1174 tally=1
2021-09-22T19:56:09.539Z [INFO]  raft: election won: tally=1
2021-09-22T19:56:09.539Z [INFO]  raft: entering leader state: leader="Node at 10.244.0.18:8400 [Leader]"
2021-09-22T19:56:09.539Z [ERROR] raft: failed to commit logs: error=EOF
2021-09-22T19:56:09.539Z [INFO]  raft: entering follower state: follower="Node at 10.244.0.18:8400 [Follower]" leader=
2021-09-22T19:56:11.076Z [WARN]  raft: heartbeat timeout reached, starting election: last-leader=
2021-09-22T19:56:11.076Z [INFO]  raft: entering candidate state: node="Node at 10.244.0.18:8400 [Candidate]" term=1175
2021-09-22T19:56:11.080Z [DEBUG] raft: votes: needed=1
2021-09-22T19:56:11.080Z [DEBUG] raft: vote granted: from=proglog-0 term=1175 tally=1
2021-09-22T19:56:11.080Z [INFO]  raft: election won: tally=1
2021-09-22T19:56:11.080Z [INFO]  raft: entering leader state: leader="Node at 10.244.0.18:8400 [Leader]"
2021-09-22T19:56:11.080Z [ERROR] raft: failed to commit logs: error=EOF
2021-09-22T19:56:11.080Z [INFO]  raft: entering follower state: follower="Node at 10.244.0.18:8400 [Follower]" leader=
2021-09-22T19:56:12.467Z [WARN]  raft: heartbeat timeout reached, starting election: last-leader=
2021-09-22T19:56:12.467Z [INFO]  raft: entering candidate state: node="Node at 10.244.0.18:8400 [Candidate]" term=1176
2021-09-22T19:56:12.470Z [DEBUG] raft: votes: needed=1
2021-09-22T19:56:12.470Z [DEBUG] raft: vote granted: from=proglog-0 term=1176 tally=1
2021-09-22T19:56:12.470Z [INFO]  raft: election won: tally=1
2021-09-22T19:56:12.470Z [INFO]  raft: entering leader state: leader="Node at 10.244.0.18:8400 [Leader]"
2021-09-22T19:56:12.470Z [ERROR] raft: failed to commit logs: error=EOF
2021-09-22T19:56:12.470Z [INFO]  raft: entering follower state: follower="Node at 10.244.0.18:8400 [Follower]" leader=
2021-09-22T19:56:13.829Z [WARN]  raft: heartbeat timeout reached, starting election: last-leader=
2021-09-22T19:56:13.830Z [INFO]  raft: entering candidate state: node="Node at 10.244.0.18:8400 [Candidate]" term=1177
2021-09-22T19:56:13.834Z [DEBUG] raft: votes: needed=1
2021-09-22T19:56:13.834Z [DEBUG] raft: vote granted: from=proglog-0 term=1177 tally=1
2021-09-22T19:56:13.834Z [INFO]  raft: election won: tally=1
2021-09-22T19:56:13.834Z [INFO]  raft: entering leader state: leader="Node at 10.244.0.18:8400 [Leader]"
2021-09-22T19:56:13.834Z [ERROR] raft: failed to commit logs: error=EOF
2021-09-22T19:56:13.834Z [INFO]  raft: entering follower state: follower="Node at 10.244.0.18:8400 [Follower]" leader=
2021-09-22T19:56:15.340Z [WARN]  raft: heartbeat timeout reached, starting election: last-leader=
2021-09-22T19:56:15.341Z [INFO]  raft: entering candidate state: node="Node at 10.244.0.18:8400 [Candidate]" term=1178
2021-09-22T19:56:15.343Z [DEBUG] raft: votes: needed=1
2021-09-22T19:56:15.343Z [DEBUG] raft: vote granted: from=proglog-0 term=1178 tally=1
2021-09-22T19:56:15.343Z [INFO]  raft: election won: tally=1
2021-09-22T19:56:15.343Z [INFO]  raft: entering leader state: leader="Node at 10.244.0.18:8400 [Leader]"
2021-09-22T19:56:15.343Z [ERROR] raft: failed to commit logs: error=EOF
2021-09-22T19:56:15.343Z [INFO]  raft: entering follower state: follower="Node at 10.244.0.18:8400 [Follower]" leader=
2021-09-22T19:56:16.451Z [WARN]  raft: heartbeat timeout reached, starting election: last-leader=
2021-09-22T19:56:16.451Z [INFO]  raft: entering candidate state: node="Node at 10.244.0.18:8400 [Candidate]" term=1179
2021-09-22T19:56:16.455Z [DEBUG] raft: votes: needed=1
2021-09-22T19:56:16.455Z [DEBUG] raft: vote granted: from=proglog-0 term=1179 tally=1
2021-09-22T19:56:16.455Z [INFO]  raft: election won: tally=1
2021-09-22T19:56:16.455Z [INFO]  raft: entering leader state: leader="Node at 10.244.0.18:8400 [Leader]"
2021-09-22T19:56:16.455Z [ERROR] raft: failed to commit logs: error=EOF
2021-09-22T19:56:16.455Z [INFO]  raft: entering follower state: follower="Node at 10.244.0.18:8400 [Follower]" leader=
2021-09-22T19:56:18.422Z [WARN]  raft: heartbeat timeout reached, starting election: last-leader=
2021-09-22T19:56:18.422Z [INFO]  raft: entering candidate state: node="Node at 10.244.0.18:8400 [Candidate]" term=1180
2021-09-22T19:56:18.425Z [DEBUG] raft: votes: needed=1
2021-09-22T19:56:18.425Z [DEBUG] raft: vote granted: from=proglog-0 term=1180 tally=1
2021-09-22T19:56:18.425Z [INFO]  raft: election won: tally=1
2021-09-22T19:56:18.425Z [INFO]  raft: entering leader state: leader="Node at 10.244.0.18:8400 [Leader]"
2021-09-22T19:56:18.425Z [ERROR] raft: failed to commit logs: error=EOF
2021-09-22T19:56:18.425Z [INFO]  raft: entering follower state: follower="Node at 10.244.0.18:8400 [Follower]" leader=
2021-09-22T19:56:19.704Z [WARN]  raft: heartbeat timeout reached, starting election: last-leader=
2021-09-22T19:56:19.704Z [INFO]  raft: entering candidate state: node="Node at 10.244.0.18:8400 [Candidate]" term=1181
2021-09-22T19:56:19.706Z [DEBUG] raft: votes: needed=1
2021-09-22T19:56:19.706Z [DEBUG] raft: vote granted: from=proglog-0 term=1181 tally=1
2021-09-22T19:56:19.706Z [INFO]  raft: election won: tally=1
2021-09-22T19:56:19.706Z [INFO]  raft: entering leader state: leader="Node at 10.244.0.18:8400 [Leader]"
2021-09-22T19:56:19.706Z [ERROR] raft: failed to commit logs: error=EOF
2021-09-22T19:56:19.706Z [INFO]  raft: entering follower state: follower="Node at 10.244.0.18:8400 [Follower]" leader=
2021-09-22T19:56:21.301Z [WARN]  raft: heartbeat timeout reached, starting election: last-leader=
2021-09-22T19:56:21.301Z [INFO]  raft: entering candidate state: node="Node at 10.244.0.18:8400 [Candidate]" term=1182
2021-09-22T19:56:21.303Z [DEBUG] raft: votes: needed=1
2021-09-22T19:56:21.303Z [DEBUG] raft: vote granted: from=proglog-0 term=1182 tally=1
2021-09-22T19:56:21.304Z [INFO]  raft: election won: tally=1
2021-09-22T19:56:21.304Z [INFO]  raft: entering leader state: leader="Node at 10.244.0.18:8400 [Leader]"
2021-09-22T19:56:21.304Z [ERROR] raft: failed to commit logs: error=EOF
2021-09-22T19:56:21.304Z [INFO]  raft: entering follower state: follower="Node at 10.244.0.18:8400 [Follower]" leader=
2021-09-22T19:56:23.026Z [WARN]  raft: heartbeat timeout reached, starting election: last-leader=
2021-09-22T19:56:23.026Z [INFO]  raft: entering candidate state: node="Node at 10.244.0.18:8400 [Candidate]" term=1183
2021-09-22T19:56:23.029Z [DEBUG] raft: votes: needed=1
2021-09-22T19:56:23.029Z [DEBUG] raft: vote granted: from=proglog-0 term=1183 tally=1
2021-09-22T19:56:23.029Z [INFO]  raft: election won: tally=1
2021-09-22T19:56:23.029Z [INFO]  raft: entering leader state: leader="Node at 10.244.0.18:8400 [Leader]"
2021-09-22T19:56:23.029Z [ERROR] raft: failed to commit logs: error=EOF
2021-09-22T19:56:23.029Z [INFO]  raft: entering follower state: follower="Node at 10.244.0.18:8400 [Follower]" leader=
2021-09-22T19:56:24.189Z [WARN]  raft: heartbeat timeout reached, starting election: last-leader=
2021-09-22T19:56:24.189Z [INFO]  raft: entering candidate state: node="Node at 10.244.0.18:8400 [Candidate]" term=1184
2021-09-22T19:56:24.192Z [DEBUG] raft: votes: needed=1
2021-09-22T19:56:24.192Z [DEBUG] raft: vote granted: from=proglog-0 term=1184 tally=1
2021-09-22T19:56:24.192Z [INFO]  raft: election won: tally=1
2021-09-22T19:56:24.192Z [INFO]  raft: entering leader state: leader="Node at 10.244.0.18:8400 [Leader]"
2021-09-22T19:56:24.192Z [ERROR] raft: failed to commit logs: error=EOF
2021-09-22T19:56:24.192Z [INFO]  raft: entering follower state: follower="Node at 10.244.0.18:8400 [Follower]" leader=
2021-09-22T19:56:25.624Z [WARN]  raft: heartbeat timeout reached, starting election: last-leader=
2021-09-22T19:56:25.624Z [INFO]  raft: entering candidate state: node="Node at 10.244.0.18:8400 [Candidate]" term=1185
2021-09-22T19:56:25.628Z [DEBUG] raft: votes: needed=1
2021-09-22T19:56:25.628Z [DEBUG] raft: vote granted: from=proglog-0 term=1185 tally=1
2021-09-22T19:56:25.628Z [INFO]  raft: election won: tally=1
2021-09-22T19:56:25.628Z [INFO]  raft: entering leader state: leader="Node at 10.244.0.18:8400 [Leader]"
2021-09-22T19:56:25.628Z [ERROR] raft: failed to commit logs: error=EOF
2021-09-22T19:56:25.628Z [INFO]  raft: entering follower state: follower="Node at 10.244.0.18:8400 [Follower]" leader=
2021-09-22T19:56:26.645Z [WARN]  raft: heartbeat timeout reached, starting election: last-leader=
2021-09-22T19:56:26.645Z [INFO]  raft: entering candidate state: node="Node at 10.244.0.18:8400 [Candidate]" term=1186
2021-09-22T19:56:26.648Z [DEBUG] raft: votes: needed=1
2021-09-22T19:56:26.648Z [DEBUG] raft: vote granted: from=proglog-0 term=1186 tally=1
2021-09-22T19:56:26.648Z [INFO]  raft: election won: tally=1
2021-09-22T19:56:26.648Z [INFO]  raft: entering leader state: leader="Node at 10.244.0.18:8400 [Leader]"
2021-09-22T19:56:26.648Z [ERROR] raft: failed to commit logs: error=EOF
2021-09-22T19:56:26.648Z [INFO]  raft: entering follower state: follower="Node at 10.244.0.18:8400 [Follower]" leader=
2021-09-22T19:56:27.897Z [WARN]  raft: heartbeat timeout reached, starting election: last-leader=
2021-09-22T19:56:27.897Z [INFO]  raft: entering candidate state: node="Node at 10.244.0.18:8400 [Candidate]" term=1187
2021-09-22T19:56:27.900Z [DEBUG] raft: votes: needed=1
2021-09-22T19:56:27.900Z [DEBUG] raft: vote granted: from=proglog-0 term=1187 tally=1
2021-09-22T19:56:27.900Z [INFO]  raft: election won: tally=1
2021-09-22T19:56:27.900Z [INFO]  raft: entering leader state: leader="Node at 10.244.0.18:8400 [Leader]"
2021-09-22T19:56:27.900Z [ERROR] raft: failed to commit logs: error=EOF
2021-09-22T19:56:27.900Z [INFO]  raft: entering follower state: follower="Node at 10.244.0.18:8400 [Follower]" leader=

Hey adamwoolhether

Were you able to fix this?

I have the same problem, where the first instance becomes a candidate and is promoted as a leader, but then fails with [ERROR] raft: failed to commit logs: error=EOF, and becomes a follower again.

The application eventually times out and the pod errors out.

1 Like

To anybody who is still interested in the solution, there are multiple bugs that need to be fixed for this to work.

Bug 1
*DistributedLog.Close() does not close the log that is used for the raft logStore. It only closes the log used to store the user data. As a result, the index file corresponding to the raft log does not get truncated to its actual size (remeber that we increase its size to MaxIndexBytes before memory mapping). This can be fixed by explicitly closing the raft log immediately after raft.Shutdown() in *DistributedLog.Close().

Bug 2
In *Index.Write(), the very first if condition checks whether there is enough space available to write one more index entry. If not, it bails out with EOF. The trouble with this is that by this time, the record has already been written to the store (i.e, we write to the store first and then update the index file). This EOF gets propagated all the way to *Log.Append() and therefore the append fails entirely having already written the record to the store!!. What should have happened in this case is that a new segment should have been created and the record should have been appended to that new segment. Unfortunately, in *Log.Append(), the call to l.newSegment() is in the wrong place. It should have been called before actually appending the record to the active segment, not after it.

Bug 3
After the changes made in the “Advertise Raft on the Fully Qualified Domain Name” section, the address specified in the liveness and readiness probes do not work. The addresses have to be modified to use the fqdn as well as shown below:

        readinessProbe:
          exec:
            command:
            - /bin/sh
            - -c
            - |-
              /bin/grpc_health_probe -addr=$HOSTNAME.proglog.{{.Release.Namespace}}.svc.cluster.local:{{.Values.rpcPort}}
          initialDelaySeconds: 10
        livenessProbe:
          exec:
            command:
            - /bin/sh
            - -c
            - |-
              /bin/grpc_health_probe -addr=$HOSTNAME.proglog.{{.Release.Namespace}}.svc.cluster.local:{{.Values.rpcPort}}
          initialDelaySeconds: 10

See this commit for the all the changes I did to fix these bugs - Fix bugs in book. · varunbpatil/proglog@2ecd0d7 · GitHub

3 Likes

Hello @travisjeffery, Thanks for your fix in advance.
I could run cluster and get running 3 pods with your codes.

$ kubectl get pods
NAME        READY   STATUS    RESTARTS   AGE
proglog-0   1/1     Running   0          19m
proglog-1   1/1     Running   0          19m
proglog-2   1/1     Running   0          19m

But, I could not success GetServersRequest using cmd/getservers/main.go in original codes of DeployLocally
The error was like this

# in client side
2022/05/16 16:22:25 rpc error: code = Unavailable desc = connection closed before server preface received
exit status 1
# in server side - port-forwarding 
$ kubectl port-forward pod/proglog-0 8400
Forwarding from 127.0.0.1:8400 -> 8400
Forwarding from [::1]:8400 -> 8400
Handling connection for 8400
E0516 16:32:31.170225  830740 portforward.go:406] an error occurred forwarding 8400 -> 8400: error forwarding port 8400 to pod 111bac84f7d9e8eff738413ffeb7d13d4a17508ee01da91a49762d693064df36, uid : failed to execute portforward in network namespace "/var/run/netns/cni-6fee86ea-c8b6-af08-f69b-208e49a8a0df": failed to connect to localhost:8400 inside namespace "111bac84f7d9e8eff738413ffeb7d13d4a17508ee01da91a49762d693064df36", IPv4: dial tcp4 127.0.0.1:8400: connect: connection refused IPv6 dial tcp6 [::1]:8400: connect: connection refused 
E0516 16:32:31.170528  830740 portforward.go:234] lost connection to pod

Solved.
For someone who may have the same issue, check the link below

1 Like

Thanks to everyone for the fixes above.

For those who are using Apple Silicon hardware, here’s a small tweak to Dockerfile to parameterize grpc_health_probe architecture (linux-arm64):

RUN GRPC_HEALTH_PROBE_VERSION=v0.4.12 && \
    GHP_ARCH=linux-arm64 && \
    wget -qO/go/bin/grpc_health_probe \
    https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-${GHP_ARCH} && \
    chmod +x /go/bin/grpc_health_probe