Rob Stewart

stewartcircle.com

CKA Prep - Troubleshooting

2022-08-19 6 min read Kubernetes

This post is part of a series which contains my study notes for the Certified Kubernetes Administrator (CKA) exam.

Note: Unless specifically indicated, text and examples in this post all come directly from the official Kubernetes documentation. I attempted to locate and extract the relevant portions of the kubernetes.io documentation that applied to the exam objective. However, I encourage you to do your own reading. I cannot guarantee that I got all of the important sections.

Troubleshooting

The Exam Curriculum breaks down the fifth exam topic into the following objectives:

Evaluate cluster and node logging

Relevant search terms for Kubernetes Documentation: logging

Concepts

  • Run kubectl get nodes to check the nodes in the cluster.
  • Run kubectl -n kube-system get pods to check the pods in the kube-system namespace.
  • Run kubectl -n kube-system logs <pod-name> to view the logs for a pod
  • Check kubelet logs by running journalctl --unit=kubelet on a system with systemd installed. If systemd is not installed then the components that are not running in containers will log to /var/log.
  • Pod logs are usually in /var/log/pods
  • Check the kubelet logs in syslog for issues.
    cat /var/log/syslog | grep kube-apiserver  
    
  • Check the container logs directly from the container runtime using crictl ps and then crictl logs.

top

Understand how to Monitor Applications

Relevant search terms for Kubernetes Documentation: monitor

Concepts

If the metrics server has been deployed on the kubernetes cluster then the Horizontal Pod Autoscaler will pull metrics data to scale pods based on resource utilization, and the kubectl top command can be used to determine which pods are using the most memory and CPU resources on the cluster.

top

Manage Container Stdout & Stderr Logs

Relevant search terms for Kubernetes Documentation: logs

Concepts

  • To view the logs from a container running in a pod, use the kubectl logs command

    kubectl logs <Pod-Name>
    
  • If there is more than one container running in the pod, you need to specify the container when running the kubectl logs command.

    kubectl logs <pod-Name> -c <Container-Name>
    
  • “You can use kubectl logs --previous to retrieve logs from a previous instantiation of a container.”

    kubectl logs <Pod-Name> -c <Container-Name> --previous
    
  • “You can use kubectl logs to view logs from a pod that is part of a deployment.”

    kubectl logs deploy/<Deployment-Name>
    

top

Troubleshoot Application Failure

Relevant search terms for Kubernetes Documentation: troubleshoot, debug pods

Concepts

  • “My Pods are pending with event message FailedScheduling

    • “If the scheduler cannot find any node where a Pod can fit, the Pod remains unscheduled until a place can be found. An Event is produced each time the scheduler fails to find a place for the Pod. You can use kubectl to view the events for a Pod; for example:”

        kubectl describe pod frontend | grep -A 9999999999 Events
      
    • “In general, if a Pod is pending with a message of this type, there are several things to try:”

      • “Add more nodes to the cluster.”
      • “Terminate unneeded Pods to make room for pending Pods.”
      • “Check that the Pod is not larger than all the nodes. For example, if all the nodes have a capacity of cpu: 1, then a Pod with a request of cpu: 1.1 will never be scheduled.”
      • “Check for node taints. If most of your nodes are tainted, and the new Pod does not tolerate that taint, the scheduler only considers placements onto the remaining nodes that don’t have that taint.”
    • You can check node capacities and amounts allocated with the kubectl describe nodes command.

  • “My container is terminated”

    • “Your container might get terminated because it is resource-starved. To check whether a container is being killed because it is hitting a resource limit, call kubectl describe pod <Pod_Name> on the Pod of interest.”
    • If the Pod was terminated previously then the termination reason will be indicated in the output from the describe command.
    • For example, you might find that the Pod was terminated due to excessive memory utilization. “Your next step might be to check the application code for a memory leak. If you find that the application is behaving how you expect, consider setting a higher memory limit (and possibly request) for that container.”

top

Troubleshoot Cluster Component Failure

Relevant search terms for Kubernetes Documentation: troubleshoot cluster

Concepts

  • Check the status of all the nodes in the cluster by running kubectl get nodes."

  • To get detailed information about the health of the cluster run kubectl cluster-info dump

  • “As with Pods, you can use kubectl describe node and kubectl get node -o yaml to retrieve detailed information about nodes.”

  • Reviewing Kubernetes Logs

    • “Here are the locations of the relevant log files. On systemd-based systems, you may need to use journalctl instead of examining log files.”

      NOTE: Depending on the Kubernetes deployment approach some kubernetes components may be running as static pods. In this case, the logs will be in /var/log/pods

    • Control Plane Nodes

      • /var/log/kube-apiserver.log - API Server, responsible for serving the API”
      • /var/log/kube-scheduler.log - Scheduler, responsible for making scheduling decisions”
      • /var/log/kube-controller-manager.log - a component that runs most Kubernetes built-in controllers, with the notable exception of scheduling (the kube-scheduler handles scheduling)”
    • Worker Nodes

      • /var/log/kubelet.log - logs from the kubelet, responsible for running containers on the node”
      • /var/log/kube-proxy.log - logs from kube-proxy, which is responsible for directing traffic to Service endpoints”

top

Troubleshoot Networking

Relevant search terms for Kubernetes Documentation: troubleshoot

Concepts

  • Service Troubleshooting

    • Make sure that the port used in the spec for the container matches the Target Port in the service.
    • Verify that the selectors in the service match the labels in the Pod spec
    • Check to confirm that the service endpoints are correct
    • Verify that kube-proxy is running
  • Review the cluster troubleshooting tips outlined above.

  • DNS Troubleshooting

    • Verify the /etc/resolv.conf file on a pod to confirm that the DNS configuration is correct

    • Perform a DNS lookup on a pod using the nslookup kubernetes.default command to confirm that DNS is resolving correctly.

    • Check the kube-system namespace to confirm that the coredns pods are running. Often the coredns pods are part of a deployment.

    • If the coredns pods are running then check the logs for errors

    • Check to confirm that the Kubernetes DNS Service is running in the kube-system namespace.

      Note: The service name is kube-dns for both CoreDNS and kube-dns deployments.”

    • Verify that the Pods are exposed to the service as endpoints by running kubectl get endpoints kube-dns --namespace=kube-system.

    • Enable logging for the coredns pods by modifying the coredns configmap using the command kubectl -n kube-system edit configmap coredns and adding log to the Corefile.

    • After logging has been enabled, make some queries and view the logs.

    • The service role used by the coredns pods must be able to list service and endpoint related resources to properly resolve service names.

    • Check the cluster role of system:coredns to confirm that it has the correct permissions.

    • Confirm that you are in the correct namespace when you are querying DNS. “DNS queries that don’t specify a namespace are limited to the pod’s namespace. If the namespace of the pod and service differ, the DNS query must include the namespace of the service.”

top

And that’s a wrap for this topic.

Return to the series overview page