Troubleshooting Guide¶

Common issues and solutions for the homelab infrastructure.

Flux GitOps Issues¶

Flux Not Syncing Changes¶

Symptoms: - Changes pushed to Git but not applied to cluster - Kustomizations stuck in "Unknown" state

Diagnosis:

# Check Flux system status
flux get kustomizations

# Check source controller
flux get sources git

# View source controller logs
kubectl logs -n flux-system -l app=source-controller

Solutions:

Force reconciliation:

flux reconcile source git flux-system
flux reconcile kustomization flux-system

Check Git repository access:
```
flux get sources git -o yaml
```

HelmRelease Failures¶

Symptoms: - HelmRelease in "Failed" state - Applications not deploying

Diagnosis:

# Check HelmRelease status
flux get helmreleases -A

# Get detailed error information
kubectl describe helmrelease <release-name> -n <namespace>

# Check helm controller logs
kubectl logs -n flux-system -l app=helm-controller

Solutions:

Check chart repository:
```
flux get sources helm
```

Suspend and resume:

flux suspend helmrelease <release-name> -n <namespace>
flux resume helmrelease <release-name> -n <namespace>

Database Connection Issues¶

PostgreSQL Connection Problems¶

Symptoms: - Applications unable to connect to database - Connection timeouts or authentication failures

Diagnosis:

# Check PostgreSQL pod status
kubectl get pods -n postgresql

# View PostgreSQL logs
kubectl logs -n postgresql statefulset/postgresql

# Test connection from within cluster
kubectl run -it --rm debug --image=postgres:16 --restart=Never -- psql -h postgresql.postgresql.svc.cluster.local -U postgres

Solutions:

Check service and endpoints:

kubectl get svc -n postgresql
kubectl get endpoints -n postgresql

Verify secrets:

kubectl get secrets -n postgresql
kubectl describe onepassworditem -n postgresql

Redis Connection Issues¶

Symptoms: - Cache misses or connection errors - Applications reporting Redis unavailability

Diagnosis:

# Check Redis pod status
kubectl get pods -n redis

# Test Redis connection
kubectl run -it --rm debug --image=redis:7 --restart=Never -- redis-cli -h redis-master.redis.svc.cluster.local ping

Networking Problems¶

Cloudflare Tunnel Issues¶

Symptoms: - External services not accessible - Tunnel showing as disconnected

Diagnosis:

# Check cloudflared pod status
kubectl get pods -n cloudflared

# View tunnel logs
kubectl logs -n cloudflared deployment/cloudflared-cloudflare-tunnel

Solutions:

Restart tunnel:

kubectl rollout restart -n cloudflared deployment/cloudflared-cloudflare-tunnel

Check tunnel configuration:
```
kubectl get configmap -n cloudflared
```

DNS Resolution Problems¶

Symptoms: - Services not resolving by name - External DNS records not created

Diagnosis:

# Check external-dns logs
kubectl logs -n external-dns deployment/external-dns

# Test DNS resolution
kubectl run -it --rm debug --image=busybox --restart=Never -- nslookup auth.kjho.me

Solutions:

Check external-dns configuration:

kubectl get service -A | grep external-dns

Verify Cloudflare credentials:

kubectl describe secret -n external-dns

Storage Issues¶

Longhorn Volume Problems¶

Symptoms: - PVCs stuck in "Pending" state - Application pods failing to start due to volume mount issues

Diagnosis:

# Check PVC status
kubectl get pvc -A

# Check Longhorn system
kubectl get pods -n longhorn-system

# Check storage classes
kubectl get storageclass

Solutions:

Check Longhorn UI:
Access Longhorn dashboard at configured URL
Review volume and node status

Restart Longhorn components:

kubectl rollout restart -n longhorn-system daemonset/longhorn-manager

Authentication Issues¶

Authentik Problems¶

Symptoms: - Unable to access Authentik UI - Authentication flows failing - Database connection errors

Diagnosis:

# Check Authentik pods
kubectl get pods -n authentik

# View Authentik logs
kubectl logs -n authentik deployment/authentik-server
kubectl logs -n authentik deployment/authentik-worker

# Check database initialization
kubectl get jobs -n authentik

Solutions:

Restart Authentik components:

kubectl rollout restart -n authentik deployment/authentik-server
kubectl rollout restart -n authentik deployment/authentik-worker

Check database connectivity:

kubectl exec -it -n authentik deployment/authentik-server -- python manage.py check --database default

1Password Sync Issues¶

Symptoms: - Secrets not syncing from 1Password - OnePasswordItem resources in error state

Diagnosis:

# Check 1Password Connect status
kubectl get pods -n 1password-connect

# Check OnePasswordItem status
kubectl get onepassworditem -A

# View operator logs
kubectl logs -n 1password-connect deployment/onepassword-connect-operator

Solutions:

Restart 1Password Connect:

kubectl rollout restart -n 1password-connect deployment/onepassword-connect

Check credentials:

kubectl describe secret -n 1password-connect

General Debugging Commands¶

Cluster Health¶

# Node status
kubectl get nodes -o wide

# Resource usage
kubectl top nodes
kubectl top pods -A

# Events
kubectl get events -A --sort-by=.metadata.creationTimestamp

Pod Debugging¶

# Pod status and details
kubectl get pods -A -o wide
kubectl describe pod <pod-name> -n <namespace>

# Container logs
kubectl logs <pod-name> -n <namespace> -c <container-name>
kubectl logs <pod-name> -n <namespace> --previous

# Execute into pod
kubectl exec -it <pod-name> -n <namespace> -- /bin/bash

Network Debugging¶

# Test connectivity between pods
kubectl run -it --rm debug --image=busybox --restart=Never -- wget -qO- http://service-name.namespace.svc.cluster.local

# Check service endpoints
kubectl get endpoints -n <namespace>

# Network policies
kubectl get networkpolicies -A

Emergency Procedures¶

Complete Cluster Reset¶

Destructive Operation

Only use in development or when cluster is completely broken

# From deployment machine
ansible-playbook -i provisioning/k3s-inventory.ini provisioning/k3s-wipe.yml
ansible-playbook -i provisioning/k3s-inventory.ini provisioning/k3s-bootstrap.yml

Service Rollback¶

# Rollback HelmRelease to previous version
flux suspend helmrelease <release-name> -n <namespace>
helm rollback <release-name> -n <namespace>
flux resume helmrelease <release-name> -n <namespace>

# Or revert Git commit
git revert <commit-hash>
git push origin main

📁 Related Files:

Discord Notifications - Alert configuration
Database Configurations - PostgreSQL and Redis
Network Configurations - Cloudflare and DNS