TekOnline

Diagnosing and Fixing High CPU Usage in Authentik Stack

The Problem

Recently, we noticed that our Authentik authentication stack was experiencing unusually high CPU usage, with the authentik-server-1 container consuming up to 178.62% CPU. This was causing performance issues and potential system instability.

Initial Investigation

When we first checked the container stats, we observed:

  • authentik-server-1: 178.62% CPU usage
  • authentik-redis-1: 0.50% CPU usage
  • authentik-worker-1: 0.22% CPU usage

Root Causes Identified

After investigating the logs and system state, we identified several issues:

  1. Redis Storage Issues
    • Redis was showing “No space left on device” errors
    • Continuous background saving attempts were failing
    • This was causing Redis to repeatedly attempt saving its state, leading to high CPU usage
  2. Memory Overcommit Warning
    • System was showing warnings about memory overcommit being disabled
    • This can cause performance issues and failures in Redis operations
  3. Resource Limits
    • No resource limits were set on the containers
    • This allowed containers to consume excessive system resources

The Solution

We implemented several fixes to resolve these issues:

1. Enable Memory Overcommit

sudo sysctl vm.overcommit_memory=1

This enables memory overcommit on the host system, which is recommended for Redis operations.

2. Clean Up Docker Resources

We cleaned up the existing containers and volumes to resolve the storage issues:

docker stop authentik-server-1 authentik-worker-1 authentik-redis-1 authentik-postgresql-1
docker rm authentik-server-1 authentik-worker-1 authentik-redis-1 authentik-postgresql-1
docker volume prune -f

3. Implement Resource Limits

We updated the docker-compose configuration to include resource limits for all containers:

services:
  postgresql:
    deploy:
      resources:
        limits:
          cpus: '1'
          memory: 1G
        reservations:
          cpus: '0.5'
          memory: 512M

  redis:
    deploy:
      resources:
        limits:
          cpus: '0.5'
          memory: 512M
        reservations:
          cpus: '0.2'
          memory: 256M

  server:
    deploy:
      resources:
        limits:
          cpus: '1'
          memory: 1G
        reservations:
          cpus: '0.5'
          memory: 512M
    environment:
      AUTHENTIK_LOG_LEVEL: warning

  worker:
    deploy:
      resources:
        limits:
          cpus: '1'
          memory: 1G
        reservations:
          cpus: '0.5'
          memory: 512M
    environment:
      AUTHENTIK_LOG_LEVEL: warning

Key Improvements

  1. Resource Management
    • Added CPU and memory limits to prevent resource exhaustion
    • Set resource reservations to ensure minimum performance
    • Reduced logging overhead by setting log level to warning
  2. Storage Optimization
    • Cleaned up Redis storage issues
    • Removed unnecessary volumes
    • Enabled proper memory overcommit settings
  3. Performance Monitoring
    • Added resource limits makes it easier to monitor container performance
    • Reduced logging overhead helps with system performance
    • Better resource allocation prevents container resource contention

Conclusion

These changes have significantly improved the performance and stability of our Authentik stack. The high CPU usage has been resolved, and the system is now operating within defined resource limits. Regular monitoring and maintenance will help prevent similar issues in the future.


    Posted

    in

    by

    Tags:

    Comments

    Leave a Reply

    Your email address will not be published. Required fields are marked *