YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

GPU Monitoring and Fan Control System

A comprehensive GPU monitoring and fan control system for AMD GPUs with real-time monitoring, advanced fan control, web interface, and system integration.

Features

🖥️ Desktop Monitoring

  • Real-time GPU monitoring with PyQt5 desktop overlay
  • System tray integration for minimal footprint monitoring
  • Configurable display modes (overlay, tray, full dashboard)
  • Multiple GPU support with automatic detection

🌡️ Advanced Fan Control

  • Multiple temperature curves (Silent, Balanced, Performance, Custom)
  • Profile-based control with hotkey switching
  • Safety limits and automatic fallback modes
  • Manual override capabilities

🌐 Web Interface

  • Remote monitoring via web browser
  • Real-time charts and historical data
  • Mobile-responsive design
  • API endpoints for integration

📊 Data Logging & Analysis

  • Historical data storage with SQLite
  • Performance analytics and trend analysis
  • Export capabilities (CSV, JSON)
  • Alert system for temperature thresholds

🔧 System Integration

  • Systemd service for automatic startup
  • Configuration management with JSON profiles
  • Autostart integration for desktop environments
  • Permission handling and error recovery

Installation

Prerequisites

# Install required packages
sudo apt update
sudo apt install python3 python3-pip python3-venv
sudo apt install python3-pyqt5 python3-pyqt5.qtopengl
sudo apt install python3-matplotlib python3-flask

Setup

# Create project directory
mkdir -p ~/gpu_monitoring_system
cd ~/gpu_monitoring_system

# Create virtual environment
python3 -m venv venv
source venv/bin/activate

# Install Python dependencies
pip install -r requirements.txt

# Run setup script
./setup.sh

Usage

Desktop Monitor

# Start desktop monitoring overlay
python3 gpu_monitor_desktop.py

# Start with system tray
python3 gpu_monitor_desktop.py --tray

# Start full dashboard
python3 gpu_monitor_desktop.py --dashboard

Fan Control

# Start fan control with default profile
python3 gpu_fan_controller.py

# Start with specific profile
python3 gpu_fan_controller.py --profile performance

# Start with custom configuration
python3 gpu_fan_controller.py --config custom.json

Web Interface

# Start web server
python3 web_interface.py

# Access at http://localhost:5000

System Service

# Install as system service
sudo ./install_service.sh

# Start service
sudo systemctl start gpu-monitoring

# Enable auto-start
sudo systemctl enable gpu-monitoring

Configuration

Fan Control Profiles

Create custom fan control profiles in config/fan_profiles.json:

{
  "silent": {
    "name": "Silent",
    "description": "Quiet operation with lower temperatures",
    "curve": {
      "min_temp": 40,
      "max_temp": 65,
      "min_pwm": 120,
      "max_pwm": 220
    },
    "safety": {
      "emergency_temp": 85,
      "emergency_pwm": 255
    }
  },
  "performance": {
    "name": "Performance",
    "description": "Maximum cooling for high performance",
    "curve": {
      "min_temp": 35,
      "max_temp": 55,
      "min_pwm": 180,
      "max_pwm": 255
    },
    "safety": {
      "emergency_temp": 80,
      "emergency_pwm": 255
    }
  }
}

Monitoring Configuration

Configure monitoring settings in config/monitoring.json:

{
  "update_interval": 1.0,
  "display_mode": "overlay",
  "show_gpu_load": true,
  "show_temperature": true,
  "show_fan_speed": true,
  "show_power": true,
  "show_vram": true,
  "alerts": {
    "enabled": true,
    "temp_warning": 75,
    "temp_critical": 85,
    "power_warning": 200
  }
}

Monitoring Data

The system collects and stores the following data:

GPU Metrics

  • Temperature: Core temperature in Celsius
  • Load: GPU utilization percentage
  • Fan Speed: RPM and PWM percentage
  • Power: Current power draw in watts
  • VRAM: Used and total memory
  • Clocks: Core and memory clock speeds

System Metrics

  • CPU Usage: Overall system load
  • Memory Usage: System RAM utilization
  • Disk Usage: Storage space monitoring
  • Network: Bandwidth usage

Web Interface Features

Dashboard

  • Real-time metric display
  • Historical charts with configurable time ranges
  • System health overview
  • Alert status and history

Charts

  • Temperature trends over time
  • Fan speed and PWM curves
  • Power consumption patterns
  • GPU utilization history

Configuration

  • Fan profile management
  • Alert threshold configuration
  • Display settings
  • Data export options

API Endpoints

Monitoring Data

  • GET /api/status - Current GPU status
  • GET /api/history - Historical data
  • GET /api/metrics - All available metrics

Fan Control

  • POST /api/fan/profile - Set fan profile
  • POST /api/fan/manual - Manual fan control
  • GET /api/fan/status - Current fan status

System

  • GET /api/system - System information
  • POST /api/alerts - Configure alerts
  • GET /api/logs - System logs

Troubleshooting

Permission Issues

If you encounter permission errors with GPU monitoring:

# Check GPU permissions
ls -la /sys/class/drm/card*/device/hwmon/

# Add user to video group
sudo usermod -a -G video $USER

# Or run with sudo for fan control
sudo python3 gpu_fan_controller.py

Missing Dependencies

# Install missing PyQt5
pip install PyQt5 PyQt5-sip

# Install missing matplotlib
pip install matplotlib

# Install missing Flask
pip install flask

Service Issues

# Check service status
sudo systemctl status gpu-monitoring

# View service logs
sudo journalctl -u gpu-monitoring -f

# Restart service
sudo systemctl restart gpu-monitoring

Development

Adding New GPU Support

  1. Update gpu_detector.py with new GPU detection logic
  2. Add temperature sensor paths for new GPU models
  3. Test with python3 test_gpu_detection.py

Custom Fan Curves

  1. Create new profile in config/fan_profiles.json
  2. Test with python3 gpu_fan_controller.py --profile new_profile
  3. Validate temperature response and stability

Web Interface Extensions

  1. Add new routes in web_interface.py
  2. Create templates in templates/ directory
  3. Add static assets in static/ directory

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests for your changes
  5. Submit a pull request

Support

For support and questions:

  • Create an issue on GitHub
  • Check the troubleshooting section
  • Review the configuration examples
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support