# TexTeller Docker Deployment Guide This guide explains how to deploy TexTeller using Docker with NVIDIA GPU support (optimized for RTX 5080). ## Prerequisites 1. **NVIDIA Driver**: Install NVIDIA driver version 525 or later 2. **NVIDIA Container Toolkit**: Required for GPU access in Docker containers 3. **Docker**: Version 20.10 or later 4. **Docker Compose**: Version 1.29 or later (or use `docker compose` v2) 5. **Pre-downloaded Model**: Model should be in `~/.cache/huggingface/hub/models--OleehyO--TexTeller/` ## Setup NVIDIA Container Toolkit If you haven't installed the NVIDIA Container Toolkit: ```bash # Add the package repository distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list # Install nvidia-container-toolkit sudo apt-get update sudo apt-get install -y nvidia-container-toolkit # Restart Docker sudo systemctl restart docker ``` ## Quick Start The easiest way to deploy is using the provided deployment script: ```bash # Run all checks and deploy ./deploy.sh deploy # Or check system requirements first ./deploy.sh check # View available commands ./deploy.sh ``` ## Build and Run ### Using the Deployment Script (Recommended) ```bash # Full deployment (checks, build, and start) ./deploy.sh deploy # Just build the image ./deploy.sh build # Start/stop the service ./deploy.sh start ./deploy.sh stop # View logs ./deploy.sh logs # Check status ./deploy.sh status ``` ### Using Docker Compose ```bash # Build and start the service docker-compose up -d # View logs docker-compose logs -f # Stop the service docker-compose down ``` ### Using Docker directly ```bash # Build the image docker build -t texteller:latest . # Run the container docker run -d \ --name texteller-server \ --gpus '"device=0"' \ -p 8001:8001 \ -v ~/.cache/huggingface/hub/models--OleehyO--TexTeller:/root/.cache/huggingface/hub/models--OleehyO--TexTeller:ro \ -e CUDA_VISIBLE_DEVICES=0 \ texteller:latest ``` ## API Usage The server accepts JSON requests with either base64-encoded images or image URLs at the `/predict` endpoint. ### Using base64-encoded image ```bash # Example with base64 image curl -X POST http://localhost:8001/predict \ -H "Content-Type: application/json" \ -d '{ "image_base64": "..." }' ``` ### Using image URL ```bash # Example with image URL curl -X POST http://localhost:8001/predict \ -H "Content-Type: application/json" \ -d '{ "image_url": "https://example.com/math_equation.png" }' ``` ### Python client example ```python import requests import base64 # Method 1: Using base64 with open("equation.png", "rb") as f: image_base64 = base64.b64encode(f.read()).decode() response = requests.post( "http://localhost:8001/predict", json={"image_base64": image_base64} ) print(response.json()) # Method 2: Using URL response = requests.post( "http://localhost:8001/predict", json={"image_url": "https://example.com/math_equation.png"} ) print(response.json()) ``` Or use the provided test script: ```bash # Test with a local image python examples/test_server.py path/to/equation.png # Test with both local and URL python examples/test_server.py path/to/equation.png https://example.com/formula.png ``` ### Response format Success response: ```json { "result": "\\frac{a}{b} = c" } ``` Error response: ```json { "error": "Failed to decode image" } ``` ## Configuration You can configure the service by modifying environment variables in `docker-compose.yml`: - `CUDA_VISIBLE_DEVICES`: GPU device ID (default: 0) - `RAY_NUM_REPLICAS`: Number of Ray Serve replicas (default: 1) - `RAY_NCPU_PER_REPLICA`: CPUs per replica (default: 4) - `RAY_NGPU_PER_REPLICA`: GPUs per replica (default: 1) ## Monitoring ```bash # Check container status docker ps # View real-time logs docker-compose logs -f texteller # Check GPU usage nvidia-smi # Check container resource usage docker stats texteller-server ``` ## Troubleshooting ### GPU not detected ```bash # Verify NVIDIA runtime is available docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi ``` ### Port already in use Change the port mapping in `docker-compose.yml`: ```yaml ports: - "8080:8000" # Host port 8080 -> Container port 8000 ``` ### Model not found Ensure the model is downloaded to the correct location: ```bash ls -la ~/.cache/huggingface/hub/models--OleehyO--TexTeller/ ``` ## Performance Notes - **RTX 5080**: Optimized for CUDA 12.8 with cuDNN 9 - **Memory**: Container requires ~4-6GB GPU memory (RTX 5080 has 16GB) - **Throughput**: ~10-20 images/second depending on image complexity - **Startup time**: ~30-60 seconds for model loading ## Advanced Configuration ### Multiple GPUs To use multiple GPUs, modify `docker-compose.yml`: ```yaml environment: - CUDA_VISIBLE_DEVICES=0,1 - RAY_NUM_REPLICAS=2 deploy: resources: reservations: devices: - driver: nvidia device_ids: ['0', '1'] capabilities: [gpu] ``` ### Production deployment For production, consider: 1. Using a reverse proxy (nginx/traefik) for SSL/TLS 2. Adding authentication middleware 3. Implementing rate limiting 4. Setting up monitoring (Prometheus/Grafana) 5. Using orchestration (Kubernetes) for scaling