Files
TexTeller/README.docker.md
yoge ba0968b2da
Some checks failed
Sphinx: Render docs / build (push) Has been cancelled
Python Linting / lint (push) Has been cancelled
Run Tests with Pytest / test (push) Has been cancelled
feat: add dockerfile
2025-12-15 22:31:13 +08:00

5.4 KiB

TexTeller Docker Deployment Guide

This guide explains how to deploy TexTeller using Docker with NVIDIA GPU support (optimized for RTX 5080).

Prerequisites

  1. NVIDIA Driver: Install NVIDIA driver version 525 or later
  2. NVIDIA Container Toolkit: Required for GPU access in Docker containers
  3. Docker: Version 20.10 or later
  4. Docker Compose: Version 1.29 or later (or use docker compose v2)
  5. Pre-downloaded Model: Model should be in ~/.cache/huggingface/hub/models--OleehyO--TexTeller/

Setup NVIDIA Container Toolkit

If you haven't installed the NVIDIA Container Toolkit:

# Add the package repository
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

# Install nvidia-container-toolkit
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

# Restart Docker
sudo systemctl restart docker

Quick Start

The easiest way to deploy is using the provided deployment script:

# Run all checks and deploy
./deploy.sh deploy

# Or check system requirements first
./deploy.sh check

# View available commands
./deploy.sh

Build and Run

# Full deployment (checks, build, and start)
./deploy.sh deploy

# Just build the image
./deploy.sh build

# Start/stop the service
./deploy.sh start
./deploy.sh stop

# View logs
./deploy.sh logs

# Check status
./deploy.sh status

Using Docker Compose

# Build and start the service
docker-compose up -d

# View logs
docker-compose logs -f

# Stop the service
docker-compose down

Using Docker directly

# Build the image
docker build -t texteller:latest .

# Run the container
docker run -d \
  --name texteller-server \
  --gpus '"device=0"' \
  -p 8001:8001 \
  -v ~/.cache/huggingface/hub/models--OleehyO--TexTeller:/root/.cache/huggingface/hub/models--OleehyO--TexTeller:ro \
  -e CUDA_VISIBLE_DEVICES=0 \
  texteller:latest

API Usage

The server accepts JSON requests with either base64-encoded images or image URLs at the /predict endpoint.

Using base64-encoded image

# Example with base64 image
curl -X POST http://localhost:8001/predict \
  -H "Content-Type: application/json" \
  -d '{
    "image_base64": "..."
  }'

Using image URL

# Example with image URL
curl -X POST http://localhost:8001/predict \
  -H "Content-Type: application/json" \
  -d '{
    "image_url": "https://example.com/math_equation.png"
  }'

Python client example

import requests
import base64

# Method 1: Using base64
with open("equation.png", "rb") as f:
    image_base64 = base64.b64encode(f.read()).decode()

response = requests.post(
    "http://localhost:8001/predict",
    json={"image_base64": image_base64}
)
print(response.json())

# Method 2: Using URL
response = requests.post(
    "http://localhost:8001/predict",
    json={"image_url": "https://example.com/math_equation.png"}
)
print(response.json())

Or use the provided test script:

# Test with a local image
python examples/test_server.py path/to/equation.png

# Test with both local and URL
python examples/test_server.py path/to/equation.png https://example.com/formula.png

Response format

Success response:

{
  "result": "\\frac{a}{b} = c"
}

Error response:

{
  "error": "Failed to decode image"
}

Configuration

You can configure the service by modifying environment variables in docker-compose.yml:

  • CUDA_VISIBLE_DEVICES: GPU device ID (default: 0)
  • RAY_NUM_REPLICAS: Number of Ray Serve replicas (default: 1)
  • RAY_NCPU_PER_REPLICA: CPUs per replica (default: 4)
  • RAY_NGPU_PER_REPLICA: GPUs per replica (default: 1)

Monitoring

# Check container status
docker ps

# View real-time logs
docker-compose logs -f texteller

# Check GPU usage
nvidia-smi

# Check container resource usage
docker stats texteller-server

Troubleshooting

GPU not detected

# Verify NVIDIA runtime is available
docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi

Port already in use

Change the port mapping in docker-compose.yml:

ports:
  - "8080:8000"  # Host port 8080 -> Container port 8000

Model not found

Ensure the model is downloaded to the correct location:

ls -la ~/.cache/huggingface/hub/models--OleehyO--TexTeller/

Performance Notes

  • RTX 5080: Optimized for CUDA 12.8 with cuDNN 9
  • Memory: Container requires ~4-6GB GPU memory (RTX 5080 has 16GB)
  • Throughput: ~10-20 images/second depending on image complexity
  • Startup time: ~30-60 seconds for model loading

Advanced Configuration

Multiple GPUs

To use multiple GPUs, modify docker-compose.yml:

environment:
  - CUDA_VISIBLE_DEVICES=0,1
  - RAY_NUM_REPLICAS=2
deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          device_ids: ['0', '1']
          capabilities: [gpu]

Production deployment

For production, consider:

  1. Using a reverse proxy (nginx/traefik) for SSL/TLS
  2. Adding authentication middleware
  3. Implementing rate limiting
  4. Setting up monitoring (Prometheus/Grafana)
  5. Using orchestration (Kubernetes) for scaling