TexTeller/README.docker.md

# TexTeller Docker Deployment Guide

This guide explains how to deploy TexTeller using Docker with NVIDIA GPU support (optimized for RTX 5080).

## Prerequisites

1. **NVIDIA Driver**: Install NVIDIA driver version 525 or later
2. **NVIDIA Container Toolkit**: Required for GPU access in Docker containers
3. **Docker**: Version 20.10 or later
4. **Docker Compose**: Version 1.29 or later (or use `docker compose` v2)
5. **Pre-downloaded Model**: Model should be in `~/.cache/huggingface/hub/models--OleehyO--TexTeller/`

## Setup NVIDIA Container Toolkit

If you haven't installed the NVIDIA Container Toolkit:

```bash
# Add the package repository
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

# Install nvidia-container-toolkit
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

# Restart Docker
sudo systemctl restart docker
```

## Quick Start

The easiest way to deploy is using the provided deployment script:

```bash
# Run all checks and deploy
./deploy.sh deploy

# Or check system requirements first
./deploy.sh check

# View available commands
./deploy.sh
```

## Build and Run

### Using the Deployment Script (Recommended)

```bash
# Full deployment (checks, build, and start)
./deploy.sh deploy

# Just build the image
./deploy.sh build

# Start/stop the service
./deploy.sh start
./deploy.sh stop

# View logs
./deploy.sh logs

# Check status
./deploy.sh status
```

### Using Docker Compose

```bash
# Build and start the service
docker-compose up -d

# View logs
docker-compose logs -f

# Stop the service
docker-compose down
```

### Using Docker directly

```bash
# Build the image
docker build -t texteller:latest .

# Run the container
docker run -d \
  --name texteller-server \
  --gpus '"device=0"' \
  -p 8001:8001 \
  -v ~/.cache/huggingface/hub/models--OleehyO--TexTeller:/root/.cache/huggingface/hub/models--OleehyO--TexTeller:ro \
  -e CUDA_VISIBLE_DEVICES=0 \
  texteller:latest
```

## API Usage

The server accepts JSON requests with either base64-encoded images or image URLs at the `/predict` endpoint.

### Using base64-encoded image

```bash
# Example with base64 image
curl -X POST http://localhost:8001/predict \
  -H "Content-Type: application/json" \
  -d '{
    "image_base64": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUA..."
  }'
```

### Using image URL

```bash
# Example with image URL
curl -X POST http://localhost:8001/predict \
  -H "Content-Type: application/json" \
  -d '{
    "image_url": "https://example.com/math_equation.png"
  }'
```

### Python client example

```python
import requests
import base64

# Method 1: Using base64
with open("equation.png", "rb") as f:
    image_base64 = base64.b64encode(f.read()).decode()

response = requests.post(
    "http://localhost:8001/predict",
    json={"image_base64": image_base64}
)
print(response.json())

# Method 2: Using URL
response = requests.post(
    "http://localhost:8001/predict",
    json={"image_url": "https://example.com/math_equation.png"}
)
print(response.json())
```

Or use the provided test script:

```bash
# Test with a local image
python examples/test_server.py path/to/equation.png

# Test with both local and URL
python examples/test_server.py path/to/equation.png https://example.com/formula.png
```

### Response format

Success response:
```json
{
  "result": "\\frac{a}{b} = c"
}
```

Error response:
```json
{
  "error": "Failed to decode image"
}
```

## Configuration

You can configure the service by modifying environment variables in `docker-compose.yml`:

- `CUDA_VISIBLE_DEVICES`: GPU device ID (default: 0)
- `RAY_NUM_REPLICAS`: Number of Ray Serve replicas (default: 1)
- `RAY_NCPU_PER_REPLICA`: CPUs per replica (default: 4)
- `RAY_NGPU_PER_REPLICA`: GPUs per replica (default: 1)

## Monitoring

```bash
# Check container status
docker ps

# View real-time logs
docker-compose logs -f texteller

# Check GPU usage
nvidia-smi

# Check container resource usage
docker stats texteller-server
```

## Troubleshooting

### GPU not detected
```bash
# Verify NVIDIA runtime is available
docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi
```

### Port already in use
Change the port mapping in `docker-compose.yml`:
```yaml
ports:
  - "8080:8000"  # Host port 8080 -> Container port 8000
```

### Model not found
Ensure the model is downloaded to the correct location:
```bash
ls -la ~/.cache/huggingface/hub/models--OleehyO--TexTeller/
```

## Performance Notes

- **RTX 5080**: Optimized for CUDA 12.8 with cuDNN 9
- **Memory**: Container requires ~4-6GB GPU memory (RTX 5080 has 16GB)
- **Throughput**: ~10-20 images/second depending on image complexity
- **Startup time**: ~30-60 seconds for model loading

## Advanced Configuration

### Multiple GPUs

To use multiple GPUs, modify `docker-compose.yml`:

```yaml
environment:
  - CUDA_VISIBLE_DEVICES=0,1
  - RAY_NUM_REPLICAS=2
deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          device_ids: ['0', '1']
          capabilities: [gpu]
```

### Production deployment

For production, consider:
1. Using a reverse proxy (nginx/traefik) for SSL/TLS
2. Adding authentication middleware
3. Implementing rate limiting
4. Setting up monitoring (Prometheus/Grafana)
5. Using orchestration (Kubernetes) for scaling