5.4 KiB
5.4 KiB
TexTeller Docker Deployment Guide
This guide explains how to deploy TexTeller using Docker with NVIDIA GPU support (optimized for RTX 5080).
Prerequisites
- NVIDIA Driver: Install NVIDIA driver version 525 or later
- NVIDIA Container Toolkit: Required for GPU access in Docker containers
- Docker: Version 20.10 or later
- Docker Compose: Version 1.29 or later (or use
docker composev2) - Pre-downloaded Model: Model should be in
~/.cache/huggingface/hub/models--OleehyO--TexTeller/
Setup NVIDIA Container Toolkit
If you haven't installed the NVIDIA Container Toolkit:
# Add the package repository
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
# Install nvidia-container-toolkit
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
# Restart Docker
sudo systemctl restart docker
Quick Start
The easiest way to deploy is using the provided deployment script:
# Run all checks and deploy
./deploy.sh deploy
# Or check system requirements first
./deploy.sh check
# View available commands
./deploy.sh
Build and Run
Using the Deployment Script (Recommended)
# Full deployment (checks, build, and start)
./deploy.sh deploy
# Just build the image
./deploy.sh build
# Start/stop the service
./deploy.sh start
./deploy.sh stop
# View logs
./deploy.sh logs
# Check status
./deploy.sh status
Using Docker Compose
# Build and start the service
docker-compose up -d
# View logs
docker-compose logs -f
# Stop the service
docker-compose down
Using Docker directly
# Build the image
docker build -t texteller:latest .
# Run the container
docker run -d \
--name texteller-server \
--gpus '"device=0"' \
-p 8001:8001 \
-v ~/.cache/huggingface/hub/models--OleehyO--TexTeller:/root/.cache/huggingface/hub/models--OleehyO--TexTeller:ro \
-e CUDA_VISIBLE_DEVICES=0 \
texteller:latest
API Usage
The server accepts JSON requests with either base64-encoded images or image URLs at the /predict endpoint.
Using base64-encoded image
# Example with base64 image
curl -X POST http://localhost:8001/predict \
-H "Content-Type: application/json" \
-d '{
"image_base64": "..."
}'
Using image URL
# Example with image URL
curl -X POST http://localhost:8001/predict \
-H "Content-Type: application/json" \
-d '{
"image_url": "https://example.com/math_equation.png"
}'
Python client example
import requests
import base64
# Method 1: Using base64
with open("equation.png", "rb") as f:
image_base64 = base64.b64encode(f.read()).decode()
response = requests.post(
"http://localhost:8001/predict",
json={"image_base64": image_base64}
)
print(response.json())
# Method 2: Using URL
response = requests.post(
"http://localhost:8001/predict",
json={"image_url": "https://example.com/math_equation.png"}
)
print(response.json())
Or use the provided test script:
# Test with a local image
python examples/test_server.py path/to/equation.png
# Test with both local and URL
python examples/test_server.py path/to/equation.png https://example.com/formula.png
Response format
Success response:
{
"result": "\\frac{a}{b} = c"
}
Error response:
{
"error": "Failed to decode image"
}
Configuration
You can configure the service by modifying environment variables in docker-compose.yml:
CUDA_VISIBLE_DEVICES: GPU device ID (default: 0)RAY_NUM_REPLICAS: Number of Ray Serve replicas (default: 1)RAY_NCPU_PER_REPLICA: CPUs per replica (default: 4)RAY_NGPU_PER_REPLICA: GPUs per replica (default: 1)
Monitoring
# Check container status
docker ps
# View real-time logs
docker-compose logs -f texteller
# Check GPU usage
nvidia-smi
# Check container resource usage
docker stats texteller-server
Troubleshooting
GPU not detected
# Verify NVIDIA runtime is available
docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi
Port already in use
Change the port mapping in docker-compose.yml:
ports:
- "8080:8000" # Host port 8080 -> Container port 8000
Model not found
Ensure the model is downloaded to the correct location:
ls -la ~/.cache/huggingface/hub/models--OleehyO--TexTeller/
Performance Notes
- RTX 5080: Optimized for CUDA 12.8 with cuDNN 9
- Memory: Container requires ~4-6GB GPU memory (RTX 5080 has 16GB)
- Throughput: ~10-20 images/second depending on image complexity
- Startup time: ~30-60 seconds for model loading
Advanced Configuration
Multiple GPUs
To use multiple GPUs, modify docker-compose.yml:
environment:
- CUDA_VISIBLE_DEVICES=0,1
- RAY_NUM_REPLICAS=2
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ['0', '1']
capabilities: [gpu]
Production deployment
For production, consider:
- Using a reverse proxy (nginx/traefik) for SSL/TLS
- Adding authentication middleware
- Implementing rate limiting
- Setting up monitoring (Prometheus/Grafana)
- Using orchestration (Kubernetes) for scaling