254 lines
5.4 KiB
Markdown
254 lines
5.4 KiB
Markdown
# TexTeller Docker Deployment Guide
|
|
|
|
This guide explains how to deploy TexTeller using Docker with NVIDIA GPU support (optimized for RTX 5080).
|
|
|
|
## Prerequisites
|
|
|
|
1. **NVIDIA Driver**: Install NVIDIA driver version 525 or later
|
|
2. **NVIDIA Container Toolkit**: Required for GPU access in Docker containers
|
|
3. **Docker**: Version 20.10 or later
|
|
4. **Docker Compose**: Version 1.29 or later (or use `docker compose` v2)
|
|
5. **Pre-downloaded Model**: Model should be in `~/.cache/huggingface/hub/models--OleehyO--TexTeller/`
|
|
|
|
## Setup NVIDIA Container Toolkit
|
|
|
|
If you haven't installed the NVIDIA Container Toolkit:
|
|
|
|
```bash
|
|
# Add the package repository
|
|
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
|
|
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
|
|
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
|
|
|
|
# Install nvidia-container-toolkit
|
|
sudo apt-get update
|
|
sudo apt-get install -y nvidia-container-toolkit
|
|
|
|
# Restart Docker
|
|
sudo systemctl restart docker
|
|
```
|
|
|
|
## Quick Start
|
|
|
|
The easiest way to deploy is using the provided deployment script:
|
|
|
|
```bash
|
|
# Run all checks and deploy
|
|
./deploy.sh deploy
|
|
|
|
# Or check system requirements first
|
|
./deploy.sh check
|
|
|
|
# View available commands
|
|
./deploy.sh
|
|
```
|
|
|
|
## Build and Run
|
|
|
|
### Using the Deployment Script (Recommended)
|
|
|
|
```bash
|
|
# Full deployment (checks, build, and start)
|
|
./deploy.sh deploy
|
|
|
|
# Just build the image
|
|
./deploy.sh build
|
|
|
|
# Start/stop the service
|
|
./deploy.sh start
|
|
./deploy.sh stop
|
|
|
|
# View logs
|
|
./deploy.sh logs
|
|
|
|
# Check status
|
|
./deploy.sh status
|
|
```
|
|
|
|
### Using Docker Compose
|
|
|
|
```bash
|
|
# Build and start the service
|
|
docker-compose up -d
|
|
|
|
# View logs
|
|
docker-compose logs -f
|
|
|
|
# Stop the service
|
|
docker-compose down
|
|
```
|
|
|
|
### Using Docker directly
|
|
|
|
```bash
|
|
# Build the image
|
|
docker build -t texteller:latest .
|
|
|
|
# Run the container
|
|
docker run -d \
|
|
--name texteller-server \
|
|
--gpus '"device=0"' \
|
|
-p 8001:8001 \
|
|
-v ~/.cache/huggingface/hub/models--OleehyO--TexTeller:/root/.cache/huggingface/hub/models--OleehyO--TexTeller:ro \
|
|
-e CUDA_VISIBLE_DEVICES=0 \
|
|
texteller:latest
|
|
```
|
|
|
|
## API Usage
|
|
|
|
The server accepts JSON requests with either base64-encoded images or image URLs at the `/predict` endpoint.
|
|
|
|
### Using base64-encoded image
|
|
|
|
```bash
|
|
# Example with base64 image
|
|
curl -X POST http://localhost:8001/predict \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"image_base64": "..."
|
|
}'
|
|
```
|
|
|
|
### Using image URL
|
|
|
|
```bash
|
|
# Example with image URL
|
|
curl -X POST http://localhost:8001/predict \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"image_url": "https://example.com/math_equation.png"
|
|
}'
|
|
```
|
|
|
|
### Python client example
|
|
|
|
```python
|
|
import requests
|
|
import base64
|
|
|
|
# Method 1: Using base64
|
|
with open("equation.png", "rb") as f:
|
|
image_base64 = base64.b64encode(f.read()).decode()
|
|
|
|
response = requests.post(
|
|
"http://localhost:8001/predict",
|
|
json={"image_base64": image_base64}
|
|
)
|
|
print(response.json())
|
|
|
|
# Method 2: Using URL
|
|
response = requests.post(
|
|
"http://localhost:8001/predict",
|
|
json={"image_url": "https://example.com/math_equation.png"}
|
|
)
|
|
print(response.json())
|
|
```
|
|
|
|
Or use the provided test script:
|
|
|
|
```bash
|
|
# Test with a local image
|
|
python examples/test_server.py path/to/equation.png
|
|
|
|
# Test with both local and URL
|
|
python examples/test_server.py path/to/equation.png https://example.com/formula.png
|
|
```
|
|
|
|
### Response format
|
|
|
|
Success response:
|
|
```json
|
|
{
|
|
"result": "\\frac{a}{b} = c"
|
|
}
|
|
```
|
|
|
|
Error response:
|
|
```json
|
|
{
|
|
"error": "Failed to decode image"
|
|
}
|
|
```
|
|
|
|
## Configuration
|
|
|
|
You can configure the service by modifying environment variables in `docker-compose.yml`:
|
|
|
|
- `CUDA_VISIBLE_DEVICES`: GPU device ID (default: 0)
|
|
- `RAY_NUM_REPLICAS`: Number of Ray Serve replicas (default: 1)
|
|
- `RAY_NCPU_PER_REPLICA`: CPUs per replica (default: 4)
|
|
- `RAY_NGPU_PER_REPLICA`: GPUs per replica (default: 1)
|
|
|
|
## Monitoring
|
|
|
|
```bash
|
|
# Check container status
|
|
docker ps
|
|
|
|
# View real-time logs
|
|
docker-compose logs -f texteller
|
|
|
|
# Check GPU usage
|
|
nvidia-smi
|
|
|
|
# Check container resource usage
|
|
docker stats texteller-server
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### GPU not detected
|
|
```bash
|
|
# Verify NVIDIA runtime is available
|
|
docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi
|
|
```
|
|
|
|
### Port already in use
|
|
Change the port mapping in `docker-compose.yml`:
|
|
```yaml
|
|
ports:
|
|
- "8080:8000" # Host port 8080 -> Container port 8000
|
|
```
|
|
|
|
### Model not found
|
|
Ensure the model is downloaded to the correct location:
|
|
```bash
|
|
ls -la ~/.cache/huggingface/hub/models--OleehyO--TexTeller/
|
|
```
|
|
|
|
## Performance Notes
|
|
|
|
- **RTX 5080**: Optimized for CUDA 12.8 with cuDNN 9
|
|
- **Memory**: Container requires ~4-6GB GPU memory (RTX 5080 has 16GB)
|
|
- **Throughput**: ~10-20 images/second depending on image complexity
|
|
- **Startup time**: ~30-60 seconds for model loading
|
|
|
|
## Advanced Configuration
|
|
|
|
### Multiple GPUs
|
|
|
|
To use multiple GPUs, modify `docker-compose.yml`:
|
|
|
|
```yaml
|
|
environment:
|
|
- CUDA_VISIBLE_DEVICES=0,1
|
|
- RAY_NUM_REPLICAS=2
|
|
deploy:
|
|
resources:
|
|
reservations:
|
|
devices:
|
|
- driver: nvidia
|
|
device_ids: ['0', '1']
|
|
capabilities: [gpu]
|
|
```
|
|
|
|
### Production deployment
|
|
|
|
For production, consider:
|
|
1. Using a reverse proxy (nginx/traefik) for SSL/TLS
|
|
2. Adding authentication middleware
|
|
3. Implementing rate limiting
|
|
4. Setting up monitoring (Prometheus/Grafana)
|
|
5. Using orchestration (Kubernetes) for scaling
|
|
|