Files
TexTeller/README.docker.md
yoge ba0968b2da
Some checks failed
Sphinx: Render docs / build (push) Has been cancelled
Python Linting / lint (push) Has been cancelled
Run Tests with Pytest / test (push) Has been cancelled
feat: add dockerfile
2025-12-15 22:31:13 +08:00

254 lines
5.4 KiB
Markdown

# TexTeller Docker Deployment Guide
This guide explains how to deploy TexTeller using Docker with NVIDIA GPU support (optimized for RTX 5080).
## Prerequisites
1. **NVIDIA Driver**: Install NVIDIA driver version 525 or later
2. **NVIDIA Container Toolkit**: Required for GPU access in Docker containers
3. **Docker**: Version 20.10 or later
4. **Docker Compose**: Version 1.29 or later (or use `docker compose` v2)
5. **Pre-downloaded Model**: Model should be in `~/.cache/huggingface/hub/models--OleehyO--TexTeller/`
## Setup NVIDIA Container Toolkit
If you haven't installed the NVIDIA Container Toolkit:
```bash
# Add the package repository
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
# Install nvidia-container-toolkit
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
# Restart Docker
sudo systemctl restart docker
```
## Quick Start
The easiest way to deploy is using the provided deployment script:
```bash
# Run all checks and deploy
./deploy.sh deploy
# Or check system requirements first
./deploy.sh check
# View available commands
./deploy.sh
```
## Build and Run
### Using the Deployment Script (Recommended)
```bash
# Full deployment (checks, build, and start)
./deploy.sh deploy
# Just build the image
./deploy.sh build
# Start/stop the service
./deploy.sh start
./deploy.sh stop
# View logs
./deploy.sh logs
# Check status
./deploy.sh status
```
### Using Docker Compose
```bash
# Build and start the service
docker-compose up -d
# View logs
docker-compose logs -f
# Stop the service
docker-compose down
```
### Using Docker directly
```bash
# Build the image
docker build -t texteller:latest .
# Run the container
docker run -d \
--name texteller-server \
--gpus '"device=0"' \
-p 8001:8001 \
-v ~/.cache/huggingface/hub/models--OleehyO--TexTeller:/root/.cache/huggingface/hub/models--OleehyO--TexTeller:ro \
-e CUDA_VISIBLE_DEVICES=0 \
texteller:latest
```
## API Usage
The server accepts JSON requests with either base64-encoded images or image URLs at the `/predict` endpoint.
### Using base64-encoded image
```bash
# Example with base64 image
curl -X POST http://localhost:8001/predict \
-H "Content-Type: application/json" \
-d '{
"image_base64": "..."
}'
```
### Using image URL
```bash
# Example with image URL
curl -X POST http://localhost:8001/predict \
-H "Content-Type: application/json" \
-d '{
"image_url": "https://example.com/math_equation.png"
}'
```
### Python client example
```python
import requests
import base64
# Method 1: Using base64
with open("equation.png", "rb") as f:
image_base64 = base64.b64encode(f.read()).decode()
response = requests.post(
"http://localhost:8001/predict",
json={"image_base64": image_base64}
)
print(response.json())
# Method 2: Using URL
response = requests.post(
"http://localhost:8001/predict",
json={"image_url": "https://example.com/math_equation.png"}
)
print(response.json())
```
Or use the provided test script:
```bash
# Test with a local image
python examples/test_server.py path/to/equation.png
# Test with both local and URL
python examples/test_server.py path/to/equation.png https://example.com/formula.png
```
### Response format
Success response:
```json
{
"result": "\\frac{a}{b} = c"
}
```
Error response:
```json
{
"error": "Failed to decode image"
}
```
## Configuration
You can configure the service by modifying environment variables in `docker-compose.yml`:
- `CUDA_VISIBLE_DEVICES`: GPU device ID (default: 0)
- `RAY_NUM_REPLICAS`: Number of Ray Serve replicas (default: 1)
- `RAY_NCPU_PER_REPLICA`: CPUs per replica (default: 4)
- `RAY_NGPU_PER_REPLICA`: GPUs per replica (default: 1)
## Monitoring
```bash
# Check container status
docker ps
# View real-time logs
docker-compose logs -f texteller
# Check GPU usage
nvidia-smi
# Check container resource usage
docker stats texteller-server
```
## Troubleshooting
### GPU not detected
```bash
# Verify NVIDIA runtime is available
docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi
```
### Port already in use
Change the port mapping in `docker-compose.yml`:
```yaml
ports:
- "8080:8000" # Host port 8080 -> Container port 8000
```
### Model not found
Ensure the model is downloaded to the correct location:
```bash
ls -la ~/.cache/huggingface/hub/models--OleehyO--TexTeller/
```
## Performance Notes
- **RTX 5080**: Optimized for CUDA 12.8 with cuDNN 9
- **Memory**: Container requires ~4-6GB GPU memory (RTX 5080 has 16GB)
- **Throughput**: ~10-20 images/second depending on image complexity
- **Startup time**: ~30-60 seconds for model loading
## Advanced Configuration
### Multiple GPUs
To use multiple GPUs, modify `docker-compose.yml`:
```yaml
environment:
- CUDA_VISIBLE_DEVICES=0,1
- RAY_NUM_REPLICAS=2
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ['0', '1']
capabilities: [gpu]
```
### Production deployment
For production, consider:
1. Using a reverse proxy (nginx/traefik) for SSL/TLS
2. Adding authentication middleware
3. Implementing rate limiting
4. Setting up monitoring (Prometheus/Grafana)
5. Using orchestration (Kubernetes) for scaling