feat: add dockerfile
This commit is contained in:
253
README.docker.md
Normal file
253
README.docker.md
Normal file
@@ -0,0 +1,253 @@
|
||||
# TexTeller Docker Deployment Guide
|
||||
|
||||
This guide explains how to deploy TexTeller using Docker with NVIDIA GPU support (optimized for RTX 5080).
|
||||
|
||||
## Prerequisites
|
||||
|
||||
1. **NVIDIA Driver**: Install NVIDIA driver version 525 or later
|
||||
2. **NVIDIA Container Toolkit**: Required for GPU access in Docker containers
|
||||
3. **Docker**: Version 20.10 or later
|
||||
4. **Docker Compose**: Version 1.29 or later (or use `docker compose` v2)
|
||||
5. **Pre-downloaded Model**: Model should be in `~/.cache/huggingface/hub/models--OleehyO--TexTeller/`
|
||||
|
||||
## Setup NVIDIA Container Toolkit
|
||||
|
||||
If you haven't installed the NVIDIA Container Toolkit:
|
||||
|
||||
```bash
|
||||
# Add the package repository
|
||||
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
|
||||
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
|
||||
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
|
||||
|
||||
# Install nvidia-container-toolkit
|
||||
sudo apt-get update
|
||||
sudo apt-get install -y nvidia-container-toolkit
|
||||
|
||||
# Restart Docker
|
||||
sudo systemctl restart docker
|
||||
```
|
||||
|
||||
## Quick Start
|
||||
|
||||
The easiest way to deploy is using the provided deployment script:
|
||||
|
||||
```bash
|
||||
# Run all checks and deploy
|
||||
./deploy.sh deploy
|
||||
|
||||
# Or check system requirements first
|
||||
./deploy.sh check
|
||||
|
||||
# View available commands
|
||||
./deploy.sh
|
||||
```
|
||||
|
||||
## Build and Run
|
||||
|
||||
### Using the Deployment Script (Recommended)
|
||||
|
||||
```bash
|
||||
# Full deployment (checks, build, and start)
|
||||
./deploy.sh deploy
|
||||
|
||||
# Just build the image
|
||||
./deploy.sh build
|
||||
|
||||
# Start/stop the service
|
||||
./deploy.sh start
|
||||
./deploy.sh stop
|
||||
|
||||
# View logs
|
||||
./deploy.sh logs
|
||||
|
||||
# Check status
|
||||
./deploy.sh status
|
||||
```
|
||||
|
||||
### Using Docker Compose
|
||||
|
||||
```bash
|
||||
# Build and start the service
|
||||
docker-compose up -d
|
||||
|
||||
# View logs
|
||||
docker-compose logs -f
|
||||
|
||||
# Stop the service
|
||||
docker-compose down
|
||||
```
|
||||
|
||||
### Using Docker directly
|
||||
|
||||
```bash
|
||||
# Build the image
|
||||
docker build -t texteller:latest .
|
||||
|
||||
# Run the container
|
||||
docker run -d \
|
||||
--name texteller-server \
|
||||
--gpus '"device=0"' \
|
||||
-p 8001:8001 \
|
||||
-v ~/.cache/huggingface/hub/models--OleehyO--TexTeller:/root/.cache/huggingface/hub/models--OleehyO--TexTeller:ro \
|
||||
-e CUDA_VISIBLE_DEVICES=0 \
|
||||
texteller:latest
|
||||
```
|
||||
|
||||
## API Usage
|
||||
|
||||
The server accepts JSON requests with either base64-encoded images or image URLs at the `/predict` endpoint.
|
||||
|
||||
### Using base64-encoded image
|
||||
|
||||
```bash
|
||||
# Example with base64 image
|
||||
curl -X POST http://localhost:8001/predict \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"image_base64": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUA..."
|
||||
}'
|
||||
```
|
||||
|
||||
### Using image URL
|
||||
|
||||
```bash
|
||||
# Example with image URL
|
||||
curl -X POST http://localhost:8001/predict \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"image_url": "https://example.com/math_equation.png"
|
||||
}'
|
||||
```
|
||||
|
||||
### Python client example
|
||||
|
||||
```python
|
||||
import requests
|
||||
import base64
|
||||
|
||||
# Method 1: Using base64
|
||||
with open("equation.png", "rb") as f:
|
||||
image_base64 = base64.b64encode(f.read()).decode()
|
||||
|
||||
response = requests.post(
|
||||
"http://localhost:8001/predict",
|
||||
json={"image_base64": image_base64}
|
||||
)
|
||||
print(response.json())
|
||||
|
||||
# Method 2: Using URL
|
||||
response = requests.post(
|
||||
"http://localhost:8001/predict",
|
||||
json={"image_url": "https://example.com/math_equation.png"}
|
||||
)
|
||||
print(response.json())
|
||||
```
|
||||
|
||||
Or use the provided test script:
|
||||
|
||||
```bash
|
||||
# Test with a local image
|
||||
python examples/test_server.py path/to/equation.png
|
||||
|
||||
# Test with both local and URL
|
||||
python examples/test_server.py path/to/equation.png https://example.com/formula.png
|
||||
```
|
||||
|
||||
### Response format
|
||||
|
||||
Success response:
|
||||
```json
|
||||
{
|
||||
"result": "\\frac{a}{b} = c"
|
||||
}
|
||||
```
|
||||
|
||||
Error response:
|
||||
```json
|
||||
{
|
||||
"error": "Failed to decode image"
|
||||
}
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
You can configure the service by modifying environment variables in `docker-compose.yml`:
|
||||
|
||||
- `CUDA_VISIBLE_DEVICES`: GPU device ID (default: 0)
|
||||
- `RAY_NUM_REPLICAS`: Number of Ray Serve replicas (default: 1)
|
||||
- `RAY_NCPU_PER_REPLICA`: CPUs per replica (default: 4)
|
||||
- `RAY_NGPU_PER_REPLICA`: GPUs per replica (default: 1)
|
||||
|
||||
## Monitoring
|
||||
|
||||
```bash
|
||||
# Check container status
|
||||
docker ps
|
||||
|
||||
# View real-time logs
|
||||
docker-compose logs -f texteller
|
||||
|
||||
# Check GPU usage
|
||||
nvidia-smi
|
||||
|
||||
# Check container resource usage
|
||||
docker stats texteller-server
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### GPU not detected
|
||||
```bash
|
||||
# Verify NVIDIA runtime is available
|
||||
docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi
|
||||
```
|
||||
|
||||
### Port already in use
|
||||
Change the port mapping in `docker-compose.yml`:
|
||||
```yaml
|
||||
ports:
|
||||
- "8080:8000" # Host port 8080 -> Container port 8000
|
||||
```
|
||||
|
||||
### Model not found
|
||||
Ensure the model is downloaded to the correct location:
|
||||
```bash
|
||||
ls -la ~/.cache/huggingface/hub/models--OleehyO--TexTeller/
|
||||
```
|
||||
|
||||
## Performance Notes
|
||||
|
||||
- **RTX 5080**: Optimized for CUDA 12.8 with cuDNN 9
|
||||
- **Memory**: Container requires ~4-6GB GPU memory (RTX 5080 has 16GB)
|
||||
- **Throughput**: ~10-20 images/second depending on image complexity
|
||||
- **Startup time**: ~30-60 seconds for model loading
|
||||
|
||||
## Advanced Configuration
|
||||
|
||||
### Multiple GPUs
|
||||
|
||||
To use multiple GPUs, modify `docker-compose.yml`:
|
||||
|
||||
```yaml
|
||||
environment:
|
||||
- CUDA_VISIBLE_DEVICES=0,1
|
||||
- RAY_NUM_REPLICAS=2
|
||||
deploy:
|
||||
resources:
|
||||
reservations:
|
||||
devices:
|
||||
- driver: nvidia
|
||||
device_ids: ['0', '1']
|
||||
capabilities: [gpu]
|
||||
```
|
||||
|
||||
### Production deployment
|
||||
|
||||
For production, consider:
|
||||
1. Using a reverse proxy (nginx/traefik) for SSL/TLS
|
||||
2. Adding authentication middleware
|
||||
3. Implementing rate limiting
|
||||
4. Setting up monitoring (Prometheus/Grafana)
|
||||
5. Using orchestration (Kubernetes) for scaling
|
||||
|
||||
Reference in New Issue
Block a user