This commit is contained in:
三洋三洋
2024-03-18 15:48:04 +00:00
parent 5d089b5a7f
commit 74341c7e8a
6 changed files with 330 additions and 118 deletions

142
README.md
View File

@@ -1,31 +1,46 @@
<div align="center">
<h1><img src="./assets/fire.svg" width=30, height=30>
𝚃𝚎𝚡𝚃𝚎𝚕𝚕𝚎𝚛 <img src="./assets/fire.svg" width=30, height=30> </h1>
<p align="center">
English | <a href="./assets/README_zh.md">中文版本</a>
</p>
<p align="center">
<img src="./assets/web_demo.gif" alt="TexTeller_demo" width=800>
</p>
<h1>
<img src="./assets/fire.svg" width=30, height=30>
𝚃𝚎𝚡𝚃𝚎𝚕𝚕𝚎𝚛
<img src="./assets/fire.svg" width=30, height=30>
</h1>
<p align="center">
English | <a href="./assets/README_zh.md">中文</a>
</p>
<p align="center">
<img src="./assets/web_demo.gif" alt="TexTeller_demo" width=800>
</p>
</div>
TexTeller is a ViT-based model designed for end-to-end formula recognition. It can recognize formulas in natural images and convert them into LaTeX-style formulas.
TexTeller is an end-to-end formula recognition model based on ViT, capable of converting images into corresponding LaTeX formulas.
TexTeller is trained on a larger dataset of image-formula pairs (a 550K dataset available [here](https://huggingface.co/datasets/OleehyO/latex-formulas)), **exhibits superior generalization ability and higher accuracy compared to [LaTeX-OCR](https://github.com/lukas-blecher/LaTeX-OCR)**, which uses approximately 100K data points. This larger dataset enables TexTeller to cover most usage scenarios more effectively( **excluding scanned images and handwritten formulas** ).
> A TexTeller checkpoint trained on a 5.5M dataset will be released soon.
TexTeller was trained with ~~550K~~7.5M image-formula pairs (dataset available [here](https://huggingface.co/datasets/OleehyO/latex-formulas)), compared to [LaTeX-OCR](https://github.com/lukas-blecher/LaTeX-OCR) which used a 100K dataset, TexTeller has **stronger generalization abilities** and **higher accuracy**, covering most use cases (**except for scanned images and handwritten formulas**).
## Prerequisites
> ~~We will soon release a TexTeller checkpoint trained on a 7.5M dataset~~
## 🔄 Change Log
* 📮[2024-03-24] TexTeller 2.0 released! The training data for TexTeller 2.0 has been increased to 7.5M (about **15 times more** than TexTeller 1.0 and also improved in data quality). The trained TexTeller 2.0 demonstrated **superior performance** in the test set, especially in recognizing rare symbols, complex multi-line formulas, and matrices.
## 🔑 Prerequisites
python=3.10
pytorch
> Note: Only CUDA version >= 12.0 have been fully tested, so we recommend using CUDA version>=12.0
> Note: Only CUDA versions >= 12.0 have been fully tested, so it is recommended to use CUDA version >= 12.0
## Getting Started
## 🖼 About Rendering LaTeX as Images
* **Install XeLaTex** and ensure `xelatex` can be called directly from the command line.
* To ensure correct rendering of the predicted formulas, **include the following packages** in your `.tex` file:
```tex
\usepackage{multirow,multicol,amsmath,amsfonts,amssymb,mathtools,bm,mathrsfs,wasysym,amsbsy,upgreek,mathalfa,stmaryrd,mathrsfs,dsfont,amsthm,amsmath,multirow}
```
## 🚀 Getting Started
1. Clone the repository:
@@ -33,13 +48,13 @@ pytorch
git clone https://github.com/OleehyO/TexTeller
```
2. After [pytorch installation](https://pytorch.org/get-started/locally/#start-locally), install the required packages:
2. After [installing pytorch](https://pytorch.org/get-started/locally/#start-locally), install the project's dependencies:
```bash
pip install -r requirements.txt
```
3. Navigate to the `TexTeller/src` directory and run the following command to perform inference:
3. Enter the `TexTeller/src` directory and run the following command in the terminal to start inference:
```bash
python inference.py -img "/path/to/image.{jpg,png}"
@@ -47,87 +62,104 @@ pytorch
#+e.g. python inference.py -img "./img.jpg" -cuda
```
> Checkpoints will be downloaded in your first run.
> The first time you run it, the required checkpoints will be downloaded from Hugging Face
## Web Demo
## 🌐 Web Demo
You can also run the web demo by navigating to the `TexTeller/src` directory and running the following command:
To start the web demo, you need to first enter the `TexTeller/src` directory, then run the following command
```bash
./start_web.sh
```
Then go to `http://localhost:8501` in your browser to run TexTeller in the web.
Then, enter `http://localhost:8501` in your browser to see the web demo
> You can change the default settings in `start_web.sh`, such as inference with GPU(e.g. `USE_CUDA=True`) or increase the number of beams(e.g. `NUM_BEAM=3`) for higher accuracy.
> You can change the default configuration of `start_web.sh`, for example, to use GPU for inference (e.g. `USE_CUDA=True`) or to increase the number of beams (e.g. `NUM_BEAM=3`) to achieve higher accuracy
## API
**NOTE:** If you want to directly render the prediction results as images on the web (for example, to check if the prediction is correct), you need to ensure [xelatex is correctly installed](https://github.com/OleehyO/TexTeller?tab=readme-ov-file#Rendering-Predicted-Results)
We use [ray serve](https://github.com/ray-project/ray) to provide a simple API for using TexTeller in your own projects. To start the server, navigate to the `TexTeller/src` directory and run the following command:
## 📡 API Usage
We use [ray serve](https://github.com/ray-project/ray) to provide an API interface for TexTeller, allowing you to integrate TexTeller into your own projects. To start the server, you first need to enter the `TexTeller/src` directory and then run the following command:
```bash
python server.py # default settings
```
You can pass the following arguments to the `server.py` script to get custom inference settings(e.g. `python server.py --use_gpu` to enable GPU inference):
You can pass the following arguments to `server.py` to change the server's inference settings (e.g. `python server.py --use_gpu` to enable GPU inference):
| Argument | Description |
| Parameter | Description |
| --- | --- |
| `-ckpt` | Path to the checkpoint file to load, default is TexTeller pretrained model. |
| `-tknz` | Path to the tokenizer, default is TexTeller tokenizer. |
| `-port` | Port number to run the server on, *default is 8000*. |
| `--use_gpu` | Whether to use GPU for inference. |
| `--num_beams` | Number of beams to use for beam search decoding, *default is 1*. |
| `--num_replicas` | Number of replicas to run the server on, *default is 1*. You can use this to get higher throughput. |
| `--ncpu_per_replica` | Number of CPU cores to use per replica, *default is 1*. |
| `--ngpu_per_replica` | Number of GPUs to use per replica, *default is 1*. You can set this to 0~1 to run multiple replicas on a single GPU(if --num_replicas 2, --ngpu_per_replica 0.7, then 2 gpus are required) |
| `-ckpt` | The path to the weights file, *default is TexTeller's pretrained weights*.|
| `-tknz` | The path to the tokenizer, *default is TexTeller's tokenizer*.|
| `-port` | The server's service port, *default is 8000*. |
| `--use_gpu` | Whether to use GPU for inference, *default is CPU*. |
| `--num_beams` | The number of beams for beam search, *default is 1*. |
| `--num_replicas` | The number of service replicas to run on the server, *default is 1 replica*. You can use more replicas to achieve greater throughput.|
| `--ncpu_per_replica` | The number of CPU cores used per service replica, *default is 1*. |
| `--ngpu_per_replica` | The number of GPUs used per service replica, *default is 1*. You can set this value between 0 and 1 to run multiple service replicas on one GPU to share the GPU, thereby improving GPU utilization. (Note, if --num_replicas is 2, --ngpu_per_replica is 0.7, then 2 GPUs must be available) |
> Client demo can be found in `TexTeller/client/demo.py`, you can refer to `demo.py` to send requests to the server.
> A client demo can be found at `TexTeller/client/demo.py`, you can refer to `demo.py` to send requests to the server
## Training
## 🏋️‍♂️ Training
### Dataset
We provide a dataset example in `TexTeller/src/models/ocr_model/train/dataset`, and you can place your own images in the `images` directory and annotate the corresponding formula for each image in `formulas.jsonl`.
We provide an example dataset in the `TexTeller/src/models/ocr_model/train/dataset` directory, you can place your own images in the `images` directory and annotate each image with its corresponding formula in `formulas.jsonl`.
After the dataset is ready, you should **change the `DIR_URL` variable** in `.../dataset/loader.py` to the path of your dataset.
After preparing your dataset, you need to **change the `DIR_URL` variable to your own dataset's path** in `.../dataset/loader.py`
### Retrain the tokenizer
### Retraining the Tokenizer
If you are using a different dataset, you may need to retrain the tokenizer to match your specific vocabulary. After setting up the dataset, you can do this by:
If you are using a different dataset, you might need to retrain the tokenizer to obtain a different dictionary. After configuring your dataset, you can train your own tokenizer with the following command:
1. Change the line `new_tokenizer.save_pretrained('./your_dir_name')` in `TexTeller/src/models/tokenizer/train.py` to your desired output directory name.
> To use a different vocabulary size, you should modify the `VOCAB_SIZE` parameter in the `TexTeller/src/models/globals.py`.
1. In `TexTeller/src/models/tokenizer/train.py`, change `new_tokenizer.save_pretrained('./your_dir_name')` to your custom output directory
> If you want to use a different dictionary size (default is 10k tokens), you need to change the `VOCAB_SIZE` variable in `TexTeller/src/models/globals.py`
2. Running the following command **under `TexTeller/src` directory**:
2. **In the `TexTeller/src` directory**, run the following command:
```bash
python -m models.tokenizer.train
```
### Train the model
### Training the Model
To train the model, you can run the following command **under `TexTeller/src` directory**:
To train the model, you need to run the following command in the `TexTeller/src` directory:
```bash
python -m models.ocr_model.train.train
```
You can set your own tokenizer and checkpoint path(or fine-tune the default model checkpoint if you don't use your own tokenizer while keeping the same model architecture) in `TexTeller/src/models/ocr_model/train/train.py`.
> Please refer to `train.py` for more details.
You can set your own tokenizer and checkpoint paths in `TexTeller/src/models/ocr_model/train/train.py` (refer to `train.py` for more information). If you are using the same architecture and dictionary as TexTeller, you can also fine-tune TexTeller's default weights with your own dataset.
Model architecture and training hyperparameters can be adjusted in `TexTeller/src/globals.py` and `TexTeller/src/models/ocr_model/train/train_args.py`.
In `TexTeller/src/globals.py` and `TexTeller/src/models/ocr_model/train/train_args.py`, you can change the model's architecture and training hyperparameters.
> We use the [Hugging Face Transformers](https://github.com/huggingface/transformers) library for model training, so you can find more details about the training hyperparameters in their [documentation](https://huggingface.co/docs/transformers/v4.32.1/main_classes/trainer#transformers.TrainingArguments).
> Our training scripts use the [Hugging Face Transformers](https://github.com/huggingface/transformers) library, so you can refer to their [documentation](https://huggingface.co/docs/transformers/v4.32.1/main_classes/trainer#transformers.TrainingArguments) for more details and configurations on training parameters.
## To-Do
## 🚧 Limitations
- [ ] Train our model with a larger amount of data(5.5M samples, and soon to be released).
* Some complex multi-line scenarios are not well handled (e.g., long formulas mixed with matrices)
- [ ] Inference acceleration.
* Does not support scanned images and PDF document recognition
* Does not support handwritten formulas
## 📅 Plans
- [x] ~~Train the model with a larger dataset (7.5M samples, coming soon)~~
- [ ] Recognition of scanned images
- [ ] PDF document recognition + Support for English and Chinese scenarios
- [ ] Inference acceleration
- [ ] ...
## Acknowledgements
## 💖 Acknowledgments
Thanks to [LaTeX-OCR](https://github.com/lukas-blecher/LaTeX-OCR) which has brought me a lot of inspiration, and [im2latex-100K](https://zenodo.org/records/56198#.V2px0jXT6eA) which enriches our dataset.
## ⭐️ Stargazers over time
[![Stargazers over time](https://starchart.cc/OleehyO/TexTeller.svg?variant=adaptive)](https://starchart.cc/OleehyO/TexTeller)

View File

@@ -1,24 +1,28 @@
<div align="center">
<h1><img src="./fire.svg" width=30, height=30>
𝚃𝚎𝚡𝚃𝚎𝚕𝚕𝚎𝚛 <img src="./fire.svg" width=30, height=30> </h1>
<p align="center">
<a href="../README.md">English</a> | 中文版本
</p>
<p align="center">
<img src="./web_demo.gif" alt="TexTeller_demo" width=800>
</p>
<h1>
<img src="./fire.svg" width=30, height=30>
𝚃𝚎𝚡𝚃𝚎𝚕𝚕𝚎𝚛
<img src="./fire.svg" width=30, height=30>
</h1>
<p align="center">
<a href="../README.md">English</a> | 中文
</p>
<p align="center">
<img src="./web_demo.gif" alt="TexTeller_demo" width=800>
</p>
</div>
TexTeller是一个基于ViT的端到端公式识别模型可以把图片转换为对应的latex公式
TexTeller用了550K的图片-公式对进行训练(数据集可以在[这里](https://huggingface.co/datasets/OleehyO/latex-formulas)获取),相比于[LaTeX-OCR](https://github.com/lukas-blecher/LaTeX-OCR)(使用了一个100K的数据集)TexTeller具有**更强的泛化能力**以及**更高的准确率**,可以覆盖大部分的使用场景(**扫描图片,手写公式除外**)。
TexTeller用了~~550K~~7.5M的图片-公式对进行训练(数据集可以在[这里](https://huggingface.co/datasets/OleehyO/latex-formulas)获取),相比于[LaTeX-OCR](https://github.com/lukas-blecher/LaTeX-OCR)(使用了一个100K的数据集)TexTeller具有**更强的泛化能力**以及**更高的准确率**,可以覆盖大部分的使用场景(**扫描图片,手写公式除外**)。
> 我们马上就会发布一个使用5.5M数据集进行训练的TexTeller checkpoint
> ~~我们马上就会发布一个使用7.5M数据集进行训练的TexTeller checkpoint~~
## 前置条件
## 🔄 变更信息
* 📮[2024-03-24] TexTeller2.0发布TexTeller2.0的训练数据增大到了7.5M(相较于TexTeller1.0**增加了~15倍**并且数据质量也有所改善)。训练后的TexTeller2.0在测试集中展现出了**更加优越的性能**,尤其在生僻符号、复杂多行、矩阵的识别场景中。
## 🔑 前置条件
python=3.10
@@ -26,7 +30,17 @@ pytorch
> 注意: 只有CUDA版本>= 12.0被完全测试过,所以最好使用>= 12.0的CUDA版本
## Getting Started
## 🖼 关于把latex渲染成图片
* **安装XeLaTex** 并确保`xelatex`可以直接被命令行调用。
* 为了确保正确渲染预测出的公式, 需要在`.tex`文件中**引入以下宏包**:
```tex
\usepackage{multirow,multicol,amsmath,amsfonts,amssymb,mathtools,bm,mathrsfs,wasysym,amsbsy,upgreek,mathalfa,stmaryrd,mathrsfs,dsfont,amsthm,amsmath,multirow}
```
## 🚀 开搞
1. 克隆本仓库:
@@ -50,7 +64,7 @@ pytorch
> 第一次运行时会在hugging face上下载所需要的checkpoints
## FAQ无法连接到Hugging Face
## ❓ 常见问题无法连接到Hugging Face
默认情况下会在Hugging Face中下载模型权重**如果你的远端服务器无法连接到Hugging Face**,你可以通过以下命令进行加载:
@@ -78,7 +92,7 @@ pytorch
2. 把这个目录上传远端服务器,并在`TexTeller/src/models/ocr_model/utils/metrics.py`中把`evaluate.load('google_bleu')`改为`evaluate.load('your/dir/path/google_bleu.py')`
## Web Demo
## 🌐 网页演示
要想启动web demo你需要先进入 `TexTeller/src` 目录,然后运行以下命令
@@ -90,7 +104,9 @@ pytorch
> 你可以改变`start_web.sh`的默认配置, 例如使用GPU进行推理(e.g. `USE_CUDA=True`) 或者增加beams的数量(e.g. `NUM_BEAM=3`)来获得更高的精确度
## API
**NOTE:** 如果你想直接把预测结果在网页上渲染成图片(比如为了检查预测结果是否正确)你需要确保[xelatex被正确安装](https://github.com/OleehyO/TexTeller?tab=readme-ov-file#Rendering-Predicted-Results)
## 📡 API调用
我们使用[ray serve](https://github.com/ray-project/ray)来对外提供一个TexTeller的API接口通过使用这个接口你可以把TexTeller整合到自己的项目里。要想启动server你需要先进入`TexTeller/src`目录然后运行以下命令:
@@ -100,28 +116,28 @@ python server.py # default settings
你可以给`server.py`传递以下参数来改变server的推理设置(e.g. `python server.py --use_gpu` 来启动GPU推理):
| Argument | Description |
| 参数 | 描述 |
| --- | --- |
| `-ckpt` | Path to the checkpoint file to load, default is TexTeller pretrained model. |
| `-tknz` | Path to the tokenizer, default is TexTeller tokenizer. |
| `-port` | Port number to run the server on, *default is 8000*. |
| `--use_gpu` | Whether to use GPU for inference. |
| `--num_beams` | Number of beams to use for beam search decoding, *default is 1*. |
| `--num_replicas` | Number of replicas to run the server on, *default is 1*. You can use this to get higher throughput. |
| `--ncpu_per_replica` | Number of CPU cores to use per replica, *default is 1*. |
| `--ngpu_per_replica` | Number of GPUs to use per replica, *default is 1*. You can set this to 0~1 to run multiple replicas on a single GPU(if --num_replicas 2, --ngpu_per_replica 0.7, then 2 gpus are required) |
| `-ckpt` | 权重文件的路径,*默认为TexTeller的预训练权重*。|
| `-tknz` | 分词器的路径, *默认为TexTeller的分词器*。|
| `-port` | 服务器的服务端口, *默认是8000* |
| `--use_gpu` | 是否使用GPU推理*默认为CPU*。 |
| `--num_beams` | beam search的beam数量 *默认是1* |
| `--num_replicas` | 在服务器上运行的服务副本数量, *默认1个副本*。你可以使用更多的副本来获取更大的吞吐量。|
| `--ncpu_per_replica` | 每个服务副本所用的CPU核心数*默认为1* |
| `--ngpu_per_replica` | 每个服务副本所用的GPU数量*默认为1*。你可以把这个值设置成 0~1之间的数这样会在一个GPU上运行多个服务副本来共享GPU从而提高GPU的利用率。(注意,如果 --num_replicas 2, --ngpu_per_replica 0.7, 那么就必须要有2个GPU可用) |
> 一个客户端demo可以在`TexTeller/client/demo.py`找到,你可以参考`demo.py`来给server发送请求
## Training
## 🏋️‍♂️ 训练
### Dataset
### 数据集
我们在`TexTeller/src/models/ocr_model/train/dataset`目录中提供了一个数据集的例子,你可以把自己的图片放在`images`目录然后在`formulas.jsonl`中为每张图片标注对应的公式。
准备好数据集后,你需要在`.../dataset/loader.py`中把 **`DIR_URL`变量改成你自己数据集的路径**
### Retrain the tokenizer
### 重新训练分词器
如果你使用了不一样的数据集你可能需要重新训练tokenizer来得到一个不一样的字典。配置好数据集后可以通过以下命令来训练自己的tokenizer
@@ -134,7 +150,7 @@ python server.py # default settings
python -m models.tokenizer.train
```
### Train the model
### 训练模型
要想训练模型, 你需要在`TexTeller/src`目录下运行以下命令:
@@ -148,14 +164,38 @@ python -m models.ocr_model.train.train
> 我们的训练脚本使用了[Hugging Face Transformers](https://github.com/huggingface/transformers)库, 所以你可以参考他们提供的[文档](https://huggingface.co/docs/transformers/v4.32.1/main_classes/trainer#transformers.TrainingArguments)来获取更多训练参数的细节以及配置。
## To-Do
## 🚧 不足
- [ ] 使用更大的数据集来训练模型(5.5M样本,即将发布)
* 部分细节很多的公式无法做到100%的准确率
<img src="" width=30, height=30>
* 部分复杂的大型多行公式识别效果不佳(例如长公式与矩阵混合)
<img src="" width=30, height=30>
> 如果遇到这种情况,你可以尝试把大型的多行公式分成多个小的子公式来识别。
* 不支持扫描图片以及PDF文档识别
* 不支持手写体公式
## 📅 计划
- [x] ~~使用更大的数据集来训练模型(7.5M样本,即将发布)~~
- [ ] 扫描图片识别
- [ ] PDF文档识别 + 中英文场景支持
- [ ] 推理加速
- [ ] ...
## Acknowledgements
## 💖 感谢
Thanks to [LaTeX-OCR](https://github.com/lukas-blecher/LaTeX-OCR) which has brought me a lot of inspiration, and [im2latex-100K](https://zenodo.org/records/56198#.V2px0jXT6eA) which enriches our dataset.
Thanks to [LaTeX-OCR](https://github.com/lukas-blecher/LaTeX-OCR) which has brought me a lot of inspiration, and [im2latex-100K](https://zenodo.org/records/56198#.V2px0jXT6eA) which enriches our dataset.
## ⭐️ 观星曲线
[![Stargazers over time](https://starchart.cc/OleehyO/TexTeller.svg?variant=adaptive)](https://starchart.cc/OleehyO/TexTeller)

View File

@@ -7,4 +7,6 @@ ray[serve]
accelerate
tensorboardX
nltk
python-multipart
python-multipart
pdf2image

View File

@@ -3,7 +3,7 @@ IMAGE_MEAN = 0.9545467
IMAGE_STD = 0.15394445
# Vocabulary size for TexTeller
VOCAB_SIZE = 10000
VOCAB_SIZE = 15000
# Fixed size for input image for TexTeller
FIXED_IMG_SIZE = 448
@@ -12,7 +12,7 @@ FIXED_IMG_SIZE = 448
IMG_CHANNELS = 1 # grayscale image
# Max size of token for embedding
MAX_TOKEN_SIZE = 512
MAX_TOKEN_SIZE = 1024
# Scaling ratio for random resizing when training
MAX_RESIZE_RATIO = 1.15

View File

@@ -17,7 +17,7 @@ from transformers import (
class TexTeller(VisionEncoderDecoderModel):
REPO_NAME = 'OleehyO/TexTeller'
REPO_NAME = '/home/lhy/code/TexTeller/src/models/ocr_model/train/train_result/TexTellerv2/checkpoint-356000'
def __init__(self, decoder_path=None, tokenizer_path=None):
encoder = ViTModel(ViTConfig(
image_size=FIXED_IMG_SIZE,

View File

@@ -2,13 +2,65 @@ import os
import io
import base64
import tempfile
import time
import subprocess
import shutil
import streamlit as st
from PIL import Image
from PIL import Image, ImageChops
from pathlib import Path
from pdf2image import convert_from_path
from models.ocr_model.utils.inference import inference
from models.ocr_model.model.TexTeller import TexTeller
html_string = '''
<h1 style="color: black; text-align: center;">
<img src="https://slackmojis.com/emojis/429-troll/download" width="50">
TexTeller
<img src="https://slackmojis.com/emojis/429-troll/download" width="50">
</h1>
'''
suc_gif_html = '''
<h1 style="color: black; text-align: center;">
<img src="https://slackmojis.com/emojis/90621-clapclap-e/download" width="50">
<img src="https://slackmojis.com/emojis/90621-clapclap-e/download" width="50">
<img src="https://slackmojis.com/emojis/90621-clapclap-e/download" width="50">
</h1>
'''
fail_gif_html = '''
<h1 style="color: black; text-align: center;">
<img src="https://slackmojis.com/emojis/51439-allthethings_intensifies/download" >
<img src="https://slackmojis.com/emojis/51439-allthethings_intensifies/download" >
<img src="https://slackmojis.com/emojis/51439-allthethings_intensifies/download" >
</h1>
'''
tex = r'''
\documentclass{{article}}
\usepackage[
left=1in, % 左边距
right=1in, % 右边距
top=1in, % 上边距
bottom=1in,% 下边距
paperwidth=40cm, % 页面宽度
paperheight=40cm % 页面高度这里以A4纸为例
]{{geometry}}
\usepackage[utf8]{{inputenc}}
\usepackage{{multirow,multicol,amsmath,amsfonts,amssymb,mathtools,bm,mathrsfs,wasysym,amsbsy,upgreek,mathalfa,stmaryrd,mathrsfs,dsfont,amsthm,amsmath,multirow}}
\begin{{document}}
{formula}
\pagenumbering{{gobble}}
\end{{document}}
'''
@st.cache_resource
def get_model():
return TexTeller.from_pretrained(os.environ['CHECKPOINT_DIR'])
@@ -18,24 +70,74 @@ def get_model():
def get_tokenizer():
return TexTeller.get_tokenizer(os.environ['TOKENIZER_DIR'])
def get_image_base64(img_file):
buffered = io.BytesIO()
img_file.seek(0)
img = Image.open(img_file)
img.save(buffered, format="PNG")
return base64.b64encode(buffered.getvalue()).decode()
def rendering(formula: str, out_img_path: Path) -> bool:
build_dir = out_img_path / 'build'
build_dir.mkdir(exist_ok=True, parents=True)
f = build_dir / 'formula.tex'
f.touch(exist_ok=True)
f.write_text(tex.format(formula=formula))
p = subprocess.Popen([
'xelatex',
f'-output-directory={build_dir}',
'-interaction=nonstopmode',
'-halt-on-error',
f'{f}'
])
p.communicate()
return p.returncode == 0
def pdf_to_pngbytes(pdf_path):
images = convert_from_path(pdf_path, first_page=1, last_page=1)
trimmed_images = trim(images[0])
png_image_bytes = io.BytesIO()
trimmed_images.save(png_image_bytes, format='PNG')
png_image_bytes.seek(0)
return png_image_bytes
def trim(im):
bg = Image.new(im.mode, im.size, im.getpixel((0,0)))
diff = ImageChops.difference(im, bg)
diff = ImageChops.add(diff, diff, 2.0, -100)
bbox = diff.getbbox()
if bbox:
return im.crop(bbox)
return im
model = get_model()
tokenizer = get_tokenizer()
# check if xelatex is installed
xelatex_installed = os.system('which xelatex > /dev/null 2>&1') == 0
if "start" not in st.session_state:
st.session_state["start"] = 1
if xelatex_installed:
st.toast('Hooray!', icon='🎉')
time.sleep(0.5)
st.toast("Xelatex have been detected.", icon='')
else:
st.error('xelatex is not installed. Please install it before using TexTeller.')
# ============================ pages =============================== #
html_string = '''
<h1 style="color: orange; text-align: center;">
✨ TexTeller ✨
</h1>
'''
st.markdown(html_string, unsafe_allow_html=True)
if "start" not in st.session_state:
st.balloons()
st.session_state["start"] = 1
uploaded_file = st.file_uploader("",type=['jpg', 'png', 'pdf'])
uploaded_file = st.file_uploader("",type=['jpg', 'png'])
if xelatex_installed:
st.caption('🥳 Xelatex have been detected, rendered image will be displayed in the web page.')
else:
st.caption('😭 Xelatex is not detected, please check the resulting latex code by yourself, or check ... to have your xelatex setup ready.')
if uploaded_file:
img = Image.open(uploaded_file)
@@ -44,13 +146,6 @@ if uploaded_file:
png_file_path = os.path.join(temp_dir, 'image.png')
img.save(png_file_path, 'PNG')
def get_image_base64(img_file):
buffered = io.BytesIO()
img_file.seek(0)
img = Image.open(img_file)
img.save(buffered, format="PNG")
return base64.b64encode(buffered.getvalue()).decode()
img_base64 = get_image_base64(uploaded_file)
st.markdown(f"""
@@ -62,7 +157,8 @@ if uploaded_file:
display: block;
margin-left: auto;
margin-right: auto;
max-width: 700px;
max-width: 500px;
max-height: 500px;
}}
</style>
<div class="centered-container">
@@ -71,7 +167,6 @@ if uploaded_file:
</div>
""", unsafe_allow_html=True)
st.write("")
st.write("")
with st.spinner("Predicting..."):
@@ -83,11 +178,54 @@ if uploaded_file:
True if os.environ['USE_CUDA'] == 'True' else False,
int(os.environ['NUM_BEAM'])
)[0]
if not xelatex_installed:
st.markdown(fail_gif_html, unsafe_allow_html=True)
st.warning('Unable to find xelatex to render image. Please check the prediction results yourself.', icon="🤡")
txt = st.text_area(
":red[Predicted formula]",
TeXTeller_result,
height=150,
)
else:
is_successed = rendering(TeXTeller_result, Path(temp_dir))
if is_successed:
# st.code(TeXTeller_result, language='latex')
# st.subheader(':rainbow[Predict] :sunglasses:', divider='rainbow')
st.subheader(':sunglasses:', divider='gray')
st.latex(TeXTeller_result)
st.code(TeXTeller_result, language='latex')
st.success('Done!')
img_base64 = get_image_base64(pdf_to_pngbytes(Path(temp_dir) / 'build' / 'formula.pdf'))
st.markdown(suc_gif_html, unsafe_allow_html=True)
st.success('Successfully rendered!', icon="")
txt = st.text_area(
":red[Predicted formula]",
TeXTeller_result,
height=150,
)
# st.latex(TeXTeller_result)
st.markdown(f"""
<style>
.centered-container {{
text-align: center;
}}
.centered-image {{
display: block;
margin-left: auto;
margin-right: auto;
max-width: 500px;
max-height: 500px;
}}
</style>
<div class="centered-container">
<img src="data:image/png;base64,{img_base64}" class="centered-image" alt="Input image">
</div>
""", unsafe_allow_html=True)
else:
st.markdown(fail_gif_html, unsafe_allow_html=True)
st.error('Rendering failed. You can try using a higher resolution image or splitting the multi line formula into a single line for better results.', icon="")
txt = st.text_area(
":red[Predicted formula]",
TeXTeller_result,
height=150,
)
shutil.rmtree(temp_dir)
# ============================ pages =============================== #