diff --git a/README.md b/README.md index f56fe81..4da93a7 100644 --- a/README.md +++ b/README.md @@ -59,8 +59,8 @@ python=3.10 ```bash python inference.py -img "/path/to/image.{jpg,png}" - # use -cuda option to enable GPU inference - #+e.g. python inference.py -img "./img.jpg" -cuda + # use --inference-mode option to enable GPU(cuda or mps) inference + #+e.g. python inference.py -img "./img.jpg" --inference-mode cuda ``` > [!NOTE] @@ -76,9 +76,6 @@ Go to the `TexTeller/src` directory and run the following command: Enter `http://localhost:8501` in a browser to view the web demo. -> [!TIP] -> You can change the default configuration of `start_web.sh`, for example, to use GPU for inference (e.g. `USE_CUDA=True`) or to increase the number of beams (e.g. `NUM_BEAM=3`) to achieve higher accuracy. - > [!NOTE] > If you are Windows user, please run the `start_web.bat` file instead. @@ -124,14 +121,12 @@ We use [ray serve](https://github.com/ray-project/ray) to provide an API interfa python server.py # default settings ``` -You can pass the following arguments to `server.py` to change the server's inference settings (e.g. `python server.py --use_gpu` to enable GPU inference): - | Parameter | Description | | --- | --- | | `-ckpt` | The path to the weights file, *default is TexTeller's pretrained weights*.| | `-tknz` | The path to the tokenizer, *default is TexTeller's tokenizer*.| | `-port` | The server's service port, *default is 8000*. | -| `--use_gpu` | Whether to use GPU for inference, *default is CPU*. | +| `--inference-mode` | Whether to use GPU(cuda or mps) for inference, *default is CPU*. | | `--num_beams` | The number of beams for beam search, *default is 1*. | | `--num_replicas` | The number of service replicas to run on the server, *default is 1 replica*. You can use more replicas to achieve greater throughput.| | `--ncpu_per_replica` | The number of CPU cores used per service replica, *default is 1*. | diff --git a/assets/README_zh.md b/assets/README_zh.md index 2e83116..b946d49 100644 --- a/assets/README_zh.md +++ b/assets/README_zh.md @@ -46,18 +46,20 @@ python=3.10 ```bash git clone https://github.com/OleehyO/TexTeller ``` + 2. [安装pytorch](https://pytorch.org/get-started/locally/#start-locally) 3. 安装本项目的依赖包: ```bash pip install -r requirements.txt ``` + 4. 进入 `TexTeller/src`目录,在终端运行以下命令进行推理: ```bash python inference.py -img "/path/to/image.{jpg,png}" - # use -cuda option to enable GPU inference - #+e.g. python inference.py -img "./img.jpg" -cuda + # use --inference-mode option to enable GPU(cuda or mps) inference + #+e.g. python inference.py -img "./img.jpg" --inference-mode cuda ``` > [!NOTE] @@ -72,11 +74,13 @@ python=3.10 ```bash pip install -U "huggingface_hub[cli]" ``` + 2. 在能连接Hugging Face的机器上下载模型权重: ```bash huggingface-cli download OleehyO/TexTeller --include "*.json" "*.bin" "*.txt" --repo-type model --local-dir "your/dir/path" ``` + 3. 把包含权重的目录上传远端服务器,然后把 `TexTeller/src/models/ocr_model/model/TexTeller.py`中的 `REPO_NAME = 'OleehyO/TexTeller'`修改为 `REPO_NAME = 'your/dir/path'` 如果你还想在训练模型时开启evaluate,你需要提前下载metric脚本并上传远端服务器: @@ -86,6 +90,7 @@ python=3.10 ```bash huggingface-cli download evaluate-metric/google_bleu --repo-type space --local-dir "your/dir/path" ``` + 2. 把这个目录上传远端服务器,并在 `TexTeller/src/models/ocr_model/utils/metrics.py`中把 `evaluate.load('google_bleu')`改为 `evaluate.load('your/dir/path/google_bleu.py')` ## 🌐 网页演示 @@ -98,9 +103,6 @@ python=3.10 在浏览器里输入 `http://localhost:8501`就可以看到web demo -> [!TIP] -> 你可以改变 `start_web.sh`的默认配置, 例如使用GPU进行推理(e.g. `USE_CUDA=True`) 或者增加beams的数量(e.g. `NUM_BEAM=3`)来获得更高的精确度 - > [!NOTE] > 对于Windows用户, 请运行 `start_web.bat`文件. @@ -133,7 +135,7 @@ python infer_det.py 在进行**公式检测后**, `TexTeller/src`目录下运行以下命令 ```shell -rec_infer_from_crop_imgs.py +python rec_infer_from_crop_imgs.py ``` 会基于上一步公式检测的结果,对裁剪出的所有公式进行批量识别,将识别结果在 `TexTeller/src/results`中保存为txt文件。 @@ -143,20 +145,18 @@ rec_infer_from_crop_imgs.py 我们使用[ray serve](https://github.com/ray-project/ray)来对外提供一个TexTeller的API接口,通过使用这个接口,你可以把TexTeller整合到自己的项目里。要想启动server,你需要先进入 `TexTeller/src`目录然后运行以下命令: ```bash -python server.py # default settings +python server.py ``` -你可以给 `server.py`传递以下参数来改变server的推理设置(e.g. `python server.py --use_gpu` 来启动GPU推理): - -| 参数 | 描述 | -| ---------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `-ckpt` | 权重文件的路径,*默认为TexTeller的预训练权重*。 | -| `-tknz` | 分词器的路径,*默认为TexTeller的分词器*。 | -| `-port` | 服务器的服务端口,*默认是8000*。 | -| `--use_gpu` | 是否使用GPU推理,*默认为CPU*。 | -| `--num_beams` | beam search的beam数量,*默认是1*。 | -| `--num_replicas` | 在服务器上运行的服务副本数量,*默认1个副本*。你可以使用更多的副本来获取更大的吞吐量。 | -| `--ncpu_per_replica` | 每个服务副本所用的CPU核心数,*默认为1*。 | +| 参数 | 描述 | +| - | - | +| `-ckpt` | 权重文件的路径,*默认为TexTeller的预训练权重*。 | +| `-tknz` | 分词器的路径,*默认为TexTeller的分词器*。 | +| `-port` | 服务器的服务端口,*默认是8000*。 | +| `--inference-mode`| 是否使用GPU(cuda或mps)推理,*默认为CPU*。 | +| `--num_beams` | beam search的beam数量,*默认是1*。 | +| `--num_replicas`| 在服务器上运行的服务副本数量,*默认1个副本*。你可以使用更多的副本来获取更大的吞吐量。 | +| `--ncpu_per_replica` | 每个服务副本所用的CPU核心数,*默认为1*。 | | `--ngpu_per_replica` | 每个服务副本所用的GPU数量,*默认为1*。你可以把这个值设置成 0~1之间的数,这样会在一个GPU上运行多个服务副本来共享GPU,从而提高GPU的利用率。(注意,如果 --num_replicas 2, --ngpu_per_replica 0.7, 那么就必须要有2个GPU可用) | > [!NOTE] diff --git a/assets/image/README_zh/1712901497354.png b/assets/image/README_zh/1712901497354.png deleted file mode 100644 index 1487d47..0000000 Binary files a/assets/image/README_zh/1712901497354.png and /dev/null differ diff --git a/requirements.txt b/requirements.txt index 780156f..c5f254c 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,13 +1,15 @@ transformers datasets evaluate -streamlit opencv-python ray[serve] accelerate tensorboardX nltk python-multipart -augraphy -onnxruntime \ No newline at end of file +augraphy +onnxruntime + +streamlit==1.30 +streamlit-paste-button diff --git a/src/inference.py b/src/inference.py index 2f4127a..2818497 100644 --- a/src/inference.py +++ b/src/inference.py @@ -18,10 +18,16 @@ if __name__ == '__main__': help='path to the input image' ) parser.add_argument( - '-cuda', - default=False, - action='store_true', - help='use cuda or not' + '--inference-mode', + type=str, + default='cpu', + help='Inference mode, select one of cpu, cuda, or mps' + ) + parser.add_argument( + '--num-beam', + type=int, + default=1, + help='number of beam search for decoding' ) args = parser.parse_args() @@ -33,6 +39,6 @@ if __name__ == '__main__': img = cv.imread(args.img) print('Inference...') - res = latex_inference(latex_rec_model, tokenizer, [img], args.cuda) + res = latex_inference(latex_rec_model, tokenizer, [img], inf_mode=args.inference_mode, num_beams=args.num_beam) res = to_katex(res[0]) print(res) diff --git a/src/models/ocr_model/utils/inference.py b/src/models/ocr_model/utils/inference.py index cc34101..ec16676 100644 --- a/src/models/ocr_model/utils/inference.py +++ b/src/models/ocr_model/utils/inference.py @@ -14,7 +14,7 @@ def inference( model: TexTeller, tokenizer: RobertaTokenizerFast, imgs_path: Union[List[str], List[np.ndarray]], - use_cuda: bool, + inf_mode: str = 'cpu', num_beams: int = 1, ) -> List[str]: model.eval() @@ -26,9 +26,8 @@ def inference( imgs = inference_transform(imgs) pixel_values = torch.stack(imgs) - if use_cuda: - model = model.to('cuda') - pixel_values = pixel_values.to('cuda') + model = model.to(inf_mode) + pixel_values = pixel_values.to(inf_mode) generate_config = GenerationConfig( max_new_tokens=MAX_TOKEN_SIZE, diff --git a/src/rec_infer_from_crop_imgs.py b/src/rec_infer_from_crop_imgs.py index b51e92c..1cacbc6 100644 --- a/src/rec_infer_from_crop_imgs.py +++ b/src/rec_infer_from_crop_imgs.py @@ -11,22 +11,22 @@ if __name__ == '__main__': os.chdir(Path(__file__).resolve().parent) parser = argparse.ArgumentParser() parser.add_argument( - '-img_dir', - type=str, - default="./subimages", - help='path to the directory containing input images' + '-img', + type=str, + required=True, + help='path to the input image' ) parser.add_argument( - '-output_dir', + '--inference-mode', type=str, - default="./results", - help='path to the output directory for storing recognition results' + default='cpu', + help='Inference mode, select one of cpu, cuda, or mps' ) parser.add_argument( - '-cuda', - default=False, - action='store_true', - help='use cuda or not' + '--num-beam', + type=int, + default=1, + help='number of beam search for decoding' ) args = parser.parse_args() @@ -46,7 +46,7 @@ if __name__ == '__main__': if img is not None: print(f'Inference for {filename}...') - res = latex_inference(latex_rec_model, tokenizer, [img], args.cuda) + res = latex_inference(latex_rec_model, tokenizer, [img], inf_mode=args.inference_mode, num_beams=args.num_beam) res = to_katex(res[0]) # Save the recognition result to a text file diff --git a/src/server.py b/src/server.py index 11adfa3..520d908 100644 --- a/src/server.py +++ b/src/server.py @@ -23,8 +23,8 @@ parser.add_argument('--num_replicas', type=int, default=1) parser.add_argument('--ncpu_per_replica', type=float, default=1.0) parser.add_argument('--ngpu_per_replica', type=float, default=0.0) -parser.add_argument('--use_cuda', action='store_true', default=False) -parser.add_argument('--num_beam', type=int, default=1) +parser.add_argument('--inference-mode', type=str, default='cpu') +parser.add_argument('--num_beams', type=int, default=1) args = parser.parse_args() if args.ngpu_per_replica > 0 and not args.use_cuda: @@ -43,18 +43,21 @@ class TexTellerServer: self, checkpoint_path: str, tokenizer_path: str, - use_cuda: bool = False, - num_beam: int = 1 + inf_mode: str = 'cpu', + num_beams: int = 1 ) -> None: self.model = TexTeller.from_pretrained(checkpoint_path) self.tokenizer = TexTeller.get_tokenizer(tokenizer_path) - self.use_cuda = use_cuda - self.num_beam = num_beam + self.inf_mode = inf_mode + self.num_beams = num_beams - self.model = self.model.to('cuda') if use_cuda else self.model + self.model = self.model.to(inf_mode) if inf_mode != 'cpu' else self.model def predict(self, image_nparray) -> str: - return inference(self.model, self.tokenizer, [image_nparray], self.use_cuda, self.num_beam)[0] + return inference( + self.model, self.tokenizer, [image_nparray], + inf_mode=self.inf_mode, num_beams=self.num_beams + )[0] @serve.deployment() @@ -78,7 +81,11 @@ if __name__ == '__main__': tknz_dir = args.tokenizer_dir serve.start(http_options={"port": args.server_port}) - texteller_server = TexTellerServer.bind(ckpt_dir, tknz_dir, use_cuda=args.use_cuda, num_beam=args.num_beam) + texteller_server = TexTellerServer.bind( + ckpt_dir, tknz_dir, + inf_mode=args.inference_mode, + num_beams=args.num_beams + ) ingress = Ingress.bind(texteller_server) ingress_handle = serve.run(ingress, route_prefix="/predict") diff --git a/src/start_web.bat b/src/start_web.bat index fd521e4..e235cca 100644 --- a/src/start_web.bat +++ b/src/start_web.bat @@ -3,8 +3,6 @@ SETLOCAL ENABLEEXTENSIONS set CHECKPOINT_DIR=default set TOKENIZER_DIR=default -set USE_CUDA=False REM True or False (case-sensitive) -set NUM_BEAM=1 streamlit run web.py diff --git a/src/start_web.sh b/src/start_web.sh index 450dff2..6ec8f7b 100755 --- a/src/start_web.sh +++ b/src/start_web.sh @@ -3,7 +3,5 @@ set -exu export CHECKPOINT_DIR="default" export TOKENIZER_DIR="default" -export USE_CUDA=False # True or False (case-sensitive) -export NUM_BEAM=1 streamlit run web.py diff --git a/src/web.py b/src/web.py index 9b53a59..379a609 100644 --- a/src/web.py +++ b/src/web.py @@ -6,16 +6,22 @@ import shutil import streamlit as st from PIL import Image +from streamlit_paste_button import paste_image_button as pbutton from models.ocr_model.utils.inference import inference from models.ocr_model.model.TexTeller import TexTeller from utils import to_katex +st.set_page_config( + page_title="TexTeller", + page_icon="🧮" +) + html_string = '''
Input image ({img.height}✖️{img.width})