Update README
This commit is contained in:
10
README.md
10
README.md
@@ -160,11 +160,13 @@ If you are using a different dataset, you might need to retrain the tokenizer to
|
|||||||
|
|
||||||
### Training the Model
|
### Training the Model
|
||||||
|
|
||||||
To train the model, you need to run the following command in the `TexTeller/src` directory:
|
1. Modify `num_processes` in `src/train_config.yaml` to match the number of GPUs available for training (default is 1).
|
||||||
|
|
||||||
```bash
|
2. In the `TexTeller/src` directory, run the following command:
|
||||||
python -m models.ocr_model.train.train
|
|
||||||
```
|
```bash
|
||||||
|
accelerate launch --config_file ./train_config.yaml -m models.ocr_model.train.train
|
||||||
|
```
|
||||||
|
|
||||||
You can set your own tokenizer and checkpoint paths in `TexTeller/src/models/ocr_model/train/train.py` (refer to `train.py` for more information). If you are using the same architecture and dictionary as TexTeller, you can also fine-tune TexTeller's default weights with your own dataset.
|
You can set your own tokenizer and checkpoint paths in `TexTeller/src/models/ocr_model/train/train.py` (refer to `train.py` for more information). If you are using the same architecture and dictionary as TexTeller, you can also fine-tune TexTeller's default weights with your own dataset.
|
||||||
|
|
||||||
|
|||||||
@@ -198,11 +198,13 @@ python server.py
|
|||||||
|
|
||||||
### 训练模型
|
### 训练模型
|
||||||
|
|
||||||
要想训练模型, 你需要在 `TexTeller/src`目录下运行以下命令:
|
1. 修改`src/train_config.yaml`中的`num_processes`为训练用的显卡数(默认为1)
|
||||||
|
|
||||||
```bash
|
2. 在`TexTeller/src`目录下运行以下命令:
|
||||||
python -m models.ocr_model.train.train
|
|
||||||
```
|
```bash
|
||||||
|
accelerate launch --config_file ./train_config.yaml -m models.ocr_model.train.train
|
||||||
|
```
|
||||||
|
|
||||||
你可以在 `TexTeller/src/models/ocr_model/train/train.py`中设置自己的tokenizer和checkpoint路径(请参考 `train.py`)。如果你使用了与TexTeller一样的架构和相同的字典,你还可以用自己的数据集来微调TexTeller的默认权重。
|
你可以在 `TexTeller/src/models/ocr_model/train/train.py`中设置自己的tokenizer和checkpoint路径(请参考 `train.py`)。如果你使用了与TexTeller一样的架构和相同的字典,你还可以用自己的数据集来微调TexTeller的默认权重。
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user