Update README
This commit is contained in:
@@ -209,11 +209,12 @@ python server.py
|
|||||||
| `-ckpt` | The path to the weights file,*default is TexTeller's pretrained weights*. |
|
| `-ckpt` | The path to the weights file,*default is TexTeller's pretrained weights*. |
|
||||||
| `-tknz` | The path to the tokenizer,*default is TexTeller's tokenizer*. |
|
| `-tknz` | The path to the tokenizer,*default is TexTeller's tokenizer*. |
|
||||||
| `-port` | The server's service port,*default is 8000*. |
|
| `-port` | The server's service port,*default is 8000*. |
|
||||||
| `--inference-mode` | Whether to use GPU(cuda or mps) for inference,*default is CPU*. |
|
| `--inference-mode` | Whether to use "cuda" or "mps" for inference,*default is "cpu"*. |
|
||||||
| `--num_beams` | The number of beams for beam search,*default is 1*. |
|
| `--num_beams` | The number of beams for beam search,*default is 1*. |
|
||||||
| `--num_replicas` | The number of service replicas to run on the server,*default is 1 replica*. You can use more replicas to achieve greater throughput.|
|
| `--num_replicas` | The number of service replicas to run on the server,*default is 1 replica*. You can use more replicas to achieve greater throughput.|
|
||||||
| `--ncpu_per_replica` | The number of CPU cores used per service replica,*default is 1*.|
|
| `--ncpu_per_replica` | The number of CPU cores used per service replica,*default is 1*.|
|
||||||
| `--ngpu_per_replica` | The number of GPUs used per service replica,*default is 1*. You can set this value between 0 and 1 to run multiple service replicas on one GPU to share the GPU, thereby improving GPU utilization. (Note, if --num_replicas is 2, --ngpu_per_replica is 0.7, then 2 GPUs must be available) |
|
| `--ngpu_per_replica` | The number of GPUs used per service replica,*default is 1*. You can set this value between 0 and 1 to run multiple service replicas on one GPU to share the GPU, thereby improving GPU utilization. (Note, if --num_replicas is 2, --ngpu_per_replica is 0.7, then 2 GPUs must be available) |
|
||||||
|
| `-onnx` | Perform inference using Onnx Runtime, *disabled by default* |
|
||||||
|
|
||||||
> [!NOTE]
|
> [!NOTE]
|
||||||
> A client demo can be found at `src/client/demo.py`, you can refer to `demo.py` to send requests to the server
|
> A client demo can be found at `src/client/demo.py`, you can refer to `demo.py` to send requests to the server
|
||||||
|
|||||||
@@ -247,11 +247,12 @@ python server.py
|
|||||||
| `-ckpt` | 权重文件的路径,*默认为TexTeller的预训练权重*。|
|
| `-ckpt` | 权重文件的路径,*默认为TexTeller的预训练权重*。|
|
||||||
| `-tknz` | 分词器的路径,*默认为TexTeller的分词器*。|
|
| `-tknz` | 分词器的路径,*默认为TexTeller的分词器*。|
|
||||||
| `-port` | 服务器的服务端口,*默认是8000*。|
|
| `-port` | 服务器的服务端口,*默认是8000*。|
|
||||||
| `--inference-mode` | 是否使用GPU(cuda或mps)推理,*默认为CPU*。|
|
| `--inference-mode` | 使用"cuda"或"mps"推理,*默认为"cpu"*。|
|
||||||
| `--num_beams` | beam search的beam数量,*默认是1*。|
|
| `--num_beams` | beam search的beam数量,*默认是1*。|
|
||||||
| `--num_replicas` | 在服务器上运行的服务副本数量,*默认1个副本*。你可以使用更多的副本来获取更大的吞吐量。|
|
| `--num_replicas` | 在服务器上运行的服务副本数量,*默认1个副本*。你可以使用更多的副本来获取更大的吞吐量。|
|
||||||
| `--ncpu_per_replica` | 每个服务副本所用的CPU核心数,*默认为1*。|
|
| `--ncpu_per_replica` | 每个服务副本所用的CPU核心数,*默认为1*。|
|
||||||
| `--ngpu_per_replica` | 每个服务副本所用的GPU数量,*默认为1*。你可以把这个值设置成 0~1之间的数,这样会在一个GPU上运行多个服务副本来共享GPU,从而提高GPU的利用率。(注意,如果 --num_replicas 2, --ngpu_per_replica 0.7, 那么就必须要有2个GPU可用) |
|
| `--ngpu_per_replica` | 每个服务副本所用的GPU数量,*默认为1*。你可以把这个值设置成 0~1之间的数,这样会在一个GPU上运行多个服务副本来共享GPU,从而提高GPU的利用率。(注意,如果 --num_replicas 2, --ngpu_per_replica 0.7, 那么就必须要有2个GPU可用) |
|
||||||
|
| `-onnx` | 使用Onnx Runtime进行推理,*默认不使用*。|
|
||||||
|
|
||||||
> [!NOTE]
|
> [!NOTE]
|
||||||
> 一个客户端demo可以在 `TexTeller/client/demo.py`找到,你可以参考 `demo.py`来给server发送请求
|
> 一个客户端demo可以在 `TexTeller/client/demo.py`找到,你可以参考 `demo.py`来给server发送请求
|
||||||
|
|||||||
Reference in New Issue
Block a user