deprecation warning??
Can you clarify if this is a remnant or if you in-fact intend to remove ctranslate2 as an "engine" in the future, and if so, do you have an idea when?
Before I spend an inordinate amount of time revising my scripts to use ctranslate2 for my RAG application...did you find out that it's not actually faster and/or higher quality than the other options out there? Just curious because as far as I know ct2 is still superior in a lot of ways. Anyways, here's your source code that prompted my question:
"""
logger.warning(
"deprecated: ct2 inference is deprecated and will be removed in the future."
)
"""
Hey - would you mind opening that in Github as Issue?
I am considering to keep it - if it remains low maintainance.
- Currently, Ctranslate2 only delivers 1/4 of performance of my optimizations in torch. this is on my rtx 3060 laptop with cuda
CT2:
Server Software: uvicorn
Server Hostname: 127.0.0.1
Server Port: 7997
Document Path: /embeddings
Document Length: Variable
Concurrency Level: 10
Time taken for tests: 60.003 seconds
Complete requests: 10
Failed requests: 0
Total transferred: 20780760 bytes
Total body sent: 7196080
HTML transferred: 20779460 bytes
Requests per second: 0.17 [#/sec] (mean)
Time per request: 60002.693 [ms] (mean)
Time per request: 6000.269 [ms] (mean, across all concurrent requests)
Transfer rate: 338.21 [Kbytes/sec] received
117.12 kb/s sent
455.33 kb/s total
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 1 0.5 1 2
Processing: 4770 27597 18316.0 29418 55234
Waiting: 4766 27596 18316.4 29414 55233
Total: 4770 27599 18316.0 29419 55235
TORCH
Document Length: Variable
Concurrency Level: 10
Time taken for tests: 7.417 seconds
Complete requests: 10
Failed requests: 0
Total transferred: 20781050 bytes
Total body sent: 7196080
HTML transferred: 20779750 bytes
Requests per second: 1.35 [#/sec] (mean)
Time per request: 7417.268 [ms] (mean)
Time per request: 741.727 [ms] (mean, across all concurrent requests)
Transfer rate: 2736.05 [Kbytes/sec] received
947.44 kb/s sent
3683.49 kb/s total
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 1 0.2 1 1
Processing: 555 3341 2295.0 3627 6862
Waiting: 553 3339 2295.6 3626 6861
Total: 556 3342 2295.2 3628 6863
Percentage of the requests served within a certain time (ms)
50% 3628
66% 4489
75% 5336
80% 6088
90% 6863
95% 6863
98% 6863
99% 6863
100% 6863 (longest request)
@ctranslate2-4you If you want, you can remove the deprecation warning from infinity via Pull request - I'll approve it. always looking for contributors getting familiar with the repo