Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with transcription into a specific language #9

Open
DimaSmirnoff27 opened this issue Dec 21, 2024 · 6 comments
Open

Problems with transcription into a specific language #9

DimaSmirnoff27 opened this issue Dec 21, 2024 · 6 comments

Comments

@DimaSmirnoff27
Copy link

Hi!
I have installed the package and run the Low-Level sample code but I encountered a problem with transcription to specific language when explicitly specifying the language in the parameters ->withLanguage('uk'). But if I remove ->withLanguage('uk') from the parameters the transcription works fine with automatic language detection.
I also tried running High-Level sample code, the problem did not go away when the language was explicitly specified

The log for whisper shows this:
[2024-12-21 12:45:37] whisper.error: whisper_lang_id: unknown language ;h���' []

My environment:
Laravel 11.31 inside docker container, image php:8.3.3-fpm
Inside the container libffi-dev is installed, also the ffi extension is installed
I ran the code with the medium and small models, the result is the same

@CodeWithKyrian
Copy link
Owner

Can you provide a full example of the script?

@DimaSmirnoff27
Copy link
Author

I investigated the problem via xdebug, everything is probably fine on the php side, the encoding problem occurs at the FFI stage and beyond.

Here is my code, it runs inside a laravel command

    // Initialize context with a model
    $modelsStoragePath = Storage::disk('models_storage')->path('');
    $audioPath = Storage::disk('uploads')->path('/1/input.mp3');

    Whisper::setLogger(new WhisperLogger(STDOUT));
    $modelPath = ModelLoader::loadModel('medium', $modelsStoragePath);
    $contextParams = WhisperContextParameters::default();
    $ctx = new WhisperContext($modelPath, $contextParams);

    // Create state and set parameters
    $state = $ctx->createState();
    $fullParams = WhisperFullParams::default()
        ->withNThreads(4)
        ->withLanguage('en');

    // Transcribe audio
    $pcm = readAudio($audioPath);
    $state->full($pcm, $fullParams);

    // Process segments
    $numSegments = $state->nSegments();
    for ($i = 0; $i < $numSegments; $i++) {
        $segment = $state->getSegmentText($i);
        $startTimestamp = $state->getSegmentStartTime($i);
        $endTimestamp = $state->getSegmentEndTime($i);

        printf(
            "[%s - %s]: %s\n",
            toTimestamp($startTimestamp),
            toTimestamp($endTimestamp),
            $segment
        );
    }

@DimaSmirnoff27 DimaSmirnoff27 closed this as not planned Won't fix, can't repro, duplicate, stale Dec 21, 2024
@DimaSmirnoff27
Copy link
Author

I found the cause of the problem.
There is a method \Codewithkyrian\Whisper\WhisperFullParams::toCStruct() that collects data into a structure. Everything is fine up to the moment of return $params;, but then in the \Codewithkyrian\Whisper\WhisperState::full() method, where the toCStruct method is called, for some reason I have a broken encoding in the parameters - char* language: b“ÿÿ”.

Most likely the problem is in the garbage collector, I set the parameter owned = false in line 326 and everything worked fine: $language = $ffi->new(“char[$len]”, false);

It doesn't solve my problem, but at least I found the cause

@CodeWithKyrian
Copy link
Owner

Hi @DimaSmirnoff27,

Thanks so much for taking the time to dig into this issue and sharing the details. You've saved me tons of work, for real.

I’ve been trying to replicate the error on my end but haven’t had any luck so far. I tested on my Mac, a Linux aarch64 VM, an Intel Linux VM, and even a PHP container (serversideup), but everything seems to work fine in those environments. It’s been a bit frustrating since not being able to reproduce the issue makes it tough to track down and fix.

Would you mind sharing the base Docker image you’re using? It’d help me recreate your exact setup and hopefully uncover what’s going wrong. Thanks again for your patience and for flagging this.

@CodeWithKyrian
Copy link
Owner

Also, the workaround you found with making the cdata owned does hint at this being a GC issue. But even so, the error message in the logs doesn’t seem to reflect the correct string passed in, so it feels like there might be more going on here.

One thing to keep in mind is that making it owned means you’ll have to manually free the memory allocated by that function call. Since the params are used until the transcription process ends, one possible workaround could be maintaining a list of references in the class and then manually disposing of them in __destruct().

That said, without being able to replicate the issue myself, it’s hard to pin this down completely or test a proper fix.

@DimaSmirnoff27
Copy link
Author

Here is a copy of my project, there is only a test command for transcription. In README I described how to run the docker and the command.

https://gitlab.com/dima.smirnoff27/stt-converter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants