Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TimestampGranularities is not working properly #59

Open
danielrahn opened this issue Sep 23, 2024 · 2 comments
Open

TimestampGranularities is not working properly #59

danielrahn opened this issue Sep 23, 2024 · 2 comments

Comments

@danielrahn
Copy link

Currently it is only possible to get segment timestamps but no other.
It is possible to set other granularities in the client and in the api, but it looks like the form data request from the client is not correct.

package main

import (
	"context"
	"io"
	"log"
	"os"

	"github.com/openai/openai-go"
)

func main() {
	client := openai.NewClient()
	file, _ := os.Open("audio.m4a")
	transcription, _ := client.Audio.Transcriptions.New(context.Background(), openai.AudioTranscriptionNewParams{
		File:           openai.F[io.Reader](file),
		Model:          openai.F(openai.AudioModelWhisper1),
		ResponseFormat: openai.F(openai.AudioTranscriptionNewParamsResponseFormatVerboseJSON),
		TimestampGranularities: openai.F([]openai.AudioTranscriptionNewParamsTimestampGranularity{
			openai.AudioTranscriptionNewParamsTimestampGranularityWord,
			openai.AudioTranscriptionNewParamsTimestampGranularitySegment,
		}),
	})

	log.Printf("words.IsMissing(): %t\n", transcription.JSON.ExtraFields["words"].IsMissing())
	// words.IsMissing(): true
	log.Printf("segments.IsMissing(): %t\n", transcription.JSON.ExtraFields["segments"].IsMissing())
	// segments.IsMissing(): false
}
--3c013d85d22e8db1236e9831f28251c99e205b0326f97909758cea53ad82
Content-Disposition: form-data; name="timestamp_granularities.0"

word
--3c013d85d22e8db1236e9831f28251c99e205b0326f97909758cea53ad82
Content-Disposition: form-data; name="timestamp_granularities.1"

segment
--3c013d85d22e8db1236e9831f28251c99e205b0326f97909758cea53ad82--

Looks like the client sets the wrong name (timestamp_granularities.0 / timestamp_granularities.1 instead of timestamp_granularities[]). The request via curl works without problems.

curl https://api.openai.com/v1/audio/transcriptions \
  -H "Authorization: Bearer X" \
  -H "Content-Type: multipart/form-data" \
  -F file="@audio.mp4" \
  -F "timestamp_granularities[]=word" \
  -F "timestamp_granularities[]=segment" \
  -F model="whisper-1" \
  -F response_format="verbose_json"
--------------------------ivB13eW5FxAXMC0FGfs777
Content-Disposition: form-data; name="timestamp_granularities[]"

word
--------------------------ivB13eW5FxAXMC0FGfs777
Content-Disposition: form-data; name="timestamp_granularities[]"

segment
--------------------------ivB13eW5FxAXMC0FGfs777
{
  "task": "transcribe",
  "segments": [],
  "words": []
}
@RobertCraigie
Copy link
Collaborator

Thanks for the detailed report, we're investigating.

@Mau-MD
Copy link

Mau-MD commented Dec 17, 2024

+1. The following snippet only returns segment and not word granularity. Curl works fine

	transcriptionParams := openai.AudioTranscriptionNewParams{
		File:           openai.F(reader),
		Model:          openai.F(openai.AudioModelWhisper1),
		Language:       openai.F("en"),
		Prompt:         openai.F(promtp),
		ResponseFormat: openai.F(openai.AudioResponseFormatVerboseJSON),
		TimestampGranularities: openai.F([]openai.AudioTranscriptionNewParamsTimestampGranularity{
			openai.AudioTranscriptionNewParamsTimestampGranularityWord,
		}),
	}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants