Skip to content

Commit

Permalink
Add auto transcription script transcribe
Browse files Browse the repository at this point in the history
  • Loading branch information
marcuswhybrow committed Apr 12, 2024
1 parent 6caeab7 commit 6cb6aeb
Show file tree
Hide file tree
Showing 2 changed files with 52 additions and 58 deletions.
62 changes: 4 additions & 58 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,66 +76,12 @@ archival purposes, and portability to other projects.

# AI Transcription

Perhaps a fully automated process will be forthcomming, but for now I'm
manually running commands to transcribe audio files with AI, and copying the
result by hand into existing markdown files inside of `./assets/todo/`. The
following commands are all in the nix dev shell (which you can enter using
`nix develop` if your not using direnv).
`flake.nix` packages a `bash` script named `transcribe`. It downloads the source audio of any file in `./assets`, transcribes it, then updates the asset with the transcription, and updates the frontmatter data to reflect this change.

First I pick a file from `./assets/todo` that doesn't have a transcript. Say
the filename begins with the date 2022-02-02. Well, I copy the url from it's
frontmatter key `source.url` and use `yt-dlp` to download the audio stream
and output the result to a file with that date as it's name:
1. Argument #1 is the markdown file to transcribe and update.
2. Arguument #2 is your name, to log in the assets metadata.

```bash
yt-dlp -x "https://website.com/some-video-or-audio-file-url" -o 2022-02-02
nix run github:marcuswhybrow/ray-peat-rodeo#transcribe -- ./assets/todo/2024-10-12-example.md "Marcus Whybrow"
```

Sometimes the output file will be called `2022-02-02.opus` or some other
extension, sometimes it will have no extension. Let's assume it's `.opus`.

I then ask Whisper AI to transcribe the audio file and output a JSON file
describing the results. I believe it's faster to tell Whisper it's an English
language conversion:

```bash
whisper --language English --output_format json 2022-02-02.opus
```

This takes a while, and a great while on old laptops. But once it's done you
shoud have a file in the same directory called `2022-02-02.json`. Whisper has
many output formats, but I've chosen JSON for it's flexibility in the next step.

The closest format whisper can output is `txt`. But this has no timestamp data
in the output text. I'd like to pepper in timestamps (which whisper knows
about) every minute or so into the resulting output. And I want them to adhere
to our custom markdown extension format: `[h:mm:ss]` e.g. `[1:23:45]`. The
square brackets are important.

So I call a custom tool written for this project that reads the JSON, ouputting
text in the way I've just descibed. I use linux redirection to append that
result to the end of the markdown file I started with:

```bash
whisper-json2md source-audio.json >> ./assets/todo/2022-02-02-example.md
```

Then I have a look at this markdown file, and check it out in the browser
(which would be https://localhost:8000/example in this example).

Finally I update the frontmatter to reflect the new state of this asset.
I add the following:

```yaml
transcription:
date: 2024-04-10 # todays date
author: Whisper AI
kind: auto-generated

added:
date: 2024-04-10
author: Marcus Whybrow # or your name instead
```
When the website is deployed this metadata makes sure everything looks right,
and the appropriate descriptions and details are available.
48 changes: 48 additions & 0 deletions flake.nix
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,14 @@
cp -r ./internal/assets/* ./build/assets
mv ./build $out
'';

meta = {
description = "Takes a Whisper IA JSON file as it's first arguent & outputs markdown to stdout appropriate to append to Ray Peat Rodeo markdown file.";
homepage = "https://github.com/marcuswhybrow/ray-peat-rodeo";
maintainers = [
"Marcus Whybrow <[email protected]>"
];
};
};

whisper-json2md = pkgs.buildGoApplication {
Expand All @@ -65,6 +73,42 @@
'';
};

transcribe = pkgs.writeScriptBin "transcribe" ''
set -o xtrace
asset_path="$1"
author="$2"
asset_name=$(basename "$asset_path")
source_url=$(${pkgs.yq-go}/bin/yq ".source.url | select(.)" "$asset_path")
tmp_dir_audio=$(mktemp --directory)
audio_path="$tmp_dir_audio/$asset_name"
${pkgs.yt-dlp}/bin/yt-dlp -x "$source_url" -o "$audio_path"
audio_name_actual=$(ls -AU "$tmp_dir_audio" | head -1)
audio_path_actual="$tmp_dir_audio/$audio_name_actual"
ls "$tmp_dir_audio"
tmp_dir_json=$(mktemp --directory)
${pkgs.openai-whisper}/bin/whisper --language English --output_format json --output_dir "$tmp_dir_json" "$audio_path_actual"
json_name=$(ls -AU "$tmp_dir_json" | head -1)
json_path="$tmp_dir_json/$json_name"
today=$(date +"%Y-%m-%d")
yq="${pkgs.yq-go}/bin/yq --front-matter process --inplace"
$yq ".transcription.date = \"$today\"" "$asset_path"
$yq ".transcription.author = \"Whisper AI\"" "$asset_path"
$yq ".transcription.kind = \"auto-generated\"" "$asset_path"
$yq ".added.author = \"$author\"" "$asset_path"
$yq ".added.date = \"$today\"" "$asset_path"
${inputs.self.packages.x86_64-linux.whisper-json2md}/bin/whisper-json2md "$json_path" >> "$asset_path"
# rm -r "$tmp_dir_audio"
# rm -r "$tmp_dir_json"
'';

default = build;
};

Expand Down Expand Up @@ -131,6 +175,10 @@

# Custom tool to convert Whisper JSON output to our markdown format
inputs.self.packages.x86_64-linux.whisper-json2md

# Convenience bash script using yt-dlp, whisper & whisper-json2md to
# transcribe and update assets with a `source.url` in the frontmatter.
inputs.self.packages.x86_64-linux.transcribe
];
};
});
Expand Down

0 comments on commit 6cb6aeb

Please sign in to comment.