Working end-to-end transcription integrations. #1774

boazsender · 2025-03-26T23:48:15Z

Fixes from webhook debugging work. Follow this test plan to verify:

Local testing guide

The following instructions will let you test the scraping half of the implementation. The webhook half will of the implementation needs to be on the development project for an end-to-end test.

In order to perform this test you will need

An assembly API key in your local environment (Ask Nathan Sanders or Matt Victor for this)
IAM permissions for the maple dev project in the Google Cloud Console (Ask Matt King for this)
A logged in and working firebase-admin session
A local copy of github.com/codeforboston/maple with installed deps

Instructions

Setup an ngrok tunnel for the local copy of the transcription webhook
1. Get an account at https://dashboard.ngrok.com
2. Connect your ngrok account ngrok config add-authtoken your-auth-token
3. Start the local function emulator with yarn run dev:functions
4. Open an ngrok tunnel: ngrok http http://localhost:5001
Find a hearing that has a video within the last 8 days. In this example we’ll use hearing-5091.
Modify functions/src/events/scrapeEvents.ts to create a local testing state
1. Hardcode const EventId = 5091 into the beginning of getEvent
2. Change the webhook endpoint in the assembly api call to the ngrok endpoint: https://put-your-ngrok-tunnel-id-here.ngrok-free.app/demo-dtp/us-central1/scrapeHearings-0
Run the firebase admin console on the dev project with yarn firebase-admin -e dev console
1. From inside the the admin repl run the following command with the id of your hearing (using 5091 in this example):
  1. const resp = await db.collection("events").doc("hearing-5091").get()
  2. const data = resp.data()
  3. JSON.stringify(data)
2. Copy the output of JSON.stringify(data) without the surrounding quotes (so you have a JS object) and save it somewhere for the next step
3. Exit the repl with .exit
Start your local firebase emulator with yarn dev:functions
1. While that emulator is running, open a second shell
  1. Connect to the local firebase admin with yarn firebase-admin -e local console
  2. Load the data into your local instance with await db.collection('events').doc('hearing-5091').set(data) where data is the js object on your clipboard. It should look something like await db.collection('events').doc('hearing-5091').set({"startsAt":{"_seconds":1741881600,"_nanoseconds":0},"id":"hearing-5091","type":"hearing","content":{"EventDate":"2025-03-13T12:00:00","HearingAgendas":[{"DocumentsInAgenda":[],"Topic":"Annual Hearing on a Potential Modification of the Health Care Cost Growth Benchmark","StartTime":"2025-03-13T12:00:00","EndTime":"2025-03-13T15:00:00"}],"RescheduledHearing":null,"StartTime":"2025-03-13T12:00:00","EventId":5091,"HearingHost":{"Details":"http://malegislature.gov/api/GeneralCourts/194/Committees/J24","CommitteeCode":"J24","GeneralCourtNumber":194},"Name":"Joint Committee on Health Care Financing","Description":"Joint Committee on Health Care Financing & Health Policy Commission Public Hearing on a Potential Modification of the CY 2026 Health Care Cost Growth Benchmark","Location":{"AddressLine2":null,"AddressLine1":"24 Beacon Street","State":"MA","ZipCode":"02133","City":"Boston","LocationName":"Gardner Auditorium"},"Status":"Completed"},"fetchedAt":{"_seconds":1741967707,"_nanoseconds":728000000}})
  3. Exit the repl with .exit
2. In the local emulator ui edit 5091 (http://localhost:3010/firestore/data/events/hearing-5091)
  1. Delete fetchedAt and fetchedAt, and replace them with timestamp type fields of the same property names and values of right now.
3. Back in the yarn dev:functions emulator run scrapeHearings()
4. Check the local hearing-5901 document with
  1. const resp = await db.collection("events").doc("hearing-5091").get()
  2. resp.data()
5. Look for the following props on the hearing 5901-doc:
  1. videoURL: location of the video hosted by the MA Legislature
  2. videoFetchedAt: timestamp of when the video url was fetched
  3. videoAssemblyId: id of the transcript in the Assembly SaaS API
Wait for the assembly job to finish (5-10 minutes usually)
Check the transcriptions collection and look for a new transcript for hearing-5901:
1. https://console.cloud.google.com/firestore/databases/-default-/data/panel/transcriptions/17c91397-c023-4f28-a621-4cef45c70749?authuser=1&project=digital-testimony-dev

vercel · 2025-03-26T23:48:20Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
maple-dev	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Apr 17, 2025 3:35pm

Mephistic

There's still an open question around what transcription data we should actually store - that doesn't necessarily have to be part of this PR, but we should get at least a tentative answer/plan on that before we run any backfill

Mephistic · 2025-03-27T21:39:01Z

functions/src/events/scrapeEvents.ts

@@ -158,20 +158,6 @@ class HearingScraper extends EventScraper<HearingListItem, Hearing> {
    const hearing = Hearing.check(eventData)


Double-checking here - how does this handle the first time a hearing is scraped? If there is no hearing in the db, eventData would be undefined - does Hearing.check blow up (and prevent us from returning the non-transcription related event data)?

I'm not sure. I think I was drawing inspiration from this prior line:

maple/functions/src/events/scrapeEvents.ts

Line 142 in a18436a

const content = HearingContent.check(await api.getHearing(EventId))

How would you want to see this done differently?

I cal also try some things out if nothing comes to mind.

I think the chief value of that HearingContent.check there is about guarding against changes to the external Mass Legislature API (i.e. if the data they return deviates too much from what we expect, don't risk polluting our database - instead, throw an error to force a dev to investigate).

For our purposes here:

If eventInDb.exists() is false (or equivalently AFAIK, if eventData is undefined), it should mean that we are scraping the present hearing for the first time. This means that we have definitely not scraped the hearing's video yet and it's valid to try scraping videos for it. The simplest, most robust solution is to just add this case to the logic of when to scrape video:

Currently, we scrape video only if:

there is a pre-existing hearing with this EventId in our DB that does not have a videoUrl set

AND

the hearing in question is within our scraping time window

We should instead scrape video only if:

EITHER (we have not scraped this hearing before) OR (there is a pre-existing hearing in our DB with this EventId that does not have a videoUrl set)

AND

the hearing in question is within our scraping time window

Updating this logic. Leaving the .check in there since that was there prior, and I think is still needed for the first run on a hearing. Do you agree?

wait nevermind, I conflated two different .checks. Will leave the first and remove the second.

Mephistic · 2025-03-27T22:24:26Z

functions/src/events/scrapeEvents.ts

@@ -191,25 +177,32 @@ class HearingScraper extends EventScraper<HearingListItem, Hearing> {
            maybeVideoURL = firstVideoSource.src

            transcript = await assembly.transcripts.submit({


Now that we've worked out the kinks here, could we refactor getEvent a bit?

I feel like there's at least two functions here that could be broken out to make this more readable :

shouldScrapeVideo (handles fetching any previous event + checking for existingVideo + the date cutoff logic, returns a boolean yes/no)

getHearingVideoUrl(does the JSDOM wrangling, returns a string | undefined with the videoUrl`)

And maybe submitTranscription (takes a videoUrl, sends the AssemblyAI request + saves the webhookAuth in the DB, returns transcriptId)

Which would leave the new code as something like:

if (shouldScrapeVideo(EventId) { const maybeVideoUrl = getHearingVideoUrl(EventId) if (maybeVideoUrl) { const transcriptId = await submitTranscription(maybeVideoUrl) // add video/transcription data to Event } }

Sure thing, I can take a pass at this.

Mephistic · 2025-03-27T22:26:19Z

functions/src/events/scrapeEvents.ts

            })

+            await db


Ideally, I think we shouldn't actually save the hearing event to Firestore in getEvent - the abstract class already handles that for us. The webhookAuth is in a subcollection (which wouldn't be caught by the abstract class) and is the exception to that rule - this should be fine since Firestore lets you write to subcollections of documents that don't exist yet.

What would you like me to do here differently?

When we scrape video, we can simply conditionally add the relevant video fields to the return object of getEvent (those fields being videoURL, videoFetchedAt, and videoAssemblyId). So long as they're present on the returned object, the code in the abstract class will handle the work of saving the events to the DB. See the line writer.set(db.doc(/events/${event.id}), event, { merge: true }) in run() at the top of this file.

As it stands, I think this would save the event twice - first in getEvent() and then again in run().

Ok I think I understand now, let me know if I got it right in my next commit

Mephistic · 2025-03-27T22:46:46Z

functions/src/webhooks/transcription.ts

              if (tokenInDbData) {
                return hashedToken === tokenInDbData.videoAssemblyWebhookToken
              }
              return false
            }
          )
+          console.log("authenticatedEventsInDb", authenticatedEventsInDb)
+
          if (authenticatedEventsInDb) {
            try {
              await db


We should start thinking about what parts of the response we actually want to save here - There are a lot of fields in the example response you posted in Slack, but the only ones that looks potentially relevant are:

id

text

utterances

words

audio_url (though we already have it on the hearing event, it would still be helpful here)

audio_duration (not sure what we'd need it for though, but it's conceivably useful)

confidence

I'm most interested in text/utterances/words - since those will take up >99% of the data size (and words likely takes up >90%). We should chat with Matt V and design about the desired functionality, but I don't think we need all three. I goofed around with some of the example data: it looks like utterances gives us mostly sensible, mostly accurate divisions of the content, and text is good to have as a fallback.

IMO words is likely unneccessary - I believe it would only be useful if we either:

Find ourselves dissatisfied with the breakpoints/speaker split inherent to utterances and want to devise our own

Want to visually identify specific words that Assembly flagged as low confidence

Want to get @mvictor55 's input on the desired functionality here before dropping the axe though (and @mertbagt 's take on what we actually need for the front-end). If we do cut words, we should also probably cut the words array in utterances.

Yes, would love to know what you think @mvictor55 and @mertbagt. I can proceed accordingly based on what you three align on.

Mephistic · 2025-03-27T22:57:33Z

functions/src/webhooks/transcription.ts

@@ -48,7 +52,7 @@ export const transcription = functions.https.onRequest(async (req, res) => {

              authenticatedEventsInDb.forEach(async d => {


Just double-checking - is it possible that the firebase function will exit before the async function in the forEach completes?

If so, it may be worth switching to a transaction for the writes here - something like

const batch = db.batch() batch.set( db.collection("transcriptions").doc(transcript.id), { _timestamp: new Date(), ...transcript } ) authenticatedEventsInDb.forEach(doc => { batch.update(doc.ref, {["x-maple-webhook"]: null}) }) await batch.commit()

Smart, will do.

Mephistic · 2025-04-05T16:32:01Z

functions/src/events/scrapeEvents.ts

+    return false
+  }
+  if (!eventData.videoFetchedAt) {
+    return withinCutoff(new Date(eventData.StartTime))


Why did this change from Hearing.check(eventData).startsAt.toDate()? I don't think eventData has a StartTime field.

I think I misunderstand an earlier review comment. I'll change this in the next commit.

Mephistic · 2025-04-05T16:49:37Z

functions/src/webhooks/transcription.ts

    if (req.body.status === "completed") {
      const transcript = await assembly.transcripts.get(req.body.transcript_id)
      if (transcript && transcript.webhook_auth) {
        const maybeEventInDb = await db
          .collection("events")
          .where("videoAssemblyId", "==", transcript.id)
          .get()
+
        if (maybeEventInDb.docs.length) {
          const authenticatedEventsInDb = maybeEventInDb.docs.filter(


Just double-checking here - does this successfully filter out requests where the tokens don't match?

Now that I think about it, I would expect the async filter function to return a Promise that resolves to a boolean instead of a boolean - and since the Promise itself would always be truthy, this check would always return true and never filter anything

I believe so based on return hashedToken === tokenInDbData.videoAssemblyWebhookToken

As long as we've manually tested with a mismatched token and ensured the request was rejected, I'll drop this - but I still believe the async filter may be an issue - sample code below, see also https://stackoverflow.com/a/71600833:

const test = [1, 2, 3, 4] // sync filter - result is [2, 4] test.filter(e => e % 2 == 0) // async filter = result is [1, 2, 3, 4] test.filter(async e => e % 2 == 0)

Mephistic · 2025-04-05T16:51:36Z

functions/src/webhooks/transcription.ts

+                text,
+                timestamp: new Date(),
+                audio_url,
+                words


As per our discussion the other day, we don't want words in the main transcriptions document - just in the subcollection.

Oy of course. Fixing in the next commit.

Mephistic · 2025-04-05T17:14:00Z

functions/src/webhooks/transcription.ts

+              transcriptionInDb.set({
+                id,
+                text,
+                timestamp: new Date(),


Should this be _timestamp to match the timestamp below? If they're actually different timestamps, could we name them accordingly?

Also, come to think of it, should this be a Timestamp instead of a Date?

Sure thing, updating in next commit.

Mephistic · 2025-04-05T17:40:58Z

functions/src/webhooks/transcription.ts

+                words
+              })
+
+              transcriptionInDb


I think these .set() calls return Promises, so we should either await them all or write them in a transaction batch.

I don't have a strong opinion on which - firestore shouldn't need the root document to exist to create the subcollections, and worst case if one of the writes fails, we'll need to re-run the webhook anyway and that will overwrite any partially saved values.

adding awaits in next commit

Mephistic · 2025-04-06T04:43:39Z

functions/src/webhooks/transcription.ts

+
+              transcriptionInDb
+                .collection("timestamps")
+                .doc("utterances")


Hmm... Based on our discussion the other day, I was expecting two separate sub-collections (utterances and words) where each doc represents one utterance/word. I'm not locked in on that approach necessarily - just wanted to clarify.

I am onboard with storing all of the utterances as one doc for now given how we'll be using it (we can always re-structure without re-scraping later if we find we need the flexibility).

I do have two concerns here, especially re: words:

Indexing - I'd rather not index anything with the words since we're not planning to query on them. I don't think Firestore gives us the ability to index only a subset of the documents in a collection.

Given our planned initial query pattern (e.g. grab the only utterances doc for a given transcription), I'm fine with just ensuring we don't index anything on utterances/words for launch (which may require adding single-field exemptions) - we can always go back and re-index if we change our query patterns. If that's not in this PR, it should come before we backfill existing hearings.

Document Size - there's a stated max size of 1 MB for a Firestore document, and I'm worried stuffing all of the words into one document could hit that limit.

For reference, the JSON of the hearing at https://prodarchivevideo.blob.core.windows.net/video/2024/Hearings/Joint/June/26_1.mp4 is ~1.1MB with both words and utterances.words included, and ~110KB without either words array included for a ~45 minute video. Reasonably assuming that splits out to 500KB for words, that could become an issue for hearings over ~90 minutes.

Have we tested this with a much longer (3/4+ hour) hearing? It's possible Firestore compacts this by default and this is a false alarm, but I'd rather know now so we can adjust how we store words.

Happy to make a doc for each word and utterance. Is there a more clever/efficient/firestorie way that you would suggest other than iterating over the arrays I already have and making a doc for each object in that loop?

Not really AFAIK - one caveat would be that we should probably use the BulkWriter (since some of these videos will have dozens of utterances and tens of thousands of words). I don't think we should need sequential ids on these subcollection docs since the startTimestamp should provide a natural ordering, but I'll leave that to your best judgment.

If making words work proves problematic in any way, give me a shout in Slack and we can try another solution (whether trying a different data structure, just dumping the words json into storage and dealing with it later, or ignoring words and just re-scraping if we find we need them).

I switched to one doc per utterance and word. I'm not sure if there is a more efficient way to do to all the db transactions together, so I left everything else as is.

Mephistic · 2025-04-13T23:51:19Z

functions/src/webhooks/transcription.ts

    if (req.body.status === "completed") {
      const transcript = await assembly.transcripts.get(req.body.transcript_id)
      if (transcript && transcript.webhook_auth) {
        const maybeEventInDb = await db
          .collection("events")
          .where("videoAssemblyId", "==", transcript.id)
          .get()
+
        if (maybeEventInDb.docs.length) {
          const authenticatedEventsInDb = maybeEventInDb.docs.filter(


As long as we've manually tested with a mismatched token and ensured the request was rejected, I'll drop this - but I still believe the async filter may be an issue - sample code below, see also https://stackoverflow.com/a/71600833:

const test = [1, 2, 3, 4] // sync filter - result is [2, 4] test.filter(e => e % 2 == 0) // async filter = result is [1, 2, 3, 4] test.filter(async e => e % 2 == 0)

Mephistic · 2025-04-13T23:58:59Z

functions/src/webhooks/transcription.ts

+              }
+
+              const batch = db.batch()
+              batch.set(db.collection("transcriptions").doc(transcript.id), {


Given that we've already saved the transcription once at this point, can this just be a batch.update to update the _timestamp? (And come to think of it, now that we renamed the other timestamp to createdAt, should this be updatedAt?)

RE the async filter, I haven't tested this with a mismatching webhook, but I see your point. Seems like firestore doesn't have a findunique api. Any suggestions for this? I'm worried about working off of maybeEventInDb[0], but maybe I'm over thinking it? It's not like we would end up with two docs with the same videoAssemblyId.

RE the timestamp, I think we can actually just delete that batch. I don't think we need a created at and a timestamp. My bad for leaving it in.

Yeah, I wouldn't expect the Firestore API itself to help us with the async filter issue.

It looks like the easiest way to do that is to first map the events to something like a Promise<Event|null> that resolves to the event when it's valid and resolves to null when the event is invalid, then filter out the nulls afterwards.

Then you can do something like:

async function getValidEvent(event: Event): Promise<Event|null> { // whatever async/await logic } const authenticatedEvents = (await Promise.all(events.map(getValidEvent))).filter(e => e)

I also wouldn't turn down something more iterative like:

const authenticatedEvents = [] docs.forEach(async doc => { const otherDoc = await getOtherDoc(whatever) if (isValid(doc, otherDoc)) { authenticatedEvents.push(doc) } })

I think the iterative approach is more readable. I'll do that.

Mephistic · 2025-04-14T00:23:34Z

functions/src/webhooks/transcription.ts

+                audio_url
+              })
+
+              if (utterances) {


Given the expected scale of utterances (i.e. several dozen per document), I believe this is fine.

Firestore does generally caution against using sequential ids because it can lead to hotspotting - but given that we'll be disabling indexes for both utterances and words subcollections and our only initial query plan is to "fetchAll" the utterances subcollection, I don't think it makes much of a difference here.

If this does end up causing an issue for utterances, we can always switch to an autogenerated documentId and just use ordering on start in queries to maintain sequential order.

Come to think of it, this would have a potential problem if we ever re-transcribe the same document (e.g. with different settings) - the utterance divisions and start times could change and would only partially overwrite any existing data.

We don't have any plans for that right now - I just want to note a caveat that we need to delete any existing transcriptions/utterances/words in the DB before re-transcribing (should that ever prove necessary).

I'm just going to go ahead and preemptively change this to a start. If I just remove ${utterance.start} from the doc path, that should hand over the ID generation to firestore, correct?

You can test in the firebase-admin repl to be sure, but I think you might also need the more explicit method for a subcollection autogenerated id: e.g.

db.collection("transcriptions").doc(`${transcript.id}`).collection("utterances").doc()

That makes sense. Reading this line of my last commit back and realizing it wouldn't work.

Mephistic · 2025-04-14T00:58:36Z

functions/src/webhooks/transcription.ts

+                await writer.close()
+              }
+
+              if (words) {


Given there are tens of thousands of words per document and we're using a sequential doc id, I'm worried this could potentially be problematically slow for a webhook.

e.g.:

Firestore default is 500 writes / second for new collections or those with sequential indexes.

~50,000 words in an average hearing / 500 = ~100 seconds to write all the word docs to firestore.

The AssemblyAI webhook response timeout is 10 seconds, and they retry up to 10 times before failing permanently.

Therefore, we could potentially see a ton of repeated firestore writes before failing anyway.

Switching to auto-generated doc ids might help to avoid the sequential index issue (there's no documented writes/second cap on those), and we could also partition multiple words into one doc to cut down on doc count (e.g. 1000 words per doc as an in-between for 1 word per doc and all words in one doc).

That said, this is getting more complicated than it's worth for a "nice-to-have" feature. How does the following sound for a rollout plan?

Don't save words to the DB at all in this PR - only save the transcription metadata and the utterances

Merge + Deploy this and have the flow run against a small subset of hearings on the Dev environment (IMO the default behavior is safe since it should only scrape recent hearings and there are only a few of those at any given time).

While we get a few hearings to run through the flow, I can tweak my hacky test front-end for transcriptions to support the new utterances subcollection format.

We'll have Matt V + a few others look at the transcribed hearings and confirm that we only need utterances

I suspect the visual UI will help get a better feel for the accuracy of the diarization than we've previously gotten by skimming example data.

At this point, we'll either have confirmation that we're fine with just utterances, or will have a better sense for how to organize words to get some use out of it.

If we're fine with just utterances, we can just run the backfill to transcribe all past hearings

If we find utterances not good enough for V1, we should have a quick chat

V1 isn't supposed to make promises around transcription accuracy, so anything complicated is likely out of scope here, but it's possible we'll be fine with just the steps I mentioned in the above comments (to effectively hack together pseudo-utterances out of the words data).

This approach should let us unblock this PR and get a better sense for how well our current approach plays with real data. Thoughts?

Sounds very sensible to me, I was starting to think the same thing. Pulling words out in the next commit.

Mephistic · 2025-04-15T14:19:21Z

functions/src/webhooks/transcription.ts

+          const authenticatedEventIds = [] as string[]
+          const hashedToken = sha256(String(req.headers["x-maple-webhook"]))
+
+          maybeEventsInDb.docs.forEach(async doc => {


Sorry, my bad here based on the last comment - given that the forEach function is async, I don't think these calls are guaranteed to complete before moving to the authenticatedEventIds.length === 1 check.

The map-predicate approach is likely the better option because it awaits on all the async checks before proceeding - though a less async iterative approach like the following might still work:

const authenticatedEventIds = [] for (docIndex in maybeEventsInDoc.docs){ const doc = maybeEventsInDoc.docs[docIndex] const tokenDocInDb = await db.whatever() if (isAuthed(doc, tokenDocInDb)) { authenticatedEventIds.push(doc.id) } } // No extra async, so authenticatedEventIds is now guaranteed to have all ids

for in is more like what I already wrote so going with that.

Mephistic · 2025-04-15T14:31:18Z

functions/src/webhooks/transcription.ts

+              // Delete the hashed webhook auth token from our db now that
+              // we're done.
+              authenticatedEventIds.forEach(async docId => {
+                await db.doc(docId).set({ ["x-maple-webhook"]: null })


Come to think of it, are we actually saving this x-maple-webhook key to Firestore such that this would null it? I though we stored the token in the private sub-collection when we first submit the transcription request (and we're now a lot more restrictive in the fields we actually save to the main transcription document).

oops, good catch, let me fix that.

Mephistic · 2025-04-16T22:13:11Z

functions/src/webhooks/transcription.ts

+
+              // Delete the hashed webhook auth token from our db now that
+              // we're done.
+              authenticatedEventIds.forEach(async docId => {


nit: I think this async forEach could run into the same problem as the other one that was just fixed - there's no guarantee the async function will complete before the request ends.

functions/src/webhooks/transcription.ts

Mephistic

LGTM!

I think there might still be some useful improvements to be made around the error handling in the webhook (e.g. different response codes if auth fails), but I don't think it's worth blocking this PR.

Working end-to-end transcription integrations.

17fdca1

boazsender requested review from sashamaryl, mertbagt, nesanders, mvictor55, timblais, alexjball, Mephistic and kiminkim724 as code owners March 26, 2025 23:48

vercel bot deployed to Preview – maple-dev March 26, 2025 23:53 View deployment

Simplify Assembly API call.

ca0f2bb

vercel bot deployed to Preview – maple-dev March 27, 2025 16:06 View deployment

Mephistic reviewed Mar 27, 2025

View reviewed changes

Address review feedback on transcriptions system.

f16f46e

vercel bot deployed to Preview – maple-dev April 5, 2025 00:26 View deployment

Mephistic requested changes Apr 6, 2025

View reviewed changes

Address further review feedback on transcriptions system.

e054c49

vercel bot deployed to Preview – maple-dev April 9, 2025 21:47 View deployment

Separate each utterance and word into its own doc.

5c4a21b

vercel bot deployed to Preview – maple-dev April 13, 2025 22:21 View deployment

Mephistic reviewed Apr 14, 2025

View reviewed changes

Remove utterance sequential ids and words all together.

f1f947c

vercel bot deployed to Preview – maple-dev April 14, 2025 15:23 View deployment

Implement further review feedback, add comments

92fccb3

vercel bot deployed to Preview – maple-dev April 15, 2025 14:03 View deployment

Mephistic reviewed Apr 15, 2025

View reviewed changes

Address additional review feedback.

24474c6

vercel bot deployed to Preview – maple-dev April 16, 2025 18:21 View deployment

Mephistic reviewed Apr 16, 2025

View reviewed changes

Implement further review feedback.

c9d5a90

vercel bot deployed to Preview – maple-dev April 17, 2025 14:06 View deployment

fix typo in webhook event query

64475c4

vercel bot deployed to Preview – maple-dev April 17, 2025 15:35 View deployment

Mephistic approved these changes Apr 17, 2025

View reviewed changes

Mephistic merged commit b28e44d into codeforboston:main Apr 22, 2025
7 of 8 checks passed

		@@ -158,20 +158,6 @@ class HearingScraper extends EventScraper<HearingListItem, Hearing> {
		const hearing = Hearing.check(eventData)

		@@ -191,25 +177,32 @@ class HearingScraper extends EventScraper<HearingListItem, Hearing> {
		maybeVideoURL = firstVideoSource.src

		transcript = await assembly.transcripts.submit({

		@@ -48,7 +52,7 @@ export const transcription = functions.https.onRequest(async (req, res) => {

		authenticatedEventsInDb.forEach(async d => {

Working end-to-end transcription integrations. #1774

Working end-to-end transcription integrations. #1774

Conversation

boazsender commented Mar 26, 2025

Local testing guide

vercel bot commented Mar 26, 2025 • edited Loading

Mephistic left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Mephistic left a comment

Choose a reason for hiding this comment

vercel bot commented Mar 26, 2025 •

edited

Loading