Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/Luma Ai Video Generation Driver #1199

Draft
wants to merge 11 commits into
base: dev
Choose a base branch
from

Conversation

william-price01
Copy link
Contributor

@william-price01 william-price01 commented Sep 23, 2024

Describe your changes

Added Luma AI as a driver, tool, task, to generate videos.

Issue ticket number and link

Add Luma Ai as a driver

@william-price01 william-price01 changed the title Finished Video Generation Driver Feature/Luma Ai Video Generation Driver Sep 24, 2024
Comment on lines +13 to +16
value: The video binary data.
mime_type: The video MIME type.
resolution: The resolution of the video (e.g., 1920x1080).
duration: Duration of the video in seconds.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which of these do you feel confident you can have in the initial version (for example, it sounded like a target duration may not be something easy to get)? Are there other params that the providers expose that we should be taking into consideration? For example, I would anticipate framerate to be a popular parameter so that we can have everyone's favorite NTSC 59.94!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to somehow include most of the things Jason mentioned, I don't believe that there is a way to bring audio with these videos, so probably no subtitles for initial version.


@property
def mime_type(self) -> str:
return "video/mp4" # Or make this flexible based on the video format
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds like our loaders support .ogg and .webm, can we point to the same loc for both or is there a reason to keep them separate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should already have the same mime-type umbrella problem for image/audio, if its not in the framework already, check with collin on the preferred approach.

Also, should this be an actual attribute instead of a @property?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@property is correct, just needs to be updated to something like this.

duration: Duration of the video in seconds.
"""

aspect_ratio: tuple[int, int] = field(default=(16, 9), kw_only=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

open question: do providers enumerate possible aspect ratios, or let you do weirdo things

Comment on lines +28 to +29
def to_text(self) -> str:
raise NotImplementedError("VideoArtifact cannot be converted to text.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other not-so-texty artifacts generate a string that describes the parameters of the artifact, is that what we want here, too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was more of a testing method, while getting it to actually work.

Comment on lines +248 to +249
"DreamMachineVideoGenerationDriver",
"BaseVideoGenerationDriver",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

am I the only one at this damned company that likes my lists alphabetized? Not for this PR, but dang

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with james, but don't reorder this list in this PR. (Generally mixing refactoring or formatting with features makes it difficult for reviewers to see what is actually changing)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BaseVideoGenerationDriver should be placed before DreamMachineVideoGenerationDriver for circular dependency (and alphabetical) reasons.

Comment on lines +32 to +33
response = self.client.generations.create(prompt=prompt, **self.params)
generation = response
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of assigning to an alias, why not assign directly to generation?

Comment on lines +37 to +38
if not generation.id:
raise Exception("Generation ID not found in the response")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this in the while loop? Is this something that should be caught either before or after the dreaming begins?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, for testing purposes, pyright was giving me a hard time for not checking if the id exists.

Comment on lines +44 to +45
if not video_url:
raise Exception("Video URL not found in the generation response")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If no URL shows up, is that indicative of something bad, like it "completed" but was somehow a failure? Can we convey what the situation is to the user?

video_url = generation.assets.video
if not video_url:
raise Exception("Video URL not found in the generation response")
video_binary = self._download_video(video_url)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

downloading feels like it could wrong in a lot of other ways (retries, timeouts, etc.)

else:
raise Exception(f"Video generation failed with status: {status}")

def _download_video(self, video_url: str) -> bytes:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

videos are big and take a long time to come down. Do we...

  1. ...already have this functionality somewhere else in the framework?
  2. ...need to take into consideration fails on memory, disk, retries, timeout?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id be fine with putting this functionality directly on VideoArtifact as a "lazy load" type feature. the alternative is creating a persistence driver for everything, which we dont have, but we have the ArtifactFileOutputMixin available.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that the Driver should probably not be the one to do this. I don't know about putting it in VideoArtifact either -- feels like too much responsibility.

In-fact, we probably shouldn't make any assumptions that the user wants us to download it. Maybe they're fine receiving a URL that they watch in their browser. Maybe this should fall onto a Loader?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be the same idea as a lazy load. VideoArtifact contains a reference to the video somewhere.

Comment on lines +25 to +26
def get_aspect_ratio(self) -> tuple[int, int]:
return self.aspect_ratio
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont need this, aspect_ratio can be directly accessed

Comment on lines +28 to +29
def to_text(self) -> str:
raise NotImplementedError("VideoArtifact cannot be converted to text.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use an approach similar to AudioArtifact for this

Comment on lines +23 to +28
client: LumaAI = field(
default=Factory(
lambda self: import_optional_dependency("lumaai").LumaAI(auth_token=self.api_key), takes_self=True
),
kw_only=True,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the lazy_property approach for this

),
kw_only=True,
)
params: dict[str, Any] = field(default={}, kw_only=True, metadata={"serializable": True})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use factory=dict here, this will return a single mutable object which is no good

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is important

Comment on lines +35 to +41
while status in ["dreaming", "queued"]:
time.sleep(5)
if not generation.id:
raise Exception("Generation ID not found in the response")

generation = self.client.generations.get(generation.id)
status = generation.state
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try to use the tenacity library for this here. i know this is the same approach as the cloud driver but we should try to clean this pattern up

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually not sure about this. Retrying (tenacity) feels different than polling (what we're doing here).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tenacity has built in conditions for if_condition or if_not_condition. its not just for retrying on errors

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't doubt it can be used for non-exception things, but all of the examples are for exception retrying which makes me think that's its primary purpose.

Do you have an example of how tenacity might be used for polling?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something like

from tenacity import retry, retry_if_result, stop_after_attempt, wait_fixed

@retry(
    retry=retry_if_result(lambda result: result is None),
    stop=stop_after_attempt(3),
    wait=wait_fixed(5),
)
def call_api(url_to_poll: str) -> Optional[str]:
    response = requests.get(url_to_poll)
    if response.status_code != 200:
        return None
    return response.text

Comment on lines +47 to +53
def save_video_artifact(self, artifact: VideoArtifact) -> None:
if self.output_file:
outfile = self.output_file
elif self.output_dir:
outfile = os.path.join(self.output_dir, artifact.name + ".mp4")
else:
raise ValueError("No output_file or output_dir specified.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this method? _write_to_file should be fine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason, it would call the to_text method inside of the base artifact class which the Video Artifact is unable to be converted to text in the same way.

Copy link
Member

@vachillo vachillo Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to override to_bytes on the video artifact

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, that makes sense! Thank you!



@define
class PromptVideoGenerationTask(BaseVideoGenerationTask):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont think these need to be two different classes. the input can be different types instead

"description": "Generates a video from text prompts.",
"schema": Schema(
{
Literal("prompt", description=BaseVideoGenerationTool.PROMPT_DESCRIPTION): str,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be inlined



@define
class PromptVideoGenerationTool(BaseVideoGenerationTool):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesnt need to be two classes, just inherit from BaseTool and make VideoGenerationTool

),
},
)
def generate_video(self, params: dict[str, dict[str, str]]) -> VideoArtifact | ErrorArtifact:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just typed as params: dict

Copy link
Contributor

@dylanholmes dylanholmes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't quite get a full review in yet, but overall the approach seems fine. You are following existing patterns (e.g. image generation) which I like.

I'll try to review a little more later today


@property
def mime_type(self) -> str:
return "video/mp4" # Or make this flexible based on the video format
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should already have the same mime-type umbrella problem for image/audio, if its not in the framework already, check with collin on the preferred approach.

Also, should this be an actual attribute instead of a @property?

Comment on lines +248 to +249
"DreamMachineVideoGenerationDriver",
"BaseVideoGenerationDriver",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with james, but don't reorder this list in this PR. (Generally mixing refactoring or formatting with features makes it difficult for reviewers to see what is actually changing)

Copy link
Member

@collindutter collindutter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start! Have not looked at the entire thing but left some initial feedback.


@property
def mime_type(self) -> str:
return "video/mp4" # Or make this flexible based on the video format
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a format field similar to AudioArtifact and use that here.


@property
def mime_type(self) -> str:
return "video/mp4" # Or make this flexible based on the video format
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@property is correct, just needs to be updated to something like this.

Comment on lines +248 to +249
"DreamMachineVideoGenerationDriver",
"BaseVideoGenerationDriver",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BaseVideoGenerationDriver should be placed before DreamMachineVideoGenerationDriver for circular dependency (and alphabetical) reasons.


return result
else:
raise Exception("Failed to run text to video generation")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also caught my eye, but we do it in BasePromptDriver which is probably where this came from. RuntimeError would probably be a better candidate here.

generation = response
status = generation.state
while status in ["dreaming", "queued"]:
time.sleep(5)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sleep time should be configurable as something like poll_interval

raise Exception("Generation ID not found in the response")

generation = self.client.generations.get(generation.id)
status = generation.state
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can get rid of status -- just use generation.state

Comment on lines +35 to +41
while status in ["dreaming", "queued"]:
time.sleep(5)
if not generation.id:
raise Exception("Generation ID not found in the response")

generation = self.client.generations.get(generation.id)
status = generation.state
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually not sure about this. Retrying (tenacity) feels different than polling (what we're doing here).


generation = self.client.generations.get(generation.id)
status = generation.state
if status == "completed":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the exa library provide any types for this so we can do something like status == exa.COMPLETED

if status == "completed":
video_url = generation.assets.video
if not video_url:
raise Exception("Video URL not found in the generation response")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flip conditional as mentioned above. Also raise a ValueError

else:
raise Exception(f"Video generation failed with status: {status}")

def _download_video(self, video_url: str) -> bytes:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that the Driver should probably not be the one to do this. I don't know about putting it in VideoArtifact either -- feels like too much responsibility.

In-fact, we probably shouldn't make any assumptions that the user wants us to download it. Maybe they're fine receiving a URL that they watch in their browser. Maybe this should fall onto a Loader?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants