Skip to content

How to train a layer before the first layer of existing models (e.g. gemma) using torchtune? #2766

Answered by pbontrager
GeorgeCarpenter asked this question in Q&A
Discussion options

You must be logged in to vote

So you want to add an image encoder layer? The standard way to do this is to create an Early Fusion model. You can define your custom model with

def my_mm_model(...):
	decoder = existingLLM(...)
	encoder = nn.Conv2D(3, embed_dim, ...)
	return EarlyFusionModel(
		decoder,
		{"image": encoder},
		encoder_tokens={"image": 128256},
		decoder_trainable=True,
		encoder_trainable=True,
	)

MyTransform(MyModelTokenizer, Transform):
	def __init__(self, ...):
		super.__init__(...)
		self.image_transform = MyImageTransform(...)

	def __call__(self, sample, inference=False):
			images = []
			for message in sample["messages"]:
				if image in message.get_media():
					image = self.transform_image(image

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by GeorgeCarpenter
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants