cluster by embedding #70

shuishen112 · 2024-07-24T22:20:00Z

add instruction clustering by using sentence transformer encoding.

sordonia · 2024-08-05T20:32:50Z

projects/modular_llm/get_clusters.py

+elif args.encoding == "embedding":
+    model = SentenceTransformer(args.model)
+
+# load the dataset


remove comment

sordonia · 2024-08-05T20:33:50Z

projects/modular_llm/get_clusters.py

+# load the dataset
+
+
+def get_orca_dataset():


can you create one function get_dataset(dataset_name) ?

sordonia · 2024-08-05T20:34:07Z

projects/modular_llm/get_clusters.py

+
+def get_flan_dataset():
+
+    flan = FlanModule(


for this you don't need to use module, just load_dataset(dataset_name) should work

sordonia · 2024-08-05T20:35:41Z

projects/modular_llm/get_clusters.py

+        enumerate(train_dataloader), total=len(train_dataloader), desc="dataset"
+    ):
+        if "source" in batch:
+            embedding = get_text_encode(batch["source"], model)


can you add an argument from the command line that specifies which field to consider in the dataset?

--text_column_name "source"

cluster by embedding

2b595c1

matheper added a commit that referenced this pull request Jul 31, 2024

Added get_clusters from #70

5b2e3cd

add embed_dim, rm nomic

3febab8

shuishen112 force-pushed the get_cluster_by_embedding branch from 9ed42e9 to 3febab8 Compare August 4, 2024 14:14

sordonia reviewed Aug 5, 2024

View reviewed changes

sordonia closed this Sep 5, 2024

sordonia deleted the get_cluster_by_embedding branch February 13, 2025 20:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cluster by embedding #70

cluster by embedding #70

shuishen112 commented Jul 24, 2024

sordonia Aug 5, 2024

sordonia Aug 5, 2024

sordonia Aug 5, 2024

sordonia Aug 5, 2024

cluster by embedding #70

cluster by embedding #70

Conversation

shuishen112 commented Jul 24, 2024

sordonia Aug 5, 2024

Choose a reason for hiding this comment

sordonia Aug 5, 2024

Choose a reason for hiding this comment

sordonia Aug 5, 2024

Choose a reason for hiding this comment

sordonia Aug 5, 2024

Choose a reason for hiding this comment