Skip to content

Releases: Sefaria/LLM

v1.3.2

18 Jul 08:34
Compare
Choose a tag to compare

v1.3.1

18 Jul 07:47
Compare
Choose a tag to compare

app-v1.3.1 (2024-07-18)

Bug Fixes

v1.3.0

18 Jul 07:44
Compare
Choose a tag to compare

app-v1.3.0 (2024-07-18)

Features

  • add Hebrew sentencizer (6c63f70)
  • basic style_guide.py (b870f91)
  • generalize clusterer more so it can take any algorithm that requires n clusters to be optimized (d049a2c)
  • integrate style guide into prompt generator (f28ed79)
  • style guide works. yay. (49fbd31)
  • style guide works. yay. (6a5aa7a)

Bug Fixes

  • update code to latest version of openai (13b6833)

v1.2.5

20 Jun 11:21
Compare
Choose a tag to compare

app-v1.2.5 (2024-06-20)

Bug Fixes

  • add lang param to get_topics() (64cc947)
  • auto-correct inner quotation marks in JSON strings (3214246)

chart-1.1.8

20 Jun 11:38
Compare
Choose a tag to compare

chart-v1.1.8 (2024-06-20)

v1.2.4

17 Jun 11:27
Compare
Choose a tag to compare

app-v1.2.4 (2024-06-17)

v1.2.3

17 Jun 11:08
Compare
Choose a tag to compare

app-v1.2.3 (2024-06-17)

v1.2.2

17 Jun 10:05
Compare
Choose a tag to compare

app-v1.2.2 (2024-06-17)

v1.2.1

17 Jun 08:29
Compare
Choose a tag to compare

app-v1.2.1 (2024-06-17)

Bug Fixes

  • remove dependency on sefaria project from uniqueness_of_source.py (67c64c1)

v1.2.0

17 Jun 07:03
Compare
Choose a tag to compare

app-v1.2.0 (2024-06-17)

Features

  • improve summary so it can potentially know if text isn't relevant to topic (47a5959)
  • add basic metric.py file which can determine questions answered by a given source (b3dd118)
  • add cluster caching (eeef931)
  • add curated_topic.py to llm interface (bba5c9f)
  • add dogs (0981a29)
  • add embedding distance function (979c259)
  • add embeddings model and cache for it (d067091)
  • add file to create good and bad curation datasets (ce0ce92)
  • add first version of summarize and embed algo (4cf63bc)
  • add function to try to derive useful context for a source (4a7890e)
  • add get_by_xml_list function (a539ecd)
  • add input files for source curation (d1e8d6a)
  • add metric to find curated topics that are good based on how distinct the sources are (68233c0)
  • add script to translate a bunch of stuff (c8536dd)
  • add sqlite caching for calls to basic langchain (8f61dca)
  • allow translating a specific version (d206497)
  • also print text when translation fails (3bf9ccf)
  • differentiate titles that are repetitious (b3cf36c)
  • export topic pages (346e9fa)
  • export topic pages (12c4bd9)
  • failed attempt to calculate pair-wise difference b/w embeddings (13bc16d)
  • finalize title deduplication prompt (d868ec4)
  • gather sources pipeline basically working. (31fa7c9)
  • Guide: Question Generator for Learning Guide (064e219)
  • improve clustering by using affinitypropogation to cluster noise and breakup large clusters. then use cluster summary cosine distance to merge very similar clusters. (4893fd2)
  • improve description writing (9317de8)
  • improve export of good bad datasets (3d70721)
  • improve importing and instantiating CurateTopic (bfce0bf)
  • improve random seed setting. improve summary so it can potentially know if text isn't relevant to topic (33a0535)
  • introduce pipeline arch for gathering sources (c94236b)
  • move core logic of clustering to cluster.py (cf41ee8)
  • optimize threshold for merging similar cluster summaries. control verbosity. increase affinitypropogation iterations although I don't have data to indicate this helps... (3beebc5)
  • push question extractor (ad60a6f)
  • refactor and improve clustering optimization (3957aeb)
  • save output to file (4a1673a)
  • switch to basic langchain impl of voyage ai to use caching (af6e6e6)
  • temp solution to avoid generating descriptions for certain topics (55ff05f)
  • wait 10 min on rate limit error (8a947c1)
  • WIP summarize questions (874e66f)
  • write get_or_generate_topic_description() (354e550)

Bug Fixes

  • add more slugs to blacklist (9fffef9)
  • bugs in new generalized cluster.py (9e2c51b)
  • check that match is not None (82a178c)
  • dont calc stdev if other clusters are <= 1 (68e4729)
  • dont cluster noise if there is none! (ce82db4)
  • dont forget to strip output before checking word count (244401e)
  • fix imports (a833c8a)
  • installation command of LLM interface package (0903894)
  • move back to white list for generating topic descs (1e8d36f)
  • only output generated source first time it's generated (af86a6b)
  • pass verbose down (d98ffbc)
  • summarize clusters in parallel (5a07f6f)
  • undo ability for uniqueness of source to say if topic doesn't apply to text (e2f94e8)