Dataset Creation using LLMs! #96

Sepideh-Ahmadian · 2024-10-19T02:23:47Z

We’re so happy to have you on board with the LADy project, Calder! We use the issue pages for many purposes, but we really enjoy noting good articles and our findings on every aspect of the project.

We can use this issue page to compile all our findings about LLMs for data generation. A great article to start with is "On LLMs-Driven Synthetic Data Generation, Curation, and Evaluation: A Survey", which you can also find in the team’s article repository.

The key questions we’re exploring are: Which language models perform best in data creation (considering the domain and the task at hand), and what are their advantages and disadvantages? As you go through the suggested paper and similar ones, feel free to add and suggest articles in both the Google Doc and here.

Once we've covered the research, we’ll dive into Q1, as mentioned by Hossein in today’s session, where we’ll test the LLMs on our gathered dataset.

If you have any questions, feel free to ask here and mention either me or Hossein!

CalderJohnson · 2024-10-19T20:23:43Z

Sounds great! I'll delve into the literature you've found as well as any other papers that catch my eye and summarize their relevant points to our goal in the google doc.

CalderJohnson · 2024-10-25T23:12:43Z

In reference to my report in the teams channel, here is my fork with my current implementation for you to take a look at:

my fork

Next week (or perhaps the week after as I have a number of midterms next week) I'll be adding options the dsg pipeline to evaluate LLM performance on explicit aspect reviews like we discussed in todays meeting :)

hosseinfani · 2024-10-26T01:23:56Z

By @CalderJohnson

Preliminary work on the DSG (Implicit Dataset Generation) pipeline.
Hello all,

I've created a preliminary pipeline for the generation of an implicit aspect dataset. I've done the following work:
I modified semeval.py and review.py to be able to take in optional arguments for reviews containing implicit aspects. The semeval loader can now load XML reviews that have NULL target values.

I modified the Review object to have an additional attribute, a boolean array named implicit. A value of True at implicit[i] indicates that the corresponding aos[i] refers to an implicit aspect. I then modified get_aos() to return an aspect term of "null" when retrieving the aos associated with an implicit aspect.
I created a pipeline with a filtering stage that only keeps reviews with implicit aspects.

It then has a generation stage that leverages GPT-4o-mini to label each review with a fitting term corresponding to the implied aspect.

Of course, this is a rough implementation, but it will serve as a baseline from which we can further tune/narrow our prompt structure, LLM choice, and other generation hyperparameters, as well as extend it to datasets other than semeval.

I've attached a screenshot of the labels generated from the toy dataset. In the aos field, the LLM generated aspect term is contained. So far, accuracy is promising but improvements must certainly be made.

hosseinfani · 2024-10-26T01:35:08Z

@CalderJohnson
thank you very much. this is very nice.
just a quick note that in the code, there is an option where we say how to treat a raw review in dataset:

LADy/src/params.py

Line 12 in c261acb

    
           'doctype': 'snt', # 'rvw' # if 'rvw': review => [[review]] else if 'snt': review => [[subreview1], [subreview2], ...]'

Also, can we use None instead of "null"?

CalderJohnson · 2024-11-01T20:54:54Z

I've been modifying my preliminary pipeline to make components more modular and to incorporate multiple LLMs for the upcoming experiment. I've also resolved the None/"null" issue Dr. Fani mentioned above.

A challenge I've encountered is that the main Python interface to Google's Gemini (an LLM we planned to test the effectiveness of for this task) requires Python version 3.9 to work API reference

I was wondering if there's a specific reason we are keeping LADy on Python 3.8. If there is, I can circumvent this by making the request directly using google's REST API with Python's requests library. If not, I'll try updating and seeing if the pipeline still works. It would be nice to have modern Python features like the match statement as well.

Sepideh-Ahmadian · 2024-11-01T21:27:37Z

Thank you, @CalderJohnson , for your update.

Currently, all the libraries used in LADy are based on Python 3.8. If we switch to Python 3.9, I think we will face a series of version compatibility issues.

CalderJohnson · 2024-11-01T21:44:45Z

This is true, although most libraries maintain backwards compatibility. I will try creating a new environment with the newest version of each python/any libraries that need to be updated and see if the pipeline still runs. If I run into compatibility issues, I'll just query the API manually for Gemini.

Sepideh-Ahmadian · 2024-11-01T22:06:57Z

Sounds like good plan!

CalderJohnson · 2024-11-07T21:28:00Z

Just an update: I've created the scaffolding for the experiment (coded the evaluation metrics, etc.), and ran it on the one model I have currently set up (gpt-4o-mini).

Graphed the results here (only tested on the toy example so far): results chart

They look poor, but I believe this to be due to the way I checked if the predictions were similar to ground truth. My threshold for similarity may have been too high, as it didn't pick up on similarities such as "food" and "chicken" (which usually are somewhat unsimilar words, but in the context of a restaurant should be treated as somewhat synonymous).

Next steps are to improve the way I measure similarity and of course integrate more models to test.

Also, let me know if I should use more/different evaluation metrics.

CalderJohnson · 2024-11-07T21:29:09Z

As you can see in the chart "precision" and "exact matches" are the same so my evaluator only flagged them as the same if the wording was identical. I'll be working on changing this and getting an updated (more accurate) chart together.

Sepideh-Ahmadian · 2024-11-07T22:59:16Z

Thank you @CalderJohnson for the update. We can also test the top five results.
I have an idea, recently reviewed an article discussing data augmentation methods, particularly an alternative to synonym replacement. While synonym replacement may inadvertently shift sentiment, using hypernyms instead can help the model generalize terms without altering context. By providing examples like 'primate' as a hypernym for 'human' or 'food' as a hypernym for 'chicken,' we can encourage the model to recognize broader domain categories. This strategy could reduce instances where contextually accurate predictions are incorrectly marked as errors.

CalderJohnson · 2024-11-08T20:11:41Z

Good to know! I'll keep this in mind if I'm unable to get good results from comparing the embeddings alone.

Sepideh-Ahmadian assigned hosseinfani and Sepideh-Ahmadian Oct 19, 2024

hosseinfani added documentation Improvements or additions to documentation literature-review Summary of the paper related to the work experiment labels Oct 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset Creation using LLMs! #96

Dataset Creation using LLMs! #96

Sepideh-Ahmadian commented Oct 19, 2024

CalderJohnson commented Oct 19, 2024

CalderJohnson commented Oct 25, 2024

hosseinfani commented Oct 26, 2024

hosseinfani commented Oct 26, 2024

CalderJohnson commented Nov 1, 2024 •

edited

Loading

Sepideh-Ahmadian commented Nov 1, 2024

CalderJohnson commented Nov 1, 2024

Sepideh-Ahmadian commented Nov 1, 2024

CalderJohnson commented Nov 7, 2024

CalderJohnson commented Nov 7, 2024

Sepideh-Ahmadian commented Nov 7, 2024

CalderJohnson commented Nov 8, 2024

Dataset Creation using LLMs! #96

Dataset Creation using LLMs! #96

Comments

Sepideh-Ahmadian commented Oct 19, 2024

CalderJohnson commented Oct 19, 2024

CalderJohnson commented Oct 25, 2024

hosseinfani commented Oct 26, 2024

hosseinfani commented Oct 26, 2024

CalderJohnson commented Nov 1, 2024 • edited Loading

Sepideh-Ahmadian commented Nov 1, 2024

CalderJohnson commented Nov 1, 2024

Sepideh-Ahmadian commented Nov 1, 2024

CalderJohnson commented Nov 7, 2024

CalderJohnson commented Nov 7, 2024

Sepideh-Ahmadian commented Nov 7, 2024

CalderJohnson commented Nov 8, 2024

CalderJohnson commented Nov 1, 2024 •

edited

Loading