Use Sarcasm v2 Dataset #1

mubaris · 2017-09-28T01:59:52Z

Sarcasm v2 is a better dataset for this project. Since it has both parent comment and reply. Apply this dataset to make the prediction better.

cagdasgerede · 2017-09-30T21:24:38Z

v2 is a single csv file. I can write a python function to covert that file into the format learn.datasets.load_files expects?

For example, for the following data point:

Corpus,Label,ID,Quote
GEN,sarc,GEN_sarc_0000,First off, That's grade A USDA approved Liberalism in a nutshell.
GEN,notsarc,GEN_notsarc_1136.First

Programmatically

I can create a file GEN_sarc_0000.txt which contains "First off, That's grade A USDA approved Liberalism in a nutshell.". I can create a file GEN_notsarc_1136.txt which contains "First".
Then, I can put the file into container/sarc folder and container/notsarc respectively.

This way the current data loading can work as it is.

What do you think about this approach?

mubaris · 2017-10-01T02:04:54Z

v2 Dataset has columns Quote and Reply. That's why it's better than v1. If we have both parent comment and reply, I think our bot will have better accuracy.

Do not go down the method you proposed.

cagdasgerede · 2017-10-01T21:33:57Z

It sounds like you are describing a more substantial change. Then what are the steps of achieving what you propose? Since you label this as hacktoberfest, could you provide some more direction?

cromagnonninja · 2017-12-06T06:37:55Z

Can I work on this issue? What exactly are the problems or concerns regarding this issue at the moment?

mubaris · 2017-12-08T09:31:08Z

@Bhanu1911

Current Method - We generate features from a single text field to train the models.

The desired Method - v2 Dataset provides 2 text field - question and reply to it. We want to make new models based on these 2 inputs.

Hope this helps

cromagnonninja · 2017-12-08T11:27:21Z

Basically this means we have to start from the ground up - we now have to train a model for the replies too, if I'm not wrong? (I'll study the code and see how you trained the first time around.) Plan of action:

Split the csv file into two parts, quote and reply.
Train and test both post division
Configure the bot to send only those replies which get a reasonably high accuracy from all algorithms.
I believe that'll be the way to go?

cromagnonninja · 2017-12-08T11:56:05Z

Could you guide me as to how you created the dataset?

mubaris · 2017-12-08T13:52:27Z

@Bhanu1911 What I was thinking is little different.

Train the model with 2 inputs - quote and reply.
For a comment to be sarcastic on Reddit, we consider the comment(reply) and its parent comment(quote)

This makes sense because Sarcasm is context based. Having comment and its parent comment will be accurate than a single comment.

mubaris · 2017-12-08T13:54:04Z

I think the source gives enough background about how they created the dataset - Sarcasm v2

cromagnonninja · 2017-12-08T13:59:02Z

I meant how did you partition the dataset?

mubaris added hacktoberfest enhancement labels Sep 28, 2017

mubaris added KWoC help wanted and removed hacktoberfest labels Nov 30, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Sarcasm v2 Dataset #1

Use Sarcasm v2 Dataset #1

mubaris commented Sep 28, 2017

cagdasgerede commented Sep 30, 2017

mubaris commented Oct 1, 2017 •

edited

Loading

cagdasgerede commented Oct 1, 2017

cromagnonninja commented Dec 6, 2017

mubaris commented Dec 8, 2017

cromagnonninja commented Dec 8, 2017

cromagnonninja commented Dec 8, 2017

mubaris commented Dec 8, 2017

mubaris commented Dec 8, 2017

cromagnonninja commented Dec 8, 2017

Use Sarcasm v2 Dataset #1

Use Sarcasm v2 Dataset #1

Comments

mubaris commented Sep 28, 2017

cagdasgerede commented Sep 30, 2017

mubaris commented Oct 1, 2017 • edited Loading

cagdasgerede commented Oct 1, 2017

cromagnonninja commented Dec 6, 2017

mubaris commented Dec 8, 2017

cromagnonninja commented Dec 8, 2017

cromagnonninja commented Dec 8, 2017

mubaris commented Dec 8, 2017

mubaris commented Dec 8, 2017

cromagnonninja commented Dec 8, 2017

mubaris commented Oct 1, 2017 •

edited

Loading