2015, NIPS, Character-level Convolutional Networks for Text Classification #83

Sepideh-Ahmadian · 2024-09-25T21:23:33Z

Paper
Character-level Convolutional Networks for Text Classification

Introduction
In the realm of text classification, most models have considered the words as the building blocks. This research explores the application of convolutional neural networks (ConvNets) for text classification by treating text as a sequence of characters rather than words. Comparing it to traditional word-based models such as bag-of-words, and n-grams, the paper argues that character-level models can work without knowledge of words or their syntactic or semantic structure and achieve competitive or state-of-the-art results on large datasets.

Main Problem
There is a need for easy-to-implement methods in the NLP domain.
The authors were motivated by the challenge of creating a text classification model that operates without prior knowledge of linguistic structures like words or phrases. They aimed to demonstrate that character-level convolutional networks could perform competitively with traditional word-based models. Additionally, they sought to address the scalability of text classification models on large datasets and explore the benefits of applying ConvNets, which have been successful in image and speech recognition, to text classification.

Illustrative Example
Not mentioned

Input
A Sentence in one of the provided datasets (news or review)(a sequence of words).

Output
Class label of the sentence based on the classification task.

Motivation
The authors were motivated by the challenge of changing the perspective of the text classification task. Inspired by the successful methods in other domains such as signal processing and vision, they suggested considering characters as the building blocks of language, not the words. In addition to that, there is a need for scalable models in text classification that can work on many languages because all of them are built based on an alphabetical system. It can also help reduce the preprocessing effort since some misspellings and emoji’s can be learned.

Related works and their gaps
In previous works, the ConvNets have been applied to the discrete or distributed embedding of words (without any knowledge of syntactic or semantic structures of the language)
Some works have been implemented at a character level with linear classifiers.
Previous literature focused on word-based models for text classification, where predefined word embeddings or features based on n-grams were common. The paper seeks to fill the gap by investigating whether character-level models can perform compare with or better than word-level models. It also addresses the limitation of relying on word embeddings and syntactic structures by treating text as raw signals composed of characters.

Contribution of this paper
The main contributions of the paper are:

The introduction and demonstration of character-level convolutional networks for text classification show that words or prior syntactic knowledge are not necessary for effective classification.
A thorough empirical comparison of character-level ConvNets against traditional methods like bag-of-words, n-grams, and word-based deep learning models.

Proposed methods
Not provided in this summary

Experiments
Datasets:

AG’s News: News articles classification.
DBPedia: Ontology classification.
Yelp Reviews: Sentiment analysis with both full rating and polarity classification.
Amazon Reviews: Full and polarity classification.
Yahoo! Answers: Topic classification of QA data
Tasks:
Topic classification and sentiment analysis.
Models:
word-based ConvNets, LSTM

Implementation
Not mentioned

Gaps this work
The study is limited to only two languages, and no comprehensive comparison across various language families has been considered. Additionally, the domains (primarily reviews) are restricted, meaning the sentence structures and word frequencies where the model performs well may not generalize to other domains. Focusing on the character level makes it difficult to interpret the results, as CNNs capture only local N-grams, failing to account for long-range dependencies. Moreover, the model does not capture domain-specific features, such as important keywords or terminology.

hosseinfani · 2024-09-26T21:33:29Z

@Sepideh-Ahmadian
I'm thinking of reproducing this work or the likes using backtranslation, that is classifying the original text is better or its translation or its backtranslation or augmentation of them. What do you think?

Sepideh-Ahmadian · 2024-09-26T21:53:41Z

That sounds interesting @hosseinfani.
However, I was wondering, if we observe differences between various versions of a sentence, how can we interpret them since we are working at the character level?
I mean how we can define a research question which is interpretable based on this research framework?

Sepideh-Ahmadian self-assigned this Sep 25, 2024

Sepideh-Ahmadian added the literature-review Summary of the paper related to the work label Sep 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2015, NIPS, Character-level Convolutional Networks for Text Classification #83

2015, NIPS, Character-level Convolutional Networks for Text Classification #83

Sepideh-Ahmadian commented Sep 25, 2024

hosseinfani commented Sep 26, 2024

Sepideh-Ahmadian commented Sep 26, 2024

2015, NIPS, Character-level Convolutional Networks for Text Classification #83

2015, NIPS, Character-level Convolutional Networks for Text Classification #83

Comments

Sepideh-Ahmadian commented Sep 25, 2024

hosseinfani commented Sep 26, 2024

Sepideh-Ahmadian commented Sep 26, 2024