Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

Training on texts with different lengths #292

Open
Magdiel3 opened this issue May 25, 2020 · 0 comments
Open

Training on texts with different lengths #292

Magdiel3 opened this issue May 25, 2020 · 0 comments

Comments

@Magdiel3
Copy link

How should I handle variation in text length (words for each line in training file)? Is it okay to just train with these differences or should I perform any normalization tasks to the text lengths before?

I am working on classifying words to a text that better fits them (i.e. relate the word electronics to text that mention or are about this topic). I'm just training on trainMode 0 with the text as data and the name of the text source as the label. The length of each text variate in range from 1 to 700 words. (Median of 74 words and std of 96 words).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant