Topic Model

Step1:

ArrayList<Pipe> pipeList = new ArrayList<Pipe>();

this is basically going to take a set of pipes. Pipes are used for things like lowercase, tokenize, remove stopwords, map to features

Step 2:

\\ Read dependent files
ClassLoader classLoader = TopicModel.class.getClassLoader();
File en_txt = new File(classLoader.getResource("en.txt").getFile()); // stop words in english
File ap_txt = new File(classLoader.getResource("ap.txt").getFile()); // example input files

Step 3:

Create your pipes

// Pipes: lowercase, tokenize, remove stopwords, map to features
pipeList.add( new CharSequenceLowercase() );
pipeList.add( new CharSequence2TokenSequence(Pattern.compile("\\p{L}[\\p{L}\\p{P}]+\\p{L}")) );
pipeList.add( new TokenSequenceRemoveStopwords(en_txt, "UTF-8", false, false, false) );
pipeList.add( new TokenSequence2FeatureSequence() );

Step 4:

Read your test data

InstanceList instances = new InstanceList (new SerialPipes(pipeList));

Reader fileReader = new InputStreamReader(new FileInputStream(ap_txt), "UTF-8");
instances.addThruPipe(new CsvIterator (fileReader, Pattern.compile("^(\\S*)[\\s,]*(\\S*)[\\s,]*(.*)$"),3, 2, 1)); 
// data, label, name fields

/* Don't freak out. All that's happening here is reading a input line into instances. */
// Example input : /* AP881218-0003 	X	A 16-year-old student   */ 
// So extract the ID, a label and the text.

Step 5:

Create a model object and specify the number of topic models that we want.

// Create a model with 100 topics, alpha_t = 0.01, beta_w = 0.01
//  Note that the first parameter is passed as the sum over topics, while
//  the second is the parameter for a single dimension of the Dirichlet prior.
int numTopics = 100;
ParallelTopicModel model = new ParallelTopicModel(numTopics, 1.0, 0.01);
model.addInstances(instances);

Step 6: This is an internal step. Dont worry about it. Just create threads and execute the stuff.

Create threads and just execute

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Topic Model

Step1:

Step 2:

Step 3:

Step 4:

Step 5:

Step 6: This is an internal step. Dont worry about it. Just create threads and execute the stuff.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally