This node allows you to create a word cloud from text data coming from one of three sources. The node can create a visualization for data in a modeler data object, it can read a single URL and parse the text using a CSS selector, or it can read a local .txt file. The extension has options for text cleaning, removing punctuation, numbers, and English stop words. It also has options for the displaying the word cloud.
The first tab for this node is Text Source.
First chose the source of the text data, either from Modeler, Web, or Local file. This is selected using the radio button on the left side.
For Modeler Data, select one or more columns from the dataset containing text. If one column is selected, an option is available for creating a word cloud for each row of text in that column. This option is enabled by clicking the check box below the field selector. If that box is unchecked, or multiple columns are selected then the each column is concatenated into a single string and used to create a word cloud per column.
For Web Text, enter the URL containing the text for analysis and the appropriate CSS selector. If you are unfamiliar with CSS Selectors, I recommend a tool like http://selectorgadget.com/ to help find the correct HTML elements.
Local Text source files must be in a .txt format. One word cloud is generated for a text file.
The second tab for this node is Display & Save Options.
The tab contains options for text preparation, defaulting to removing punctuation, numbers, and (‘english’) stop words from the text.
The word cloud display group of parameters adjust the colors used based on the R Color Brewer package. The minimum frequency of words to be used in the word cloud, the maximum number of words to display, and the rotation percent of words can be set in this section. You can also print the words with their respective frequencies by checking the box in this section.
The Save option will create a .png file containing the word cloud(s) generated by the node. If multiple files are created (for multiple word clouds) then a value at the end of the file name will increment for each file. The width and height values in this section are in inches.
- IBM SPSS Modeler v16 or later
- ‘R Essentials for SPSS Modeler’ plugin: Download here
- R 2.15.x or R 3.1 (Use this link to find the correct version)
- Download the extension: Download
- Close IBM SPSS Modeler. Save the .cfe file in the CDB directory, located by default on Windows in "C:\ProgramData\IBM\SPSS\Modeler\version\CDB" or under your IBM SPSS Modeler installation directory. Note: this is a hidden directory, so you need to type it in manually or copy/paste the file path.
- Restart IBM SPSS Modeler, the node will now appear in the Output palette.
The R packages will be installed the first time the node is used as long as an Internet connection is available.
- Find a PDF with the documentation of this extension in the Documentation directory
- There is a sample available in the Example directory
- Greg Filla (gdfilla)