- 
                Notifications
    You must be signed in to change notification settings 
- Fork 28
6. Creating DataFrame
There are four ways of creating a data frame:
The easiest and most straightforward way of creating a DataFrame is by passing all data in an array of arrays to fromRows: or fromColumns: message. Here is an example of initializing a DataFrame with rows:
df := DataFrame fromRows: #(
   ('Barcelona' 1.609 true)
   ('Dubai' 2.789 true)
   ('London' 8.788 false)).The same data frame can be created from the array of columns
df := DataFrame fromColumns: #(
   ('Barcelona' 'Dubai' 'London')
   (1.609 2.789 8.788)
   (true true false)).Since the names of rows and columns are not provided, they are initialized with their default values: (1 to: self numberOfRows) and (1 to: self numberOfColumns). Both rowNames and columnNames can always be changed by passing an array of new names to a corresponding accessor. This array must be of the same size as the number of rows and columns.
df columnNames: #(City Population BeenThere).
df rowNames: #(A B C).You can convert this data frame to a pretty-printed table that can be coppied and pasted into letters, blog posts, and tutorials (such as this one) using df asStringTable message
   |  City       Population  BeenThere  
---+----------------------------------
A  |  Barcelona       1.609       true  
B  |  Dubai           2.789       true  
C  |  London          8.788      false
By it's nature DataFrame is similar to a matrix. It works like a table of values, supports matrix accessors, such as at:at: or at:at:put: and in some cases can be treated like a matrix. Some classes provide tabular data in matrix format. For example TabularWorksheet class of Tabular package that is used for reading XLSX files. To initialize a DataFrame from a maxtrix of values, use fromMatrix: method
matrix := Matrix
   rows: 3 columns: 3
   contents:
      #('Barcelona' 1.609 true
        'Dubai' 2.789 true
        'London' 8.788 false).
         
df := DataFrame fromMatrix: matrix.Once again, the names of rows and columns are set to their default values.
In most real-world scenarios the data is located in a file or database. The support for database connections will be added in future releases. Right now DataFrame provides you the methods for loading data from two most commot file formats: CSV and XLSX
DataFrame fromCSV: 'path/to/your/file.csv'.
DataFrame fromXLSX: 'path/to/your/file.xlsx'.Since JSON does not store data as a table, it is not possible to read such file directly into a DataFrame. However, you can parse JSON using NeoJSON or any other library, construct an array of rows and pass it to fromRows: message, as described in previous sections.
DataFrame provides several famous datasets for you to play with. They are compact and can be loaded with a simple message. An this point there are three datasets that can be loaded in this way - Iris flower dataset, a simplified Boston Housing dataset, and Restaurant tipping dataset.
DataFrame loadIris.
DataFrame loadHousing.
DataFrame loadTips.