Skip to content

Importing Data

David Wright edited this page May 1, 2018 · 6 revisions

Meta.Numerics.Data allows you to import data from comma-separated-value (CSV) format, JSON dictionary format, or programmatically via direct manipulation of table objects. This tutorial explains how.

How do I read in data from a CSV file?

First, let's make a CSV file to import. Copy and paste the following text into a file named test.csv in your program's working directory:

Id, Name, Sex, Birthdate, Height, Weight, Result
1, John, M, 1970-01-02, 190.0, 75.0, True
2, Mary, F, 1980-02-03, 155.0, 40.0, True
3, Luke, M, 1990-03-04, 180.0, 60.0, False

(If you prefer, you can enter the data into a spreadsheet and use the spreadsheet's save-as-CSV functionality.) Now use FrameTable's static FromCsv method to import the data:

using System;
using System.IO;
using Meta.Numerics.Data;

FrameTable data;
using (TextReader reader = File.OpenText("test.csv")) {
    data = FrameTable.FromCsv(reader);
}

Console.WriteLine($"Imported CSV file with {data.Rows.Count} rows.");
Console.WriteLine("The names and types of the columns are:");
foreach (FrameColumn column in data.Columns) {
    Console.WriteLine($"  {column.Name} of type {column.StorageType}");
}

Notice that the name of each column was read from the first row and the type of each column was inferred from the text.

What if my CSV is at a web endpoint?

That's only a little bit more complicated. Here is some code that fetches the well-known Titanic data set into a frame table.

using System.Net;

FrameTable titanic;
Uri url = new Uri("https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv");
WebRequest request = WebRequest.Create(url);
using (WebResponse response = request.GetResponse()) {
    using (StreamReader reader = new StreamReader(response.GetResponseStream())) {
        titanic = FrameTable.FromCsv(reader);
    }
}

Note that this CSV is significantly larger and more complicated than in our previous example, and successfully parsing it indicates that Meta.Numerics can successfully deal with escaped commas, missing values, and other issues.

Why do I have to handle the file-reading a web-requesting?

I wish we could provide overloads that handled this for you, but these APIs are not part of .NET Standard 1.1.

What about JSON data?

Use a JSON deserializer to produce a collection of dictionaries, then use FrameTable's FromDictionaries method. Here is an example that gets some JSON data from the web, deserializes it using the popular Newtonsoft.Json library, and creates a frame table from the output, all in just a few lines of code.

using System.Collections.Generic;
using Newtonsoft.Json;

Uri jsonUrl = new Uri("https://raw.githubusercontent.com/dcwuser/metanumerics/master/Examples/Data/example.json");
WebClient client = new WebClient();
string input = client.DownloadString(jsonUrl);
List<Dictionary<string,object>> output = JsonConvert.DeserializeObject<List<Dictionary<string,object>>>(input);
FrameTable jsonExample = FrameTable.FromDictionaries(output);

This also illustrates that you can use WebClient instead of WebRequest to get data from a web endpoint.

Why do I have to go get a JSON parser?

We didn't want to write our own JSON parser (others have done that job better than we could), nor did we want Meta.Numerics to depend on any particular JSON parsing package (that causes endless versioning issues).

What about nulls?

Let's edit our example CSV file to leave one (or more) of the cells empty:

Id, Name, Sex, Birthdate, Height, Weight, Result
1, John, M, 1970-01-02, 190.0, 75.0, True
2, Mary, F, 1980-02-03, 155.0,     , True
3,     , M, 1990-03-04, 180.0, 60.0, False

Now re-run the same code we wrote above to import test.csv. When Meta.Numerics imports the modified file, the values in the missing cells will be null. Columns with structure types like double and missing values will be Nullable<T> instead of T. (Columns with reference types like string dont't need to change their column types to support null values.) So Meta.Numerics.Data handles nulls gracefully for all types of data in a way that integrates seamlessly with the .NET Framework's Nullable system.

How do I construct a data table programmatically?

Use the AddColumn and AddRow methods to define a schema and add rows. Here is a programmatic reconstruction of our test data set (with missing values):

// Define the schema.
FrameTable table = new FrameTable();
table.AddColumn<int>("Id");
table.AddColumn<string>("Name");
table.AddColumn<string>("Sex");
table.AddColumn<DateTime>("Birthdate");
table.AddColumn<double>("Height");
table.AddColumn<double?>("Weight");
table.AddColumn<bool>("Result");
            
// Add rows using as arrays of objects.
table.AddRow(1, "John", "M", DateTime.Parse("1970-01-02"), 190.0, 75.0, true);
table.AddRow(2, "Mary", "F", DateTime.Parse("1980-02-03"), 155.0, null, true);

// Add a row using a dictionary. This is more verbose, but very clear.
table.AddRow(new Dictionary<string,object>(){
    {"Id", 3},
    {"Name", null},
    {"Sex", "M"},
    {"Birthdate", DateTime.Parse("1990-03-04")},
    {"Height", 180.0},
    {"Weight", 60.0},
    {"Result", false}
});

What's Next?

Now that you have some frame-tables full of data, learn how to manipulate them by reading Manipulating Data.

Home

Clone this wiki locally