-
Notifications
You must be signed in to change notification settings - Fork 27
Importing Data
Meta.Numerics.Data allows you to import data from comma-separated-value (CSV) format, JSON dictionary format, or programmatically via direct manipulation of table objects. This tutorial explains how.
First, let's make a CSV file to import. Copy and paste the following text into a file named test.csv in your program's working directory:
Id, Name, Sex, Birthdate, Height, Weight, Result
1, John, M, 1970-01-02, 190.0, 75.0, True
2, Mary, F, 1980-02-03, 155.0, 40.0, True
3, Luke, M, 1990-03-04, 180.0, 60.0, False
(If you prefer, you can enter the data into a spreadsheet and use the spreadsheet's save-as-CSV functionality.) Now use FrameTable's static FromCsv method to import the data:
using System;
using System.IO;
using Meta.Numerics.Data;
FrameTable data;
using (TextReader reader = File.OpenText("test.csv")) {
data = FrameTable.FromCsv(reader);
}
Console.WriteLine($"Imported CSV file with {data.Rows.Count} rows.");
Console.WriteLine("The names and types of the columns are:");
foreach (FrameColumn column in data.Columns) {
Console.WriteLine($" {column.Name} of type {column.StorageType}");
}
Notice that the name of each column was read from the first row and the type of each column was inferred from the text.
That's only a little bit more complicated. Here is some code that fetches the well-known Titanic data set into a frame table.
using System.Net;
FrameTable titanic;
Uri url = new Uri("https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv");
WebRequest request = WebRequest.Create(url);
using (WebResponse response = request.GetResponse()) {
using (StreamReader reader = new StreamReader(response.GetResponseStream())) {
titanic = FrameTable.FromCsv(reader);
}
}
Note that this CSV is significantly larger and more complicated than in our previous example, and successfully parsing it indicates that Meta.Numerics can successfully deal with escaped commas, missing values, and other issues.
I wish we could provide overloads that handled this for you, but these APIs are not part of .NET Standard 1.1.
Use a JSON deserializer to produce a collection of dictionaries, then use FrameTable's FromDictionaries method. Here is an example that gets some JSON data from the web, deserializes it using the popular Newtonsoft.Json library, and creates a frame table from the output, all in just a few lines of code.
using System.Collections.Generic;
using Newtonsoft.Json;
Uri jsonUrl = new Uri("https://raw.githubusercontent.com/dcwuser/metanumerics/master/Examples/Data/example.json");
WebClient client = new WebClient();
string input = client.DownloadString(jsonUrl);
List<Dictionary<string,object>> output = JsonConvert.DeserializeObject<List<Dictionary<string,object>>>(input);
FrameTable jsonExample = FrameTable.FromDictionaries(output);
This also illustrates that you can use WebClient instead of WebRequest to get data from a web endpoint.
We didn't want to write our own JSON parser (others have done that job better than we could), nor did we want Meta.Numerics to depend on any particular JSON parsing package (that causes endless versioning issues).
Let's edit our example CSV file to leave one (or more) of the cells empty:
Id, Name, Sex, Birthdate, Height, Weight, Result
1, John, M, 1970-01-02, 190.0, 75.0, True
2, Mary, F, 1980-02-03, 155.0, , True
3, , M, 1990-03-04, 180.0, 60.0, False
Now re-run the same code we wrote above to import test.csv. When Meta.Numerics imports the modified file, the values in the missing cells will be null. Columns with structure types like double and missing values will be Nullable<T> instead of T. (Columns with reference types like string dont't need to change their column types to support null values.) So Meta.Numerics.Data handles nulls gracefully for all types of data in a way that integrates seamlessly with the .NET Framework's Nullable system.
Use the AddColumn and AddRow methods to define a schema and add rows. Here is a programmatic reconstruction of our test data set (with missing values):
// Define the schema.
FrameTable table = new FrameTable();
table.AddColumn<int>("Id");
table.AddColumn<string>("Name");
table.AddColumn<string>("Sex");
table.AddColumn<DateTime>("Birthdate");
table.AddColumn<double>("Height");
table.AddColumn<double?>("Weight");
table.AddColumn<bool>("Result");
// Add rows using as arrays of objects.
table.AddRow(1, "John", "M", DateTime.Parse("1970-01-02"), 190.0, 75.0, true);
table.AddRow(2, "Mary", "F", DateTime.Parse("1980-02-03"), 155.0, null, true);
// Add a row using a dictionary. This is more verbose, but very clear.
table.AddRow(new Dictionary<string,object>(){
{"Id", 3},
{"Name", null},
{"Sex", "M"},
{"Birthdate", DateTime.Parse("1990-03-04")},
{"Height", 180.0},
{"Weight", 60.0},
{"Result", false}
});
Now that you have some frame-tables full of data, learn how to manipulate them by reading Manipulating Data.
- Project
- What's New
- Installation
- Versioning
- Tutorials
- Functions
- Compute a Special Function
- Bessel Functions
- Solvers
- Evaluate An Integral
- Find a Maximum or Minimum
- Solve an Equation
- Integrate a Differential Equation
- Data Wrangling
- Statistics
- Analyze a Sample
- Compare Two Samples
- Simple Linear Regression
- Association
- ANOVA
- Contingency Tables
- Multiple Regression
- Logistic Regression
- Cluster and Component Analysis
- Time Series Analysis
- Fit a Sample to a Distribution
- Distributions
- Special Objects
- Linear Algebra
- Polynomials
- Permutations
- Partitions
- Uncertain Values
- Extended Precision
- Functions