NAME

Catmandu::Stat - Catmandu modules for working with statistical data

SYNOPSIS

# Calculate statistics on the availabity of the ISBN fields in the dataset
cat data.json | catmandu convert JSON to Stat --fields isbn

# Preprocess data and calculate statistics
catmandu convert MARC to Stat --fix 'marc_map(020a,isbn)' --fields isbn < data.mrc

# Or in fix files

# Calculate the mean of foo. E.g. foo => [1,2,3,4]
stat_mean(foo)  # foo => '2.5'

# Calculate the median of foo. E.g. foo => [1,2,3,4]
stat_median(foo)  # foo => '2.5'

# Calculate the standard deviation of foo. E.g. foo => [1,2,3,4]
stat_stddev(foo)  # foo => '1.12'

# Calculate the variance of foo. E.g. foo => [1,2,3,4]
stat_variance(foo)  # foo => '1.25'

MODULES

EXAMPLES

The Catmandu::Stat distribution includes a CSV file on the Sacramento crime rate in January 2006, "t/SacramentocrimeJanuary2006.csv" also available at http://samplecsvs.s3.amazonaws.com/SacramentocrimeJanuary2006.csv

To view statistics on the fields available in this file type:

$ catmandu convert CSV to Stat < t/SacramentocrimeJanuary2006.csv

| name          | count | zeros | zeros% | min | max | mean | variance | stdev | uniq~ | uniq% | entropy   |
|---------------|-------|-------|--------|-----|-----|------|----------|-------|-------|-------|-----------|
| #             | 7584  |       |        |     |     |      |          |       |       |       |           |
| address       | 7584  | 0     | 0.0    | 1   | 1   | 1    | 0.0      | 0.0   | 5425  | 71.5  | 12.4/12.4 |
| beat          | 7584  | 0     | 0.0    | 1   | 1   | 1    | 0.0      | 0.0   | 20    | 0.3   | 4.3/12.9  |
| cdatetime     | 7584  | 0     | 0.0    | 1   | 1   | 1    | 0.0      | 0.0   | 5071  | 66.9  | 12.3/12.3 |
| crimedescr    | 7584  | 0     | 0.0    | 1   | 1   | 1    | 0.0      | 0.0   | 305   | 4.0   | 5.6/12.6  |
| district      | 7584  | 0     | 0.0    | 1   | 1   | 1    | 0.0      | 0.0   | 6     | 0.1   | 2.6/12.9  |
| grid          | 7584  | 0     | 0.0    | 1   | 1   | 1    | 0.0      | 0.0   | 537   | 7.1   | 7.8/9.9   |
| latitude      | 7584  | 0     | 0.0    | 1   | 1   | 1    | 0.0      | 0.0   | 5288  | 69.7  | 12.4/12.4 |
| longitude     | 7584  | 0     | 0.0    | 1   | 1   | 1    | 0.0      | 0.0   | 5295  | 69.8  | 12.4/12.4 |
| ucr_ncic_code | 7584  | 0     | 0.0    | 1   | 1   | 1    | 0.0      | 0.0   | 88    | 1.2   | 4.1/12.9  |

The file has 7584 rows where and all the fields address to ucr_ncic_code contain values. Each field has only one value (no arrays available in the CSV file). The are 5492 unique addresses in the CSV file. The district field has the lowest entropy, most of its values are shared among many rows.

AUTHOR

Patrick Hochstenbach, <patrick.hochstenbach at ugent.be>

LICENSE AND COPYRIGHT

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

See http://dev.perl.org/licenses/ for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
lib/Catmandu		lib/Catmandu
t		t
.gitignore		.gitignore
Build.PL		Build.PL
Changes		Changes
LICENSE		LICENSE
MANIFEST.SKIP		MANIFEST.SKIP
META.json		META.json
README.md		README.md
cpanfile		cpanfile
dist.ini		dist.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NAME

SYNOPSIS

MODULES

EXAMPLES

SEE ALSO

AUTHOR

LICENSE AND COPYRIGHT

About

Releases

Packages

Contributors 4

Languages

License

LibreCat/Catmandu-Stat

Folders and files

Latest commit

History

Repository files navigation

NAME

SYNOPSIS

MODULES

EXAMPLES

SEE ALSO

AUTHOR

LICENSE AND COPYRIGHT

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages