csv2wiki: An open source program to convert rows in a CSV file to pages in a wiki.
csv2wiki requires Python 3. So far, only MediaWiki (v1.28.2 or later) is supported, but it would not be hard to extend this to support other wikis.
Basic usage from current directory:
$ python3 -m csv2wiki -c CONFIG_FILE --csv=CSV [OPTIONS]
Or run from a different directory by using PYTHONPATH:
$ PYTHONPATH=<install_dir> python3 -m csv2wiki ...etc...
You can also just run csv2wiki
, after installing it as a package:
$ pip3 install -e .
Do python3 -m csv2wiki --help
to see complete usage. Summary:
You create a config file that is specific to the particular CSV and destination wiki. The config file contains various parameters about the conversion: wiki URL and login information, which columns in the CSV should be included, a template for naming the resultant wiki pages, etc. Then you run the script at the command line, passing the config file with the -c option and the CSV file as an argument.
The main thing to pay attention to is the sec_map
field in the
config file. That field's value is essentially a small
domain-specific language
(DSL) for
mapping the flat structure of a CSV's columns to the nested structure
of sections, subsections, etc in a wiki page. The documentation (see
csv2wiki --help
) for sec_map
is thorough; we recommend that while
reading it you also have at hand an example value for the field, such
as the one in this
sample config file in
the
MacArthur repository.
See the bug tracker for known issues; pull requests are welcome.
csv2wiki is free software, distributed under the GNU Affero General Public License version 3.
Accompanying csv2wiki are two helper programs:
-
find-unique-columns helps you quickly figure out which columns (or combinations of columns) offer unique values across all rows in the spreadsheet. Run
$ python3 find-unique-columns --help
to see usage. For example, if you can run it on the accompanying test data spreadsheet, test-input.csv, like this
$ python3 find-unique-columns -g 2,4,6 -g 3,4 -g 6,1 -g 2,6 -g 3,6 -g 2,3 -s "-" test-input.csv
the output will show all individually unique columns, the three unique combinations of columns (unique when the separator is included, that is) from among the six combinations requested, and the maximum cell length found across all rows for each column represented:
Individual columns that are unique across all rows: 1. (max len: 4) Identifying Number 5. (max len: 8) Ridiculously Unique Random String Unique combination: 2-3 2. (max len: 13) Not-Quite-Unique Name 3. (max len: 15) Non-Unique Animal Unique combination: 2-4-6 2. (max len: 13) Not-Quite-Unique Name 4. (max len: 11) Non-Unique Vegetable 6. (max len: 28) Something That's The Same In Every Row Unique combination: 6-1 6. (max len: 28) Something That's The Same In Every Row 1. (max len: 4) Identifying Number
-
mwiki-sak The "MediaWiki Swiss Army Knife". This offers command-line-based programmatic access to MediaWiki (using the MediaWiki API). As of this writing, it offers the ability to list all pages in the wiki and to delete pages by name. Run
$ python3 mwiki-sak --help
for more information.
-
csvkit is a great suite of tools for manipulating (chopping, filtering, joining, etc) CSV files.
-
xsv, written in Rust, is also "a command line program for indexing, slicing, analyzing, splitting and joining CSV files."
-
csv-scope is for quick exploration of a CSV file.
-
tsv-utils (code) is eBay's command line toolset for working with (filtering, sampling, statistics, joins, etc) large tabular data files. TSV implies CSV, since you can specify any delimiter you want -- it doesn't have to be TAB.
-
csvs-to-sqlite turns a CSV file into an SQL database.