> cat people.csv # These are just some sample data. Lines with comments start # with a hash or with semicolon ; The first non-empty line that is not a comment contains names ; of the columns in the file. The other non-empty lines contain ; the data. id,name,age,sallary,department 1,John Doe,52,2500,sales 2,Jane Doe,49,2300,marketing 3,Foo Bar,35,3000,it 4,Enoch Root,70,4000,it > cat departments.csv: # These are some sample departments id,name sales,Sales department marketing,Marketing guys and girls it,IT Crowd > cat data.csv | where 'age < 50' | join departments.csv as dep | select name, sallary, dep.name Jane Doe,2300,Marketing guys and girls Foo Bar,3000,IT Crows > cat data.csv | aggregate department, avg(sallary) sales,2500 marketing,2300 it,3500
Do you use CVS files as a part of your workflow? If your workflow is automated, you probably know that any changed to the source of the data is a nightmare. Adding or removing columns leads to updating all scripts.
CSVql is designed for situations, where you only need one-time processing, such as when the data being processed are not the final data and need not to be stored. The queries are performed only once, for example to generate reports. In this case, the benefits of using a databse do not apply, as most of the processing is only done once. However, using a database makes the workflow dependent on the database server being installed and configured properly wherever the workflow is used.
While CSVql replicates some functionality of standard Unix tools such as grep, cut or paste, CSVql-bases scripts are easier to use and maintain thanks to its column-based design. All the standard Unix tools use numbers to address the columns. This is fine, until you change the columns in the original source of data.
-
Perl 5.8
-
Text::CSV
-
Natural Docs for building documentation