Skip to content

How to use

blue-monk edited this page Oct 6, 2021 · 2 revisions

Table of Contents

Prelude

Show version

$ ./csvdiff.py --version

Get Help

$ ./csvdiff.py -h

Usages

From here, we will show some examples of execution in the appendix/csv_samples/ directory using sample data.

Sample data

Suppose the keys are the 0th column and the 2nd column.

  • sample_lhs.csv

    head1, head2, head3, head4, head5
    key1-2, value1-2, key2-2, value2-2, 20201224T035908
    key1-3, value1-3, key2-3, value2-3, 20201224T180527
    key1-4, value1-4, key2-4, value2-4, 20201225T104851
    key1-5, value1-5, key2-5, value2-5, 20201225T142142
    
  • sample_rhs.csv

    head1, head2, head3, head4, head5
    key1-1, value1-1, key2-1, value2-1, 20210108T142358
    key1-2, value1-3, key2-2, value2-z, 20210108T174216
    key1-4, value1-4, key2-4, value2-4, 20210109T090245
    key1-5, value1-v, key2-5, value2-5, 20210109T111231
    

🌴 Usage 1: Minimum required arguments

The minimum required arguments are:

  • Two files to compare
  • Key column index (-k option ( --matching-keys))
    • Index is 0 based
    • Multiple columns can be specified separated by commas
    • In this example, index 0 and index 2 are specified
    • If the key consists of only the 0th column, you don't even need to specify the -k option

As a result, only the number of differences and the line number is displayed.

$ ../../src/csvdiff2/csvdiff.py sample_lhs.csv sample_rhs.csv -k 0,2

============ Report ============

● Count & Row number
same lines           : 0
left side only    (<): 1 :-- Row Numbers      -->: [3]
right side only   (>): 1 :-- Row Numbers      -->: [2]
with differences  (!): 3 :-- Row Number Pairs -->: [(2, 3), (4, 4), (5, 5)]

Report description

heading description
same lines Number of lines that exist in both files and have the same content
left side only Number of lines that existed only in the left-hand file, and their line numbers
right side only Number of lines that existed only in the right-hand file, and their line numbers
with differences Number of lines that exist in both files but have different contents, and their line number pairs

Caution

If the key is a number without zero padding, you need to specify the number of digits after the colon (:). For example, if the column at index 0 is a number with up to 6 digits, specify as follows.

$ ../../src/csvdiff2/csvdiff.py sample_lhs.csv sample_rhs.csv -k 0:6,2

🌴 Usage 2: Show the contents of lines with differences

To view the contents of different lines, Use the -d (--show-difference-only) option.

$ ../../src/csvdiff2/csvdiff.py sample_lhs.csv sample_rhs.csv -k 0,2 -d

============ Report ============

● Differences
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
sample_lhs.csv                                                        sample_rhs.csv                                                       Column indices with difference
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                                                   >  2 ['key1-1', 'value1-1', 'key2-1', 'value2-1', '20210108T142358']
2 ['key1-2', 'value1-2', 'key2-2', 'value2-2', '20201224T035908']  !  3 ['key1-2', 'value1-3', 'key2-2', 'value2-z', '20210108T174216']  @ [1, 3, 4]
3 ['key1-3', 'value1-3', 'key2-3', 'value2-3', '20201224T180527']  <
4 ['key1-4', 'value1-4', 'key2-4', 'value2-4', '20201225T104851']  !  4 ['key1-4', 'value1-4', 'key2-4', 'value2-4', '20210109T090245']  @ [4]
5 ['key1-5', 'value1-5', 'key2-5', 'value2-5', '20201225T142142']  !  5 ['key1-5', 'value1-v', 'key2-5', 'value2-5', '20210109T111231']  @ [1, 4]
  • Differences are indicated by the following DIFF-MARKs

    • ! : There is a difference
    • < : Exists only on the left side
    • > : Exists only on the right side
  • The number displayed before each CSV line data is the line number of the actual file

    • line number is 1 based
  • For rows with differences, the column indices with differences will be displayed after @

    • column index is 0 based

If you also want to see the number of differences, specify the -c option (--show-count).

$ ../../src/csvdiff2/csvdiff.py sample_lhs.csv sample_rhs.csv -k 0,2 -dc

============ Report ============

● Differences
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
sample_lhs.csv                                                        sample_rhs.csv                                                       Column indices with difference
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                                                   >  2 ['key1-1', 'value1-1', 'key2-1', 'value2-1', '20210108T142358']
2 ['key1-2', 'value1-2', 'key2-2', 'value2-2', '20201224T035908']  !  3 ['key1-2', 'value1-3', 'key2-2', 'value2-z', '20210108T174216']  @ [1, 3, 4]
3 ['key1-3', 'value1-3', 'key2-3', 'value2-3', '20201224T180527']  <
4 ['key1-4', 'value1-4', 'key2-4', 'value2-4', '20201225T104851']  !  4 ['key1-4', 'value1-4', 'key2-4', 'value2-4', '20210109T090245']  @ [4]
5 ['key1-5', 'value1-5', 'key2-5', 'value2-5', '20201225T142142']  !  5 ['key1-5', 'value1-v', 'key2-5', 'value2-5', '20210109T111231']  @ [1, 4]

● Count & Row number
same lines           : 0
left side only    (<): 1 :-- Row Numbers      -->: [3]
right side only   (>): 1 :-- Row Numbers      -->: [2]
with differences  (!): 3 :-- Row Number Pairs -->: [(2, 3), (4, 4), (5, 5)]

🌴 Usage 3: Ignore columns

Try specifying the columns you don't want to compare, using the -i option (--ignore-columns). You can specify multiple columns separated by commas. In this example, let's ignore the column at index 4.

$ ../../src/csvdiff2/csvdiff.py sample_lhs.csv sample_rhs.csv -k 0,2 -dc -i 4

============ Report ============

● Differences
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
sample_lhs.csv                                                        sample_rhs.csv                                                       Column indices with difference
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                                                   >  2 ['key1-1', 'value1-1', 'key2-1', 'value2-1', '20210108T142358']
2 ['key1-2', 'value1-2', 'key2-2', 'value2-2', '20201224T035908']  !  3 ['key1-2', 'value1-3', 'key2-2', 'value2-z', '20210108T174216']  @ [1, 3]
3 ['key1-3', 'value1-3', 'key2-3', 'value2-3', '20201224T180527']  <
5 ['key1-5', 'value1-5', 'key2-5', 'value2-5', '20201225T142142']  !  5 ['key1-5', 'value1-v', 'key2-5', 'value2-5', '20210109T111231']  @ [1]

● Count & Row number
same lines           : 1
left side only    (<): 1 :-- Row Numbers      -->: [3]
right side only   (>): 1 :-- Row Numbers      -->: [2]
with differences  (!): 2 :-- Row Number Pairs -->: [(2, 3), (5, 5)]

🌴 Usage 4: Show all lines

To show all lines, including lines with no differences, use the -a option (--show-all-lines).

$ ../../src/csvdiff2/csvdiff.py sample_lhs.csv sample_rhs.csv -k 0,2 -ac -i 4

============ Report ============

● All
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
sample_lhs.csv                                                        sample_rhs.csv                                                       Column indices with difference
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                                                   >  2 ['key1-1', 'value1-1', 'key2-1', 'value2-1', '20210108T142358']
2 ['key1-2', 'value1-2', 'key2-2', 'value2-2', '20201224T035908']  !  3 ['key1-2', 'value1-3', 'key2-2', 'value2-z', '20210108T174216']  @ [1, 3]
3 ['key1-3', 'value1-3', 'key2-3', 'value2-3', '20201224T180527']  <
4 ['key1-4', 'value1-4', 'key2-4', 'value2-4', '20201225T104851']     4 ['key1-4', 'value1-4', 'key2-4', 'value2-4', '20210109T090245']
5 ['key1-5', 'value1-5', 'key2-5', 'value2-5', '20201225T142142']  !  5 ['key1-5', 'value1-v', 'key2-5', 'value2-5', '20210109T111231']  @ [1]

● Count & Row number
same lines           : 1
left side only    (<): 1 :-- Row Numbers      -->: [3]
right side only   (>): 1 :-- Row Numbers      -->: [2]
with differences  (!): 2 :-- Row Number Pairs -->: [(2, 3), (5, 5)]

Lines with no differences ('same lines') do not have a DIFF-MARK.


🌴 Usage 5: In vertical style

Let's try to display the report vertically. Use the -v option (--vertical-style).

$ ../../src/csvdiff2/csvdiff.py sample_lhs.csv sample_rhs.csv -k 0,2 -ac -i 4 -v

============ Report ============

● All
--------------------------------------------------------------------------------
L sample_lhs.csv
R sample_rhs.csv
--------------------------------------------------------------------------------
> R 2 ['key1-1', 'value1-1', 'key2-1', 'value2-1', '20210108T142358']
! @ [1, 3]
  L 2 ['key1-2', 'value1-2', 'key2-2', 'value2-2', '20201224T035908']
  R 3 ['key1-2', 'value1-3', 'key2-2', 'value2-z', '20210108T174216']
< L 3 ['key1-3', 'value1-3', 'key2-3', 'value2-3', '20201224T180527']
=
  L 4 ['key1-4', 'value1-4', 'key2-4', 'value2-4', '20201225T104851']
  R 4 ['key1-4', 'value1-4', 'key2-4', 'value2-4', '20210109T090245']
! @ [1]
  L 5 ['key1-5', 'value1-5', 'key2-5', 'value2-5', '20201225T142142']
  R 5 ['key1-5', 'value1-v', 'key2-5', 'value2-5', '20210109T111231']

● Count & Row number
same lines           : 1
left side only    (<): 1 :-- Row Numbers      -->: [3]
right side only   (>): 1 :-- Row Numbers      -->: [2]
with differences  (!): 2 :-- Row Number Pairs -->: [(2, 3), (5, 5)]
  • Differences are indicated by the following DIFF-MARKs
    • = : No difference
    • ! : There is a difference
    • < : Exists only on the left side
    • > : Exists only on the right side
  • Unlike the horizontal report, the DIFF-MARK (=) is also displayed on the lines where there is no difference
  • The L mark represents the first specified file, and the R mark represents the next specified file
  • For rows with differences, the column indexes with differences are displayed after @

🌴 Usage 6: Treat key items as unique

Use the -u option (--unique-key) if you want to detect errors in key columns that should be unique but are not. Without this option, the comparison process will be performed as is, On the other hand, If this option is specified, processing will end when duplicates are detected.


🌴 Other usages

If the result is not displayed normally, You can check the status with the -x option (--show-context-from-arguments).

$ ../../src/csvdiff2/csvdiff.py sample_lhs.csv sample_rhs.csv -k 0,2 -ac -i 4 -x

============ Report ============

● Context
File Path on the Left-Hand Side: /path/to/sample_lhs.csv
File Path on the Right-Hand Side : /path/to/sample_rhs.csv
Matching Key Indices: [MatchingKeyInfo(0, '<not specified>'), MatchingKeyInfo(2, '<not specified>')]
Matching Key Is Unique?: False
Column Indices to Ignore: [4]
with Header?: True
Report Style: Two facing (Horizontal)
Show Count?: True
Show Difference Only?: False
Show All?: True
Show Context?: True
CSV Sniffing Size: 4096
--- csv analysis conditions ---
Forces Individual Specified Conditions?: False
column_separator_for_lhs: ,
column_separator_for_rhs: ,
line_separator_for_lhs: 0d0a
line_separator_for_rhs: 0d0a
quote_char_for_lhs: "
quote_char_for_rhs: "
skips_space_after_column_separator_for_lhs: True
skips_space_after_column_separator_for_rhs: True

● All
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
sample_lhs.csv                                                        sample_rhs.csv                                                       Column indices with difference
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                                                   >  2 ['key1-1', 'value1-1', 'key2-1', 'value2-1', '20210108T142358']
2 ['key1-2', 'value1-2', 'key2-2', 'value2-2', '20201224T035908']  !  3 ['key1-2', 'value1-3', 'key2-2', 'value2-z', '20210108T174216']  @ [1, 3]
3 ['key1-3', 'value1-3', 'key2-3', 'value2-3', '20201224T180527']  <
4 ['key1-4', 'value1-4', 'key2-4', 'value2-4', '20201225T104851']     4 ['key1-4', 'value1-4', 'key2-4', 'value2-4', '20210109T090245']
5 ['key1-5', 'value1-5', 'key2-5', 'value2-5', '20201225T142142']  !  5 ['key1-5', 'value1-v', 'key2-5', 'value2-5', '20210109T111231']  @ [1]

● Count & Row number
same lines           : 1
left side only    (<): 1 :-- Row Numbers      -->: [3]
right side only   (>): 1 :-- Row Numbers      -->: [2]
with differences  (!): 2 :-- Row Number Pairs -->: [(2, 3), (5, 5)]