Skip to content

Commit

Permalink
update README
Browse files Browse the repository at this point in the history
  • Loading branch information
streamnsight committed Jun 10, 2022
1 parent 93a9084 commit 8cc7484
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Generate Fake Credit Card Transaction Data, Including Fraudulent Transactions

Note: Version v1.0 behavior has changed in such a way that it runs much faster, however transaction files are chunked, so that several files get generated per profile. If your downstream process expects 1 file per profile, please checkout the v0.5 release branch `release/v0.5`.

## General Usage

In this version, the general usage has changed:
Expand Down Expand Up @@ -27,7 +29,7 @@ This version is modified from the version v0.5 to parallelize the work using `mu

Because of the way it parallelize the work (chunking transaction generation by chunking the customer list), there will be multiple transaction files generated per profile. Also not that if the number of customers is small, there may be empty files (i.e. files where no customer in the chunk matched the profile). This is expected.

With standard profiles, it was benchmarked as generating ~95MB/thread/min. With a 64cores/128threads AMD E3, I was able to generate 1.4TB of data, 4.5B transactions, in just under 2h, as opposed to days when running the previous version.
With standard profiles, it was benchmarked as generating ~95MB/thread/min. With a 64 cores/128 threads AMD E3, I was able to generate 1.4TB of data, 4.5B transactions, in just under 2h, as opposed to days when running the previous versions.

The generation code is originally based on code by [Josh Plotkin](https://github.com/joshplotkin/data_generation). Change log of modifications to original code are below.

Expand Down

0 comments on commit 8cc7484

Please sign in to comment.