Skip to content

Commit

Permalink
make release-tag: Merge branch 'master' into stable
Browse files Browse the repository at this point in the history
  • Loading branch information
csala committed Nov 13, 2020
2 parents 2d79e8e + 09a6704 commit 6d0343a
Show file tree
Hide file tree
Showing 13 changed files with 338 additions and 95 deletions.
13 changes: 4 additions & 9 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,15 +1,10 @@
# Config file for automatic testing at travis-ci.org
dist: trusty
dist: bionic
language: python
python:
- 3.8
- 3.7
- 3.6
- 3.5

matrix:
include:
- python: 3.7
dist: xenial
sudo: required

# Command to install dependencies
install: pip install -U tox-travis codecov
Expand All @@ -28,5 +23,5 @@ deploy:
local-dir: docs/_build/html
target-branch: gh-pages
on:
branch: master
branch: stable
python: 3.6
24 changes: 24 additions & 0 deletions HISTORY.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,29 @@
# History

## v0.2.2 - 2020-11-13

In this release we introduce several minor improvements to make CTGAN more versatile and
propertly support new types of data, such as categorical NaN values, as well as conditional
sampling and features to save and load models.

Additionally, the dependency ranges and python versions have been updated to support up
to date runtimes.

Many thanks @fealho @leix28 @csala @oregonpillow and @lurosenb for working on making this release possible!

### Improvements

* Drop Python 3.5 support - [Issue #79](https://github.com/sdv-dev/CTGAN/issues/79) by @fealho
* Support NaN values in categorical variables - [Issue #78](https://github.com/sdv-dev/CTGAN/issues/78) by @fealho
* Sample synthetic data conditioning on a discrete column - [Issue #69](https://github.com/sdv-dev/CTGAN/issues/69) by @leix28
* Support recent versions of pandas - [Issue #57](https://github.com/sdv-dev/CTGAN/issues/57) by @csala
* Easy solution for restoring original dtypes - [Issue #26](https://github.com/sdv-dev/CTGAN/issues/26) by @oregonpillow

### Bugs fixed

* Loss to nan - [Issue #73](https://github.com/sdv-dev/CTGAN/issues/73) by @fealho
* Swapped the sklearn utils testing import statement - [Issue #53](https://github.com/sdv-dev/CTGAN/issues/53) by @lurosenb

## v0.2.1 - 2020-01-27

Minor version including changes to ensure the logs are properly printed and
Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ install-test: clean-build clean-pyc ## install the package and test dependencies

.PHONY: test
test: ## run tests quickly with the default Python
python -m pytest --basetemp=${ENVTMPDIR} --cov=ctgan
python -m pytest --cov=ctgan

.PHONY: lint
lint: ## check style with flake8 and isort
Expand Down
55 changes: 43 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,9 @@
<i>An open source project from Data to AI Lab at MIT.</i>
</p>

[![Development Status](https://img.shields.io/badge/Development%20Status-2%20--%20Pre--Alpha-yellow)](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha)
[![PyPI Shield](https://img.shields.io/pypi/v/ctgan.svg)](https://pypi.python.org/pypi/ctgan)
[![Travis CI Shield](https://travis-ci.org/sdv-dev/CTGAN.svg?branch=master)](https://travis-ci.org/sdv-dev/CTGAN)
[![Travis CI Shield](https://travis-ci.com/sdv-dev/CTGAN.svg?branch=master)](https://travis-ci.com/sdv-dev/CTGAN)
[![Downloads](https://pepy.tech/badge/ctgan)](https://pepy.tech/project/ctgan)
[![Coverage Status](https://codecov.io/gh/sdv-dev/CTGAN/branch/master/graph/badge.svg)](https://codecov.io/gh/sdv-dev/CTGAN)

Expand All @@ -15,7 +16,7 @@ Implementation of our NeurIPS paper [Modeling Tabular data using Conditional GAN
CTGAN is a GAN-based data synthesizer that can generate synthetic tabular data with high fidelity.

* License: [MIT](https://github.com/sdv-dev/CTGAN/blob/master/LICENSE)
* Documentation: https://sdv-dev.github.io/CTGAN
* Development Status: [Pre-Alpha](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha)
* Homepage: https://github.com/sdv-dev/CTGAN

## Overview
Expand All @@ -36,7 +37,7 @@ we develop a new model called CTGAN. Several major differences make CTGAN outper

## Requirements

**CTGAN** has been developed and tested on [Python 3.5, 3.6 and 3.7](https://www.python.org/downloads/)
**CTGAN** has been developed and tested on [Python 3.6, 3.7 and 3.8](https://www.python.org/downloads/)

## Install from PyPI

Expand All @@ -49,7 +50,7 @@ pip install ctgan
This will pull and install the latest stable release from [PyPI](https://pypi.org/).

If you want to install from source or contribute to the project please read the
[Contributing Guide](https://sdv-dev.github.io/CTGAN/contributing.html#get-started).
[Contributing Guide](CONTRIBUTING.rst).

# Data Format

Expand Down Expand Up @@ -171,22 +172,46 @@ data generated by the model.
| 20.9853 | Private | 120637 | ... | 40.0238 | United-States | <=50K |
| ... | ... | ... | ... | ... | ... | ... |

## 3. Generate synthetic data conditioning on one column

**NOTE**: CTGAN does not distinguish between float and integer columns, which means that it will
sample float values in all cases. If integer values are required, the outputted float values
must be rounded to integers in a later step, outside of CTGAN.
In the CTGAN model, we have a conditional vector. By setting the conditional vector, we increase
the probability of getting one value in one discrete column.

For example, the following code **increase the probability** of workclass = " Private".

```python
samples = ctgan.sample(1000, 'workclass', ' Private')
```

**Note that this code does not guarante workclass=" Private"**

## 4. Save and load the synthesizer

To save a trained ctgan synthesizer, use

```python
ctgan.save(path_to_a_folder)
```

To restore a saved synthesizer, use

```python
ctgan = CTGANSynthesizer()
ctgan.fit(data, discrete_columns, epochs=0, load_path=path_to_a_folder)
```

**Please make sure the saved model and the loaded model are for the
same dataset.**

# Join our community

1. If you would like to try more dataset examples, please have a look at the [examples folder](
https://github.com/sdv-dev/CTGAN/tree/master/examples) of the repository. Please contact us
if you have a usage example that you would want to share with the community.
2. If you want to contribute to the project code, please head to the [Contributing Guide](
https://sdv-dev.github.io/CTGAN/contributing.html#get-started) for more details about how to do it.
CONTRIBUTING.rst) for more details about how to do it.
3. If you have any doubts, feature requests or detect an error, please [open an issue on github](
https://github.com/sdv-dev/CTGAN/issues)
4. Also do not forget to check the [project documentation site](https://sdv-dev.github.io/CTGAN/)!


# Citing TGAN

Expand All @@ -204,6 +229,8 @@ If you use CTGAN, please cite the following work:
```

# Related Projects
Please note that these libraries are external contributions and are not maintained nor supervised by
the MIT DAI-Lab team.

## R interface for CTGAN

Expand All @@ -212,5 +239,9 @@ of **CTGAN** to **R** users.

More details can be found in the corresponding repository: https://github.com/kasaai/ctgan

Please note that this package is an external contribution and is not maintained nor suporvised by
the MIT DAI-Lab team.
## CTGAN Server CLI

A package to easily deploy **CTGAN** onto a remote server. This package is developed by Timothy Pillow @oregonpillow.

More details can be found in the corresponding repository: https://github.com/oregonpillow/ctgan-server-cli

2 changes: 1 addition & 1 deletion ctgan/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

__author__ = 'MIT Data To AI Lab'
__email__ = '[email protected]'
__version__ = '0.2.1'
__version__ = '0.2.2.dev4'

from ctgan.demo import load_demo
from ctgan.synthesizer import CTGANSynthesizer
Expand Down
31 changes: 29 additions & 2 deletions ctgan/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,16 @@ def _parse_args():
parser.add_argument('-n', '--num-samples', type=int,
help='Number of rows to sample. Defaults to the training data size')

parser.add_argument('--save', default=None, type=str,
help='A filename to save the trained synthesizer.')
parser.add_argument('--load', default=None, type=str,
help='A filename to load a trained synthesizer.')

parser.add_argument("--sample_condition_column", default=None, type=str,
help="Select a discrete column name.")
parser.add_argument("--sample_condition_column_value", default=None, type=str,
help="Specify the value of the selected discrete column.")

parser.add_argument('data', help='Path to training data')
parser.add_argument('output', help='Path of the output file')

Expand All @@ -34,13 +44,30 @@ def main():
else:
data, discrete_columns = read_csv(args.data, args.metadata, args.header, args.discrete)

model = CTGANSynthesizer()
if args.load:
model = CTGANSynthesizer.load(args.load)
else:
model = CTGANSynthesizer()
model.fit(data, discrete_columns, args.epochs)

if args.save is not None:
model.save(args.save)

num_samples = args.num_samples or len(data)
sampled = model.sample(num_samples)

if args.sample_condition_column is not None:
assert args.sample_condition_column_value is not None

sampled = model.sample(
num_samples,
args.sample_condition_column,
args.sample_condition_column_value)

if args.tsv:
write_tsv(sampled, args.metadata, args.output)
else:
sampled.to_csv(args.output, index=False)


if __name__ == "__main__":
main()
6 changes: 6 additions & 0 deletions ctgan/conditional.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,3 +96,9 @@ def sample_zero(self, batch):
vec[i, pick + self.interval[col, 0]] = 1

return vec

def generate_cond_from_condition_column_info(self, condition_info, batch):
vec = np.zeros((batch, self.n_opt), dtype='float32')
id = self.interval[condition_info["discrete_column_id"]][0] + condition_info["value_id"]
vec[:, id] = 1
return vec
Loading

0 comments on commit 6d0343a

Please sign in to comment.