Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle public and private CSVs #218

Merged
merged 29 commits into from
Jan 16, 2025
Merged
Changes from 1 commit
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
afc07fc
handle public and private CSV in CLI... but nothing downstream
mccalluc Jan 13, 2025
f4ec7a3
coverage
mccalluc Jan 13, 2025
492c4d6
More readable CLI help
mccalluc Jan 13, 2025
5858129
fake data into separate function
mccalluc Jan 13, 2025
2b95650
warn -> error
mccalluc Jan 13, 2025
25083dd
warn -> error
mccalluc Jan 13, 2025
c76ac79
add private and public component params
mccalluc Jan 13, 2025
9a2ecc3
add explanation in UI
mccalluc Jan 13, 2025
7f6cc46
add cards to organize first tab
mccalluc Jan 13, 2025
b900e05
stub where the warning message will go
mccalluc Jan 14, 2025
c398c9a
warning message about column mismatch
mccalluc Jan 14, 2025
64c01a3
better formating on list
mccalluc Jan 14, 2025
327395d
linting
mccalluc Jan 14, 2025
9c2439d
make the "Define analysis" button conditional
mccalluc Jan 14, 2025
3a0de4e
fix label in end-to-end
mccalluc Jan 14, 2025
c280b0b
reformat for readability
mccalluc Jan 14, 2025
43ef859
match -> mismatch
mccalluc Jan 14, 2025
5bbcc91
read either public or private
mccalluc Jan 14, 2025
9712374
move out content of simulation card
mccalluc Jan 14, 2025
0893d72
Different simulation card if public CSV
mccalluc Jan 14, 2025
630d5a6
fix renaming bugs
mccalluc Jan 14, 2025
cb394fb
add test to fix coverage; use "Optional"
mccalluc Jan 14, 2025
3f4b11b
factor mock data generation out of make_accuracy_histogram
mccalluc Jan 14, 2025
2cbc826
public and private previews
mccalluc Jan 14, 2025
38160f5
start testing conditional display for public vs private
mccalluc Jan 15, 2025
48c1239
nb reads public or private
mccalluc Jan 15, 2025
6f25df7
also make plot title conditional
mccalluc Jan 15, 2025
a4ca970
factor out shared descriptions
mccalluc Jan 15, 2025
df1a17d
missing f on f-string
mccalluc Jan 15, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
factor out shared descriptions
mccalluc committed Jan 15, 2025

Verified

This commit was signed with the committer’s verified signature.
theseion Max Leske
commit a4ca970475e718b52921f9a2907413acda592dcf
26 changes: 12 additions & 14 deletions dp_wizard/app/dataset_panel.py
Original file line number Diff line number Diff line change
@@ -3,7 +3,12 @@

from shiny import ui, reactive, render, Inputs, Outputs, Session

from dp_wizard.utils.argparse_helpers import get_cli_info
from dp_wizard.utils.argparse_helpers import (
get_cli_info,
PUBLIC_TEXT,
PRIVATE_TEXT,
PUBLIC_PRIVATE_TEXT,
)
from dp_wizard.utils.csv_helper import get_csv_names_mismatch
from dp_wizard.app.components.outputs import (
output_code_sample,
@@ -30,19 +35,12 @@ def dataset_ui():
ui.card(
ui.card_header("Input CSVs"),
ui.markdown(
"""
Choose **Public CSV** if you have a public data set, and are curious how
DP can be applied: The preview visualizations will use your public data.
Choose **Private CSV** if you only have a private data set, and want to
make a release from it: The preview visualizations will only use
simulated data, and apart from the headers, the private CSV is not
read until the release.
Choose both **Public CSV** and **Private CSV** if you have two files
with the same structure. Perhaps the public CSV is older and no longer
sensitive. Preview visualizations will be made with the public data,
but the release will be made with private data.
f"""
Choose **Public CSV** {PUBLIC_TEXT}
Choose **Private CSV** {PRIVATE_TEXT}
Choose both **Public CSV** and **Private CSV** {PUBLIC_PRIVATE_TEXT}
"""
),
ui.row(
29 changes: 17 additions & 12 deletions dp_wizard/utils/argparse_helpers.py
Original file line number Diff line number Diff line change
@@ -15,24 +15,29 @@ def _existing_csv_type(arg: str) -> Path:
return path


PUBLIC_TEXT = """if you have a public data set, and are curious how
DP can be applied: The preview visualizations will use your public data."""
PRIVATE_TEXT = """if you only have a private data set, and want to
make a release from it: The preview visualizations will only use
simulated data, and apart from the headers, the private CSV is not
read until the release."""
PUBLIC_PRIVATE_TEXT = """if you have two CSVs
with the same structure. Perhaps the public CSV is older and no longer
sensitive. Preview visualizations will be made with the public data,
but the release will be made with private data."""


def _get_arg_parser():
parser = argparse.ArgumentParser(
formatter_class=argparse.RawDescriptionHelpFormatter,
description="DP Wizard makes it easier to get started with "
"Differential Privacy.",
epilog="""
Use "--public_csv" if you have a public data set, and are curious how
DP can be applied: The preview visualizations will use your public data.
Use "--private_csv" if you only have a private data set, and want to
make a release from it: The preview visualizations will only use
simulated data, and apart from the headers, the private CSV is not
read until the release.
Use "--public_csv" and "--private_csv" together if you have two CSVs
with the same structure. Perhaps the public CSV is older and no longer
sensitive. Preview visualizations will be made with the public data,
but the release will be made with private data.
Use "--public_csv" {PUBLIC_TEXT}
Use "--private_csv" {PRIVATE_TEXT}
Use "--public_csv" and "--private_csv" together {PUBLIC_PRIVATE_TEXT}
""",
)
parser.add_argument(