Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial rename to HATS #390

Merged
merged 4 commits into from
Sep 19, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .copier-answers.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,9 @@ include_benchmarks: true
include_docs: true
include_notebooks: true
mypy_type_checking: basic
package_name: hipscat_import
package_name: hats_import
project_license: BSD
project_name: hipscat-import
project_name: hats-import
project_organization: astronomy-commons
python_versions:
- '3.9'
Expand Down
2 changes: 1 addition & 1 deletion .github/pull_request_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ If it fixes an open issue, please link to the issue here. If this PR closes an i


## Code Quality
- [ ] I have read the [Contribution Guide](https://hipscat-import.readthedocs.io/en/stable/guide/contributing.html) and [LINCC Frameworks Code of Conduct](https://lsstdiscoveryalliance.org/programs/lincc-frameworks/code-conduct/)
- [ ] I have read the [Contribution Guide](https://hats-import.readthedocs.io/en/stable/guide/contributing.html) and [LINCC Frameworks Code of Conduct](https://lsstdiscoveryalliance.org/programs/lincc-frameworks/code-conduct/)
delucchi-cmu marked this conversation as resolved.
Show resolved Hide resolved
- [ ] My code follows the code style of this project
- [ ] My code builds (or compiles) cleanly without any errors or warnings
- [ ] My code contains relevant comments and necessary documentation
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/publish-to-pypi.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ jobs:
python -m pip install --upgrade pip
pip install .
- name: Create lock requirements file
run: pip list --format=freeze --exclude "hipscat-import" > requirements.txt
run: pip list --format=freeze --exclude "hats-import" > requirements.txt
- name: Install dev dependencies
run: pip install .[dev]
- name: Run unit tests with pytest
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/testing-and-coverage.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ jobs:
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
- name: Run unit tests with pytest
run: |
python -m pytest tests --cov=hipscat_import --cov-report=xml
python -m pytest tests --cov=hats_import --cov-report=xml
- name: Run dask-on-ray tests with pytest
run: |
python -m pytest tests --use_ray
Expand Down
24 changes: 12 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,33 +1,33 @@
<img src="https://github.com/astronomy-commons/lsdb/blob/main/docs/lincc-logo.png?raw=true" width="300" height="100">

# hipscat-import
# hats-import

[![Template](https://img.shields.io/badge/Template-LINCC%20Frameworks%20Python%20Project%20Template-brightgreen)](https://lincc-ppt.readthedocs.io/en/stable/)

[![PyPI](https://img.shields.io/pypi/v/hipscat-import?color=blue&logo=pypi&logoColor=white)](https://pypi.org/project/hipscat-import/)
[![Conda](https://img.shields.io/conda/vn/conda-forge/hipscat-import.svg?color=blue&logo=condaforge&logoColor=white)](https://anaconda.org/conda-forge/hipscat-import)
[![PyPI](https://img.shields.io/pypi/v/hats-import?color=blue&logo=pypi&logoColor=white)](https://pypi.org/project/hats-import/)
[![Conda](https://img.shields.io/conda/vn/conda-forge/hats-import.svg?color=blue&logo=condaforge&logoColor=white)](https://anaconda.org/conda-forge/hats-import)

[![GitHub Workflow Status](https://img.shields.io/github/actions/workflow/status/astronomy-commons/hipscat-import/smoke-test.yml)](https://github.com/astronomy-commons/hipscat-import/actions/workflows/smoke-test.yml)
[![codecov](https://codecov.io/gh/astronomy-commons/hipscat-import/branch/main/graph/badge.svg)](https://codecov.io/gh/astronomy-commons/hipscat-import)
[![Read the Docs](https://img.shields.io/readthedocs/hipscat-import)](https://hipscat-import.readthedocs.io/)
[![GitHub Workflow Status](https://img.shields.io/github/actions/workflow/status/astronomy-commons/hats-import/smoke-test.yml)](https://github.com/astronomy-commons/hats-import/actions/workflows/smoke-test.yml)
[![codecov](https://codecov.io/gh/astronomy-commons/hats-import/branch/main/graph/badge.svg)](https://codecov.io/gh/astronomy-commons/hats-import)
[![Read the Docs](https://img.shields.io/readthedocs/hats-import)](https://hats-import.readthedocs.io/)

## HiPSCat import - Utility for ingesting large survey data into HiPSCat structure.
## HATS import - Utility for ingesting large survey data into HATS structure.

Check out our [ReadTheDocs site](https://hipscat-import.readthedocs.io/en/stable/)
Check out our [ReadTheDocs site](https://hats-import.readthedocs.io/en/stable/)
for more information on partitioning, installation, and contributing.

See related projects:

* HiPSCat ([on GitHub](https://github.com/astronomy-commons/hipscat))
([on ReadTheDocs](https://hipscat.readthedocs.io/en/stable/))
* HATS ([on GitHub](https://github.com/astronomy-commons/hats))
([on ReadTheDocs](https://hats.readthedocs.io/en/stable/))
* LSDB ([on GitHub](https://github.com/astronomy-commons/lsdb))
([on ReadTheDocs](https://lsdb.readthedocs.io/en/stable/))

## Contributing

[![GitHub issue custom search in repo](https://img.shields.io/github/issues-search/astronomy-commons/hipscat-import?color=purple&label=Good%20first%20issues&query=is%3Aopen%20label%3A%22good%20first%20issue%22)](https://github.com/astronomy-commons/hipscat-import/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22)
[![GitHub issue custom search in repo](https://img.shields.io/github/issues-search/astronomy-commons/hats-import?color=purple&label=Good%20first%20issues&query=is%3Aopen%20label%3A%22good%20first%20issue%22)](https://github.com/astronomy-commons/hats-import/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22)

See the [contribution guide](https://hipscat-import.readthedocs.io/en/stable/guide/contributing.html)
See the [contribution guide](https://hats-import.readthedocs.io/en/stable/guide/contributing.html)
for complete installation instructions and contribution best practices.

## Acknowledgements
Expand Down
6 changes: 3 additions & 3 deletions benchmarks/asv.conf.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
// you know what you are doing.
"version": 1,
// The name of the project being benchmarked.
"project": "hipscat-import",
"project": "hats-import",
// The project's homepage.
"project_url": "https://github.com/astronomy-commons/hipscat-import",
"project_url": "https://github.com/astronomy-commons/hats-import",
// The URL or local path of the source code repository for the
// project being benchmarked.
"repo": "..",
Expand All @@ -32,7 +32,7 @@
// variable.
"environment_type": "virtualenv",
// the base URL to show a commit for the project.
"show_commit_url": "https://github.com/astronomy-commons/hipscat-import/commit/",
"show_commit_url": "https://github.com/astronomy-commons/hats-import/commit/",
// The Pythons you'd like to test against. If not provided, defaults
// to the current version of Python used to run `asv`.
"pythons": [
Expand Down
4 changes: 2 additions & 2 deletions benchmarks/benchmarks.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@

import numpy as np

from hipscat_import.catalog.resume_plan import ResumePlan
from hipscat_import.catalog.sparse_histogram import SparseHistogram
from hats_import.catalog.resume_plan import ResumePlan
from hats_import.catalog.sparse_histogram import SparseHistogram


class BinningSuite:
Expand Down
32 changes: 16 additions & 16 deletions docs/catalogs/arguments.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ A minimal arguments block will look something like:

.. code-block:: python

from hipscat_import.catalog.arguments import ImportArguments
from hats_import.catalog.arguments import ImportArguments

args = ImportArguments(
sort_columns="ObjectID",
Expand All @@ -25,8 +25,8 @@ A minimal arguments block will look something like:
More details on each of these parameters is provided in sections below.

For the curious, see the API documentation for
:py:class:`hipscat_import.catalog.arguments.ImportArguments`, and its superclass
:py:class:`hipscat_import.runtime_arguments.RuntimeArguments`.
:py:class:`hats_import.catalog.arguments.ImportArguments`, and its superclass
:py:class:`hats_import.runtime_arguments.RuntimeArguments`.

Pipeline setup
-------------------------------------------------------------------------------
Expand All @@ -52,7 +52,7 @@ to the pipeline, ignoring the above arguments. This would look like:
.. code-block:: python

from dask.distributed import Client
from hipscat_import.pipeline import pipeline_with_client
from hats_import.pipeline import pipeline_with_client

args = ... # ImportArguments()
with Client('scheduler:port') as client:
Expand All @@ -63,7 +63,7 @@ potentially avoid some python threading issues with dask:

.. code-block:: python

from hipscat_import.pipeline import pipeline
from hats_import.pipeline import pipeline

def import_pipeline():
args = ...
Expand All @@ -88,14 +88,14 @@ files are found, we will restore the pipeline's previous progress.

If you want to start the pipeline from scratch you can simply set `resume=False`.
Alternatively, go to the temp directory you've specified and remove any intermediate
files created by the previous runs of the ``hipscat-import`` pipeline. You should also
files created by the previous runs of the ``hats-import`` pipeline. You should also
remove the output directory if it has any content. The resume argument performs these
cleaning operations automatically for you.

Reading input files
-------------------------------------------------------------------------------

Catalog import reads through a list of files and converts them into a hipscatted catalog.
Catalog import reads through a list of files and converts them into a hats-sharded catalog.

Which files?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down Expand Up @@ -134,7 +134,7 @@ to parse a whitespace separated file. Otherwise, you can use a short string to
specify an existing file reader type e.g. ``file_reader="csv"``.

You can find the full API documentation for
:py:class:`hipscat_import.catalog.file_readers.InputReader`
:py:class:`hats_import.catalog.file_readers.InputReader`

.. code-block:: python

Expand Down Expand Up @@ -206,18 +206,18 @@ Which fields?

Specify the ``ra_column`` and ``dec_column`` for the dataset.

There are two fields that we require in order to make a valid hipscatted
There are two fields that we require in order to make a valid hats-sharded
catalog, the right ascension and declination. At this time, this is the only
supported system for celestial coordinates.

If you're importing data that has previously been hipscatted, you may use
``use_hipscat_index = True``. This will use that previously compused hipscat spatial
If you're importing data that has previously been hats-sharded, you may use
``use_healpix_29 = True``. This will use that previously compused hats spatial
index as the position, instead of ra/dec.

Healpix order and thresholds
-------------------------------------------------------------------------------

When creating a new catalog through the hipscat-import process, we try to
When creating a new catalog through the hats-import process, we try to
create partitions with approximately the same number of rows per partition.
This isn't perfect, because the sky is uneven, but we still try to create
smaller-area pixels in more dense areas, and larger-area pixels in less dense
Expand Down Expand Up @@ -322,18 +322,18 @@ How?
You may want to tweak parameters of the final catalog output, and we have helper
arguments for a few of those.

``add_hipscat_index`` - ``bool`` - whether or not to add the hipscat spatial index
as a column in the resulting catalog. The ``_hipscat_index`` field is designed to make many
``add_healpix_29`` - ``bool`` - whether or not to add the hats spatial index
as a column in the resulting catalog. The ``_healpix_29`` field is designed to make many
dask operations more performant, but if you do not intend to publish your dataset
and do not intend to use dask, then you can suppress generation of this column to
save a little space in your final disk usage.

The ``_hipscat_index`` uses a high healpix order and a uniqueness counter to create
The ``_healpix_29`` uses a high healpix order and a uniqueness counter to create
delucchi-cmu marked this conversation as resolved.
Show resolved Hide resolved
values that can order all points in the sky, according to a nested healpix scheme.

``sort_columns`` - ``str`` - column for survey identifier, or other sortable column.
If sorting by multiple columns, they should be comma-separated.
If ``add_hipscat_index=True``, this sorting will be used to resolve the
If ``add_healpix_29=True``, this sorting will be used to resolve the
index counter within the same higher-order pixel space.

delucchi-cmu marked this conversation as resolved.
Show resolved Hide resolved
``use_schema_file`` - ``str`` - path to a parquet file with schema metadata.
Expand Down
6 changes: 3 additions & 3 deletions docs/catalogs/public/allwise.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,9 @@ Example import

import pandas as pd

import hipscat_import.pipeline as runner
from hipscat_import.catalog.arguments import ImportArguments
from hipscat_import.catalog.file_readers import CsvReader
import hats_import.pipeline as runner
from hats_import.catalog.arguments import ImportArguments
from hats_import.catalog.file_readers import CsvReader

# Load the column names and types from a side file.
type_frame = pd.read_csv("allwise_types.csv")
Expand Down
6 changes: 3 additions & 3 deletions docs/catalogs/public/neowise.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,9 @@ Example import

import pandas as pd

import hipscat_import.pipeline as runner
from hipscat_import.catalog.arguments import ImportArguments
from hipscat_import.catalog.file_readers import CsvReader
import hats_import.pipeline as runner
from hats_import.catalog.arguments import ImportArguments
from hats_import.catalog.file_readers import CsvReader

# Load the column names and types from a side file.
type_frame = pd.read_csv("neowise_types.csv")
Expand Down
6 changes: 3 additions & 3 deletions docs/catalogs/public/panstarrs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,9 @@ Example import of objects (otmo)

import pandas as pd

import hipscat_import.pipeline as runner
from hipscat_import.catalog.arguments import ImportArguments
from hipscat_import.catalog.file_readers import CsvReader
import hats_import.pipeline as runner
from hats_import.catalog.arguments import ImportArguments
from hats_import.catalog.file_readers import CsvReader

# Load the column names and types from a side file.
type_frame = pd.read_csv("ps1_otmo_types.csv")
Expand Down
4 changes: 2 additions & 2 deletions docs/catalogs/public/sdss.rst
Original file line number Diff line number Diff line change
Expand Up @@ -64,8 +64,8 @@ Example import

.. code-block:: python

from hipscat_import.catalog.arguments import ImportArguments
import hipscat_import.pipeline as runner
from hats_import.catalog.arguments import ImportArguments
import hats_import.pipeline as runner

args = ImportArguments(
output_artifact_name="sdss_dr16q",
Expand Down
6 changes: 3 additions & 3 deletions docs/catalogs/public/tic.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,9 @@ Example import

import pandas as pd

import hipscat_import.pipeline as runner
from hipscat_import.catalog.arguments import ImportArguments
from hipscat_import.catalog.file_readers import CsvReader
import hats_import.pipeline as runner
from hats_import.catalog.arguments import ImportArguments
from hats_import.catalog.file_readers import CsvReader

type_frame = pd.read_csv("tic_types.csv")
type_map = dict(zip(type_frame["name"], type_frame["type"]))
Expand Down
6 changes: 3 additions & 3 deletions docs/catalogs/public/zubercal.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,9 @@ Challenges with this data set

.. code-block:: python

import hipscat_import.pipeline as runner
from hipscat_import.catalog.arguments import ImportArguments
from hipscat_import.catalog.file_readers import ParquetReader
import hats_import.pipeline as runner
from hats_import.catalog.arguments import ImportArguments
from hats_import.catalog.file_readers import ParquetReader
import pyarrow.parquet as pq
import pyarrow as pa
import re
Expand Down
10 changes: 5 additions & 5 deletions docs/catalogs/temp_files.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Temporary files and disk usage
===============================================================================

This page aims to characterize intermediate files created by the hipscat-import
This page aims to characterize intermediate files created by the hats-import
catalog creation process. Most users are going to be ok with setting the ``tmp_dir``
and not thinking much more about it.

Expand Down Expand Up @@ -90,7 +90,7 @@ Some more explanation:
What's happening when
-------------------------------------------------------------------------------

The hipscat-import catalog creation process generates a lot of temporary files. Some find this
The hats-import catalog creation process generates a lot of temporary files. Some find this
surprising, so we try to provide a narrative of what's happening and why.

Planning stage
Expand Down Expand Up @@ -196,10 +196,10 @@ final catalog can be very different from the on-disk size of the input files.

In our internal testing, we converted a number of different kinds of catalogs,
and share some of the results with you, to give some suggestion of the disk requirements
you may face when converting your own catalogs to hipscat format.
you may face when converting your own catalogs to hats format.

============= =============== =========== =============== =========================
Catalog Input size (-h) Input size Hipscatted size Ratio
Catalog Input size (-h) Input size HATS size Ratio
============= =============== =========== =============== =========================
allwise 1.2T 1196115700 310184460 0.26 (a lot smaller)
neowise 3.9T 4177447284 4263269112 1.02 (about the same)
Expand All @@ -213,4 +213,4 @@ Notes:
- allwise, neowise, and tic were all originally compressed CSV files.
- sdss was originally a series of fits files
- zubercal was originally 500k parquet files, and is reduced in the example to
around 70k hipscat parquet files.
around 70k hats parquet files.
8 changes: 4 additions & 4 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,10 @@
# -- Project information -----------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information

project = "hipscat-import"
project = "hats-import"
copyright = "2023, LINCC Frameworks"
author = "LINCC Frameworks"
release = version("hipscat-import")
release = version("hats-import")
# for example take major/minor
version = ".".join(release.split(".")[:2])

Expand Down Expand Up @@ -80,8 +80,8 @@
## lets us suppress the copy button on select code blocks.
copybutton_selector = "div:not(.no-copybutton) > div.highlight > pre"

# Cross-link hipscat documentation from the API reference:
# Cross-link hats documentation from the API reference:
# https://docs.readthedocs.io/en/stable/guides/intersphinx.html
intersphinx_mapping = {
"hipscat": ("http://hipscat.readthedocs.io/en/stable/", None),
"hats": ("http://hats.readthedocs.io/en/stable/", None),
}
4 changes: 2 additions & 2 deletions docs/guide/contact.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ We at LINCC Frameworks pride ourselves on being a friendly bunch!
If you're encountering issues, have some gnarly dataset, have ideas for
making our products better, or pretty much anything else, reach out!

* Open an issue in our github repo for hipscat-import
* https://github.com/astronomy-commons/hipscat-import/issues/new
* Open an issue in our github repo for hats-import
* https://github.com/astronomy-commons/hats-import/issues/new
* If you're on LSSTC slack, so are we!
`#lincc-frameworks-qa <https://lsstc.slack.com/archives/C062LG1AK1S>`_
2 changes: 1 addition & 1 deletion docs/guide/contributing.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Contributing to hipscat-import
Contributing to hats-import
===============================================================================

Find (or make) a new GitHub issue
Expand Down
4 changes: 2 additions & 2 deletions docs/guide/dask_on_ray.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ See more on Ray's site:

https://docs.ray.io/en/latest/ray-more-libs/dask-on-ray.html

How to use in hipscat-import pipelines
How to use in hats-import pipelines
-------------------------------------------------------------------------------

Install ray
Expand All @@ -27,7 +27,7 @@ You should also disable ray when you're done, just to clean things up.
from dask.distributed import Client
from ray.util.dask import disable_dask_on_ray, enable_dask_on_ray

from hipscat_import.pipeline import pipeline_with_client
from hats_import.pipeline import pipeline_with_client

with ray.init(
num_cpus=args.dask_n_workers,
Expand Down
Loading