Merge branch 'main' into zarr_archive

fsspec · Aug 25, 2023 · 2950deb · 2950deb
2 parents 634eeb4 + a81f22a
commit 2950deb
Show file tree

Hide file tree

Showing 4 changed files with 11 additions and 12 deletions.
diff --git a/docs/source/advanced.rst b/docs/source/advanced.rst
@@ -7,7 +7,7 @@ Using Dask
 Scanning and combining datasets can be computationally intensive and may
 require a lot of bandwidth for some data formats. Where the target data
 contains many input files, it makes sense to parallelise the job with
-dask and maybe disrtibuted the workload on a cluster to get additional
+dask and maybe distribute the workload on a cluster to get additional
 CPUs and network performance.
 
 Simple parallel
@@ -41,7 +41,7 @@ Tree reduction
 
 In some cases, the combine process can itself be slow or memory hungry.
 In such cases, it is useful to combine the single-file reference sets in
-batches (which reducec a lot of redundancy between them) and then
+batches (which reduce a lot of redundancy between them) and then
 combine the results of the batches. This technique is known as tree
 reduction. An example of doing this by hand can be seen `here`_.
 
@@ -106,13 +106,13 @@ Parquet Storage
 
 JSON is very convenient as a storage format for references, because it is
 simple, human-readable and ubiquitously supported. However, it is not the most
-efficient in terns of storage size of parsing speed. For python, in particular,
+efficient in terms of storage size of parsing speed. For python, in particular,
 it comes with the added downside of repeated strings becoming separate python
 string instances, greatly inflating memory footprint at load time.
 
 To overcome these problems, and in particular keep down the memory use for the
 end-user of kerchunked data, we can convert references to be stored in parquet,
-and use them with ``fsspec.implementations.reference.DRFererenceFileSystem``,
+and use them with ``fsspec.implementations.reference.ReferenceFileSystem``,
 an alternative new implementation designed to work only with parquet input.
 
 The principle benefits of the parquet path are:

diff --git a/docs/source/cases.rst b/docs/source/cases.rst
@@ -22,7 +22,7 @@ Discussion: https://github.com/fsspec/kerchunk/issues/78
 
 Generator script: https://github.com/cgohlke/tifffile/blob/v2021.10.10/examples/earthbigdata.py
 
-Notebook: https://github.com/fsspec/kerchunk/raw/main/examples/earthbigdata.ipynb
+Notebook: https://nbviewer.org/github/fsspec/kerchunk/blob/main/examples/earthbigdata.ipynb
 
 Solar Dynamics Observatory
 --------------------------
@@ -34,7 +34,7 @@ Effective in-memory data size: 400GB
 Notes: each wavelength filter is presented as a separate variable. The DATE-OBS of the nearest preceding 94A image
 is used for other filters to maintain a single time axis for all variables.
 
-Notebook: https://github.com/fsspec/kerchunk/raw/main/examples/SDO.ipynb
+Notebook: https://nbviewer.org/github/fsspec/kerchunk/blob/main/examples/SDO.ipynb
 
 National Water Model
 --------------------
@@ -46,9 +46,9 @@ Effective in-memory size: 80TB
 Notes: there are so many files, that dask and a tee reduction were required to aggregate the
 metadata.
 
-Notebook: https://nbviewer.org/gist/rsignell-usgs/02da7d9257b4b26d84d053be1af2ceeb
+Generator notebook: https://nbviewer.org/gist/rsignell-usgs/ef435a53ac530a2843ce7e1d59f96e22
 
-Generator notebook: https://gist.github.com/rsignell-usgs/ef435a53ac530a2843ce7e1d59f96e22
+Notebook: https://nbviewer.org/gist/rsignell-usgs/02da7d9257b4b26d84d053be1af2ceeb
 
 MUR SST
 -------

diff --git a/docs/source/test_example.rst b/docs/source/test_example.rst
@@ -87,7 +87,7 @@ This is what a user of the generated dataset would do. This person does not need
 
 Since the invocation for xarray to read this data is a little involved, we recommend
 declaring the data set in an ``intake`` catalog. Alternatively, you might split the command
-into mlutiple lines by first constructing the filesystem or mapper (you will see this in some
+into multiple lines by first constructing the filesystem or mapper (you will see this in some
 examples).
 
 Note that, if the combining was done previously and saved to a JSON file, then the path to

diff --git a/docs/source/tutorial.rst b/docs/source/tutorial.rst
@@ -8,7 +8,7 @@ Initially we create a pair of single file jsons for two ERA5 variables using ``K
 Single file JSONs
 -----------------
 
-The ``Kerchunk.hdf.SingleHdf5ToZarr`` method is used to create a single ``.json`` reference file for each file url passed to it. Here we use it to create a number of reference files for the ERA5 pubic dataset on `AWS <https://registry.opendata.aws/ecmwf-era5/>`__. We will compute a number of different times and variables to demonstrate different methods of combining them.
+The ``Kerchunk.hdf.SingleHdf5ToZarr`` method is used to create a single ``.json`` reference file for each file url passed to it. Here we use it to create a number of reference files for the ERA5 public dataset on `AWS <https://registry.opendata.aws/ecmwf-era5/>`__. We will compute a number of different times and variables to demonstrate different methods of combining them.
 
 The Kerchunk package is still in a development phase and so changes frequently. Installing directly from the source code is recommended.
 
@@ -244,8 +244,7 @@ For more complex uses it is also possible to pass in a compiled ``regex`` functi
 
 Here the ``new_dimension`` values have been populated by the compiled ``regex`` function ``ex`` which takes the file urls as input.
 
-To extract time information from file names, a custom function can be defined of the form ``(index, fs, var, fn) -> value`` to generate a valid ``datetime.datetime`` data type, typically using regular expressions.  These datetime objects are then used to generate time coordinates through the
- ``coo_dtypes`` argument in the ``MultiZarrToZarr`` function.
+To extract time information from file names, a custom function can be defined of the form ``(index, fs, var, fn) -> value`` to generate a valid ``datetime.datetime`` data type, typically using regular expressions.  These datetime objects are then used to generate time coordinates through the ``coo_dtypes`` argument in the ``MultiZarrToZarr`` function.
 
 Here's an example for file names following the pattern ``cgl_TOC_YYYYmmddHHMM_X21Y05_S3A_v1.1.0.json``: