diff --git a/.buildinfo b/.buildinfo new file mode 100644 index 00000000..29ff1265 --- /dev/null +++ b/.buildinfo @@ -0,0 +1,4 @@ +# Sphinx build info version 1 +# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done. +config: 2921a4c5ce816b9792230abd536acabd +tags: 645f666f9bcd5a90fca523b33c5a78b7 diff --git a/.nojekyll b/.nojekyll new file mode 100644 index 00000000..e69de29b diff --git a/README.md b/README.md new file mode 100644 index 00000000..ae1cf45a --- /dev/null +++ b/README.md @@ -0,0 +1,3 @@ +#GitHub Pages + +Last update of sphinx html documentation from [e54a92f](https://github.com/hynky1999/CmonCrawl/tree/e54a92f01e3b4b491b1ff54b4560467f57a318b0) diff --git a/_images/when_to_use.drawio.png b/_images/when_to_use.drawio.png new file mode 100644 index 00000000..3d561654 Binary files /dev/null and b/_images/when_to_use.drawio.png differ diff --git a/_sources/api.rst.txt b/_sources/api.rst.txt new file mode 100644 index 00000000..2ed3eeb5 --- /dev/null +++ b/_sources/api.rst.txt @@ -0,0 +1,14 @@ +API +=== + +.. autosummary:: + :recursive: + :toctree: generated + + + cmoncrawl + + + + + diff --git a/_sources/cli/cli.rst.txt b/_sources/cli/cli.rst.txt new file mode 100644 index 00000000..aa47bb46 --- /dev/null +++ b/_sources/cli/cli.rst.txt @@ -0,0 +1,48 @@ +.. _cli: + +Command Line Interface +====================== + +The command line interface is a simple wrapper around the library. + +It provides the two main functionalities: + +* `download` - Downloads samples of either :ref:`domain_record` or HTML from common crawl indexes +* `extract` - Downloads an HTML from Domain Record and extracts the content. It can also directly take the HTML and extract the data. + +Both functionalities are invoked using ``cmon`` followed by the functionality and the required arguments. +The ``cmon`` command also takes a few optional arguments: + +--verbosity + Verbosity level. Choices are [0, 1, 2], with 0 being the least verbose and 2 being the most verbose. Default is 1. + +--aws_profile + AWS profile to use for AWS calls (Athena, S3). If not provided, the default AWS profile will be used. + +Examples +-------- + +.. code-block:: bash + + # Download first 1000 domain records for example.com + cmon download --match_type=domain --limit=1000 dr_output record example.com + + # Download first 100 htmls for example.com + cmon download --match_type=domain --limit=100 html_output html example.com + + # Take the domain records downloaded using the first command and extracts them using your extractors + cmon extract config.json extracted_output dr_output/*.jsonl record + + # Take the htmls downloaded using the second command and extracts them using your extractors + cmon extract config.json extracted_output html_output/*.html html + + + + + + + + + + + diff --git a/_sources/cli/download.rst.txt b/_sources/cli/download.rst.txt new file mode 100644 index 00000000..9deb48ae --- /dev/null +++ b/_sources/cli/download.rst.txt @@ -0,0 +1,105 @@ +Command Line Download +===================== + +The download mode of the ``cmon`` command line tool serves to query and download from CommonCrawl indexes. +The following arguments are needed in this order: + +Positional arguments +-------------------- + +1. output - Path to output directory. + +2. {record,html} - Download mode: + + - record: Download record files from Common Crawl. + - html: Download HTML files from Common Crawl. + +3. urls - URLs to download, e.g. www.bcc.cz. + + +In html mode, the output directory will contain .html files, one +for each found URL. In record mode, the output directory will contain +``.jsonl`` files, each containing multiple domain records in JSON format. + + +Options +------- + +--limit LIMIT + Max number of URLs to download. + +--since SINCE + Start date in ISO format (e.g., 2020-01-01). + +--to TO + End date in ISO format (e.g., 2020-01-01). + +--cc_server CC_SERVER + Common Crawl indexes to query. Must provide the whole URL (e.g., https://index.commoncrawl.org/CC-MAIN-2023-14-index). + +--max_retry MAX_RETRY + Max number of retries for a request. Increase this number when requests are failing. + +--sleep_base SLEEP_BASE + Base sleep time for exponential backoff in case of request failure. + +--max_requests_per_second MAX_REQUESTS_PER_SECOND + Max number of requests per second. + +--match_type MATCH_TYPE + One of exact, prefix, host, domain + Match type for the URL. Refer to cdx-api for more information. + See :py:class:`cmoncrawl.common.types.MatchType` for more information. + +--max_directory_size MAX_DIRECTORY_SIZE + Max number of files per directory. + +--filter_non_200 + Filter out non-200 status code. + +--aggregator AGGREGATOR + Aggregator to use for the query. + + - athena: Athena aggregator. Fastest, but requires AWS credentials with correct permissions. See :ref:`misc/athena:Athena` for more information. + - gateway: Gateway aggregator (default). Very slow, but no need for AWS config. + +--s3_bucket S3_BUCKET + S3 bucket to use for Athena aggregator. Only needed if using Athena aggregator. + + - If set the bucket will not be deleted after the query is done, allowing to reuse it for future queries. + - If not set, a temporary bucket will be created and deleted after the query is done. + +.. note:: + If you specify an S3 bucket, remember to delete it manually after you're done to avoid incurring unnecessary costs. + + +Record mode options +------------------- + +--max_crawls_per_file MAX_CRAWLS_PER_FILE + Max number of domain records per file output + +HTML mode options +----------------- + +--encoding ENCODING + Force usage of specified encoding if possible. + +--download_method DOWNLOAD_METHOD + Method for downloading warc files from Common Crawl, it only applies to HTML download. + + - api: Download from Common Crawl API Gateway. This is the default option. + - s3: Download from Common Crawl S3 bucket. This is the fastest option, but requires AWS credentials with correct permissions. + + +Examples +-------- + + +.. code-block:: bash + + # Download first 1000 domain records for example.com + cmon download dr_output record --match_type=domain --limit=1000 example.com + + # Download first 100 htmls for example.com + cmon download html_output html --match_type=domain --limit=100 example.com diff --git a/_sources/cli/extract.rst.txt b/_sources/cli/extract.rst.txt new file mode 100644 index 00000000..634a836d --- /dev/null +++ b/_sources/cli/extract.rst.txt @@ -0,0 +1,90 @@ +Command line Extract +==================== + +The extract mode of the ``cmon`` command line tool serves to extract data from your downloaded files. +The following arguments are needed in this order: + +Positional arguments +-------------------- + +1. config_path - Path to the config file containing extraction rules. + +2. output_path - Path to the output directory. + +3. {record,html} - Extraction mode: + + - record: Extract data from jsonl (domain record) files. + - html: Extract data from HTML files. + +4. files - Files to extract data from (Either HTML files or .jsonl files). + +To create a config file, see :ref:`extractor_config`. + +Both modes yield the same output format, which is a ``.jsonl`` file containing the extracted data, +one per line. For each file, a new directory is created in the output directory, named after the +file. + +The files created by the download mode can be directly used with the appropriate mode +in the extraction. + +- If you have an HTML file, you can use the HTML mode to extract it. +- If you have a domain records, you can use the RECORD mode to extract it. +- If you have domain records, which you acquired without using cmoncrawl, + +please refer to :ref:`domain_record_jsonl`, which describes how to create ``.jsonl`` files from your domain records, +which you can then use with the record mode. + +Optional arguments +------------------ + +--max_crawls_per_file MAX_CRAWLS_PER_FILE + Max number of extractions per file output. + +--max_directory_size MAX_DIRECTORY_SIZE + Max number of extraction files per directory. + +--n_proc N_PROC + Number of processes to use for extraction. The parallelization is on file level, + thus for a single file, it's useless to use more than one process. + +Record arguments +---------------- + +--max_retry MAX_RETRY + Max number of WARC download attempts. + +--download_method DOWNLOAD_METHOD + Method for downloading warc files from Common Crawl, it only applies to HTML download. + + - api: Download from Common Crawl API Gateway. This is the default option. + - s3: Download from Common Crawl S3 bucket. This is the fastest option, but requires AWS credentials with correct permissions. + +--sleep_base SLEEP_BASE + Base value for exponential backoff between failed requests. + +--max_requests_per_second MAX_REQUESTS_PER_SECOND + Max number of requests per second. + +Html arguments +-------------- + +--date DATE + Date of extraction of HTML files in ISO format (e.g., 2021-01-01). The default is today. + +--url URL + URL from which the HTML files were downloaded. By default, it will try to infer from the file content. + +Examples +-------- + +.. code-block:: bash + + # Take the domain records downloaded using the first command and extracts them using your extractors + cmon extract config.json extracted_output dr_output/*.jsonl record --max_retry 100 --download_method=gateway --sleep_base 1.3 + + # Take the htmls downloaded using the second command and extracts them using your extractors + cmon extract config.json extracted_output html_output/*.html html --date 2021-01-01 --url https://www.example.com + +When you are going to build the extractors, you will appreciate that you can specify +what the URL of the HTML file is and what the date of the extraction is. This is because +those information are used during the extractor routing. \ No newline at end of file diff --git a/_sources/cli/index.rst.txt b/_sources/cli/index.rst.txt new file mode 100644 index 00000000..b0dac0fc --- /dev/null +++ b/_sources/cli/index.rst.txt @@ -0,0 +1,12 @@ +Command Line Interface +====================== + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + + cli + download + extract + + diff --git a/_sources/extraction/config_file.rst.txt b/_sources/extraction/config_file.rst.txt new file mode 100644 index 00000000..38f341ec --- /dev/null +++ b/_sources/extraction/config_file.rst.txt @@ -0,0 +1,137 @@ +.. _extractor_config: + +Extractor config file +========================== +In many cases you will want to use more than single extractor. +Imagine if you crawl two news websites which have completely different structure +and you want to extract the article. You can achieve this by using the Extractor config file. + +The extractor config file, defines what extractor should be used for a given HTML file. +You can leverage datetime of the crawl and url to specify which extractor should be used. + +Structure +--------- + +The structure is following: + +.. code-block:: json + + { + + "extractors_path": "Path to the extractors folder", + "routes": [ + { + "regexes": [".*"], + "extractors": [{ + "name": "my_extractor", + "since": "iso date string", + "to": "iso date string" + }, + { + "name": "my_extractor2", + } + ] + }, + { + "regexes": ["another_regex"], + "....": "...." + } + ] + } + + +The ``extractors_path`` is the path to the folder where the extractors are located. + +.. note:: + The extractors_path is relative to the current working directory. + + +The ``routes`` defined a list of possible extractors and conditions we can route to. Each route is a dictionary with the following keys: + +* ``regexes``: a list of regexes. At least one regex must match the url, for this route to be used. +* ``extractors``: a list of extractors that will be used to extract the data from the url. The first extractor for which ``since`` < record_date < ``to`` is used. + + +Each extractor has the following keys: + +* ``name``: the name of the extractor. This is the name of the python file without the .py extension, you can also set NAME variable in the extractor file to override this. +* ``since`` [optional] : The starting crawl date for which the extractor is valid (e.g. 2009-01-01) +* ``to`` [optional] : The ending crawl date for which the extractor is valid. Format is the same as for ``since``. + +.. note:: + If ``since`` and ``to`` are not specified, the extractor will match for all crawls for that route. + + +Example +------- + +Given the following folder structure: + +.. code-block:: text + + extractors/ + ├── a_extractor.py + ├── a_extractor2.py + └── b_extractor.py + +and the following config: + +.. code-block:: json + + { + + "extractors_path": "./extractors", + "routes": [ + { + "regexes": [".*cmon.cz.*"], + "extractors": [{ + "name": "a_extractor", + "to": "2010-01-01" + }, + { + "name": "a_extractor2", + "since": "2010-01-01" + } + ] + }, + { + "regexes": [".*cmon2.cz.*"], + "extractors": [{ + "name": "b_extractor", + } + ] + } + ] + } + +The following will happen: + +* A domain record with url http://www.cmon.cz, cralwed on 2012 will be extracted using the a_extractor2.py extractor. +* A domain record with url http://www.cmon.cz, cralwed on 2009 will be extracted using the a_extractor.py extractor. +* A domain record with url http://www.cmon2.cz, cralwed on 2012 will be extracted using the b_extractor.py extractor. + + +`__init__.py` +------------- +You might want to put the common code of the extractors into +a common python file. The problem is that during the execution, +the extractors directory is not in the python path. To add the extractors +directory we also load `__init__.py`` file (But don't add load extractors in it). + +Thus you can create `__init__.py` file in the extractors directory with the following content: + +.. code-block:: python + + import sys + from pathlib import Path + sys.path.append(Path(__file__).parent) + +which will add the extractors directory to the python path. + + +Arbitrary Code Execution +------------------------ +.. warning:: + Since the router, loads and executes all files in the extractors + directory, every .py file in this directory is executed. Thus + you should not put any untrusted files in this directory. \ No newline at end of file diff --git a/_sources/extraction/creating_extractor.rst.txt b/_sources/extraction/creating_extractor.rst.txt new file mode 100644 index 00000000..b72bd541 --- /dev/null +++ b/_sources/extraction/creating_extractor.rst.txt @@ -0,0 +1,99 @@ +.. _extractors: + +Extractor types +================ + +All the extractors you will write must implement the :py:class:`cmoncrawl.processor.pipeline.extractor.IExtractor` class. +If you choose to implement it directly, you will have to implement the ``extract`` method. +In the method you are provided with the HTML page as a string and crawl Medatata. You then define what data you want to extract from HTML as dictionary or None if you want +to discard the HTML. + +While the interface is simple it doesn't handle encoding problems or filtering. +If you want to parse the HTML using ``bs4`` and then extract the data you can use either: + +- :py:class:`cmoncrawl.processor.pipeline.extractor.BaseExtractor`, which parses the HTML using ``bs4`` and resolves encoding issues +- :py:class:`cmoncrawl.processor.pipeline.extractor.PageExtractor`, in which you just define CSS selectors to use and function which transform the data from selectors + +Extractor Definition +==================== +In order to register you extractor, you must define each extractor in +separate file and you must initialize the extractor in that file to variable +named `extractor`. + +Example 1. +---------- + +.. code-block:: python + :caption: extractor.py + + # You can either use the NAME variable to define name, + # otherwise the name will be inherited from the file name + NAME='title_extractor' + + from cmoncrawl.processor.pipeline.extractor import IExtractor + from cmoncrawl.common.types import PipeMetadata + + class MyExtractor(IExtractor): + def extract(self, response: str, metadata: PipeMetadata) -> Dict[str, Any] | None: + return {"title": "My title"} + + extractor = MyExtractor() + + +BaseExtractor +============= + +The `BaseExtractor`` assumes you will want to use parsed HTML using +`BeautifulSoup `_ +Thus the only method you need to implement is the `extract_soup` method. + +Extraction +---------- + +- `extract_soup` method + +It takes a BeautifulSoup object and crawl metadata (see :py:class:`cmoncrawl.common.types.PipeMetadata`) and must return +a dictionary of extracted data or None if the page should not be extacted, for example if you haven't found all the data you need. + +Additionaly, you might want to filter the pages you don't want to +extract. For this, you have two options: + +Filtering +--------- + +- `filter_raw` method + +This method take the raw HTML and crawl metadata and must return True if the page should be extracted or False otherwise. If you can +decide based on raw HTML, this is the most efficient way to filter pages, as now soup parsing will be done. + +- `filter_soup` method + +This method take the BeautifulSoup object and crawl metadata and must return True if the page should be extracted or False otherwise. + + +Finally your file must create the said extractor and name it `extractor`. + +Example 2. +---------- + +Here is an example of an extractor that will extract the title of the page. + +.. code-block:: python + :caption: ext.py + + + from cmoncrawl.processor.pipeline.extractor import BaseExtractor + from cmoncrawl.common.types import PipeMetadata + + class TitleExtractor(BaseExtractor): + def extract_soup(self, soup: BeautifulSoup, metadata: PipeMetadata) -> dict: + return {'title': soup.title.text} + + def filter_soup(self, soup: BeautifulSoup, metadata: PipeMetadata) -> bool: + return soup.title is not None + + extractor = TitleExtractor() + NAME='title' + +Now in :ref:`extractor_config` you would refer to this extractor as `title_extractor`. +If you would't set the `NAME` variable, you would refer to it as `ext`. \ No newline at end of file diff --git a/_sources/extraction/index.rst.txt b/_sources/extraction/index.rst.txt new file mode 100644 index 00000000..b96f7317 --- /dev/null +++ b/_sources/extraction/index.rst.txt @@ -0,0 +1,15 @@ +Extraction +================= +In order to save space, you might want to extract the information from the +HTMLs directly, without saving the HTMLs themselves. The library does allow +you to do that. In this section, we will show you how you can define your +own extractors. + + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + + creating_extractor + config_file + utils diff --git a/_sources/extraction/utils.rst.txt b/_sources/extraction/utils.rst.txt new file mode 100644 index 00000000..ae53bae7 --- /dev/null +++ b/_sources/extraction/utils.rst.txt @@ -0,0 +1,23 @@ +Extraction utils +================ + +The utilies for extraction are defined :py:mod:`cmoncrawl.processor.extraction`. +It provides helper function for both filtering and extraction. + + +Filtering +--------- + +- `must_exist_filter``: filter out the ulrs that don't contain css selector + +- `must_not_exist_filter`: filter out the ulrs that contain css selector + + +Extraction +---------- + +-- `check_required`: Creates a function that checks if all the required fileds are present in the extracted data + +-- `chain_transform`: Creates a function that chains multiple transformation function, if any return None, the chain is broken and None is returned. Especially usefull with soup select etc... + +-- `extract_transform`: Creates a function that extracts the data from the soup tag using the css selector and transforms it using your transformation functions. \ No newline at end of file diff --git a/_sources/generated/cmoncrawl.aggregator.athena_query.rst.txt b/_sources/generated/cmoncrawl.aggregator.athena_query.rst.txt new file mode 100644 index 00000000..2cedc183 --- /dev/null +++ b/_sources/generated/cmoncrawl.aggregator.athena_query.rst.txt @@ -0,0 +1,28 @@ +cmoncrawl.aggregator.athena\_query +================================== + +.. automodule:: cmoncrawl.aggregator.athena_query + + + + + + + + + + + + .. rubric:: Classes + + .. autoclass:: AthenaAggregator + :members: + + + + + + + + + diff --git a/_sources/generated/cmoncrawl.aggregator.base.rst.txt b/_sources/generated/cmoncrawl.aggregator.base.rst.txt new file mode 100644 index 00000000..0a9cbbbf --- /dev/null +++ b/_sources/generated/cmoncrawl.aggregator.base.rst.txt @@ -0,0 +1,29 @@ +cmoncrawl.aggregator.base +========================= + +.. automodule:: cmoncrawl.aggregator.base + + + + + + + + + + + + .. rubric:: Classes + + .. autosummary:: + + IAggregator + + + + + + + + + diff --git a/_sources/generated/cmoncrawl.aggregator.gateway_query.rst.txt b/_sources/generated/cmoncrawl.aggregator.gateway_query.rst.txt new file mode 100644 index 00000000..8ed0934a --- /dev/null +++ b/_sources/generated/cmoncrawl.aggregator.gateway_query.rst.txt @@ -0,0 +1,26 @@ +cmoncrawl.aggregator.gateway\_query +=================================== + +.. automodule:: cmoncrawl.aggregator.gateway_query + + + + + + + + + + + + .. rubric:: Classes + + .. autoclass:: GatewayAggregator + :members: + + + + + + + diff --git a/_sources/generated/cmoncrawl.aggregator.rst.txt b/_sources/generated/cmoncrawl.aggregator.rst.txt new file mode 100644 index 00000000..155b5681 --- /dev/null +++ b/_sources/generated/cmoncrawl.aggregator.rst.txt @@ -0,0 +1,34 @@ +cmoncrawl.aggregator +==================== + +.. automodule:: cmoncrawl.aggregator + + + + + + + + + + + + + + + + + + + +.. rubric:: Modules + +.. autosummary:: + :toctree: + :recursive: + + cmoncrawl.aggregator.athena_query + cmoncrawl.aggregator.base + cmoncrawl.aggregator.gateway_query + cmoncrawl.aggregator.utils + diff --git a/_sources/generated/cmoncrawl.aggregator.utils.athena_query_maker.rst.txt b/_sources/generated/cmoncrawl.aggregator.utils.athena_query_maker.rst.txt new file mode 100644 index 00000000..c4b49a35 --- /dev/null +++ b/_sources/generated/cmoncrawl.aggregator.utils.athena_query_maker.rst.txt @@ -0,0 +1,36 @@ +cmoncrawl.aggregator.utils.athena\_query\_maker +=============================================== + +.. automodule:: cmoncrawl.aggregator.utils.athena_query_maker + + + + + + + + .. rubric:: Functions + + .. autosummary:: + + crawl_query + crawl_url_to_name + date_to_sql_format + get_name + prepare_athena_sql_query + prepare_athena_where_conditions + url_query_based_on_match_type + url_query_date_range + + + + + + + + + + + + + diff --git a/_sources/generated/cmoncrawl.aggregator.utils.helpers.rst.txt b/_sources/generated/cmoncrawl.aggregator.utils.helpers.rst.txt new file mode 100644 index 00000000..7e499a57 --- /dev/null +++ b/_sources/generated/cmoncrawl.aggregator.utils.helpers.rst.txt @@ -0,0 +1,38 @@ +cmoncrawl.aggregator.utils.helpers +================================== + +.. automodule:: cmoncrawl.aggregator.utils.helpers + + + + + + + + .. rubric:: Functions + + .. autosummary:: + + get_all_CC_indexes + log_after_retry + retrieve + unify_url_id + + + + + + + + + + .. rubric:: Exceptions + + .. autosummary:: + + DownloadError + + + + + diff --git a/_sources/generated/cmoncrawl.aggregator.utils.ndjson.rst.txt b/_sources/generated/cmoncrawl.aggregator.utils.ndjson.rst.txt new file mode 100644 index 00000000..318ceb86 --- /dev/null +++ b/_sources/generated/cmoncrawl.aggregator.utils.ndjson.rst.txt @@ -0,0 +1,29 @@ +cmoncrawl.aggregator.utils.ndjson +================================= + +.. automodule:: cmoncrawl.aggregator.utils.ndjson + + + + + + + + + + + + .. rubric:: Classes + + .. autosummary:: + + Decoder + + + + + + + + + diff --git a/_sources/generated/cmoncrawl.aggregator.utils.rst.txt b/_sources/generated/cmoncrawl.aggregator.utils.rst.txt new file mode 100644 index 00000000..0a57b6aa --- /dev/null +++ b/_sources/generated/cmoncrawl.aggregator.utils.rst.txt @@ -0,0 +1,33 @@ +cmoncrawl.aggregator.utils +========================== + +.. automodule:: cmoncrawl.aggregator.utils + + + + + + + + + + + + + + + + + + + +.. rubric:: Modules + +.. autosummary:: + :toctree: + :recursive: + + cmoncrawl.aggregator.utils.athena_query_maker + cmoncrawl.aggregator.utils.helpers + cmoncrawl.aggregator.utils.ndjson + diff --git a/_sources/generated/cmoncrawl.common.loggers.rst.txt b/_sources/generated/cmoncrawl.common.loggers.rst.txt new file mode 100644 index 00000000..5a1a373c --- /dev/null +++ b/_sources/generated/cmoncrawl.common.loggers.rst.txt @@ -0,0 +1,29 @@ +cmoncrawl.common.loggers +======================== + +.. automodule:: cmoncrawl.common.loggers + + + + + + + + .. rubric:: Functions + + .. autosummary:: + + setup_loggers + + + + + + + + + + + + + diff --git a/_sources/generated/cmoncrawl.common.rst.txt b/_sources/generated/cmoncrawl.common.rst.txt new file mode 100644 index 00000000..66aed275 --- /dev/null +++ b/_sources/generated/cmoncrawl.common.rst.txt @@ -0,0 +1,33 @@ +cmoncrawl.common +================ + +.. automodule:: cmoncrawl.common + + + + + + + + + + + + + + + + + + + +.. rubric:: Modules + +.. autosummary:: + :toctree: + :recursive: + + cmoncrawl.common.loggers + cmoncrawl.common.throttling + cmoncrawl.common.types + diff --git a/_sources/generated/cmoncrawl.common.throttling.rst.txt b/_sources/generated/cmoncrawl.common.throttling.rst.txt new file mode 100644 index 00000000..3ad24a6c --- /dev/null +++ b/_sources/generated/cmoncrawl.common.throttling.rst.txt @@ -0,0 +1,29 @@ +cmoncrawl.common.throttling +=========================== + +.. automodule:: cmoncrawl.common.throttling + + + + + + + + + + + + .. rubric:: Classes + + .. autosummary:: + + Throttler + + + + + + + + + diff --git a/_sources/generated/cmoncrawl.common.types.rst.txt b/_sources/generated/cmoncrawl.common.types.rst.txt new file mode 100644 index 00000000..7c469633 --- /dev/null +++ b/_sources/generated/cmoncrawl.common.types.rst.txt @@ -0,0 +1,53 @@ +cmoncrawl.common.types +====================== + +.. automodule:: cmoncrawl.common.types + + + + + + + + .. rubric:: Functions + + .. autosummary:: + + parse_timestamp + + + + + + .. rubric:: Classes + + .. autoclass:: DomainCrawl + :members: + + .. autoclass:: DomainRecord + :members: + + .. autoclass:: ExtractConfig + :members: + + .. autoclass:: ExtractorConfig + :members: + + .. autoclass:: MatchType + :members: + + .. autoclass:: PipeMetadata + :members: + + .. autoclass:: RetrieveResponse + :members: + + .. autoclass:: RoutesConfig + :members: + + + + + + + diff --git a/_sources/generated/cmoncrawl.config.rst.txt b/_sources/generated/cmoncrawl.config.rst.txt new file mode 100644 index 00000000..1889e925 --- /dev/null +++ b/_sources/generated/cmoncrawl.config.rst.txt @@ -0,0 +1,35 @@ +cmoncrawl.config +================ + +.. automodule:: cmoncrawl.config + + + + + + + + .. rubric:: Functions + + .. autosummary:: + + get_str_env + + + + + + .. rubric:: Classes + + .. autosummary:: + + Config + + + + + + + + + diff --git a/_sources/generated/cmoncrawl.integrations.commands.rst.txt b/_sources/generated/cmoncrawl.integrations.commands.rst.txt new file mode 100644 index 00000000..87dd0d61 --- /dev/null +++ b/_sources/generated/cmoncrawl.integrations.commands.rst.txt @@ -0,0 +1,33 @@ +cmoncrawl.integrations.commands +=============================== + +.. automodule:: cmoncrawl.integrations.commands + + + + + + + + .. rubric:: Functions + + .. autosummary:: + + add_args + add_subparsers + get_args + main + process_args + + + + + + + + + + + + + diff --git a/_sources/generated/cmoncrawl.integrations.download.rst.txt b/_sources/generated/cmoncrawl.integrations.download.rst.txt new file mode 100644 index 00000000..b6807f81 --- /dev/null +++ b/_sources/generated/cmoncrawl.integrations.download.rst.txt @@ -0,0 +1,43 @@ +cmoncrawl.integrations.download +=============================== + +.. automodule:: cmoncrawl.integrations.download + + + + + + + + .. rubric:: Functions + + .. autosummary:: + + add_args + add_mode_args + get_aggregator + get_download_downloader + run_download + url_download + url_download_prepare_router + url_download_prepare_streamer + + + + + + .. rubric:: Classes + + .. autosummary:: + + Aggregator + DownloadOutputFormat + + + + + + + + + diff --git a/_sources/generated/cmoncrawl.integrations.extract.rst.txt b/_sources/generated/cmoncrawl.integrations.extract.rst.txt new file mode 100644 index 00000000..a31203fc --- /dev/null +++ b/_sources/generated/cmoncrawl.integrations.extract.rst.txt @@ -0,0 +1,43 @@ +cmoncrawl.integrations.extract +============================== + +.. automodule:: cmoncrawl.integrations.extract + + + + + + + + .. rubric:: Functions + + .. autosummary:: + + add_args + add_mode_args + create_router + extract_from_files + get_domain_records_html + get_domain_records_json + get_extract_downloader + load_config + run_extract + + + + + + .. rubric:: Classes + + .. autosummary:: + + ExtractMode + + + + + + + + + diff --git a/_sources/generated/cmoncrawl.integrations.rst.txt b/_sources/generated/cmoncrawl.integrations.rst.txt new file mode 100644 index 00000000..114b3a79 --- /dev/null +++ b/_sources/generated/cmoncrawl.integrations.rst.txt @@ -0,0 +1,34 @@ +cmoncrawl.integrations +====================== + +.. automodule:: cmoncrawl.integrations + + + + + + + + + + + + + + + + + + + +.. rubric:: Modules + +.. autosummary:: + :toctree: + :recursive: + + cmoncrawl.integrations.commands + cmoncrawl.integrations.download + cmoncrawl.integrations.extract + cmoncrawl.integrations.utils + diff --git a/_sources/generated/cmoncrawl.integrations.utils.rst.txt b/_sources/generated/cmoncrawl.integrations.utils.rst.txt new file mode 100644 index 00000000..548934a9 --- /dev/null +++ b/_sources/generated/cmoncrawl.integrations.utils.rst.txt @@ -0,0 +1,35 @@ +cmoncrawl.integrations.utils +============================ + +.. automodule:: cmoncrawl.integrations.utils + + + + + + + + .. rubric:: Functions + + .. autosummary:: + + get_dao + + + + + + .. rubric:: Classes + + .. autosummary:: + + DAOname + + + + + + + + + diff --git a/_sources/generated/cmoncrawl.middleware.rst.txt b/_sources/generated/cmoncrawl.middleware.rst.txt new file mode 100644 index 00000000..79aab77f --- /dev/null +++ b/_sources/generated/cmoncrawl.middleware.rst.txt @@ -0,0 +1,32 @@ +cmoncrawl.middleware +==================== + +.. automodule:: cmoncrawl.middleware + + + + + + + + + + + + + + + + + + + +.. rubric:: Modules + +.. autosummary:: + :toctree: + :recursive: + + cmoncrawl.middleware.stompware + cmoncrawl.middleware.synchronized + diff --git a/_sources/generated/cmoncrawl.middleware.stompware.rst.txt b/_sources/generated/cmoncrawl.middleware.stompware.rst.txt new file mode 100644 index 00000000..c429f955 --- /dev/null +++ b/_sources/generated/cmoncrawl.middleware.stompware.rst.txt @@ -0,0 +1,31 @@ +cmoncrawl.middleware.stompware +============================== + +.. automodule:: cmoncrawl.middleware.stompware + + + + + + + + + + + + .. rubric:: Classes + + .. autoclass:: StompAggregator + :members: + + .. autoclass:: StompProcessor + :members: + + + + + + + + + diff --git a/_sources/generated/cmoncrawl.middleware.synchronized.rst.txt b/_sources/generated/cmoncrawl.middleware.synchronized.rst.txt new file mode 100644 index 00000000..7c3cb360 --- /dev/null +++ b/_sources/generated/cmoncrawl.middleware.synchronized.rst.txt @@ -0,0 +1,30 @@ +cmoncrawl.middleware.synchronized +================================= + +.. automodule:: cmoncrawl.middleware.synchronized + + + + + + + + .. rubric:: Functions + + .. autosummary:: + + extract + query_and_extract + + + + + + + + + + + + + diff --git a/_sources/generated/cmoncrawl.processor.dao.api.rst.txt b/_sources/generated/cmoncrawl.processor.dao.api.rst.txt new file mode 100644 index 00000000..32bc491d --- /dev/null +++ b/_sources/generated/cmoncrawl.processor.dao.api.rst.txt @@ -0,0 +1,28 @@ +cmoncrawl.processor.dao.api +=========================== + +.. automodule:: cmoncrawl.processor.dao.api + + + + + + + + + + + + .. rubric:: Classes + + .. autoclass:: CCAPIGatewayDAO + :members: + + + + + + + + + diff --git a/_sources/generated/cmoncrawl.processor.dao.base.rst.txt b/_sources/generated/cmoncrawl.processor.dao.base.rst.txt new file mode 100644 index 00000000..5cbedca0 --- /dev/null +++ b/_sources/generated/cmoncrawl.processor.dao.base.rst.txt @@ -0,0 +1,35 @@ +cmoncrawl.processor.dao.base +============================ + +.. automodule:: cmoncrawl.processor.dao.base + + + + + + + + + + + + .. rubric:: Classes + + .. autoclass:: ICC_Dao + :members: + + + + + + + .. rubric:: Exceptions + + .. autosummary:: + + DownloadError + + + + + diff --git a/_sources/generated/cmoncrawl.processor.dao.rst.txt b/_sources/generated/cmoncrawl.processor.dao.rst.txt new file mode 100644 index 00000000..9e52c4ad --- /dev/null +++ b/_sources/generated/cmoncrawl.processor.dao.rst.txt @@ -0,0 +1,33 @@ +cmoncrawl.processor.dao +======================= + +.. automodule:: cmoncrawl.processor.dao + + + + + + + + + + + + + + + + + + + +.. rubric:: Modules + +.. autosummary:: + :toctree: + :recursive: + + cmoncrawl.processor.dao.api + cmoncrawl.processor.dao.base + cmoncrawl.processor.dao.s3 + diff --git a/_sources/generated/cmoncrawl.processor.dao.s3.rst.txt b/_sources/generated/cmoncrawl.processor.dao.s3.rst.txt new file mode 100644 index 00000000..f6446166 --- /dev/null +++ b/_sources/generated/cmoncrawl.processor.dao.s3.rst.txt @@ -0,0 +1,28 @@ +cmoncrawl.processor.dao.s3 +========================== + +.. automodule:: cmoncrawl.processor.dao.s3 + + + + + + + + + + + + .. rubric:: Classes + + .. autoclass:: S3Dao + :members: + + + + + + + + + diff --git a/_sources/generated/cmoncrawl.processor.extraction.filters.rst.txt b/_sources/generated/cmoncrawl.processor.extraction.filters.rst.txt new file mode 100644 index 00000000..bafedb50 --- /dev/null +++ b/_sources/generated/cmoncrawl.processor.extraction.filters.rst.txt @@ -0,0 +1,30 @@ +cmoncrawl.processor.extraction.filters +====================================== + +.. automodule:: cmoncrawl.processor.extraction.filters + + + + + + + + .. rubric:: Functions + + .. autosummary:: + + must_exist_filter + must_not_exist_filter + + + + + + + + + + + + + diff --git a/_sources/generated/cmoncrawl.processor.extraction.rst.txt b/_sources/generated/cmoncrawl.processor.extraction.rst.txt new file mode 100644 index 00000000..13d6f460 --- /dev/null +++ b/_sources/generated/cmoncrawl.processor.extraction.rst.txt @@ -0,0 +1,32 @@ +cmoncrawl.processor.extraction +============================== + +.. automodule:: cmoncrawl.processor.extraction + + + + + + + + + + + + + + + + + + + +.. rubric:: Modules + +.. autosummary:: + :toctree: + :recursive: + + cmoncrawl.processor.extraction.filters + cmoncrawl.processor.extraction.utils + diff --git a/_sources/generated/cmoncrawl.processor.extraction.utils.rst.txt b/_sources/generated/cmoncrawl.processor.extraction.utils.rst.txt new file mode 100644 index 00000000..0721a0fc --- /dev/null +++ b/_sources/generated/cmoncrawl.processor.extraction.utils.rst.txt @@ -0,0 +1,39 @@ +cmoncrawl.processor.extraction.utils +==================================== + +.. automodule:: cmoncrawl.processor.extraction.utils + + + + + + + + .. rubric:: Functions + + .. autosummary:: + + all_same_transform + chain_transforms + check_required + combine_dicts + extract_transform + get_attribute_transform + get_tag_transform + get_tags_transform + get_text_list_transform + get_text_transform + transform + + + + + + + + + + + + + diff --git a/_sources/generated/cmoncrawl.processor.pipeline.downloader.rst.txt b/_sources/generated/cmoncrawl.processor.pipeline.downloader.rst.txt new file mode 100644 index 00000000..e3fb89dc --- /dev/null +++ b/_sources/generated/cmoncrawl.processor.pipeline.downloader.rst.txt @@ -0,0 +1,45 @@ +cmoncrawl.processor.pipeline.downloader +======================================= + +.. automodule:: cmoncrawl.processor.pipeline.downloader + + + + + + + + .. rubric:: Functions + + .. autosummary:: + + log_after_retry + + + + + + .. rubric:: Classes + + .. autoclass:: AsyncDownloader + :members: + + .. autoclass:: DownloaderLocalFiles + :members: + + .. autoclass:: DummyDownloader + :members: + + .. autoclass:: IDownloader + :members: + + .. autoclass:: WarcIterator + :members: + + + + + + + + diff --git a/_sources/generated/cmoncrawl.processor.pipeline.extractor.rst.txt b/_sources/generated/cmoncrawl.processor.pipeline.extractor.rst.txt new file mode 100644 index 00000000..436c75b5 --- /dev/null +++ b/_sources/generated/cmoncrawl.processor.pipeline.extractor.rst.txt @@ -0,0 +1,39 @@ +cmoncrawl.processor.pipeline.extractor +====================================== + +.. automodule:: cmoncrawl.processor.pipeline.extractor + + + + + + + + + + + + .. rubric:: Classes + + .. autoclass:: BaseExtractor + :members: + + .. autoclass:: DomainRecordExtractor + :members: + + .. autoclass:: HTMLExtractor + :members: + + .. autoclass:: IExtractor + :members: + + .. autoclass:: PageExtractor + :members: + + + + + + + + diff --git a/_sources/generated/cmoncrawl.processor.pipeline.pipeline.rst.txt b/_sources/generated/cmoncrawl.processor.pipeline.pipeline.rst.txt new file mode 100644 index 00000000..1f5b773d --- /dev/null +++ b/_sources/generated/cmoncrawl.processor.pipeline.pipeline.rst.txt @@ -0,0 +1,30 @@ +cmoncrawl.processor.pipeline.pipeline +===================================== + +.. automodule:: cmoncrawl.processor.pipeline.pipeline + + + + + + + + + + + + .. rubric:: Classes + + .. autosummary:: + + ProcessorPipeline + :members: + + + + + + + + + diff --git a/_sources/generated/cmoncrawl.processor.pipeline.router.rst.txt b/_sources/generated/cmoncrawl.processor.pipeline.router.rst.txt new file mode 100644 index 00000000..ae4c6a26 --- /dev/null +++ b/_sources/generated/cmoncrawl.processor.pipeline.router.rst.txt @@ -0,0 +1,34 @@ +cmoncrawl.processor.pipeline.router +=================================== + +.. automodule:: cmoncrawl.processor.pipeline.router + + + + + + + + + + + + .. rubric:: Classes + + .. autoclass:: IRouter + :members: + + .. autoclass:: Route + :members: + + .. autoclass:: Router + :members: + + + + + + + + + diff --git a/_sources/generated/cmoncrawl.processor.pipeline.rst.txt b/_sources/generated/cmoncrawl.processor.pipeline.rst.txt new file mode 100644 index 00000000..111dae15 --- /dev/null +++ b/_sources/generated/cmoncrawl.processor.pipeline.rst.txt @@ -0,0 +1,35 @@ +cmoncrawl.processor.pipeline +============================ + +.. automodule:: cmoncrawl.processor.pipeline + + + + + + + + + + + + + + + + + + + +.. rubric:: Modules + +.. autosummary:: + :toctree: + :recursive: + + cmoncrawl.processor.pipeline.downloader + cmoncrawl.processor.pipeline.extractor + cmoncrawl.processor.pipeline.pipeline + cmoncrawl.processor.pipeline.router + cmoncrawl.processor.pipeline.streamer + diff --git a/_sources/generated/cmoncrawl.processor.pipeline.streamer.rst.txt b/_sources/generated/cmoncrawl.processor.pipeline.streamer.rst.txt new file mode 100644 index 00000000..d9a11860 --- /dev/null +++ b/_sources/generated/cmoncrawl.processor.pipeline.streamer.rst.txt @@ -0,0 +1,38 @@ +cmoncrawl.processor.pipeline.streamer +===================================== + +.. automodule:: cmoncrawl.processor.pipeline.streamer + + + + + + + + + + + + .. rubric:: Classes + + .. autoclass:: BaseStreamerFile + :members: + + .. autoclass:: IStreamer + :members: + + .. autoclass:: MemoryStreamer + :members: + + .. autoclass:: StreamerFileHTML + :members: + + .. autoclass:: StreamerFileJSON + :members: + + + + + + + diff --git a/_sources/generated/cmoncrawl.processor.rst.txt b/_sources/generated/cmoncrawl.processor.rst.txt new file mode 100644 index 00000000..496089d3 --- /dev/null +++ b/_sources/generated/cmoncrawl.processor.rst.txt @@ -0,0 +1,33 @@ +cmoncrawl.processor +=================== + +.. automodule:: cmoncrawl.processor + + + + + + + + + + + + + + + + + + + +.. rubric:: Modules + +.. autosummary:: + :toctree: + :recursive: + + cmoncrawl.processor.dao + cmoncrawl.processor.extraction + cmoncrawl.processor.pipeline + diff --git a/_sources/generated/cmoncrawl.rst.txt b/_sources/generated/cmoncrawl.rst.txt new file mode 100644 index 00000000..ded007d6 --- /dev/null +++ b/_sources/generated/cmoncrawl.rst.txt @@ -0,0 +1,36 @@ +cmoncrawl +========= + +.. automodule:: cmoncrawl + + + + + + + + + + + + + + + + + + + +.. rubric:: Modules + +.. autosummary:: + :toctree: + :recursive: + + cmoncrawl.aggregator + cmoncrawl.common + cmoncrawl.config + cmoncrawl.integrations + cmoncrawl.middleware + cmoncrawl.processor + diff --git a/_sources/index.rst.txt b/_sources/index.rst.txt new file mode 100644 index 00000000..ff1fa92a --- /dev/null +++ b/_sources/index.rst.txt @@ -0,0 +1,27 @@ +.. CommonCrawl Extractor documentation master file, created by + sphinx-quickstart on Tue Nov 8 17:40:35 2022. + You can adapt this file completely to your liking, but it should at least + contain the root `toctree` directive. + +Welcome to CommonCrawl Extractor's documentation! +================================================= + +.. toctree:: + :maxdepth: 3 + :caption: Contents: + + usage + cli/index + extraction/index + prog_guide/index + misc/index + api + + + +Indices and tables +================== + +* :ref:`genindex` +* :ref:`modindex` +* :ref:`search` diff --git a/_sources/misc/athena.rst.txt b/_sources/misc/athena.rst.txt new file mode 100644 index 00000000..6ef9101b --- /dev/null +++ b/_sources/misc/athena.rst.txt @@ -0,0 +1,87 @@ +Athena +====== + +AWS Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. +Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run. +Athena is easy to use. Simply point to your data in Amazon S3, define the schema, and start querying using standard SQL. +Most results are delivered within seconds. With Athena, there’s no need for complex ETL jobs to prepare your data for analysis. +This makes it easy for anyone with SQL skills to quickly analyze large-scale datasets. + +Prerequisites +------------- + +In order to use the athena module, you must have AWS account with following credentials: + +.. code-block:: json + + { + "Version": "2012-10-17", + "Statement": [ + { + "Sid": "CommoncrawlDB", + "Effect": "Allow", + "Action": [ + "athena:CreateDataCatalog", + "glue:BatchCreatePartition", + "athena:StartQueryExecution", + "glue:CreateTable", + "glue:CreateDatabase", + "glue:GetTable", + "glue:GetTables", + "glue:GetDatabase", + "glue:GetDatabases", + "glue:UpdateTable", + "glue:UpdatePartition", + "glue:GetPartition", + "glue:GetPartitions", + "athena:GetQueryExecution", + "athena:ListTableMetadata", + "s3:GetBucketLocation", + "s3:DescribeJob" + ], + "Resource": "*" + }, + { + "Sid": "ResultsBucket", + "Effect": "Allow", + "Action": "s3:ListBucket", + "Resource": "arn:aws:s3:::cmoncrawl-testbucket" + }, + { + "Sid": "ResultsBucket-objects", + "Effect": "Allow", + "Action": [ + "s3:PutObject", + "s3:GetObject", + "s3:DeleteObject" + ], + "Resource": "arn:aws:s3:::cmoncrawl-testbucket/*" + }, + { + "Sid": "CommoncrawlBucket", + "Effect": "Allow", + "Action": [ + "s3:GetObject", + "s3:ListBucket" + ], + "Resource": [ + "arn:aws:s3:::commoncrawl/*", + "arn:aws:s3:::commoncrawl" + ] + } + ] + } + +Caching +------- +If you provide a bucket name when itnializing the :py:class:`cmoncrawl.aggregator.athena_query.AthenaAggregator`, +the results of the query will be cached in the bucket. Whenever you make the same query the results will be reused. +This means that the bucket is not automatically cleaned up and it's your responsibility to do so. + +If you don't provide a bucket name, the results will not be cached and randomly generated bucket will be used and deleted +after the query is finished. + + + + + diff --git a/_sources/misc/domain_record.rst.txt b/_sources/misc/domain_record.rst.txt new file mode 100644 index 00000000..79beb977 --- /dev/null +++ b/_sources/misc/domain_record.rst.txt @@ -0,0 +1,55 @@ +.. _domain_record: + +Domain Record +============= + +By domain record we refer to a strucuture that cotains the information +about how to download a crawl of an url. It contains the following + +* **url**: the url to crawl +* **filename**: the warc filename +* **offset**: the offset in the warc file +* **length**: the length of the html crawl +* **digest** [optional]: the digest of the html crawl +* **encoding** [optional]: the encoding of the html crawl +* **timestamp** [optional]: the timestamp of the crawl + + +.. _domain_record_jsonl: + +Domain Record JSONL format +========================== + +In order to use your own domain records with extract mode of cli, +you must format them into follwoing json format + +.. code-block:: json + + { + "domain_record": + { + "url": "http://example.com", + "filename": "crawl.warc.gz", + "offset": 123, + "length": 456, + "digest": "sha1:1234567890abcdef", + "encoding": "utf-8", + "timestamp": "2018-01-01T00:00:00Z" + }, + "additional_info": + { + "key1": "value1", + "key2": "value2" + } + } + +Each such json must be on a separate line in a file. +You don't have to provide all the fields, only ``url``, ``filename``, +``offset`` and ``length`` are required. +The Athena SQL keys are: +``u.url, cc.warc_filename, cc.warc_record_offset, cc.warc_record_length, cc.content_digest, cc.fetch_time`` + + +The ``additional_info`` field is optional and can contain any additional +information. It will be added to extracted fields as is. It's usefull +when you for example want to add to which set the url belongs to. \ No newline at end of file diff --git a/_sources/misc/index.rst.txt b/_sources/misc/index.rst.txt new file mode 100644 index 00000000..0c2f4284 --- /dev/null +++ b/_sources/misc/index.rst.txt @@ -0,0 +1,9 @@ +Miscellaneous +================= + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + + athena + domain_record diff --git a/_sources/modules.rst.txt b/_sources/modules.rst.txt new file mode 100644 index 00000000..08a26361 --- /dev/null +++ b/_sources/modules.rst.txt @@ -0,0 +1,6 @@ +docs +==== + +.. toctree:: + :maxdepth: 4 + diff --git a/_sources/prog_guide/index.rst.txt b/_sources/prog_guide/index.rst.txt new file mode 100644 index 00000000..5e0f3329 --- /dev/null +++ b/_sources/prog_guide/index.rst.txt @@ -0,0 +1,20 @@ +Programming Guide +================= + +This section of the documentation is for people who want to use the +``cmoncrawl`` library to create their own extraction pipeline in python. +It allows use to take full advatange of the ``cmoncrawl`` library unlike +the command line utility which is limited to a few options. + +.. note: + You probably don't need to read this if you just want to use the utility. + This is for people who want to create their own extraction pipeline. + + + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + + overview + practice diff --git a/_sources/prog_guide/overview.rst.txt b/_sources/prog_guide/overview.rst.txt new file mode 100644 index 00000000..f9f2364c --- /dev/null +++ b/_sources/prog_guide/overview.rst.txt @@ -0,0 +1,96 @@ +How to extract from Common Crawl (theory) +========================================= + +The process of getting one parsed web page from CommonCrawl can be described as a pipeline. + +1. Query CommmonCrawl to find a link to a file that contains the web page we want. +2. Download a file +3. Choose parser for the web page +4. Filter out the web page if not matching the conditions +5. Extract fields from the page +6. Save the fields to a file + + + +The first step is handled by `Aggregator` while the rest is handled by `Processor`. + +======================= +1. Querying CommonCrawl +======================= +what WARC File how + `WARC `_ is a file format that is used for storing multitudes of web resources. + In our case these files contain a bunch of downloaded web pages and their metadata. + It's possible to get only part of the file by specifying the offset in file and length of the part we want. +what + Common Crawl Index +how + A CommonCrawl index is a collection which maps crawled urls to WARC file which contain the crawl of that url. + +Every month a CommonCrawl releases a new index which contains all links to web pages that were crawled that month. + +.. warning:: + It is important to understand that even if the index was released in a certain month, it can contain the links to web pages that might be older. + +Thus in order to download an page we query the index to get link to respective WARC file, offset and length of page. +Since there are multiples of the indexes we should query all of them to make sure we don't miss the page. +With the link to the WARC and offset and length we can continue to another step. + +All this is handled by :py:class:`cmoncrawl.aggregator.gateway_query.GatewayAggregator`. But for basic use you will not need to use it directly. + + +===================== +2. Downloading a file +===================== +The Processor node than downloads the url and related information from queue and downloads the appropriate WARC file. +This step is handled by :py:mod:`cmoncrawl.processor.pipeline.downloader.AsyncDownloader`. +It simply downloads and extracts the page from the WARC file. For downloading we use two data access objects (DAOs, :py:class:`cmoncrawl.processor.dao.base.ICC_Dao`): + +- :py:class:`cmoncrawl.processor.dao.s3`, which downloads the file from AWS S3 directly +- :py:class:`cmoncrawl.processor.dao.api`, which downloads the file through CommonCrawl API Gateway. + + +=================== +3. Choose extractor +=================== +Once the page is downloaded we first need to choose a extractor for it. +Extractors are dynamically loaded based on definitions in :ref:`extractor_config`. +All loaded processors are then matched against the url and crawl date and first matching is used. +This functionality is handled by :py:class:`cmoncrawl.processor.pipeline.router.Router`. + +For development of extractors refer to :ref:`extractors`. + + +============================= +4. Filtering out the web page +============================= + +Once the extractor is chosen the filtering function is used to either drop or pass a page. +In order to filter your you can use either :py:meth:`cmoncrawl.processor.pipeline.extractor.BaseExtractor.filter_raw` for +filtering based on raw html pages (fast). Or wait for conversion to soup and then filter using +:py:meth:`cmoncrawl.processor.pipeline.extractor.BaseExtractor.filter_soup` (slow). + +=============================== +5. Extract fields from the page +=============================== + +The extracting function defined by the extractor is used to extract the fields from the page. +Just extract the values and return them in dict. + + +============== +6. File saving +============== +With the field extracted we need to save them to a file. +By default the fields are saved in json file. +The way the file is saved is defined by streamers. +All of the currently implemented streamers are derived from :py:class:`cmoncrawl.processor.pipeline.streamer.BaseStreamerFile`. +Which defined how are the files saved, but the content parsing is left to the derived classes. + +Currently we support 2 streamers: + +- JSON (:py:class:`cmoncrawl.processor.pipeline.streamer.StreamerFileJSON`) and one for html (:py:class:`cmoncrawl.processor.pipeline.streamer.StreamerFileHTML`), which creates a json per line output, and outputs all extracted data +- HTML (:py:class:`cmoncrawl.processor.pipeline.streamer.StreamerFileHTML`), which creates a html file (assuming the html is defined in extracted data['html']). + +If you want to debug you might want to use :py:class:`cmoncrawl.processor.pipeline.streamer.MemoryStreamer` which outputs the data to memory instead of file. + +If you would like different format you can create your own saver by inheriting from :py:class:`cmoncrawl.processor.pipeline.streamer.IStreamer` and then changing pipeline creation with your new outstreamer. \ No newline at end of file diff --git a/_sources/prog_guide/practice.rst.txt b/_sources/prog_guide/practice.rst.txt new file mode 100644 index 00000000..07aa9339 --- /dev/null +++ b/_sources/prog_guide/practice.rst.txt @@ -0,0 +1,150 @@ +.. _custom_pipeline: + +How to extract from Common Crawl (practice) +=========================================== + +Since we now know what steps should we do in order to extract data from Common Crawl and +how they map to ``cmoncrawl`` primitives, let's now see how to do it in practice. + + +Pipeline +-------- +We already know how to get the domain records and we also know how to download, extract and save the data. +The pipeline allows use to combine all but the first step into single object that can be used to extract data from Common Crawl. + +To create a pipeline simply initialize :py:class:`cmoncrawl.processor.pipeline.pipeline.ProcessorPipeline` with Downloader, Router and Streamer. +You can then call it's :py:meth:`cmoncrawl.processor.pipeline.pipeline.ProcessorPipeline.process_domain_record` method with the query and it will run the whole pipeline for single domain record. + + +.. note:: + The exceptions are not handled by the pipeline and are passed to the caller, to handle them as you wish. + +Simulatenous querying and extracting +------------------------------------ + +Now all we need to resolve is how t effectively connect querying index and download/extracting (pipeline) data. +One way is to query index and whenever we get a domain record, we can pass it to the pipeline, this is exactly how +:py:func:`cmoncrawl.integrations.middleware.synchronized.query_and_extract` works. This works great when we use Gateway DAO, +as the querying index takes about the same time as downloading/extracting. This is how we can do it: + +.. code-block:: python + :caption: Simultaneously query and extract data from Common Crawl + + from typing import Any, Dict + from bs4 import BeautifulSoup + from cmoncrawl.aggregator.gateway_query import GatewayAggregator + from cmoncrawl.processor.pipeline.extractor import BaseExtractor + from cmoncrawl.processor.pipeline.pipeline import ProcessorPipeline + from cmoncrawl.processor.pipeline.downloader import AsyncDownloader + from cmoncrawl.processor.pipeline.router import Router + from cmoncrawl.processor.pipeline.streamer import StreamerFileJSON + from cmoncrawl.common.loggers import all_purpose_logger + from cmoncrawl.common.types import MatchType, PipeMetadata + from cmoncrawl.middleware.synchronized import query_and_extract + from cmoncrawl.processor.dao.s3 import S3Dao + from pathlib import Path + + + class YourCustomExtractor(BaseExtractor): + def extract_soup(self, soup: BeautifulSoup, metadata: PipeMetadata) -> Dict[str, Any] | None: + return {"title": "Dummy"} + + your_custom_extractor = YourCustomExtractor() + + # We register our custom extractor to the router + router = Router() + router.load_extractor("ext", your_custom_extractor) + router.register_route("ext", ".*bbc.com.*") + streamer = StreamerFileJSON(Path("extracted"), max_directory_size=1000, max_file_size=100) + + async with S3Dao(aws_profile="dev") as dao: + downloader = AsyncDownloader(dao) + pipeline = ProcessorPipeline(downloader=downloader, router=router, outstreamer=streamer) + + index_agg = GatewayAggregator( + urls=["bbc.com"], + match_type=MatchType.DOMAIN, + limit=1000, + ) + + processed_urls = await query_and_extract(index_agg, pipeline) + +Query records and then extract +------------------------------ + +The otherway is to query index for all records and download/extract them afterwards. This approach works +great with Athena as the query takes around 1-2 minutes. With this approach we can than abuse both multiprocessing to process +and asyncio queues to download the data faster. This is how we can do it: + + +.. code-block:: python + :caption: Query and extract data from Common Crawl + + from cmoncrawl.aggregator.athena_query import AthenaAggregator + from cmoncrawl.common.types import MatchType + from typing import Any, Dict + from bs4 import BeautifulSoup + from cmoncrawl.aggregator.gateway_query import GatewayAggregator + from cmoncrawl.processor.pipeline.extractor import BaseExtractor + from cmoncrawl.processor.pipeline.pipeline import ProcessorPipeline + from cmoncrawl.processor.pipeline.downloader import AsyncDownloader + from cmoncrawl.processor.pipeline.router import Router + from cmoncrawl.processor.pipeline.streamer import StreamerFileJSON + from cmoncrawl.common.loggers import all_purpose_logger + from cmoncrawl.common.types import MatchType, PipeMetadata + from cmoncrawl.middleware.synchronized import extract + from cmoncrawl.processor.dao.s3 import S3Dao + from pathlib import Path + + # Query + records = [] + async with AthenaAggregator(urls=["bbc.com"], + match_type=MatchType.DOMAIN, + limit=1000, + bucket_name="test-dev-cmoncrawl", + aws_profile="dev" + ) as agg: + async for record in agg: + records.append(record) + + #Then extract + + + + class YourCustomExtractor(BaseExtractor): + def extract_soup(self, soup: BeautifulSoup, metadata: PipeMetadata) -> Dict[str, Any] | None: + return {"title": "Dummy"} + + your_custom_extractor = YourCustomExtractor() + + # We register our custom extractor to the router + router = Router() + router.load_extractor("ext", your_custom_extractor) + router.register_route("ext", ".*bbc.com.*") + streamer = StreamerFileJSON(Path("extracted"), max_directory_size=1000, max_file_size=100) + + async with S3Dao(aws_profile="dev") as dao: + downloader = AsyncDownloader(dao) + pipeline = ProcessorPipeline(downloader=downloader, router=router, outstreamer=streamer) + + index_agg = GatewayAggregator( + urls=["bbc.com"], + match_type=MatchType.DOMAIN, + limit=1000, + ) + + processed_urls = await extract(pipeline=pipeline, records=[(rec, {}) for rec in records]) + +To leverage multiprocessing, simply divide the records into n chunks and for each chunk initialize a new process. + +Distributed Simulatenous high-throughput querying and extracting +---------------------------------------------------------------- + +Lastly you can leverage :py:class:`cmoncrawl.middleware.stompware.StompAggregator` to query and send data to queue using stomp protocol, +and simulatenous retrieve the data from the queue and extract it using :py:class:`cmoncrawl.middleware.stompware.StompProcessor`. + + +Be cooperative +-------------- +If you plan to use multiprocessing or distributed approach, please try to be nice to others and limit the number of requests +at Downloader/Aggregator accordingly. \ No newline at end of file diff --git a/_sources/usage.rst.txt b/_sources/usage.rst.txt new file mode 100644 index 00000000..93a24a43 --- /dev/null +++ b/_sources/usage.rst.txt @@ -0,0 +1,51 @@ +Usage +===== + +The library is designed to make interaction with CommonCrawl's indexes simple, +while also providing a framework for extracting data from the downloaded +HTMLs. + +You can use the library in two ways: + +1. :ref:`cli` - This should suffice for 80% of the use cases. Restricted, but easy to use. +2. :ref:`custom_pipeline` - If you need more control over the process, you can use the library programmatically. + +Workflow +-------- +In order to download from CommonCrawl you first need to find the pointers to the data you want to download. +Search for the pointers is done over the specific files called indexes. The indexes don't contain the data itself, +but rather metadata and pointers to the data. We call these pointers domain records (see :ref:`domain_record`). +Once you have the domain records you can download the data from the CommonCrawl's S3 bucket. Since you might want +to extract only specific data from the downloaded HTMLs, you can also specify a list of extractors to be run on the +downloaded HTMLs. + +The library thus supports the two step workflow: + +1. First download domain records from the indexes. +2. Download and extract the domain records. + +AWS +--- +The CommonCrawl are stored on AWS S3 us-east-1 bucket. The CommonCrawl allows you to access the data using following methods: + +1. Gateway - you can download the data throught CloudFlare HTTP Gateway. You will not need AWS credentials, but it is also the slowest. +2. S3 - you can download the data directly from S3. You will need AWS credentials, but it is also the fastest. + +Additionaly, the CommonCrawl provides two ways to to query the data: + +1. CommonCrawl Index - Free, but more limited and incrdibly slow. +2. AWS Athena - Paid, but much faster, you can use SQL to query the data. + +The library supports all of these methods. We recommend using S3/AWS Athena combination. Refer to the following image to see the differences: + +.. image:: ../source/images/when_to_use.drawio.png + :alt: When to use this library + +Be nice to others +----------------- +If you use the library programmatically or through CLI, +you will find, that you can specify the number of threads to use. +Please be aware that by default we limit the number of requests per thread +to 20/s. This is to prevent overloading the CommonCrawl's servers. If you +plan to use more threads, be considerate to others and don't set the number +of threads too high. \ No newline at end of file diff --git a/_static/_sphinx_javascript_frameworks_compat.js b/_static/_sphinx_javascript_frameworks_compat.js new file mode 100644 index 00000000..81415803 --- /dev/null +++ b/_static/_sphinx_javascript_frameworks_compat.js @@ -0,0 +1,123 @@ +/* Compatability shim for jQuery and underscores.js. + * + * Copyright Sphinx contributors + * Released under the two clause BSD licence + */ + +/** + * small helper function to urldecode strings + * + * See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/decodeURIComponent#Decoding_query_parameters_from_a_URL + */ +jQuery.urldecode = function(x) { + if (!x) { + return x + } + return decodeURIComponent(x.replace(/\+/g, ' ')); +}; + +/** + * small helper function to urlencode strings + */ +jQuery.urlencode = encodeURIComponent; + +/** + * This function returns the parsed url parameters of the + * current request. Multiple values per key are supported, + * it will always return arrays of strings for the value parts. + */ +jQuery.getQueryParameters = function(s) { + if (typeof s === 'undefined') + s = document.location.search; + var parts = s.substr(s.indexOf('?') + 1).split('&'); + var result = {}; + for (var i = 0; i < parts.length; i++) { + var tmp = parts[i].split('=', 2); + var key = jQuery.urldecode(tmp[0]); + var value = jQuery.urldecode(tmp[1]); + if (key in result) + result[key].push(value); + else + result[key] = [value]; + } + return result; +}; + +/** + * highlight a given string on a jquery object by wrapping it in + * span elements with the given class name. + */ +jQuery.fn.highlightText = function(text, className) { + function highlight(node, addItems) { + if (node.nodeType === 3) { + var val = node.nodeValue; + var pos = val.toLowerCase().indexOf(text); + if (pos >= 0 && + !jQuery(node.parentNode).hasClass(className) && + !jQuery(node.parentNode).hasClass("nohighlight")) { + var span; + var isInSVG = jQuery(node).closest("body, svg, foreignObject").is("svg"); + if (isInSVG) { + span = document.createElementNS("http://www.w3.org/2000/svg", "tspan"); + } else { + span = document.createElement("span"); + span.className = className; + } + span.appendChild(document.createTextNode(val.substr(pos, text.length))); + node.parentNode.insertBefore(span, node.parentNode.insertBefore( + document.createTextNode(val.substr(pos + text.length)), + node.nextSibling)); + node.nodeValue = val.substr(0, pos); + if (isInSVG) { + var rect = document.createElementNS("http://www.w3.org/2000/svg", "rect"); + var bbox = node.parentElement.getBBox(); + rect.x.baseVal.value = bbox.x; + rect.y.baseVal.value = bbox.y; + rect.width.baseVal.value = bbox.width; + rect.height.baseVal.value = bbox.height; + rect.setAttribute('class', className); + addItems.push({ + "parent": node.parentNode, + "target": rect}); + } + } + } + else if (!jQuery(node).is("button, select, textarea")) { + jQuery.each(node.childNodes, function() { + highlight(this, addItems); + }); + } + } + var addItems = []; + var result = this.each(function() { + highlight(this, addItems); + }); + for (var i = 0; i < addItems.length; ++i) { + jQuery(addItems[i].parent).before(addItems[i].target); + } + return result; +}; + +/* + * backward compatibility for jQuery.browser + * This will be supported until firefox bug is fixed. + */ +if (!jQuery.browser) { + jQuery.uaMatch = function(ua) { + ua = ua.toLowerCase(); + + var match = /(chrome)[ \/]([\w.]+)/.exec(ua) || + /(webkit)[ \/]([\w.]+)/.exec(ua) || + /(opera)(?:.*version|)[ \/]([\w.]+)/.exec(ua) || + /(msie) ([\w.]+)/.exec(ua) || + ua.indexOf("compatible") < 0 && /(mozilla)(?:.*? rv:([\w.]+)|)/.exec(ua) || + []; + + return { + browser: match[ 1 ] || "", + version: match[ 2 ] || "0" + }; + }; + jQuery.browser = {}; + jQuery.browser[jQuery.uaMatch(navigator.userAgent).browser] = true; +} diff --git a/_static/basic.css b/_static/basic.css new file mode 100644 index 00000000..30fee9d0 --- /dev/null +++ b/_static/basic.css @@ -0,0 +1,925 @@ +/* + * basic.css + * ~~~~~~~~~ + * + * Sphinx stylesheet -- basic theme. + * + * :copyright: Copyright 2007-2023 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ + +/* -- main layout ----------------------------------------------------------- */ + +div.clearer { + clear: both; +} + +div.section::after { + display: block; + content: ''; + clear: left; +} + +/* -- relbar ---------------------------------------------------------------- */ + +div.related { + width: 100%; + font-size: 90%; +} + +div.related h3 { + display: none; +} + +div.related ul { + margin: 0; + padding: 0 0 0 10px; + list-style: none; +} + +div.related li { + display: inline; +} + +div.related li.right { + float: right; + margin-right: 5px; +} + +/* -- sidebar --------------------------------------------------------------- */ + +div.sphinxsidebarwrapper { + padding: 10px 5px 0 10px; +} + +div.sphinxsidebar { + float: left; + width: 230px; + margin-left: -100%; + font-size: 90%; + word-wrap: break-word; + overflow-wrap : break-word; +} + +div.sphinxsidebar ul { + list-style: none; +} + +div.sphinxsidebar ul ul, +div.sphinxsidebar ul.want-points { + margin-left: 20px; + list-style: square; +} + +div.sphinxsidebar ul ul { + margin-top: 0; + margin-bottom: 0; +} + +div.sphinxsidebar form { + margin-top: 10px; +} + +div.sphinxsidebar input { + border: 1px solid #98dbcc; + font-family: sans-serif; + font-size: 1em; +} + +div.sphinxsidebar #searchbox form.search { + overflow: hidden; +} + +div.sphinxsidebar #searchbox input[type="text"] { + float: left; + width: 80%; + padding: 0.25em; + box-sizing: border-box; +} + +div.sphinxsidebar #searchbox input[type="submit"] { + float: left; + width: 20%; + border-left: none; + padding: 0.25em; + box-sizing: border-box; +} + + +img { + border: 0; + max-width: 100%; +} + +/* -- search page ----------------------------------------------------------- */ + +ul.search { + margin: 10px 0 0 20px; + padding: 0; +} + +ul.search li { + padding: 5px 0 5px 20px; + background-image: url(file.png); + background-repeat: no-repeat; + background-position: 0 7px; +} + +ul.search li a { + font-weight: bold; +} + +ul.search li p.context { + color: #888; + margin: 2px 0 0 30px; + text-align: left; +} + +ul.keywordmatches li.goodmatch a { + font-weight: bold; +} + +/* -- index page ------------------------------------------------------------ */ + +table.contentstable { + width: 90%; + margin-left: auto; + margin-right: auto; +} + +table.contentstable p.biglink { + line-height: 150%; +} + +a.biglink { + font-size: 1.3em; +} + +span.linkdescr { + font-style: italic; + padding-top: 5px; + font-size: 90%; +} + +/* -- general index --------------------------------------------------------- */ + +table.indextable { + width: 100%; +} + +table.indextable td { + text-align: left; + vertical-align: top; +} + +table.indextable ul { + margin-top: 0; + margin-bottom: 0; + list-style-type: none; +} + +table.indextable > tbody > tr > td > ul { + padding-left: 0em; +} + +table.indextable tr.pcap { + height: 10px; +} + +table.indextable tr.cap { + margin-top: 10px; + background-color: #f2f2f2; +} + +img.toggler { + margin-right: 3px; + margin-top: 3px; + cursor: pointer; +} + +div.modindex-jumpbox { + border-top: 1px solid #ddd; + border-bottom: 1px solid #ddd; + margin: 1em 0 1em 0; + padding: 0.4em; +} + +div.genindex-jumpbox { + border-top: 1px solid #ddd; + border-bottom: 1px solid #ddd; + margin: 1em 0 1em 0; + padding: 0.4em; +} + +/* -- domain module index --------------------------------------------------- */ + +table.modindextable td { + padding: 2px; + border-collapse: collapse; +} + +/* -- general body styles --------------------------------------------------- */ + +div.body { + min-width: 360px; + max-width: 800px; +} + +div.body p, div.body dd, div.body li, div.body blockquote { + -moz-hyphens: auto; + -ms-hyphens: auto; + -webkit-hyphens: auto; + hyphens: auto; +} + +a.headerlink { + visibility: hidden; +} + +a:visited { + color: #551A8B; +} + +h1:hover > a.headerlink, +h2:hover > a.headerlink, +h3:hover > a.headerlink, +h4:hover > a.headerlink, +h5:hover > a.headerlink, +h6:hover > a.headerlink, +dt:hover > a.headerlink, +caption:hover > a.headerlink, +p.caption:hover > a.headerlink, +div.code-block-caption:hover > a.headerlink { + visibility: visible; +} + +div.body p.caption { + text-align: inherit; +} + +div.body td { + text-align: left; +} + +.first { + margin-top: 0 !important; +} + +p.rubric { + margin-top: 30px; + font-weight: bold; +} + +img.align-left, figure.align-left, .figure.align-left, object.align-left { + clear: left; + float: left; + margin-right: 1em; +} + +img.align-right, figure.align-right, .figure.align-right, object.align-right { + clear: right; + float: right; + margin-left: 1em; +} + +img.align-center, figure.align-center, .figure.align-center, object.align-center { + display: block; + margin-left: auto; + margin-right: auto; +} + +img.align-default, figure.align-default, .figure.align-default { + display: block; + margin-left: auto; + margin-right: auto; +} + +.align-left { + text-align: left; +} + +.align-center { + text-align: center; +} + +.align-default { + text-align: center; +} + +.align-right { + text-align: right; +} + +/* -- sidebars -------------------------------------------------------------- */ + +div.sidebar, +aside.sidebar { + margin: 0 0 0.5em 1em; + border: 1px solid #ddb; + padding: 7px; + background-color: #ffe; + width: 40%; + float: right; + clear: right; + overflow-x: auto; +} + +p.sidebar-title { + font-weight: bold; +} + +nav.contents, +aside.topic, +div.admonition, div.topic, blockquote { + clear: left; +} + +/* -- topics ---------------------------------------------------------------- */ + +nav.contents, +aside.topic, +div.topic { + border: 1px solid #ccc; + padding: 7px; + margin: 10px 0 10px 0; +} + +p.topic-title { + font-size: 1.1em; + font-weight: bold; + margin-top: 10px; +} + +/* -- admonitions ----------------------------------------------------------- */ + +div.admonition { + margin-top: 10px; + margin-bottom: 10px; + padding: 7px; +} + +div.admonition dt { + font-weight: bold; +} + +p.admonition-title { + margin: 0px 10px 5px 0px; + font-weight: bold; +} + +div.body p.centered { + text-align: center; + margin-top: 25px; +} + +/* -- content of sidebars/topics/admonitions -------------------------------- */ + +div.sidebar > :last-child, +aside.sidebar > :last-child, +nav.contents > :last-child, +aside.topic > :last-child, +div.topic > :last-child, +div.admonition > :last-child { + margin-bottom: 0; +} + +div.sidebar::after, +aside.sidebar::after, +nav.contents::after, +aside.topic::after, +div.topic::after, +div.admonition::after, +blockquote::after { + display: block; + content: ''; + clear: both; +} + +/* -- tables ---------------------------------------------------------------- */ + +table.docutils { + margin-top: 10px; + margin-bottom: 10px; + border: 0; + border-collapse: collapse; +} + +table.align-center { + margin-left: auto; + margin-right: auto; +} + +table.align-default { + margin-left: auto; + margin-right: auto; +} + +table caption span.caption-number { + font-style: italic; +} + +table caption span.caption-text { +} + +table.docutils td, table.docutils th { + padding: 1px 8px 1px 5px; + border-top: 0; + border-left: 0; + border-right: 0; + border-bottom: 1px solid #aaa; +} + +th { + text-align: left; + padding-right: 5px; +} + +table.citation { + border-left: solid 1px gray; + margin-left: 1px; +} + +table.citation td { + border-bottom: none; +} + +th > :first-child, +td > :first-child { + margin-top: 0px; +} + +th > :last-child, +td > :last-child { + margin-bottom: 0px; +} + +/* -- figures --------------------------------------------------------------- */ + +div.figure, figure { + margin: 0.5em; + padding: 0.5em; +} + +div.figure p.caption, figcaption { + padding: 0.3em; +} + +div.figure p.caption span.caption-number, +figcaption span.caption-number { + font-style: italic; +} + +div.figure p.caption span.caption-text, +figcaption span.caption-text { +} + +/* -- field list styles ----------------------------------------------------- */ + +table.field-list td, table.field-list th { + border: 0 !important; +} + +.field-list ul { + margin: 0; + padding-left: 1em; +} + +.field-list p { + margin: 0; +} + +.field-name { + -moz-hyphens: manual; + -ms-hyphens: manual; + -webkit-hyphens: manual; + hyphens: manual; +} + +/* -- hlist styles ---------------------------------------------------------- */ + +table.hlist { + margin: 1em 0; +} + +table.hlist td { + vertical-align: top; +} + +/* -- object description styles --------------------------------------------- */ + +.sig { + font-family: 'Consolas', 'Menlo', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', monospace; +} + +.sig-name, code.descname { + background-color: transparent; + font-weight: bold; +} + +.sig-name { + font-size: 1.1em; +} + +code.descname { + font-size: 1.2em; +} + +.sig-prename, code.descclassname { + background-color: transparent; +} + +.optional { + font-size: 1.3em; +} + +.sig-paren { + font-size: larger; +} + +.sig-param.n { + font-style: italic; +} + +/* C++ specific styling */ + +.sig-inline.c-texpr, +.sig-inline.cpp-texpr { + font-family: unset; +} + +.sig.c .k, .sig.c .kt, +.sig.cpp .k, .sig.cpp .kt { + color: #0033B3; +} + +.sig.c .m, +.sig.cpp .m { + color: #1750EB; +} + +.sig.c .s, .sig.c .sc, +.sig.cpp .s, .sig.cpp .sc { + color: #067D17; +} + + +/* -- other body styles ----------------------------------------------------- */ + +ol.arabic { + list-style: decimal; +} + +ol.loweralpha { + list-style: lower-alpha; +} + +ol.upperalpha { + list-style: upper-alpha; +} + +ol.lowerroman { + list-style: lower-roman; +} + +ol.upperroman { + list-style: upper-roman; +} + +:not(li) > ol > li:first-child > :first-child, +:not(li) > ul > li:first-child > :first-child { + margin-top: 0px; +} + +:not(li) > ol > li:last-child > :last-child, +:not(li) > ul > li:last-child > :last-child { + margin-bottom: 0px; +} + +ol.simple ol p, +ol.simple ul p, +ul.simple ol p, +ul.simple ul p { + margin-top: 0; +} + +ol.simple > li:not(:first-child) > p, +ul.simple > li:not(:first-child) > p { + margin-top: 0; +} + +ol.simple p, +ul.simple p { + margin-bottom: 0; +} + +aside.footnote > span, +div.citation > span { + float: left; +} +aside.footnote > span:last-of-type, +div.citation > span:last-of-type { + padding-right: 0.5em; +} +aside.footnote > p { + margin-left: 2em; +} +div.citation > p { + margin-left: 4em; +} +aside.footnote > p:last-of-type, +div.citation > p:last-of-type { + margin-bottom: 0em; +} +aside.footnote > p:last-of-type:after, +div.citation > p:last-of-type:after { + content: ""; + clear: both; +} + +dl.field-list { + display: grid; + grid-template-columns: fit-content(30%) auto; +} + +dl.field-list > dt { + font-weight: bold; + word-break: break-word; + padding-left: 0.5em; + padding-right: 5px; +} + +dl.field-list > dd { + padding-left: 0.5em; + margin-top: 0em; + margin-left: 0em; + margin-bottom: 0em; +} + +dl { + margin-bottom: 15px; +} + +dd > :first-child { + margin-top: 0px; +} + +dd ul, dd table { + margin-bottom: 10px; +} + +dd { + margin-top: 3px; + margin-bottom: 10px; + margin-left: 30px; +} + +.sig dd { + margin-top: 0px; + margin-bottom: 0px; +} + +.sig dl { + margin-top: 0px; + margin-bottom: 0px; +} + +dl > dd:last-child, +dl > dd:last-child > :last-child { + margin-bottom: 0; +} + +dt:target, span.highlighted { + background-color: #fbe54e; +} + +rect.highlighted { + fill: #fbe54e; +} + +dl.glossary dt { + font-weight: bold; + font-size: 1.1em; +} + +.versionmodified { + font-style: italic; +} + +.system-message { + background-color: #fda; + padding: 5px; + border: 3px solid red; +} + +.footnote:target { + background-color: #ffa; +} + +.line-block { + display: block; + margin-top: 1em; + margin-bottom: 1em; +} + +.line-block .line-block { + margin-top: 0; + margin-bottom: 0; + margin-left: 1.5em; +} + +.guilabel, .menuselection { + font-family: sans-serif; +} + +.accelerator { + text-decoration: underline; +} + +.classifier { + font-style: oblique; +} + +.classifier:before { + font-style: normal; + margin: 0 0.5em; + content: ":"; + display: inline-block; +} + +abbr, acronym { + border-bottom: dotted 1px; + cursor: help; +} + +.translated { + background-color: rgba(207, 255, 207, 0.2) +} + +.untranslated { + background-color: rgba(255, 207, 207, 0.2) +} + +/* -- code displays --------------------------------------------------------- */ + +pre { + overflow: auto; + overflow-y: hidden; /* fixes display issues on Chrome browsers */ +} + +pre, div[class*="highlight-"] { + clear: both; +} + +span.pre { + -moz-hyphens: none; + -ms-hyphens: none; + -webkit-hyphens: none; + hyphens: none; + white-space: nowrap; +} + +div[class*="highlight-"] { + margin: 1em 0; +} + +td.linenos pre { + border: 0; + background-color: transparent; + color: #aaa; +} + +table.highlighttable { + display: block; +} + +table.highlighttable tbody { + display: block; +} + +table.highlighttable tr { + display: flex; +} + +table.highlighttable td { + margin: 0; + padding: 0; +} + +table.highlighttable td.linenos { + padding-right: 0.5em; +} + +table.highlighttable td.code { + flex: 1; + overflow: hidden; +} + +.highlight .hll { + display: block; +} + +div.highlight pre, +table.highlighttable pre { + margin: 0; +} + +div.code-block-caption + div { + margin-top: 0; +} + +div.code-block-caption { + margin-top: 1em; + padding: 2px 5px; + font-size: small; +} + +div.code-block-caption code { + background-color: transparent; +} + +table.highlighttable td.linenos, +span.linenos, +div.highlight span.gp { /* gp: Generic.Prompt */ + user-select: none; + -webkit-user-select: text; /* Safari fallback only */ + -webkit-user-select: none; /* Chrome/Safari */ + -moz-user-select: none; /* Firefox */ + -ms-user-select: none; /* IE10+ */ +} + +div.code-block-caption span.caption-number { + padding: 0.1em 0.3em; + font-style: italic; +} + +div.code-block-caption span.caption-text { +} + +div.literal-block-wrapper { + margin: 1em 0; +} + +code.xref, a code { + background-color: transparent; + font-weight: bold; +} + +h1 code, h2 code, h3 code, h4 code, h5 code, h6 code { + background-color: transparent; +} + +.viewcode-link { + float: right; +} + +.viewcode-back { + float: right; + font-family: sans-serif; +} + +div.viewcode-block:target { + margin: -1px -10px; + padding: 0 10px; +} + +/* -- math display ---------------------------------------------------------- */ + +img.math { + vertical-align: middle; +} + +div.body div.math p { + text-align: center; +} + +span.eqno { + float: right; +} + +span.eqno a.headerlink { + position: absolute; + z-index: 1; +} + +div.math:hover a.headerlink { + visibility: visible; +} + +/* -- printout stylesheet --------------------------------------------------- */ + +@media print { + div.document, + div.documentwrapper, + div.bodywrapper { + margin: 0 !important; + width: 100%; + } + + div.sphinxsidebar, + div.related, + div.footer, + #top-link { + display: none; + } +} \ No newline at end of file diff --git a/_static/check-solid.svg b/_static/check-solid.svg new file mode 100644 index 00000000..92fad4b5 --- /dev/null +++ b/_static/check-solid.svg @@ -0,0 +1,4 @@ + + + + diff --git a/_static/clipboard.min.js b/_static/clipboard.min.js new file mode 100644 index 00000000..54b3c463 --- /dev/null +++ b/_static/clipboard.min.js @@ -0,0 +1,7 @@ +/*! + * clipboard.js v2.0.8 + * https://clipboardjs.com/ + * + * Licensed MIT © Zeno Rocha + */ +!function(t,e){"object"==typeof exports&&"object"==typeof module?module.exports=e():"function"==typeof define&&define.amd?define([],e):"object"==typeof exports?exports.ClipboardJS=e():t.ClipboardJS=e()}(this,function(){return n={686:function(t,e,n){"use strict";n.d(e,{default:function(){return o}});var e=n(279),i=n.n(e),e=n(370),u=n.n(e),e=n(817),c=n.n(e);function a(t){try{return document.execCommand(t)}catch(t){return}}var f=function(t){t=c()(t);return a("cut"),t};var l=function(t){var e,n,o,r=1 + + + + diff --git a/_static/copybutton.css b/_static/copybutton.css new file mode 100644 index 00000000..f1916ec7 --- /dev/null +++ b/_static/copybutton.css @@ -0,0 +1,94 @@ +/* Copy buttons */ +button.copybtn { + position: absolute; + display: flex; + top: .3em; + right: .3em; + width: 1.7em; + height: 1.7em; + opacity: 0; + transition: opacity 0.3s, border .3s, background-color .3s; + user-select: none; + padding: 0; + border: none; + outline: none; + border-radius: 0.4em; + /* The colors that GitHub uses */ + border: #1b1f2426 1px solid; + background-color: #f6f8fa; + color: #57606a; +} + +button.copybtn.success { + border-color: #22863a; + color: #22863a; +} + +button.copybtn svg { + stroke: currentColor; + width: 1.5em; + height: 1.5em; + padding: 0.1em; +} + +div.highlight { + position: relative; +} + +/* Show the copybutton */ +.highlight:hover button.copybtn, button.copybtn.success { + opacity: 1; +} + +.highlight button.copybtn:hover { + background-color: rgb(235, 235, 235); +} + +.highlight button.copybtn:active { + background-color: rgb(187, 187, 187); +} + +/** + * A minimal CSS-only tooltip copied from: + * https://codepen.io/mildrenben/pen/rVBrpK + * + * To use, write HTML like the following: + * + *

Short

+ */ + .o-tooltip--left { + position: relative; + } + + .o-tooltip--left:after { + opacity: 0; + visibility: hidden; + position: absolute; + content: attr(data-tooltip); + padding: .2em; + font-size: .8em; + left: -.2em; + background: grey; + color: white; + white-space: nowrap; + z-index: 2; + border-radius: 2px; + transform: translateX(-102%) translateY(0); + transition: opacity 0.2s cubic-bezier(0.64, 0.09, 0.08, 1), transform 0.2s cubic-bezier(0.64, 0.09, 0.08, 1); +} + +.o-tooltip--left:hover:after { + display: block; + opacity: 1; + visibility: visible; + transform: translateX(-100%) translateY(0); + transition: opacity 0.2s cubic-bezier(0.64, 0.09, 0.08, 1), transform 0.2s cubic-bezier(0.64, 0.09, 0.08, 1); + transition-delay: .5s; +} + +/* By default the copy button shouldn't show up when printing a page */ +@media print { + button.copybtn { + display: none; + } +} diff --git a/_static/copybutton.js b/_static/copybutton.js new file mode 100644 index 00000000..2ea7ff3e --- /dev/null +++ b/_static/copybutton.js @@ -0,0 +1,248 @@ +// Localization support +const messages = { + 'en': { + 'copy': 'Copy', + 'copy_to_clipboard': 'Copy to clipboard', + 'copy_success': 'Copied!', + 'copy_failure': 'Failed to copy', + }, + 'es' : { + 'copy': 'Copiar', + 'copy_to_clipboard': 'Copiar al portapapeles', + 'copy_success': '¡Copiado!', + 'copy_failure': 'Error al copiar', + }, + 'de' : { + 'copy': 'Kopieren', + 'copy_to_clipboard': 'In die Zwischenablage kopieren', + 'copy_success': 'Kopiert!', + 'copy_failure': 'Fehler beim Kopieren', + }, + 'fr' : { + 'copy': 'Copier', + 'copy_to_clipboard': 'Copier dans le presse-papier', + 'copy_success': 'Copié !', + 'copy_failure': 'Échec de la copie', + }, + 'ru': { + 'copy': 'Скопировать', + 'copy_to_clipboard': 'Скопировать в буфер', + 'copy_success': 'Скопировано!', + 'copy_failure': 'Не удалось скопировать', + }, + 'zh-CN': { + 'copy': '复制', + 'copy_to_clipboard': '复制到剪贴板', + 'copy_success': '复制成功!', + 'copy_failure': '复制失败', + }, + 'it' : { + 'copy': 'Copiare', + 'copy_to_clipboard': 'Copiato negli appunti', + 'copy_success': 'Copiato!', + 'copy_failure': 'Errore durante la copia', + } +} + +let locale = 'en' +if( document.documentElement.lang !== undefined + && messages[document.documentElement.lang] !== undefined ) { + locale = document.documentElement.lang +} + +let doc_url_root = DOCUMENTATION_OPTIONS.URL_ROOT; +if (doc_url_root == '#') { + doc_url_root = ''; +} + +/** + * SVG files for our copy buttons + */ +let iconCheck = ` + ${messages[locale]['copy_success']} + + +` + +// If the user specified their own SVG use that, otherwise use the default +let iconCopy = ``; +if (!iconCopy) { + iconCopy = ` + ${messages[locale]['copy_to_clipboard']} + + + +` +} + +/** + * Set up copy/paste for code blocks + */ + +const runWhenDOMLoaded = cb => { + if (document.readyState != 'loading') { + cb() + } else if (document.addEventListener) { + document.addEventListener('DOMContentLoaded', cb) + } else { + document.attachEvent('onreadystatechange', function() { + if (document.readyState == 'complete') cb() + }) + } +} + +const codeCellId = index => `codecell${index}` + +// Clears selected text since ClipboardJS will select the text when copying +const clearSelection = () => { + if (window.getSelection) { + window.getSelection().removeAllRanges() + } else if (document.selection) { + document.selection.empty() + } +} + +// Changes tooltip text for a moment, then changes it back +// We want the timeout of our `success` class to be a bit shorter than the +// tooltip and icon change, so that we can hide the icon before changing back. +var timeoutIcon = 2000; +var timeoutSuccessClass = 1500; + +const temporarilyChangeTooltip = (el, oldText, newText) => { + el.setAttribute('data-tooltip', newText) + el.classList.add('success') + // Remove success a little bit sooner than we change the tooltip + // So that we can use CSS to hide the copybutton first + setTimeout(() => el.classList.remove('success'), timeoutSuccessClass) + setTimeout(() => el.setAttribute('data-tooltip', oldText), timeoutIcon) +} + +// Changes the copy button icon for two seconds, then changes it back +const temporarilyChangeIcon = (el) => { + el.innerHTML = iconCheck; + setTimeout(() => {el.innerHTML = iconCopy}, timeoutIcon) +} + +const addCopyButtonToCodeCells = () => { + // If ClipboardJS hasn't loaded, wait a bit and try again. This + // happens because we load ClipboardJS asynchronously. + if (window.ClipboardJS === undefined) { + setTimeout(addCopyButtonToCodeCells, 250) + return + } + + // Add copybuttons to all of our code cells + const COPYBUTTON_SELECTOR = 'div.highlight pre'; + const codeCells = document.querySelectorAll(COPYBUTTON_SELECTOR) + codeCells.forEach((codeCell, index) => { + const id = codeCellId(index) + codeCell.setAttribute('id', id) + + const clipboardButton = id => + `` + codeCell.insertAdjacentHTML('afterend', clipboardButton(id)) + }) + +function escapeRegExp(string) { + return string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'); // $& means the whole matched string +} + +/** + * Removes excluded text from a Node. + * + * @param {Node} target Node to filter. + * @param {string} exclude CSS selector of nodes to exclude. + * @returns {DOMString} Text from `target` with text removed. + */ +function filterText(target, exclude) { + const clone = target.cloneNode(true); // clone as to not modify the live DOM + if (exclude) { + // remove excluded nodes + clone.querySelectorAll(exclude).forEach(node => node.remove()); + } + return clone.innerText; +} + +// Callback when a copy button is clicked. Will be passed the node that was clicked +// should then grab the text and replace pieces of text that shouldn't be used in output +function formatCopyText(textContent, copybuttonPromptText, isRegexp = false, onlyCopyPromptLines = true, removePrompts = true, copyEmptyLines = true, lineContinuationChar = "", hereDocDelim = "") { + var regexp; + var match; + + // Do we check for line continuation characters and "HERE-documents"? + var useLineCont = !!lineContinuationChar + var useHereDoc = !!hereDocDelim + + // create regexp to capture prompt and remaining line + if (isRegexp) { + regexp = new RegExp('^(' + copybuttonPromptText + ')(.*)') + } else { + regexp = new RegExp('^(' + escapeRegExp(copybuttonPromptText) + ')(.*)') + } + + const outputLines = []; + var promptFound = false; + var gotLineCont = false; + var gotHereDoc = false; + const lineGotPrompt = []; + for (const line of textContent.split('\n')) { + match = line.match(regexp) + if (match || gotLineCont || gotHereDoc) { + promptFound = regexp.test(line) + lineGotPrompt.push(promptFound) + if (removePrompts && promptFound) { + outputLines.push(match[2]) + } else { + outputLines.push(line) + } + gotLineCont = line.endsWith(lineContinuationChar) & useLineCont + if (line.includes(hereDocDelim) & useHereDoc) + gotHereDoc = !gotHereDoc + } else if (!onlyCopyPromptLines) { + outputLines.push(line) + } else if (copyEmptyLines && line.trim() === '') { + outputLines.push(line) + } + } + + // If no lines with the prompt were found then just use original lines + if (lineGotPrompt.some(v => v === true)) { + textContent = outputLines.join('\n'); + } + + // Remove a trailing newline to avoid auto-running when pasting + if (textContent.endsWith("\n")) { + textContent = textContent.slice(0, -1) + } + return textContent +} + + +var copyTargetText = (trigger) => { + var target = document.querySelector(trigger.attributes['data-clipboard-target'].value); + + // get filtered text + let exclude = '.linenos'; + + let text = filterText(target, exclude); + return formatCopyText(text, '', false, true, true, true, '', '') +} + + // Initialize with a callback so we can modify the text before copy + const clipboard = new ClipboardJS('.copybtn', {text: copyTargetText}) + + // Update UI with error/success messages + clipboard.on('success', event => { + clearSelection() + temporarilyChangeTooltip(event.trigger, messages[locale]['copy'], messages[locale]['copy_success']) + temporarilyChangeIcon(event.trigger) + }) + + clipboard.on('error', event => { + temporarilyChangeTooltip(event.trigger, messages[locale]['copy'], messages[locale]['copy_failure']) + }) +} + +runWhenDOMLoaded(addCopyButtonToCodeCells) \ No newline at end of file diff --git a/_static/copybutton_funcs.js b/_static/copybutton_funcs.js new file mode 100644 index 00000000..dbe1aaad --- /dev/null +++ b/_static/copybutton_funcs.js @@ -0,0 +1,73 @@ +function escapeRegExp(string) { + return string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'); // $& means the whole matched string +} + +/** + * Removes excluded text from a Node. + * + * @param {Node} target Node to filter. + * @param {string} exclude CSS selector of nodes to exclude. + * @returns {DOMString} Text from `target` with text removed. + */ +export function filterText(target, exclude) { + const clone = target.cloneNode(true); // clone as to not modify the live DOM + if (exclude) { + // remove excluded nodes + clone.querySelectorAll(exclude).forEach(node => node.remove()); + } + return clone.innerText; +} + +// Callback when a copy button is clicked. Will be passed the node that was clicked +// should then grab the text and replace pieces of text that shouldn't be used in output +export function formatCopyText(textContent, copybuttonPromptText, isRegexp = false, onlyCopyPromptLines = true, removePrompts = true, copyEmptyLines = true, lineContinuationChar = "", hereDocDelim = "") { + var regexp; + var match; + + // Do we check for line continuation characters and "HERE-documents"? + var useLineCont = !!lineContinuationChar + var useHereDoc = !!hereDocDelim + + // create regexp to capture prompt and remaining line + if (isRegexp) { + regexp = new RegExp('^(' + copybuttonPromptText + ')(.*)') + } else { + regexp = new RegExp('^(' + escapeRegExp(copybuttonPromptText) + ')(.*)') + } + + const outputLines = []; + var promptFound = false; + var gotLineCont = false; + var gotHereDoc = false; + const lineGotPrompt = []; + for (const line of textContent.split('\n')) { + match = line.match(regexp) + if (match || gotLineCont || gotHereDoc) { + promptFound = regexp.test(line) + lineGotPrompt.push(promptFound) + if (removePrompts && promptFound) { + outputLines.push(match[2]) + } else { + outputLines.push(line) + } + gotLineCont = line.endsWith(lineContinuationChar) & useLineCont + if (line.includes(hereDocDelim) & useHereDoc) + gotHereDoc = !gotHereDoc + } else if (!onlyCopyPromptLines) { + outputLines.push(line) + } else if (copyEmptyLines && line.trim() === '') { + outputLines.push(line) + } + } + + // If no lines with the prompt were found then just use original lines + if (lineGotPrompt.some(v => v === true)) { + textContent = outputLines.join('\n'); + } + + // Remove a trailing newline to avoid auto-running when pasting + if (textContent.endsWith("\n")) { + textContent = textContent.slice(0, -1) + } + return textContent +} diff --git a/_static/css/badge_only.css b/_static/css/badge_only.css new file mode 100644 index 00000000..c718cee4 --- /dev/null +++ b/_static/css/badge_only.css @@ -0,0 +1 @@ +.clearfix{*zoom:1}.clearfix:after,.clearfix:before{display:table;content:""}.clearfix:after{clear:both}@font-face{font-family:FontAwesome;font-style:normal;font-weight:400;src:url(fonts/fontawesome-webfont.eot?674f50d287a8c48dc19ba404d20fe713?#iefix) format("embedded-opentype"),url(fonts/fontawesome-webfont.woff2?af7ae505a9eed503f8b8e6982036873e) format("woff2"),url(fonts/fontawesome-webfont.woff?fee66e712a8a08eef5805a46892932ad) format("woff"),url(fonts/fontawesome-webfont.ttf?b06871f281fee6b241d60582ae9369b9) format("truetype"),url(fonts/fontawesome-webfont.svg?912ec66d7572ff821749319396470bde#FontAwesome) format("svg")}.fa:before{font-family:FontAwesome;font-style:normal;font-weight:400;line-height:1}.fa:before,a .fa{text-decoration:inherit}.fa:before,a .fa,li .fa{display:inline-block}li .fa-large:before{width:1.875em}ul.fas{list-style-type:none;margin-left:2em;text-indent:-.8em}ul.fas li .fa{width:.8em}ul.fas li .fa-large:before{vertical-align:baseline}.fa-book:before,.icon-book:before{content:"\f02d"}.fa-caret-down:before,.icon-caret-down:before{content:"\f0d7"}.fa-caret-up:before,.icon-caret-up:before{content:"\f0d8"}.fa-caret-left:before,.icon-caret-left:before{content:"\f0d9"}.fa-caret-right:before,.icon-caret-right:before{content:"\f0da"}.rst-versions{position:fixed;bottom:0;left:0;width:300px;color:#fcfcfc;background:#1f1d1d;font-family:Lato,proxima-nova,Helvetica Neue,Arial,sans-serif;z-index:400}.rst-versions a{color:#2980b9;text-decoration:none}.rst-versions .rst-badge-small{display:none}.rst-versions .rst-current-version{padding:12px;background-color:#272525;display:block;text-align:right;font-size:90%;cursor:pointer;color:#27ae60}.rst-versions .rst-current-version:after{clear:both;content:"";display:block}.rst-versions .rst-current-version .fa{color:#fcfcfc}.rst-versions .rst-current-version .fa-book,.rst-versions .rst-current-version .icon-book{float:left}.rst-versions .rst-current-version.rst-out-of-date{background-color:#e74c3c;color:#fff}.rst-versions .rst-current-version.rst-active-old-version{background-color:#f1c40f;color:#000}.rst-versions.shift-up{height:auto;max-height:100%;overflow-y:scroll}.rst-versions.shift-up .rst-other-versions{display:block}.rst-versions .rst-other-versions{font-size:90%;padding:12px;color:grey;display:none}.rst-versions .rst-other-versions hr{display:block;height:1px;border:0;margin:20px 0;padding:0;border-top:1px solid #413d3d}.rst-versions .rst-other-versions dd{display:inline-block;margin:0}.rst-versions .rst-other-versions dd a{display:inline-block;padding:6px;color:#fcfcfc}.rst-versions.rst-badge{width:auto;bottom:20px;right:20px;left:auto;border:none;max-width:300px;max-height:90%}.rst-versions.rst-badge .fa-book,.rst-versions.rst-badge .icon-book{float:none;line-height:30px}.rst-versions.rst-badge.shift-up .rst-current-version{text-align:right}.rst-versions.rst-badge.shift-up .rst-current-version .fa-book,.rst-versions.rst-badge.shift-up .rst-current-version .icon-book{float:left}.rst-versions.rst-badge>.rst-current-version{width:auto;height:30px;line-height:30px;padding:0 6px;display:block;text-align:center}@media screen and (max-width:768px){.rst-versions{width:85%;display:none}.rst-versions.shift{display:block}} \ No newline at end of file diff --git a/_static/css/fonts/Roboto-Slab-Bold.woff b/_static/css/fonts/Roboto-Slab-Bold.woff new file mode 100644 index 00000000..6cb60000 Binary files /dev/null and b/_static/css/fonts/Roboto-Slab-Bold.woff differ diff --git a/_static/css/fonts/Roboto-Slab-Bold.woff2 b/_static/css/fonts/Roboto-Slab-Bold.woff2 new file mode 100644 index 00000000..7059e231 Binary files /dev/null and b/_static/css/fonts/Roboto-Slab-Bold.woff2 differ diff --git a/_static/css/fonts/Roboto-Slab-Regular.woff b/_static/css/fonts/Roboto-Slab-Regular.woff new file mode 100644 index 00000000..f815f63f Binary files /dev/null and b/_static/css/fonts/Roboto-Slab-Regular.woff differ diff --git a/_static/css/fonts/Roboto-Slab-Regular.woff2 b/_static/css/fonts/Roboto-Slab-Regular.woff2 new file mode 100644 index 00000000..f2c76e5b Binary files /dev/null and b/_static/css/fonts/Roboto-Slab-Regular.woff2 differ diff --git a/_static/css/fonts/fontawesome-webfont.eot b/_static/css/fonts/fontawesome-webfont.eot new file mode 100644 index 00000000..e9f60ca9 Binary files /dev/null and b/_static/css/fonts/fontawesome-webfont.eot differ diff --git a/_static/css/fonts/fontawesome-webfont.svg b/_static/css/fonts/fontawesome-webfont.svg new file mode 100644 index 00000000..855c845e --- /dev/null +++ b/_static/css/fonts/fontawesome-webfont.svg @@ -0,0 +1,2671 @@ + + + + +Created by FontForge 20120731 at Mon Oct 24 17:37:40 2016 + By ,,, +Copyright Dave Gandy 2016. All rights reserved. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/_static/css/fonts/fontawesome-webfont.ttf b/_static/css/fonts/fontawesome-webfont.ttf new file mode 100644 index 00000000..35acda2f Binary files /dev/null and b/_static/css/fonts/fontawesome-webfont.ttf differ diff --git a/_static/css/fonts/fontawesome-webfont.woff b/_static/css/fonts/fontawesome-webfont.woff new file mode 100644 index 00000000..400014a4 Binary files /dev/null and b/_static/css/fonts/fontawesome-webfont.woff differ diff --git a/_static/css/fonts/fontawesome-webfont.woff2 b/_static/css/fonts/fontawesome-webfont.woff2 new file mode 100644 index 00000000..4d13fc60 Binary files /dev/null and b/_static/css/fonts/fontawesome-webfont.woff2 differ diff --git a/_static/css/fonts/lato-bold-italic.woff b/_static/css/fonts/lato-bold-italic.woff new file mode 100644 index 00000000..88ad05b9 Binary files /dev/null and b/_static/css/fonts/lato-bold-italic.woff differ diff --git a/_static/css/fonts/lato-bold-italic.woff2 b/_static/css/fonts/lato-bold-italic.woff2 new file mode 100644 index 00000000..c4e3d804 Binary files /dev/null and b/_static/css/fonts/lato-bold-italic.woff2 differ diff --git a/_static/css/fonts/lato-bold.woff b/_static/css/fonts/lato-bold.woff new file mode 100644 index 00000000..c6dff51f Binary files /dev/null and b/_static/css/fonts/lato-bold.woff differ diff --git a/_static/css/fonts/lato-bold.woff2 b/_static/css/fonts/lato-bold.woff2 new file mode 100644 index 00000000..bb195043 Binary files /dev/null and b/_static/css/fonts/lato-bold.woff2 differ diff --git a/_static/css/fonts/lato-normal-italic.woff b/_static/css/fonts/lato-normal-italic.woff new file mode 100644 index 00000000..76114bc0 Binary files /dev/null and b/_static/css/fonts/lato-normal-italic.woff differ diff --git a/_static/css/fonts/lato-normal-italic.woff2 b/_static/css/fonts/lato-normal-italic.woff2 new file mode 100644 index 00000000..3404f37e Binary files /dev/null and b/_static/css/fonts/lato-normal-italic.woff2 differ diff --git a/_static/css/fonts/lato-normal.woff b/_static/css/fonts/lato-normal.woff new file mode 100644 index 00000000..ae1307ff Binary files /dev/null and b/_static/css/fonts/lato-normal.woff differ diff --git a/_static/css/fonts/lato-normal.woff2 b/_static/css/fonts/lato-normal.woff2 new file mode 100644 index 00000000..3bf98433 Binary files /dev/null and b/_static/css/fonts/lato-normal.woff2 differ diff --git a/_static/css/theme.css b/_static/css/theme.css new file mode 100644 index 00000000..19a446a0 --- /dev/null +++ b/_static/css/theme.css @@ -0,0 +1,4 @@ +html{box-sizing:border-box}*,:after,:before{box-sizing:inherit}article,aside,details,figcaption,figure,footer,header,hgroup,nav,section{display:block}audio,canvas,video{display:inline-block;*display:inline;*zoom:1}[hidden],audio:not([controls]){display:none}*{-webkit-box-sizing:border-box;-moz-box-sizing:border-box;box-sizing:border-box}html{font-size:100%;-webkit-text-size-adjust:100%;-ms-text-size-adjust:100%}body{margin:0}a:active,a:hover{outline:0}abbr[title]{border-bottom:1px dotted}b,strong{font-weight:700}blockquote{margin:0}dfn{font-style:italic}ins{background:#ff9;text-decoration:none}ins,mark{color:#000}mark{background:#ff0;font-style:italic;font-weight:700}.rst-content code,.rst-content tt,code,kbd,pre,samp{font-family:monospace,serif;_font-family:courier new,monospace;font-size:1em}pre{white-space:pre}q{quotes:none}q:after,q:before{content:"";content:none}small{font-size:85%}sub,sup{font-size:75%;line-height:0;position:relative;vertical-align:baseline}sup{top:-.5em}sub{bottom:-.25em}dl,ol,ul{margin:0;padding:0;list-style:none;list-style-image:none}li{list-style:none}dd{margin:0}img{border:0;-ms-interpolation-mode:bicubic;vertical-align:middle;max-width:100%}svg:not(:root){overflow:hidden}figure,form{margin:0}label{cursor:pointer}button,input,select,textarea{font-size:100%;margin:0;vertical-align:baseline;*vertical-align:middle}button,input{line-height:normal}button,input[type=button],input[type=reset],input[type=submit]{cursor:pointer;-webkit-appearance:button;*overflow:visible}button[disabled],input[disabled]{cursor:default}input[type=search]{-webkit-appearance:textfield;-moz-box-sizing:content-box;-webkit-box-sizing:content-box;box-sizing:content-box}textarea{resize:vertical}table{border-collapse:collapse;border-spacing:0}td{vertical-align:top}.chromeframe{margin:.2em 0;background:#ccc;color:#000;padding:.2em 0}.ir{display:block;border:0;text-indent:-999em;overflow:hidden;background-color:transparent;background-repeat:no-repeat;text-align:left;direction:ltr;*line-height:0}.ir br{display:none}.hidden{display:none!important;visibility:hidden}.visuallyhidden{border:0;clip:rect(0 0 0 0);height:1px;margin:-1px;overflow:hidden;padding:0;position:absolute;width:1px}.visuallyhidden.focusable:active,.visuallyhidden.focusable:focus{clip:auto;height:auto;margin:0;overflow:visible;position:static;width:auto}.invisible{visibility:hidden}.relative{position:relative}big,small{font-size:100%}@media print{body,html,section{background:none!important}*{box-shadow:none!important;text-shadow:none!important;filter:none!important;-ms-filter:none!important}a,a:visited{text-decoration:underline}.ir a:after,a[href^="#"]:after,a[href^="javascript:"]:after{content:""}blockquote,pre{page-break-inside:avoid}thead{display:table-header-group}img,tr{page-break-inside:avoid}img{max-width:100%!important}@page{margin:.5cm}.rst-content .toctree-wrapper>p.caption,h2,h3,p{orphans:3;widows:3}.rst-content .toctree-wrapper>p.caption,h2,h3{page-break-after:avoid}}.btn,.fa:before,.icon:before,.rst-content .admonition,.rst-content .admonition-title:before,.rst-content .admonition-todo,.rst-content .attention,.rst-content .caution,.rst-content .code-block-caption .headerlink:before,.rst-content .danger,.rst-content .eqno .headerlink:before,.rst-content .error,.rst-content .hint,.rst-content .important,.rst-content .note,.rst-content .seealso,.rst-content .tip,.rst-content .warning,.rst-content code.download span:first-child:before,.rst-content dl dt .headerlink:before,.rst-content h1 .headerlink:before,.rst-content h2 .headerlink:before,.rst-content h3 .headerlink:before,.rst-content h4 .headerlink:before,.rst-content h5 .headerlink:before,.rst-content h6 .headerlink:before,.rst-content p.caption .headerlink:before,.rst-content p .headerlink:before,.rst-content table>caption .headerlink:before,.rst-content tt.download span:first-child:before,.wy-alert,.wy-dropdown .caret:before,.wy-inline-validate.wy-inline-validate-danger .wy-input-context:before,.wy-inline-validate.wy-inline-validate-info .wy-input-context:before,.wy-inline-validate.wy-inline-validate-success .wy-input-context:before,.wy-inline-validate.wy-inline-validate-warning .wy-input-context:before,.wy-menu-vertical li.current>a button.toctree-expand:before,.wy-menu-vertical li.on a button.toctree-expand:before,.wy-menu-vertical li button.toctree-expand:before,input[type=color],input[type=date],input[type=datetime-local],input[type=datetime],input[type=email],input[type=month],input[type=number],input[type=password],input[type=search],input[type=tel],input[type=text],input[type=time],input[type=url],input[type=week],select,textarea{-webkit-font-smoothing:antialiased}.clearfix{*zoom:1}.clearfix:after,.clearfix:before{display:table;content:""}.clearfix:after{clear:both}/*! + * Font Awesome 4.7.0 by @davegandy - http://fontawesome.io - @fontawesome + * License - http://fontawesome.io/license (Font: SIL OFL 1.1, CSS: MIT License) + */@font-face{font-family:FontAwesome;src:url(fonts/fontawesome-webfont.eot?674f50d287a8c48dc19ba404d20fe713);src:url(fonts/fontawesome-webfont.eot?674f50d287a8c48dc19ba404d20fe713?#iefix&v=4.7.0) format("embedded-opentype"),url(fonts/fontawesome-webfont.woff2?af7ae505a9eed503f8b8e6982036873e) format("woff2"),url(fonts/fontawesome-webfont.woff?fee66e712a8a08eef5805a46892932ad) format("woff"),url(fonts/fontawesome-webfont.ttf?b06871f281fee6b241d60582ae9369b9) format("truetype"),url(fonts/fontawesome-webfont.svg?912ec66d7572ff821749319396470bde#fontawesomeregular) format("svg");font-weight:400;font-style:normal}.fa,.icon,.rst-content .admonition-title,.rst-content .code-block-caption .headerlink,.rst-content .eqno .headerlink,.rst-content code.download span:first-child,.rst-content dl dt .headerlink,.rst-content h1 .headerlink,.rst-content h2 .headerlink,.rst-content h3 .headerlink,.rst-content h4 .headerlink,.rst-content h5 .headerlink,.rst-content h6 .headerlink,.rst-content p.caption .headerlink,.rst-content p .headerlink,.rst-content table>caption .headerlink,.rst-content tt.download span:first-child,.wy-menu-vertical li.current>a button.toctree-expand,.wy-menu-vertical li.on a button.toctree-expand,.wy-menu-vertical li button.toctree-expand{display:inline-block;font:normal normal normal 14px/1 FontAwesome;font-size:inherit;text-rendering:auto;-webkit-font-smoothing:antialiased;-moz-osx-font-smoothing:grayscale}.fa-lg{font-size:1.33333em;line-height:.75em;vertical-align:-15%}.fa-2x{font-size:2em}.fa-3x{font-size:3em}.fa-4x{font-size:4em}.fa-5x{font-size:5em}.fa-fw{width:1.28571em;text-align:center}.fa-ul{padding-left:0;margin-left:2.14286em;list-style-type:none}.fa-ul>li{position:relative}.fa-li{position:absolute;left:-2.14286em;width:2.14286em;top:.14286em;text-align:center}.fa-li.fa-lg{left:-1.85714em}.fa-border{padding:.2em .25em .15em;border:.08em solid #eee;border-radius:.1em}.fa-pull-left{float:left}.fa-pull-right{float:right}.fa-pull-left.icon,.fa.fa-pull-left,.rst-content .code-block-caption .fa-pull-left.headerlink,.rst-content .eqno .fa-pull-left.headerlink,.rst-content .fa-pull-left.admonition-title,.rst-content code.download span.fa-pull-left:first-child,.rst-content dl dt .fa-pull-left.headerlink,.rst-content h1 .fa-pull-left.headerlink,.rst-content h2 .fa-pull-left.headerlink,.rst-content h3 .fa-pull-left.headerlink,.rst-content h4 .fa-pull-left.headerlink,.rst-content h5 .fa-pull-left.headerlink,.rst-content h6 .fa-pull-left.headerlink,.rst-content p .fa-pull-left.headerlink,.rst-content table>caption .fa-pull-left.headerlink,.rst-content tt.download span.fa-pull-left:first-child,.wy-menu-vertical li.current>a button.fa-pull-left.toctree-expand,.wy-menu-vertical li.on a button.fa-pull-left.toctree-expand,.wy-menu-vertical li button.fa-pull-left.toctree-expand{margin-right:.3em}.fa-pull-right.icon,.fa.fa-pull-right,.rst-content .code-block-caption .fa-pull-right.headerlink,.rst-content .eqno .fa-pull-right.headerlink,.rst-content .fa-pull-right.admonition-title,.rst-content code.download span.fa-pull-right:first-child,.rst-content dl dt .fa-pull-right.headerlink,.rst-content h1 .fa-pull-right.headerlink,.rst-content h2 .fa-pull-right.headerlink,.rst-content h3 .fa-pull-right.headerlink,.rst-content h4 .fa-pull-right.headerlink,.rst-content h5 .fa-pull-right.headerlink,.rst-content h6 .fa-pull-right.headerlink,.rst-content p .fa-pull-right.headerlink,.rst-content table>caption .fa-pull-right.headerlink,.rst-content tt.download span.fa-pull-right:first-child,.wy-menu-vertical li.current>a button.fa-pull-right.toctree-expand,.wy-menu-vertical li.on a button.fa-pull-right.toctree-expand,.wy-menu-vertical li button.fa-pull-right.toctree-expand{margin-left:.3em}.pull-right{float:right}.pull-left{float:left}.fa.pull-left,.pull-left.icon,.rst-content .code-block-caption .pull-left.headerlink,.rst-content .eqno .pull-left.headerlink,.rst-content .pull-left.admonition-title,.rst-content code.download span.pull-left:first-child,.rst-content dl dt .pull-left.headerlink,.rst-content h1 .pull-left.headerlink,.rst-content h2 .pull-left.headerlink,.rst-content h3 .pull-left.headerlink,.rst-content h4 .pull-left.headerlink,.rst-content h5 .pull-left.headerlink,.rst-content h6 .pull-left.headerlink,.rst-content p .pull-left.headerlink,.rst-content table>caption .pull-left.headerlink,.rst-content tt.download span.pull-left:first-child,.wy-menu-vertical li.current>a button.pull-left.toctree-expand,.wy-menu-vertical li.on a button.pull-left.toctree-expand,.wy-menu-vertical li button.pull-left.toctree-expand{margin-right:.3em}.fa.pull-right,.pull-right.icon,.rst-content .code-block-caption .pull-right.headerlink,.rst-content .eqno .pull-right.headerlink,.rst-content .pull-right.admonition-title,.rst-content code.download span.pull-right:first-child,.rst-content dl dt .pull-right.headerlink,.rst-content h1 .pull-right.headerlink,.rst-content h2 .pull-right.headerlink,.rst-content h3 .pull-right.headerlink,.rst-content h4 .pull-right.headerlink,.rst-content h5 .pull-right.headerlink,.rst-content h6 .pull-right.headerlink,.rst-content p .pull-right.headerlink,.rst-content table>caption .pull-right.headerlink,.rst-content tt.download span.pull-right:first-child,.wy-menu-vertical li.current>a button.pull-right.toctree-expand,.wy-menu-vertical li.on a button.pull-right.toctree-expand,.wy-menu-vertical li button.pull-right.toctree-expand{margin-left:.3em}.fa-spin{-webkit-animation:fa-spin 2s linear infinite;animation:fa-spin 2s linear infinite}.fa-pulse{-webkit-animation:fa-spin 1s steps(8) infinite;animation:fa-spin 1s steps(8) infinite}@-webkit-keyframes fa-spin{0%{-webkit-transform:rotate(0deg);transform:rotate(0deg)}to{-webkit-transform:rotate(359deg);transform:rotate(359deg)}}@keyframes fa-spin{0%{-webkit-transform:rotate(0deg);transform:rotate(0deg)}to{-webkit-transform:rotate(359deg);transform:rotate(359deg)}}.fa-rotate-90{-ms-filter:"progid:DXImageTransform.Microsoft.BasicImage(rotation=1)";-webkit-transform:rotate(90deg);-ms-transform:rotate(90deg);transform:rotate(90deg)}.fa-rotate-180{-ms-filter:"progid:DXImageTransform.Microsoft.BasicImage(rotation=2)";-webkit-transform:rotate(180deg);-ms-transform:rotate(180deg);transform:rotate(180deg)}.fa-rotate-270{-ms-filter:"progid:DXImageTransform.Microsoft.BasicImage(rotation=3)";-webkit-transform:rotate(270deg);-ms-transform:rotate(270deg);transform:rotate(270deg)}.fa-flip-horizontal{-ms-filter:"progid:DXImageTransform.Microsoft.BasicImage(rotation=0, mirror=1)";-webkit-transform:scaleX(-1);-ms-transform:scaleX(-1);transform:scaleX(-1)}.fa-flip-vertical{-ms-filter:"progid:DXImageTransform.Microsoft.BasicImage(rotation=2, mirror=1)";-webkit-transform:scaleY(-1);-ms-transform:scaleY(-1);transform:scaleY(-1)}:root .fa-flip-horizontal,:root .fa-flip-vertical,:root .fa-rotate-90,:root .fa-rotate-180,:root .fa-rotate-270{filter:none}.fa-stack{position:relative;display:inline-block;width:2em;height:2em;line-height:2em;vertical-align:middle}.fa-stack-1x,.fa-stack-2x{position:absolute;left:0;width:100%;text-align:center}.fa-stack-1x{line-height:inherit}.fa-stack-2x{font-size:2em}.fa-inverse{color:#fff}.fa-glass:before{content:""}.fa-music:before{content:""}.fa-search:before,.icon-search:before{content:""}.fa-envelope-o:before{content:""}.fa-heart:before{content:""}.fa-star:before{content:""}.fa-star-o:before{content:""}.fa-user:before{content:""}.fa-film:before{content:""}.fa-th-large:before{content:""}.fa-th:before{content:""}.fa-th-list:before{content:""}.fa-check:before{content:""}.fa-close:before,.fa-remove:before,.fa-times:before{content:""}.fa-search-plus:before{content:""}.fa-search-minus:before{content:""}.fa-power-off:before{content:""}.fa-signal:before{content:""}.fa-cog:before,.fa-gear:before{content:""}.fa-trash-o:before{content:""}.fa-home:before,.icon-home:before{content:""}.fa-file-o:before{content:""}.fa-clock-o:before{content:""}.fa-road:before{content:""}.fa-download:before,.rst-content code.download span:first-child:before,.rst-content tt.download span:first-child:before{content:""}.fa-arrow-circle-o-down:before{content:""}.fa-arrow-circle-o-up:before{content:""}.fa-inbox:before{content:""}.fa-play-circle-o:before{content:""}.fa-repeat:before,.fa-rotate-right:before{content:""}.fa-refresh:before{content:""}.fa-list-alt:before{content:""}.fa-lock:before{content:""}.fa-flag:before{content:""}.fa-headphones:before{content:""}.fa-volume-off:before{content:""}.fa-volume-down:before{content:""}.fa-volume-up:before{content:""}.fa-qrcode:before{content:""}.fa-barcode:before{content:""}.fa-tag:before{content:""}.fa-tags:before{content:""}.fa-book:before,.icon-book:before{content:""}.fa-bookmark:before{content:""}.fa-print:before{content:""}.fa-camera:before{content:""}.fa-font:before{content:""}.fa-bold:before{content:""}.fa-italic:before{content:""}.fa-text-height:before{content:""}.fa-text-width:before{content:""}.fa-align-left:before{content:""}.fa-align-center:before{content:""}.fa-align-right:before{content:""}.fa-align-justify:before{content:""}.fa-list:before{content:""}.fa-dedent:before,.fa-outdent:before{content:""}.fa-indent:before{content:""}.fa-video-camera:before{content:""}.fa-image:before,.fa-photo:before,.fa-picture-o:before{content:""}.fa-pencil:before{content:""}.fa-map-marker:before{content:""}.fa-adjust:before{content:""}.fa-tint:before{content:""}.fa-edit:before,.fa-pencil-square-o:before{content:""}.fa-share-square-o:before{content:""}.fa-check-square-o:before{content:""}.fa-arrows:before{content:""}.fa-step-backward:before{content:""}.fa-fast-backward:before{content:""}.fa-backward:before{content:""}.fa-play:before{content:""}.fa-pause:before{content:""}.fa-stop:before{content:""}.fa-forward:before{content:""}.fa-fast-forward:before{content:""}.fa-step-forward:before{content:""}.fa-eject:before{content:""}.fa-chevron-left:before{content:""}.fa-chevron-right:before{content:""}.fa-plus-circle:before{content:""}.fa-minus-circle:before{content:""}.fa-times-circle:before,.wy-inline-validate.wy-inline-validate-danger .wy-input-context:before{content:""}.fa-check-circle:before,.wy-inline-validate.wy-inline-validate-success .wy-input-context:before{content:""}.fa-question-circle:before{content:""}.fa-info-circle:before{content:""}.fa-crosshairs:before{content:""}.fa-times-circle-o:before{content:""}.fa-check-circle-o:before{content:""}.fa-ban:before{content:""}.fa-arrow-left:before{content:""}.fa-arrow-right:before{content:""}.fa-arrow-up:before{content:""}.fa-arrow-down:before{content:""}.fa-mail-forward:before,.fa-share:before{content:""}.fa-expand:before{content:""}.fa-compress:before{content:""}.fa-plus:before{content:""}.fa-minus:before{content:""}.fa-asterisk:before{content:""}.fa-exclamation-circle:before,.rst-content .admonition-title:before,.wy-inline-validate.wy-inline-validate-info .wy-input-context:before,.wy-inline-validate.wy-inline-validate-warning .wy-input-context:before{content:""}.fa-gift:before{content:""}.fa-leaf:before{content:""}.fa-fire:before,.icon-fire:before{content:""}.fa-eye:before{content:""}.fa-eye-slash:before{content:""}.fa-exclamation-triangle:before,.fa-warning:before{content:""}.fa-plane:before{content:""}.fa-calendar:before{content:""}.fa-random:before{content:""}.fa-comment:before{content:""}.fa-magnet:before{content:""}.fa-chevron-up:before{content:""}.fa-chevron-down:before{content:""}.fa-retweet:before{content:""}.fa-shopping-cart:before{content:""}.fa-folder:before{content:""}.fa-folder-open:before{content:""}.fa-arrows-v:before{content:""}.fa-arrows-h:before{content:""}.fa-bar-chart-o:before,.fa-bar-chart:before{content:""}.fa-twitter-square:before{content:""}.fa-facebook-square:before{content:""}.fa-camera-retro:before{content:""}.fa-key:before{content:""}.fa-cogs:before,.fa-gears:before{content:""}.fa-comments:before{content:""}.fa-thumbs-o-up:before{content:""}.fa-thumbs-o-down:before{content:""}.fa-star-half:before{content:""}.fa-heart-o:before{content:""}.fa-sign-out:before{content:""}.fa-linkedin-square:before{content:""}.fa-thumb-tack:before{content:""}.fa-external-link:before{content:""}.fa-sign-in:before{content:""}.fa-trophy:before{content:""}.fa-github-square:before{content:""}.fa-upload:before{content:""}.fa-lemon-o:before{content:""}.fa-phone:before{content:""}.fa-square-o:before{content:""}.fa-bookmark-o:before{content:""}.fa-phone-square:before{content:""}.fa-twitter:before{content:""}.fa-facebook-f:before,.fa-facebook:before{content:""}.fa-github:before,.icon-github:before{content:""}.fa-unlock:before{content:""}.fa-credit-card:before{content:""}.fa-feed:before,.fa-rss:before{content:""}.fa-hdd-o:before{content:""}.fa-bullhorn:before{content:""}.fa-bell:before{content:""}.fa-certificate:before{content:""}.fa-hand-o-right:before{content:""}.fa-hand-o-left:before{content:""}.fa-hand-o-up:before{content:""}.fa-hand-o-down:before{content:""}.fa-arrow-circle-left:before,.icon-circle-arrow-left:before{content:""}.fa-arrow-circle-right:before,.icon-circle-arrow-right:before{content:""}.fa-arrow-circle-up:before{content:""}.fa-arrow-circle-down:before{content:""}.fa-globe:before{content:""}.fa-wrench:before{content:""}.fa-tasks:before{content:""}.fa-filter:before{content:""}.fa-briefcase:before{content:""}.fa-arrows-alt:before{content:""}.fa-group:before,.fa-users:before{content:""}.fa-chain:before,.fa-link:before,.icon-link:before{content:""}.fa-cloud:before{content:""}.fa-flask:before{content:""}.fa-cut:before,.fa-scissors:before{content:""}.fa-copy:before,.fa-files-o:before{content:""}.fa-paperclip:before{content:""}.fa-floppy-o:before,.fa-save:before{content:""}.fa-square:before{content:""}.fa-bars:before,.fa-navicon:before,.fa-reorder:before{content:""}.fa-list-ul:before{content:""}.fa-list-ol:before{content:""}.fa-strikethrough:before{content:""}.fa-underline:before{content:""}.fa-table:before{content:""}.fa-magic:before{content:""}.fa-truck:before{content:""}.fa-pinterest:before{content:""}.fa-pinterest-square:before{content:""}.fa-google-plus-square:before{content:""}.fa-google-plus:before{content:""}.fa-money:before{content:""}.fa-caret-down:before,.icon-caret-down:before,.wy-dropdown .caret:before{content:""}.fa-caret-up:before{content:""}.fa-caret-left:before{content:""}.fa-caret-right:before{content:""}.fa-columns:before{content:""}.fa-sort:before,.fa-unsorted:before{content:""}.fa-sort-desc:before,.fa-sort-down:before{content:""}.fa-sort-asc:before,.fa-sort-up:before{content:""}.fa-envelope:before{content:""}.fa-linkedin:before{content:""}.fa-rotate-left:before,.fa-undo:before{content:""}.fa-gavel:before,.fa-legal:before{content:""}.fa-dashboard:before,.fa-tachometer:before{content:""}.fa-comment-o:before{content:""}.fa-comments-o:before{content:""}.fa-bolt:before,.fa-flash:before{content:""}.fa-sitemap:before{content:""}.fa-umbrella:before{content:""}.fa-clipboard:before,.fa-paste:before{content:""}.fa-lightbulb-o:before{content:""}.fa-exchange:before{content:""}.fa-cloud-download:before{content:""}.fa-cloud-upload:before{content:""}.fa-user-md:before{content:""}.fa-stethoscope:before{content:""}.fa-suitcase:before{content:""}.fa-bell-o:before{content:""}.fa-coffee:before{content:""}.fa-cutlery:before{content:""}.fa-file-text-o:before{content:""}.fa-building-o:before{content:""}.fa-hospital-o:before{content:""}.fa-ambulance:before{content:""}.fa-medkit:before{content:""}.fa-fighter-jet:before{content:""}.fa-beer:before{content:""}.fa-h-square:before{content:""}.fa-plus-square:before{content:""}.fa-angle-double-left:before{content:""}.fa-angle-double-right:before{content:""}.fa-angle-double-up:before{content:""}.fa-angle-double-down:before{content:""}.fa-angle-left:before{content:""}.fa-angle-right:before{content:""}.fa-angle-up:before{content:""}.fa-angle-down:before{content:""}.fa-desktop:before{content:""}.fa-laptop:before{content:""}.fa-tablet:before{content:""}.fa-mobile-phone:before,.fa-mobile:before{content:""}.fa-circle-o:before{content:""}.fa-quote-left:before{content:""}.fa-quote-right:before{content:""}.fa-spinner:before{content:""}.fa-circle:before{content:""}.fa-mail-reply:before,.fa-reply:before{content:""}.fa-github-alt:before{content:""}.fa-folder-o:before{content:""}.fa-folder-open-o:before{content:""}.fa-smile-o:before{content:""}.fa-frown-o:before{content:""}.fa-meh-o:before{content:""}.fa-gamepad:before{content:""}.fa-keyboard-o:before{content:""}.fa-flag-o:before{content:""}.fa-flag-checkered:before{content:""}.fa-terminal:before{content:""}.fa-code:before{content:""}.fa-mail-reply-all:before,.fa-reply-all:before{content:""}.fa-star-half-empty:before,.fa-star-half-full:before,.fa-star-half-o:before{content:""}.fa-location-arrow:before{content:""}.fa-crop:before{content:""}.fa-code-fork:before{content:""}.fa-chain-broken:before,.fa-unlink:before{content:""}.fa-question:before{content:""}.fa-info:before{content:""}.fa-exclamation:before{content:""}.fa-superscript:before{content:""}.fa-subscript:before{content:""}.fa-eraser:before{content:""}.fa-puzzle-piece:before{content:""}.fa-microphone:before{content:""}.fa-microphone-slash:before{content:""}.fa-shield:before{content:""}.fa-calendar-o:before{content:""}.fa-fire-extinguisher:before{content:""}.fa-rocket:before{content:""}.fa-maxcdn:before{content:""}.fa-chevron-circle-left:before{content:""}.fa-chevron-circle-right:before{content:""}.fa-chevron-circle-up:before{content:""}.fa-chevron-circle-down:before{content:""}.fa-html5:before{content:""}.fa-css3:before{content:""}.fa-anchor:before{content:""}.fa-unlock-alt:before{content:""}.fa-bullseye:before{content:""}.fa-ellipsis-h:before{content:""}.fa-ellipsis-v:before{content:""}.fa-rss-square:before{content:""}.fa-play-circle:before{content:""}.fa-ticket:before{content:""}.fa-minus-square:before{content:""}.fa-minus-square-o:before,.wy-menu-vertical li.current>a button.toctree-expand:before,.wy-menu-vertical li.on a button.toctree-expand:before{content:""}.fa-level-up:before{content:""}.fa-level-down:before{content:""}.fa-check-square:before{content:""}.fa-pencil-square:before{content:""}.fa-external-link-square:before{content:""}.fa-share-square:before{content:""}.fa-compass:before{content:""}.fa-caret-square-o-down:before,.fa-toggle-down:before{content:""}.fa-caret-square-o-up:before,.fa-toggle-up:before{content:""}.fa-caret-square-o-right:before,.fa-toggle-right:before{content:""}.fa-eur:before,.fa-euro:before{content:""}.fa-gbp:before{content:""}.fa-dollar:before,.fa-usd:before{content:""}.fa-inr:before,.fa-rupee:before{content:""}.fa-cny:before,.fa-jpy:before,.fa-rmb:before,.fa-yen:before{content:""}.fa-rouble:before,.fa-rub:before,.fa-ruble:before{content:""}.fa-krw:before,.fa-won:before{content:""}.fa-bitcoin:before,.fa-btc:before{content:""}.fa-file:before{content:""}.fa-file-text:before{content:""}.fa-sort-alpha-asc:before{content:""}.fa-sort-alpha-desc:before{content:""}.fa-sort-amount-asc:before{content:""}.fa-sort-amount-desc:before{content:""}.fa-sort-numeric-asc:before{content:""}.fa-sort-numeric-desc:before{content:""}.fa-thumbs-up:before{content:""}.fa-thumbs-down:before{content:""}.fa-youtube-square:before{content:""}.fa-youtube:before{content:""}.fa-xing:before{content:""}.fa-xing-square:before{content:""}.fa-youtube-play:before{content:""}.fa-dropbox:before{content:""}.fa-stack-overflow:before{content:""}.fa-instagram:before{content:""}.fa-flickr:before{content:""}.fa-adn:before{content:""}.fa-bitbucket:before,.icon-bitbucket:before{content:""}.fa-bitbucket-square:before{content:""}.fa-tumblr:before{content:""}.fa-tumblr-square:before{content:""}.fa-long-arrow-down:before{content:""}.fa-long-arrow-up:before{content:""}.fa-long-arrow-left:before{content:""}.fa-long-arrow-right:before{content:""}.fa-apple:before{content:""}.fa-windows:before{content:""}.fa-android:before{content:""}.fa-linux:before{content:""}.fa-dribbble:before{content:""}.fa-skype:before{content:""}.fa-foursquare:before{content:""}.fa-trello:before{content:""}.fa-female:before{content:""}.fa-male:before{content:""}.fa-gittip:before,.fa-gratipay:before{content:""}.fa-sun-o:before{content:""}.fa-moon-o:before{content:""}.fa-archive:before{content:""}.fa-bug:before{content:""}.fa-vk:before{content:""}.fa-weibo:before{content:""}.fa-renren:before{content:""}.fa-pagelines:before{content:""}.fa-stack-exchange:before{content:""}.fa-arrow-circle-o-right:before{content:""}.fa-arrow-circle-o-left:before{content:""}.fa-caret-square-o-left:before,.fa-toggle-left:before{content:""}.fa-dot-circle-o:before{content:""}.fa-wheelchair:before{content:""}.fa-vimeo-square:before{content:""}.fa-try:before,.fa-turkish-lira:before{content:""}.fa-plus-square-o:before,.wy-menu-vertical li button.toctree-expand:before{content:""}.fa-space-shuttle:before{content:""}.fa-slack:before{content:""}.fa-envelope-square:before{content:""}.fa-wordpress:before{content:""}.fa-openid:before{content:""}.fa-bank:before,.fa-institution:before,.fa-university:before{content:""}.fa-graduation-cap:before,.fa-mortar-board:before{content:""}.fa-yahoo:before{content:""}.fa-google:before{content:""}.fa-reddit:before{content:""}.fa-reddit-square:before{content:""}.fa-stumbleupon-circle:before{content:""}.fa-stumbleupon:before{content:""}.fa-delicious:before{content:""}.fa-digg:before{content:""}.fa-pied-piper-pp:before{content:""}.fa-pied-piper-alt:before{content:""}.fa-drupal:before{content:""}.fa-joomla:before{content:""}.fa-language:before{content:""}.fa-fax:before{content:""}.fa-building:before{content:""}.fa-child:before{content:""}.fa-paw:before{content:""}.fa-spoon:before{content:""}.fa-cube:before{content:""}.fa-cubes:before{content:""}.fa-behance:before{content:""}.fa-behance-square:before{content:""}.fa-steam:before{content:""}.fa-steam-square:before{content:""}.fa-recycle:before{content:""}.fa-automobile:before,.fa-car:before{content:""}.fa-cab:before,.fa-taxi:before{content:""}.fa-tree:before{content:""}.fa-spotify:before{content:""}.fa-deviantart:before{content:""}.fa-soundcloud:before{content:""}.fa-database:before{content:""}.fa-file-pdf-o:before{content:""}.fa-file-word-o:before{content:""}.fa-file-excel-o:before{content:""}.fa-file-powerpoint-o:before{content:""}.fa-file-image-o:before,.fa-file-photo-o:before,.fa-file-picture-o:before{content:""}.fa-file-archive-o:before,.fa-file-zip-o:before{content:""}.fa-file-audio-o:before,.fa-file-sound-o:before{content:""}.fa-file-movie-o:before,.fa-file-video-o:before{content:""}.fa-file-code-o:before{content:""}.fa-vine:before{content:""}.fa-codepen:before{content:""}.fa-jsfiddle:before{content:""}.fa-life-bouy:before,.fa-life-buoy:before,.fa-life-ring:before,.fa-life-saver:before,.fa-support:before{content:""}.fa-circle-o-notch:before{content:""}.fa-ra:before,.fa-rebel:before,.fa-resistance:before{content:""}.fa-empire:before,.fa-ge:before{content:""}.fa-git-square:before{content:""}.fa-git:before{content:""}.fa-hacker-news:before,.fa-y-combinator-square:before,.fa-yc-square:before{content:""}.fa-tencent-weibo:before{content:""}.fa-qq:before{content:""}.fa-wechat:before,.fa-weixin:before{content:""}.fa-paper-plane:before,.fa-send:before{content:""}.fa-paper-plane-o:before,.fa-send-o:before{content:""}.fa-history:before{content:""}.fa-circle-thin:before{content:""}.fa-header:before{content:""}.fa-paragraph:before{content:""}.fa-sliders:before{content:""}.fa-share-alt:before{content:""}.fa-share-alt-square:before{content:""}.fa-bomb:before{content:""}.fa-futbol-o:before,.fa-soccer-ball-o:before{content:""}.fa-tty:before{content:""}.fa-binoculars:before{content:""}.fa-plug:before{content:""}.fa-slideshare:before{content:""}.fa-twitch:before{content:""}.fa-yelp:before{content:""}.fa-newspaper-o:before{content:""}.fa-wifi:before{content:""}.fa-calculator:before{content:""}.fa-paypal:before{content:""}.fa-google-wallet:before{content:""}.fa-cc-visa:before{content:""}.fa-cc-mastercard:before{content:""}.fa-cc-discover:before{content:""}.fa-cc-amex:before{content:""}.fa-cc-paypal:before{content:""}.fa-cc-stripe:before{content:""}.fa-bell-slash:before{content:""}.fa-bell-slash-o:before{content:""}.fa-trash:before{content:""}.fa-copyright:before{content:""}.fa-at:before{content:""}.fa-eyedropper:before{content:""}.fa-paint-brush:before{content:""}.fa-birthday-cake:before{content:""}.fa-area-chart:before{content:""}.fa-pie-chart:before{content:""}.fa-line-chart:before{content:""}.fa-lastfm:before{content:""}.fa-lastfm-square:before{content:""}.fa-toggle-off:before{content:""}.fa-toggle-on:before{content:""}.fa-bicycle:before{content:""}.fa-bus:before{content:""}.fa-ioxhost:before{content:""}.fa-angellist:before{content:""}.fa-cc:before{content:""}.fa-ils:before,.fa-shekel:before,.fa-sheqel:before{content:""}.fa-meanpath:before{content:""}.fa-buysellads:before{content:""}.fa-connectdevelop:before{content:""}.fa-dashcube:before{content:""}.fa-forumbee:before{content:""}.fa-leanpub:before{content:""}.fa-sellsy:before{content:""}.fa-shirtsinbulk:before{content:""}.fa-simplybuilt:before{content:""}.fa-skyatlas:before{content:""}.fa-cart-plus:before{content:""}.fa-cart-arrow-down:before{content:""}.fa-diamond:before{content:""}.fa-ship:before{content:""}.fa-user-secret:before{content:""}.fa-motorcycle:before{content:""}.fa-street-view:before{content:""}.fa-heartbeat:before{content:""}.fa-venus:before{content:""}.fa-mars:before{content:""}.fa-mercury:before{content:""}.fa-intersex:before,.fa-transgender:before{content:""}.fa-transgender-alt:before{content:""}.fa-venus-double:before{content:""}.fa-mars-double:before{content:""}.fa-venus-mars:before{content:""}.fa-mars-stroke:before{content:""}.fa-mars-stroke-v:before{content:""}.fa-mars-stroke-h:before{content:""}.fa-neuter:before{content:""}.fa-genderless:before{content:""}.fa-facebook-official:before{content:""}.fa-pinterest-p:before{content:""}.fa-whatsapp:before{content:""}.fa-server:before{content:""}.fa-user-plus:before{content:""}.fa-user-times:before{content:""}.fa-bed:before,.fa-hotel:before{content:""}.fa-viacoin:before{content:""}.fa-train:before{content:""}.fa-subway:before{content:""}.fa-medium:before{content:""}.fa-y-combinator:before,.fa-yc:before{content:""}.fa-optin-monster:before{content:""}.fa-opencart:before{content:""}.fa-expeditedssl:before{content:""}.fa-battery-4:before,.fa-battery-full:before,.fa-battery:before{content:""}.fa-battery-3:before,.fa-battery-three-quarters:before{content:""}.fa-battery-2:before,.fa-battery-half:before{content:""}.fa-battery-1:before,.fa-battery-quarter:before{content:""}.fa-battery-0:before,.fa-battery-empty:before{content:""}.fa-mouse-pointer:before{content:""}.fa-i-cursor:before{content:""}.fa-object-group:before{content:""}.fa-object-ungroup:before{content:""}.fa-sticky-note:before{content:""}.fa-sticky-note-o:before{content:""}.fa-cc-jcb:before{content:""}.fa-cc-diners-club:before{content:""}.fa-clone:before{content:""}.fa-balance-scale:before{content:""}.fa-hourglass-o:before{content:""}.fa-hourglass-1:before,.fa-hourglass-start:before{content:""}.fa-hourglass-2:before,.fa-hourglass-half:before{content:""}.fa-hourglass-3:before,.fa-hourglass-end:before{content:""}.fa-hourglass:before{content:""}.fa-hand-grab-o:before,.fa-hand-rock-o:before{content:""}.fa-hand-paper-o:before,.fa-hand-stop-o:before{content:""}.fa-hand-scissors-o:before{content:""}.fa-hand-lizard-o:before{content:""}.fa-hand-spock-o:before{content:""}.fa-hand-pointer-o:before{content:""}.fa-hand-peace-o:before{content:""}.fa-trademark:before{content:""}.fa-registered:before{content:""}.fa-creative-commons:before{content:""}.fa-gg:before{content:""}.fa-gg-circle:before{content:""}.fa-tripadvisor:before{content:""}.fa-odnoklassniki:before{content:""}.fa-odnoklassniki-square:before{content:""}.fa-get-pocket:before{content:""}.fa-wikipedia-w:before{content:""}.fa-safari:before{content:""}.fa-chrome:before{content:""}.fa-firefox:before{content:""}.fa-opera:before{content:""}.fa-internet-explorer:before{content:""}.fa-television:before,.fa-tv:before{content:""}.fa-contao:before{content:""}.fa-500px:before{content:""}.fa-amazon:before{content:""}.fa-calendar-plus-o:before{content:""}.fa-calendar-minus-o:before{content:""}.fa-calendar-times-o:before{content:""}.fa-calendar-check-o:before{content:""}.fa-industry:before{content:""}.fa-map-pin:before{content:""}.fa-map-signs:before{content:""}.fa-map-o:before{content:""}.fa-map:before{content:""}.fa-commenting:before{content:""}.fa-commenting-o:before{content:""}.fa-houzz:before{content:""}.fa-vimeo:before{content:""}.fa-black-tie:before{content:""}.fa-fonticons:before{content:""}.fa-reddit-alien:before{content:""}.fa-edge:before{content:""}.fa-credit-card-alt:before{content:""}.fa-codiepie:before{content:""}.fa-modx:before{content:""}.fa-fort-awesome:before{content:""}.fa-usb:before{content:""}.fa-product-hunt:before{content:""}.fa-mixcloud:before{content:""}.fa-scribd:before{content:""}.fa-pause-circle:before{content:""}.fa-pause-circle-o:before{content:""}.fa-stop-circle:before{content:""}.fa-stop-circle-o:before{content:""}.fa-shopping-bag:before{content:""}.fa-shopping-basket:before{content:""}.fa-hashtag:before{content:""}.fa-bluetooth:before{content:""}.fa-bluetooth-b:before{content:""}.fa-percent:before{content:""}.fa-gitlab:before,.icon-gitlab:before{content:""}.fa-wpbeginner:before{content:""}.fa-wpforms:before{content:""}.fa-envira:before{content:""}.fa-universal-access:before{content:""}.fa-wheelchair-alt:before{content:""}.fa-question-circle-o:before{content:""}.fa-blind:before{content:""}.fa-audio-description:before{content:""}.fa-volume-control-phone:before{content:""}.fa-braille:before{content:""}.fa-assistive-listening-systems:before{content:""}.fa-american-sign-language-interpreting:before,.fa-asl-interpreting:before{content:""}.fa-deaf:before,.fa-deafness:before,.fa-hard-of-hearing:before{content:""}.fa-glide:before{content:""}.fa-glide-g:before{content:""}.fa-sign-language:before,.fa-signing:before{content:""}.fa-low-vision:before{content:""}.fa-viadeo:before{content:""}.fa-viadeo-square:before{content:""}.fa-snapchat:before{content:""}.fa-snapchat-ghost:before{content:""}.fa-snapchat-square:before{content:""}.fa-pied-piper:before{content:""}.fa-first-order:before{content:""}.fa-yoast:before{content:""}.fa-themeisle:before{content:""}.fa-google-plus-circle:before,.fa-google-plus-official:before{content:""}.fa-fa:before,.fa-font-awesome:before{content:""}.fa-handshake-o:before{content:""}.fa-envelope-open:before{content:""}.fa-envelope-open-o:before{content:""}.fa-linode:before{content:""}.fa-address-book:before{content:""}.fa-address-book-o:before{content:""}.fa-address-card:before,.fa-vcard:before{content:""}.fa-address-card-o:before,.fa-vcard-o:before{content:""}.fa-user-circle:before{content:""}.fa-user-circle-o:before{content:""}.fa-user-o:before{content:""}.fa-id-badge:before{content:""}.fa-drivers-license:before,.fa-id-card:before{content:""}.fa-drivers-license-o:before,.fa-id-card-o:before{content:""}.fa-quora:before{content:""}.fa-free-code-camp:before{content:""}.fa-telegram:before{content:""}.fa-thermometer-4:before,.fa-thermometer-full:before,.fa-thermometer:before{content:""}.fa-thermometer-3:before,.fa-thermometer-three-quarters:before{content:""}.fa-thermometer-2:before,.fa-thermometer-half:before{content:""}.fa-thermometer-1:before,.fa-thermometer-quarter:before{content:""}.fa-thermometer-0:before,.fa-thermometer-empty:before{content:""}.fa-shower:before{content:""}.fa-bath:before,.fa-bathtub:before,.fa-s15:before{content:""}.fa-podcast:before{content:""}.fa-window-maximize:before{content:""}.fa-window-minimize:before{content:""}.fa-window-restore:before{content:""}.fa-times-rectangle:before,.fa-window-close:before{content:""}.fa-times-rectangle-o:before,.fa-window-close-o:before{content:""}.fa-bandcamp:before{content:""}.fa-grav:before{content:""}.fa-etsy:before{content:""}.fa-imdb:before{content:""}.fa-ravelry:before{content:""}.fa-eercast:before{content:""}.fa-microchip:before{content:""}.fa-snowflake-o:before{content:""}.fa-superpowers:before{content:""}.fa-wpexplorer:before{content:""}.fa-meetup:before{content:""}.sr-only{position:absolute;width:1px;height:1px;padding:0;margin:-1px;overflow:hidden;clip:rect(0,0,0,0);border:0}.sr-only-focusable:active,.sr-only-focusable:focus{position:static;width:auto;height:auto;margin:0;overflow:visible;clip:auto}.fa,.icon,.rst-content .admonition-title,.rst-content .code-block-caption .headerlink,.rst-content .eqno .headerlink,.rst-content code.download span:first-child,.rst-content dl dt .headerlink,.rst-content h1 .headerlink,.rst-content h2 .headerlink,.rst-content h3 .headerlink,.rst-content h4 .headerlink,.rst-content h5 .headerlink,.rst-content h6 .headerlink,.rst-content p.caption .headerlink,.rst-content p .headerlink,.rst-content table>caption .headerlink,.rst-content tt.download span:first-child,.wy-dropdown .caret,.wy-inline-validate.wy-inline-validate-danger .wy-input-context,.wy-inline-validate.wy-inline-validate-info .wy-input-context,.wy-inline-validate.wy-inline-validate-success .wy-input-context,.wy-inline-validate.wy-inline-validate-warning .wy-input-context,.wy-menu-vertical li.current>a button.toctree-expand,.wy-menu-vertical li.on a button.toctree-expand,.wy-menu-vertical li button.toctree-expand{font-family:inherit}.fa:before,.icon:before,.rst-content .admonition-title:before,.rst-content .code-block-caption .headerlink:before,.rst-content .eqno .headerlink:before,.rst-content code.download span:first-child:before,.rst-content dl dt .headerlink:before,.rst-content h1 .headerlink:before,.rst-content h2 .headerlink:before,.rst-content h3 .headerlink:before,.rst-content h4 .headerlink:before,.rst-content h5 .headerlink:before,.rst-content h6 .headerlink:before,.rst-content p.caption .headerlink:before,.rst-content p .headerlink:before,.rst-content table>caption .headerlink:before,.rst-content tt.download span:first-child:before,.wy-dropdown .caret:before,.wy-inline-validate.wy-inline-validate-danger .wy-input-context:before,.wy-inline-validate.wy-inline-validate-info .wy-input-context:before,.wy-inline-validate.wy-inline-validate-success .wy-input-context:before,.wy-inline-validate.wy-inline-validate-warning .wy-input-context:before,.wy-menu-vertical li.current>a button.toctree-expand:before,.wy-menu-vertical li.on a button.toctree-expand:before,.wy-menu-vertical li button.toctree-expand:before{font-family:FontAwesome;display:inline-block;font-style:normal;font-weight:400;line-height:1;text-decoration:inherit}.rst-content .code-block-caption a .headerlink,.rst-content .eqno a .headerlink,.rst-content a .admonition-title,.rst-content code.download a span:first-child,.rst-content dl dt a .headerlink,.rst-content h1 a .headerlink,.rst-content h2 a .headerlink,.rst-content h3 a .headerlink,.rst-content h4 a .headerlink,.rst-content h5 a .headerlink,.rst-content h6 a .headerlink,.rst-content p.caption a .headerlink,.rst-content p a .headerlink,.rst-content table>caption a .headerlink,.rst-content tt.download a span:first-child,.wy-menu-vertical li.current>a button.toctree-expand,.wy-menu-vertical li.on a button.toctree-expand,.wy-menu-vertical li a button.toctree-expand,a .fa,a .icon,a .rst-content .admonition-title,a .rst-content .code-block-caption .headerlink,a .rst-content .eqno .headerlink,a .rst-content code.download span:first-child,a .rst-content dl dt .headerlink,a .rst-content h1 .headerlink,a .rst-content h2 .headerlink,a .rst-content h3 .headerlink,a .rst-content h4 .headerlink,a .rst-content h5 .headerlink,a .rst-content h6 .headerlink,a .rst-content p.caption .headerlink,a .rst-content p .headerlink,a .rst-content table>caption .headerlink,a .rst-content tt.download span:first-child,a .wy-menu-vertical li button.toctree-expand{display:inline-block;text-decoration:inherit}.btn .fa,.btn .icon,.btn .rst-content .admonition-title,.btn .rst-content .code-block-caption .headerlink,.btn .rst-content .eqno .headerlink,.btn .rst-content code.download span:first-child,.btn .rst-content dl dt .headerlink,.btn .rst-content h1 .headerlink,.btn .rst-content h2 .headerlink,.btn .rst-content h3 .headerlink,.btn .rst-content h4 .headerlink,.btn .rst-content h5 .headerlink,.btn .rst-content h6 .headerlink,.btn .rst-content p .headerlink,.btn .rst-content table>caption .headerlink,.btn .rst-content tt.download span:first-child,.btn .wy-menu-vertical li.current>a button.toctree-expand,.btn .wy-menu-vertical li.on a button.toctree-expand,.btn .wy-menu-vertical li button.toctree-expand,.nav .fa,.nav .icon,.nav .rst-content .admonition-title,.nav .rst-content .code-block-caption .headerlink,.nav .rst-content .eqno .headerlink,.nav .rst-content code.download span:first-child,.nav .rst-content dl dt .headerlink,.nav .rst-content h1 .headerlink,.nav .rst-content h2 .headerlink,.nav .rst-content h3 .headerlink,.nav .rst-content h4 .headerlink,.nav .rst-content h5 .headerlink,.nav .rst-content h6 .headerlink,.nav .rst-content p .headerlink,.nav .rst-content table>caption .headerlink,.nav .rst-content tt.download span:first-child,.nav .wy-menu-vertical li.current>a button.toctree-expand,.nav .wy-menu-vertical li.on a button.toctree-expand,.nav .wy-menu-vertical li button.toctree-expand,.rst-content .btn .admonition-title,.rst-content .code-block-caption .btn .headerlink,.rst-content .code-block-caption .nav .headerlink,.rst-content .eqno .btn .headerlink,.rst-content .eqno .nav .headerlink,.rst-content .nav .admonition-title,.rst-content code.download .btn span:first-child,.rst-content code.download .nav span:first-child,.rst-content dl dt .btn .headerlink,.rst-content dl dt .nav .headerlink,.rst-content h1 .btn .headerlink,.rst-content h1 .nav .headerlink,.rst-content h2 .btn .headerlink,.rst-content h2 .nav .headerlink,.rst-content h3 .btn .headerlink,.rst-content h3 .nav .headerlink,.rst-content h4 .btn .headerlink,.rst-content h4 .nav .headerlink,.rst-content h5 .btn .headerlink,.rst-content h5 .nav .headerlink,.rst-content h6 .btn .headerlink,.rst-content h6 .nav .headerlink,.rst-content p .btn .headerlink,.rst-content p .nav .headerlink,.rst-content table>caption .btn .headerlink,.rst-content table>caption .nav .headerlink,.rst-content tt.download .btn span:first-child,.rst-content tt.download .nav span:first-child,.wy-menu-vertical li .btn button.toctree-expand,.wy-menu-vertical li.current>a .btn button.toctree-expand,.wy-menu-vertical li.current>a .nav button.toctree-expand,.wy-menu-vertical li .nav button.toctree-expand,.wy-menu-vertical li.on a .btn button.toctree-expand,.wy-menu-vertical li.on a .nav button.toctree-expand{display:inline}.btn .fa-large.icon,.btn .fa.fa-large,.btn .rst-content .code-block-caption .fa-large.headerlink,.btn .rst-content .eqno .fa-large.headerlink,.btn .rst-content .fa-large.admonition-title,.btn .rst-content code.download span.fa-large:first-child,.btn .rst-content dl dt .fa-large.headerlink,.btn .rst-content h1 .fa-large.headerlink,.btn .rst-content h2 .fa-large.headerlink,.btn .rst-content h3 .fa-large.headerlink,.btn .rst-content h4 .fa-large.headerlink,.btn .rst-content h5 .fa-large.headerlink,.btn .rst-content h6 .fa-large.headerlink,.btn .rst-content p .fa-large.headerlink,.btn .rst-content table>caption .fa-large.headerlink,.btn .rst-content tt.download span.fa-large:first-child,.btn .wy-menu-vertical li button.fa-large.toctree-expand,.nav .fa-large.icon,.nav .fa.fa-large,.nav .rst-content .code-block-caption .fa-large.headerlink,.nav .rst-content .eqno .fa-large.headerlink,.nav .rst-content .fa-large.admonition-title,.nav .rst-content code.download span.fa-large:first-child,.nav .rst-content dl dt .fa-large.headerlink,.nav .rst-content h1 .fa-large.headerlink,.nav .rst-content h2 .fa-large.headerlink,.nav .rst-content h3 .fa-large.headerlink,.nav .rst-content h4 .fa-large.headerlink,.nav .rst-content h5 .fa-large.headerlink,.nav .rst-content h6 .fa-large.headerlink,.nav .rst-content p .fa-large.headerlink,.nav .rst-content table>caption .fa-large.headerlink,.nav .rst-content tt.download span.fa-large:first-child,.nav .wy-menu-vertical li button.fa-large.toctree-expand,.rst-content .btn .fa-large.admonition-title,.rst-content .code-block-caption .btn .fa-large.headerlink,.rst-content .code-block-caption .nav .fa-large.headerlink,.rst-content .eqno .btn .fa-large.headerlink,.rst-content .eqno .nav .fa-large.headerlink,.rst-content .nav .fa-large.admonition-title,.rst-content code.download .btn span.fa-large:first-child,.rst-content code.download .nav span.fa-large:first-child,.rst-content dl dt .btn .fa-large.headerlink,.rst-content dl dt .nav .fa-large.headerlink,.rst-content h1 .btn .fa-large.headerlink,.rst-content h1 .nav .fa-large.headerlink,.rst-content h2 .btn .fa-large.headerlink,.rst-content h2 .nav .fa-large.headerlink,.rst-content h3 .btn .fa-large.headerlink,.rst-content h3 .nav .fa-large.headerlink,.rst-content h4 .btn .fa-large.headerlink,.rst-content h4 .nav .fa-large.headerlink,.rst-content h5 .btn .fa-large.headerlink,.rst-content h5 .nav .fa-large.headerlink,.rst-content h6 .btn .fa-large.headerlink,.rst-content h6 .nav .fa-large.headerlink,.rst-content p .btn .fa-large.headerlink,.rst-content p .nav .fa-large.headerlink,.rst-content table>caption .btn .fa-large.headerlink,.rst-content table>caption .nav .fa-large.headerlink,.rst-content tt.download .btn span.fa-large:first-child,.rst-content tt.download .nav span.fa-large:first-child,.wy-menu-vertical li .btn button.fa-large.toctree-expand,.wy-menu-vertical li .nav button.fa-large.toctree-expand{line-height:.9em}.btn .fa-spin.icon,.btn .fa.fa-spin,.btn .rst-content .code-block-caption .fa-spin.headerlink,.btn .rst-content .eqno .fa-spin.headerlink,.btn .rst-content .fa-spin.admonition-title,.btn .rst-content code.download span.fa-spin:first-child,.btn .rst-content dl dt .fa-spin.headerlink,.btn .rst-content h1 .fa-spin.headerlink,.btn .rst-content h2 .fa-spin.headerlink,.btn .rst-content h3 .fa-spin.headerlink,.btn .rst-content h4 .fa-spin.headerlink,.btn .rst-content h5 .fa-spin.headerlink,.btn .rst-content h6 .fa-spin.headerlink,.btn .rst-content p .fa-spin.headerlink,.btn .rst-content table>caption .fa-spin.headerlink,.btn .rst-content tt.download span.fa-spin:first-child,.btn .wy-menu-vertical li button.fa-spin.toctree-expand,.nav .fa-spin.icon,.nav .fa.fa-spin,.nav .rst-content .code-block-caption .fa-spin.headerlink,.nav .rst-content .eqno .fa-spin.headerlink,.nav .rst-content .fa-spin.admonition-title,.nav .rst-content code.download span.fa-spin:first-child,.nav .rst-content dl dt .fa-spin.headerlink,.nav .rst-content h1 .fa-spin.headerlink,.nav .rst-content h2 .fa-spin.headerlink,.nav .rst-content h3 .fa-spin.headerlink,.nav .rst-content h4 .fa-spin.headerlink,.nav .rst-content h5 .fa-spin.headerlink,.nav .rst-content h6 .fa-spin.headerlink,.nav .rst-content p .fa-spin.headerlink,.nav .rst-content table>caption .fa-spin.headerlink,.nav .rst-content tt.download span.fa-spin:first-child,.nav .wy-menu-vertical li button.fa-spin.toctree-expand,.rst-content .btn .fa-spin.admonition-title,.rst-content .code-block-caption .btn .fa-spin.headerlink,.rst-content .code-block-caption .nav .fa-spin.headerlink,.rst-content .eqno .btn .fa-spin.headerlink,.rst-content .eqno .nav .fa-spin.headerlink,.rst-content .nav .fa-spin.admonition-title,.rst-content code.download .btn span.fa-spin:first-child,.rst-content code.download .nav span.fa-spin:first-child,.rst-content dl dt .btn .fa-spin.headerlink,.rst-content dl dt .nav .fa-spin.headerlink,.rst-content h1 .btn .fa-spin.headerlink,.rst-content h1 .nav .fa-spin.headerlink,.rst-content h2 .btn .fa-spin.headerlink,.rst-content h2 .nav .fa-spin.headerlink,.rst-content h3 .btn .fa-spin.headerlink,.rst-content h3 .nav .fa-spin.headerlink,.rst-content h4 .btn .fa-spin.headerlink,.rst-content h4 .nav .fa-spin.headerlink,.rst-content h5 .btn .fa-spin.headerlink,.rst-content h5 .nav .fa-spin.headerlink,.rst-content h6 .btn .fa-spin.headerlink,.rst-content h6 .nav .fa-spin.headerlink,.rst-content p .btn .fa-spin.headerlink,.rst-content p .nav .fa-spin.headerlink,.rst-content table>caption .btn .fa-spin.headerlink,.rst-content table>caption .nav .fa-spin.headerlink,.rst-content tt.download .btn span.fa-spin:first-child,.rst-content tt.download .nav span.fa-spin:first-child,.wy-menu-vertical li .btn button.fa-spin.toctree-expand,.wy-menu-vertical li .nav button.fa-spin.toctree-expand{display:inline-block}.btn.fa:before,.btn.icon:before,.rst-content .btn.admonition-title:before,.rst-content .code-block-caption .btn.headerlink:before,.rst-content .eqno .btn.headerlink:before,.rst-content code.download span.btn:first-child:before,.rst-content dl dt .btn.headerlink:before,.rst-content h1 .btn.headerlink:before,.rst-content h2 .btn.headerlink:before,.rst-content h3 .btn.headerlink:before,.rst-content h4 .btn.headerlink:before,.rst-content h5 .btn.headerlink:before,.rst-content h6 .btn.headerlink:before,.rst-content p .btn.headerlink:before,.rst-content table>caption .btn.headerlink:before,.rst-content tt.download span.btn:first-child:before,.wy-menu-vertical li button.btn.toctree-expand:before{opacity:.5;-webkit-transition:opacity .05s ease-in;-moz-transition:opacity .05s ease-in;transition:opacity .05s ease-in}.btn.fa:hover:before,.btn.icon:hover:before,.rst-content .btn.admonition-title:hover:before,.rst-content .code-block-caption .btn.headerlink:hover:before,.rst-content .eqno .btn.headerlink:hover:before,.rst-content code.download span.btn:first-child:hover:before,.rst-content dl dt .btn.headerlink:hover:before,.rst-content h1 .btn.headerlink:hover:before,.rst-content h2 .btn.headerlink:hover:before,.rst-content h3 .btn.headerlink:hover:before,.rst-content h4 .btn.headerlink:hover:before,.rst-content h5 .btn.headerlink:hover:before,.rst-content h6 .btn.headerlink:hover:before,.rst-content p .btn.headerlink:hover:before,.rst-content table>caption .btn.headerlink:hover:before,.rst-content tt.download span.btn:first-child:hover:before,.wy-menu-vertical li button.btn.toctree-expand:hover:before{opacity:1}.btn-mini .fa:before,.btn-mini .icon:before,.btn-mini .rst-content .admonition-title:before,.btn-mini .rst-content .code-block-caption .headerlink:before,.btn-mini .rst-content .eqno .headerlink:before,.btn-mini .rst-content code.download span:first-child:before,.btn-mini .rst-content dl dt .headerlink:before,.btn-mini .rst-content h1 .headerlink:before,.btn-mini .rst-content h2 .headerlink:before,.btn-mini .rst-content h3 .headerlink:before,.btn-mini .rst-content h4 .headerlink:before,.btn-mini .rst-content h5 .headerlink:before,.btn-mini .rst-content h6 .headerlink:before,.btn-mini .rst-content p .headerlink:before,.btn-mini .rst-content table>caption .headerlink:before,.btn-mini .rst-content tt.download span:first-child:before,.btn-mini .wy-menu-vertical li button.toctree-expand:before,.rst-content .btn-mini .admonition-title:before,.rst-content .code-block-caption .btn-mini .headerlink:before,.rst-content .eqno .btn-mini .headerlink:before,.rst-content code.download .btn-mini span:first-child:before,.rst-content dl dt .btn-mini .headerlink:before,.rst-content h1 .btn-mini .headerlink:before,.rst-content h2 .btn-mini .headerlink:before,.rst-content h3 .btn-mini .headerlink:before,.rst-content h4 .btn-mini .headerlink:before,.rst-content h5 .btn-mini .headerlink:before,.rst-content h6 .btn-mini .headerlink:before,.rst-content p .btn-mini .headerlink:before,.rst-content table>caption .btn-mini .headerlink:before,.rst-content tt.download .btn-mini span:first-child:before,.wy-menu-vertical li .btn-mini button.toctree-expand:before{font-size:14px;vertical-align:-15%}.rst-content .admonition,.rst-content .admonition-todo,.rst-content .attention,.rst-content .caution,.rst-content .danger,.rst-content .error,.rst-content .hint,.rst-content .important,.rst-content .note,.rst-content .seealso,.rst-content .tip,.rst-content .warning,.wy-alert{padding:12px;line-height:24px;margin-bottom:24px;background:#e7f2fa}.rst-content .admonition-title,.wy-alert-title{font-weight:700;display:block;color:#fff;background:#6ab0de;padding:6px 12px;margin:-12px -12px 12px}.rst-content .danger,.rst-content .error,.rst-content .wy-alert-danger.admonition,.rst-content .wy-alert-danger.admonition-todo,.rst-content .wy-alert-danger.attention,.rst-content .wy-alert-danger.caution,.rst-content .wy-alert-danger.hint,.rst-content .wy-alert-danger.important,.rst-content .wy-alert-danger.note,.rst-content .wy-alert-danger.seealso,.rst-content .wy-alert-danger.tip,.rst-content .wy-alert-danger.warning,.wy-alert.wy-alert-danger{background:#fdf3f2}.rst-content .danger .admonition-title,.rst-content .danger .wy-alert-title,.rst-content .error .admonition-title,.rst-content .error .wy-alert-title,.rst-content .wy-alert-danger.admonition-todo .admonition-title,.rst-content .wy-alert-danger.admonition-todo .wy-alert-title,.rst-content .wy-alert-danger.admonition .admonition-title,.rst-content .wy-alert-danger.admonition .wy-alert-title,.rst-content .wy-alert-danger.attention .admonition-title,.rst-content .wy-alert-danger.attention .wy-alert-title,.rst-content .wy-alert-danger.caution .admonition-title,.rst-content .wy-alert-danger.caution .wy-alert-title,.rst-content .wy-alert-danger.hint .admonition-title,.rst-content .wy-alert-danger.hint .wy-alert-title,.rst-content .wy-alert-danger.important .admonition-title,.rst-content .wy-alert-danger.important .wy-alert-title,.rst-content .wy-alert-danger.note .admonition-title,.rst-content .wy-alert-danger.note .wy-alert-title,.rst-content .wy-alert-danger.seealso .admonition-title,.rst-content .wy-alert-danger.seealso .wy-alert-title,.rst-content .wy-alert-danger.tip .admonition-title,.rst-content .wy-alert-danger.tip .wy-alert-title,.rst-content .wy-alert-danger.warning .admonition-title,.rst-content .wy-alert-danger.warning .wy-alert-title,.rst-content .wy-alert.wy-alert-danger .admonition-title,.wy-alert.wy-alert-danger .rst-content .admonition-title,.wy-alert.wy-alert-danger .wy-alert-title{background:#f29f97}.rst-content .admonition-todo,.rst-content .attention,.rst-content .caution,.rst-content .warning,.rst-content .wy-alert-warning.admonition,.rst-content .wy-alert-warning.danger,.rst-content .wy-alert-warning.error,.rst-content .wy-alert-warning.hint,.rst-content .wy-alert-warning.important,.rst-content .wy-alert-warning.note,.rst-content .wy-alert-warning.seealso,.rst-content .wy-alert-warning.tip,.wy-alert.wy-alert-warning{background:#ffedcc}.rst-content .admonition-todo .admonition-title,.rst-content .admonition-todo .wy-alert-title,.rst-content .attention .admonition-title,.rst-content .attention .wy-alert-title,.rst-content .caution .admonition-title,.rst-content .caution .wy-alert-title,.rst-content .warning .admonition-title,.rst-content .warning .wy-alert-title,.rst-content .wy-alert-warning.admonition .admonition-title,.rst-content .wy-alert-warning.admonition .wy-alert-title,.rst-content .wy-alert-warning.danger .admonition-title,.rst-content .wy-alert-warning.danger .wy-alert-title,.rst-content .wy-alert-warning.error .admonition-title,.rst-content .wy-alert-warning.error .wy-alert-title,.rst-content .wy-alert-warning.hint .admonition-title,.rst-content .wy-alert-warning.hint .wy-alert-title,.rst-content .wy-alert-warning.important .admonition-title,.rst-content .wy-alert-warning.important .wy-alert-title,.rst-content .wy-alert-warning.note .admonition-title,.rst-content .wy-alert-warning.note .wy-alert-title,.rst-content .wy-alert-warning.seealso .admonition-title,.rst-content .wy-alert-warning.seealso .wy-alert-title,.rst-content .wy-alert-warning.tip .admonition-title,.rst-content .wy-alert-warning.tip .wy-alert-title,.rst-content .wy-alert.wy-alert-warning .admonition-title,.wy-alert.wy-alert-warning .rst-content .admonition-title,.wy-alert.wy-alert-warning .wy-alert-title{background:#f0b37e}.rst-content .note,.rst-content .seealso,.rst-content .wy-alert-info.admonition,.rst-content .wy-alert-info.admonition-todo,.rst-content .wy-alert-info.attention,.rst-content .wy-alert-info.caution,.rst-content .wy-alert-info.danger,.rst-content .wy-alert-info.error,.rst-content .wy-alert-info.hint,.rst-content .wy-alert-info.important,.rst-content .wy-alert-info.tip,.rst-content .wy-alert-info.warning,.wy-alert.wy-alert-info{background:#e7f2fa}.rst-content .note .admonition-title,.rst-content .note .wy-alert-title,.rst-content .seealso .admonition-title,.rst-content .seealso .wy-alert-title,.rst-content .wy-alert-info.admonition-todo .admonition-title,.rst-content .wy-alert-info.admonition-todo .wy-alert-title,.rst-content .wy-alert-info.admonition .admonition-title,.rst-content .wy-alert-info.admonition .wy-alert-title,.rst-content .wy-alert-info.attention .admonition-title,.rst-content .wy-alert-info.attention .wy-alert-title,.rst-content .wy-alert-info.caution .admonition-title,.rst-content .wy-alert-info.caution .wy-alert-title,.rst-content .wy-alert-info.danger .admonition-title,.rst-content .wy-alert-info.danger .wy-alert-title,.rst-content .wy-alert-info.error .admonition-title,.rst-content .wy-alert-info.error .wy-alert-title,.rst-content .wy-alert-info.hint .admonition-title,.rst-content .wy-alert-info.hint .wy-alert-title,.rst-content .wy-alert-info.important .admonition-title,.rst-content .wy-alert-info.important .wy-alert-title,.rst-content .wy-alert-info.tip .admonition-title,.rst-content .wy-alert-info.tip .wy-alert-title,.rst-content .wy-alert-info.warning .admonition-title,.rst-content .wy-alert-info.warning .wy-alert-title,.rst-content .wy-alert.wy-alert-info .admonition-title,.wy-alert.wy-alert-info .rst-content .admonition-title,.wy-alert.wy-alert-info .wy-alert-title{background:#6ab0de}.rst-content .hint,.rst-content .important,.rst-content .tip,.rst-content .wy-alert-success.admonition,.rst-content .wy-alert-success.admonition-todo,.rst-content .wy-alert-success.attention,.rst-content .wy-alert-success.caution,.rst-content .wy-alert-success.danger,.rst-content .wy-alert-success.error,.rst-content .wy-alert-success.note,.rst-content .wy-alert-success.seealso,.rst-content .wy-alert-success.warning,.wy-alert.wy-alert-success{background:#dbfaf4}.rst-content .hint .admonition-title,.rst-content .hint .wy-alert-title,.rst-content .important .admonition-title,.rst-content .important .wy-alert-title,.rst-content .tip .admonition-title,.rst-content .tip .wy-alert-title,.rst-content .wy-alert-success.admonition-todo .admonition-title,.rst-content .wy-alert-success.admonition-todo .wy-alert-title,.rst-content .wy-alert-success.admonition .admonition-title,.rst-content .wy-alert-success.admonition .wy-alert-title,.rst-content .wy-alert-success.attention .admonition-title,.rst-content .wy-alert-success.attention .wy-alert-title,.rst-content .wy-alert-success.caution .admonition-title,.rst-content .wy-alert-success.caution .wy-alert-title,.rst-content .wy-alert-success.danger .admonition-title,.rst-content .wy-alert-success.danger .wy-alert-title,.rst-content .wy-alert-success.error .admonition-title,.rst-content .wy-alert-success.error .wy-alert-title,.rst-content .wy-alert-success.note .admonition-title,.rst-content .wy-alert-success.note .wy-alert-title,.rst-content .wy-alert-success.seealso .admonition-title,.rst-content .wy-alert-success.seealso .wy-alert-title,.rst-content .wy-alert-success.warning .admonition-title,.rst-content .wy-alert-success.warning .wy-alert-title,.rst-content .wy-alert.wy-alert-success .admonition-title,.wy-alert.wy-alert-success .rst-content .admonition-title,.wy-alert.wy-alert-success .wy-alert-title{background:#1abc9c}.rst-content .wy-alert-neutral.admonition,.rst-content .wy-alert-neutral.admonition-todo,.rst-content .wy-alert-neutral.attention,.rst-content .wy-alert-neutral.caution,.rst-content .wy-alert-neutral.danger,.rst-content .wy-alert-neutral.error,.rst-content .wy-alert-neutral.hint,.rst-content .wy-alert-neutral.important,.rst-content .wy-alert-neutral.note,.rst-content .wy-alert-neutral.seealso,.rst-content .wy-alert-neutral.tip,.rst-content .wy-alert-neutral.warning,.wy-alert.wy-alert-neutral{background:#f3f6f6}.rst-content .wy-alert-neutral.admonition-todo .admonition-title,.rst-content .wy-alert-neutral.admonition-todo .wy-alert-title,.rst-content .wy-alert-neutral.admonition .admonition-title,.rst-content .wy-alert-neutral.admonition .wy-alert-title,.rst-content .wy-alert-neutral.attention .admonition-title,.rst-content .wy-alert-neutral.attention .wy-alert-title,.rst-content .wy-alert-neutral.caution .admonition-title,.rst-content .wy-alert-neutral.caution .wy-alert-title,.rst-content .wy-alert-neutral.danger .admonition-title,.rst-content .wy-alert-neutral.danger .wy-alert-title,.rst-content .wy-alert-neutral.error .admonition-title,.rst-content .wy-alert-neutral.error .wy-alert-title,.rst-content .wy-alert-neutral.hint .admonition-title,.rst-content .wy-alert-neutral.hint .wy-alert-title,.rst-content .wy-alert-neutral.important .admonition-title,.rst-content .wy-alert-neutral.important .wy-alert-title,.rst-content .wy-alert-neutral.note .admonition-title,.rst-content .wy-alert-neutral.note .wy-alert-title,.rst-content .wy-alert-neutral.seealso .admonition-title,.rst-content .wy-alert-neutral.seealso .wy-alert-title,.rst-content .wy-alert-neutral.tip .admonition-title,.rst-content .wy-alert-neutral.tip .wy-alert-title,.rst-content .wy-alert-neutral.warning .admonition-title,.rst-content .wy-alert-neutral.warning .wy-alert-title,.rst-content .wy-alert.wy-alert-neutral .admonition-title,.wy-alert.wy-alert-neutral .rst-content .admonition-title,.wy-alert.wy-alert-neutral .wy-alert-title{color:#404040;background:#e1e4e5}.rst-content .wy-alert-neutral.admonition-todo a,.rst-content .wy-alert-neutral.admonition a,.rst-content .wy-alert-neutral.attention a,.rst-content .wy-alert-neutral.caution a,.rst-content .wy-alert-neutral.danger a,.rst-content .wy-alert-neutral.error a,.rst-content .wy-alert-neutral.hint a,.rst-content .wy-alert-neutral.important a,.rst-content .wy-alert-neutral.note a,.rst-content .wy-alert-neutral.seealso a,.rst-content .wy-alert-neutral.tip a,.rst-content .wy-alert-neutral.warning a,.wy-alert.wy-alert-neutral a{color:#2980b9}.rst-content .admonition-todo p:last-child,.rst-content .admonition p:last-child,.rst-content .attention p:last-child,.rst-content .caution p:last-child,.rst-content .danger p:last-child,.rst-content .error p:last-child,.rst-content .hint p:last-child,.rst-content .important p:last-child,.rst-content .note p:last-child,.rst-content .seealso p:last-child,.rst-content .tip p:last-child,.rst-content .warning p:last-child,.wy-alert p:last-child{margin-bottom:0}.wy-tray-container{position:fixed;bottom:0;left:0;z-index:600}.wy-tray-container li{display:block;width:300px;background:transparent;color:#fff;text-align:center;box-shadow:0 5px 5px 0 rgba(0,0,0,.1);padding:0 24px;min-width:20%;opacity:0;height:0;line-height:56px;overflow:hidden;-webkit-transition:all .3s ease-in;-moz-transition:all .3s ease-in;transition:all .3s ease-in}.wy-tray-container li.wy-tray-item-success{background:#27ae60}.wy-tray-container li.wy-tray-item-info{background:#2980b9}.wy-tray-container li.wy-tray-item-warning{background:#e67e22}.wy-tray-container li.wy-tray-item-danger{background:#e74c3c}.wy-tray-container li.on{opacity:1;height:56px}@media screen and (max-width:768px){.wy-tray-container{bottom:auto;top:0;width:100%}.wy-tray-container li{width:100%}}button{font-size:100%;margin:0;vertical-align:baseline;*vertical-align:middle;cursor:pointer;line-height:normal;-webkit-appearance:button;*overflow:visible}button::-moz-focus-inner,input::-moz-focus-inner{border:0;padding:0}button[disabled]{cursor:default}.btn{display:inline-block;border-radius:2px;line-height:normal;white-space:nowrap;text-align:center;cursor:pointer;font-size:100%;padding:6px 12px 8px;color:#fff;border:1px solid rgba(0,0,0,.1);background-color:#27ae60;text-decoration:none;font-weight:400;font-family:Lato,proxima-nova,Helvetica Neue,Arial,sans-serif;box-shadow:inset 0 1px 2px -1px hsla(0,0%,100%,.5),inset 0 -2px 0 0 rgba(0,0,0,.1);outline-none:false;vertical-align:middle;*display:inline;zoom:1;-webkit-user-drag:none;-webkit-user-select:none;-moz-user-select:none;-ms-user-select:none;user-select:none;-webkit-transition:all .1s linear;-moz-transition:all .1s linear;transition:all .1s linear}.btn-hover{background:#2e8ece;color:#fff}.btn:hover{background:#2cc36b;color:#fff}.btn:focus{background:#2cc36b;outline:0}.btn:active{box-shadow:inset 0 -1px 0 0 rgba(0,0,0,.05),inset 0 2px 0 0 rgba(0,0,0,.1);padding:8px 12px 6px}.btn:visited{color:#fff}.btn-disabled,.btn-disabled:active,.btn-disabled:focus,.btn-disabled:hover,.btn:disabled{background-image:none;filter:progid:DXImageTransform.Microsoft.gradient(enabled = false);filter:alpha(opacity=40);opacity:.4;cursor:not-allowed;box-shadow:none}.btn::-moz-focus-inner{padding:0;border:0}.btn-small{font-size:80%}.btn-info{background-color:#2980b9!important}.btn-info:hover{background-color:#2e8ece!important}.btn-neutral{background-color:#f3f6f6!important;color:#404040!important}.btn-neutral:hover{background-color:#e5ebeb!important;color:#404040}.btn-neutral:visited{color:#404040!important}.btn-success{background-color:#27ae60!important}.btn-success:hover{background-color:#295!important}.btn-danger{background-color:#e74c3c!important}.btn-danger:hover{background-color:#ea6153!important}.btn-warning{background-color:#e67e22!important}.btn-warning:hover{background-color:#e98b39!important}.btn-invert{background-color:#222}.btn-invert:hover{background-color:#2f2f2f!important}.btn-link{background-color:transparent!important;color:#2980b9;box-shadow:none;border-color:transparent!important}.btn-link:active,.btn-link:hover{background-color:transparent!important;color:#409ad5!important;box-shadow:none}.btn-link:visited{color:#9b59b6}.wy-btn-group .btn,.wy-control .btn{vertical-align:middle}.wy-btn-group{margin-bottom:24px;*zoom:1}.wy-btn-group:after,.wy-btn-group:before{display:table;content:""}.wy-btn-group:after{clear:both}.wy-dropdown{position:relative;display:inline-block}.wy-dropdown-active .wy-dropdown-menu{display:block}.wy-dropdown-menu{position:absolute;left:0;display:none;float:left;top:100%;min-width:100%;background:#fcfcfc;z-index:100;border:1px solid #cfd7dd;box-shadow:0 2px 2px 0 rgba(0,0,0,.1);padding:12px}.wy-dropdown-menu>dd>a{display:block;clear:both;color:#404040;white-space:nowrap;font-size:90%;padding:0 12px;cursor:pointer}.wy-dropdown-menu>dd>a:hover{background:#2980b9;color:#fff}.wy-dropdown-menu>dd.divider{border-top:1px solid #cfd7dd;margin:6px 0}.wy-dropdown-menu>dd.search{padding-bottom:12px}.wy-dropdown-menu>dd.search input[type=search]{width:100%}.wy-dropdown-menu>dd.call-to-action{background:#e3e3e3;text-transform:uppercase;font-weight:500;font-size:80%}.wy-dropdown-menu>dd.call-to-action:hover{background:#e3e3e3}.wy-dropdown-menu>dd.call-to-action .btn{color:#fff}.wy-dropdown.wy-dropdown-up .wy-dropdown-menu{bottom:100%;top:auto;left:auto;right:0}.wy-dropdown.wy-dropdown-bubble .wy-dropdown-menu{background:#fcfcfc;margin-top:2px}.wy-dropdown.wy-dropdown-bubble .wy-dropdown-menu a{padding:6px 12px}.wy-dropdown.wy-dropdown-bubble .wy-dropdown-menu a:hover{background:#2980b9;color:#fff}.wy-dropdown.wy-dropdown-left .wy-dropdown-menu{right:0;left:auto;text-align:right}.wy-dropdown-arrow:before{content:" ";border-bottom:5px solid #f5f5f5;border-left:5px solid transparent;border-right:5px solid transparent;position:absolute;display:block;top:-4px;left:50%;margin-left:-3px}.wy-dropdown-arrow.wy-dropdown-arrow-left:before{left:11px}.wy-form-stacked select{display:block}.wy-form-aligned .wy-help-inline,.wy-form-aligned input,.wy-form-aligned label,.wy-form-aligned select,.wy-form-aligned textarea{display:inline-block;*display:inline;*zoom:1;vertical-align:middle}.wy-form-aligned .wy-control-group>label{display:inline-block;vertical-align:middle;width:10em;margin:6px 12px 0 0;float:left}.wy-form-aligned .wy-control{float:left}.wy-form-aligned .wy-control label{display:block}.wy-form-aligned .wy-control select{margin-top:6px}fieldset{margin:0}fieldset,legend{border:0;padding:0}legend{width:100%;white-space:normal;margin-bottom:24px;font-size:150%;*margin-left:-7px}label,legend{display:block}label{margin:0 0 .3125em;color:#333;font-size:90%}input,select,textarea{font-size:100%;margin:0;vertical-align:baseline;*vertical-align:middle}.wy-control-group{margin-bottom:24px;max-width:1200px;margin-left:auto;margin-right:auto;*zoom:1}.wy-control-group:after,.wy-control-group:before{display:table;content:""}.wy-control-group:after{clear:both}.wy-control-group.wy-control-group-required>label:after{content:" *";color:#e74c3c}.wy-control-group .wy-form-full,.wy-control-group .wy-form-halves,.wy-control-group .wy-form-thirds{padding-bottom:12px}.wy-control-group .wy-form-full input[type=color],.wy-control-group .wy-form-full input[type=date],.wy-control-group .wy-form-full input[type=datetime-local],.wy-control-group .wy-form-full input[type=datetime],.wy-control-group .wy-form-full input[type=email],.wy-control-group .wy-form-full input[type=month],.wy-control-group .wy-form-full input[type=number],.wy-control-group .wy-form-full input[type=password],.wy-control-group .wy-form-full input[type=search],.wy-control-group .wy-form-full input[type=tel],.wy-control-group .wy-form-full input[type=text],.wy-control-group .wy-form-full input[type=time],.wy-control-group .wy-form-full input[type=url],.wy-control-group .wy-form-full input[type=week],.wy-control-group .wy-form-full select,.wy-control-group .wy-form-halves input[type=color],.wy-control-group .wy-form-halves input[type=date],.wy-control-group .wy-form-halves input[type=datetime-local],.wy-control-group .wy-form-halves input[type=datetime],.wy-control-group .wy-form-halves input[type=email],.wy-control-group .wy-form-halves input[type=month],.wy-control-group .wy-form-halves input[type=number],.wy-control-group .wy-form-halves input[type=password],.wy-control-group .wy-form-halves input[type=search],.wy-control-group .wy-form-halves input[type=tel],.wy-control-group .wy-form-halves input[type=text],.wy-control-group .wy-form-halves input[type=time],.wy-control-group .wy-form-halves input[type=url],.wy-control-group .wy-form-halves input[type=week],.wy-control-group .wy-form-halves select,.wy-control-group .wy-form-thirds input[type=color],.wy-control-group .wy-form-thirds input[type=date],.wy-control-group .wy-form-thirds input[type=datetime-local],.wy-control-group .wy-form-thirds input[type=datetime],.wy-control-group .wy-form-thirds input[type=email],.wy-control-group .wy-form-thirds input[type=month],.wy-control-group .wy-form-thirds input[type=number],.wy-control-group .wy-form-thirds input[type=password],.wy-control-group .wy-form-thirds input[type=search],.wy-control-group .wy-form-thirds input[type=tel],.wy-control-group .wy-form-thirds input[type=text],.wy-control-group .wy-form-thirds input[type=time],.wy-control-group .wy-form-thirds input[type=url],.wy-control-group .wy-form-thirds input[type=week],.wy-control-group .wy-form-thirds select{width:100%}.wy-control-group .wy-form-full{float:left;display:block;width:100%;margin-right:0}.wy-control-group .wy-form-full:last-child{margin-right:0}.wy-control-group .wy-form-halves{float:left;display:block;margin-right:2.35765%;width:48.82117%}.wy-control-group .wy-form-halves:last-child,.wy-control-group .wy-form-halves:nth-of-type(2n){margin-right:0}.wy-control-group .wy-form-halves:nth-of-type(odd){clear:left}.wy-control-group .wy-form-thirds{float:left;display:block;margin-right:2.35765%;width:31.76157%}.wy-control-group .wy-form-thirds:last-child,.wy-control-group .wy-form-thirds:nth-of-type(3n){margin-right:0}.wy-control-group .wy-form-thirds:nth-of-type(3n+1){clear:left}.wy-control-group.wy-control-group-no-input .wy-control,.wy-control-no-input{margin:6px 0 0;font-size:90%}.wy-control-no-input{display:inline-block}.wy-control-group.fluid-input input[type=color],.wy-control-group.fluid-input input[type=date],.wy-control-group.fluid-input input[type=datetime-local],.wy-control-group.fluid-input input[type=datetime],.wy-control-group.fluid-input input[type=email],.wy-control-group.fluid-input input[type=month],.wy-control-group.fluid-input input[type=number],.wy-control-group.fluid-input input[type=password],.wy-control-group.fluid-input input[type=search],.wy-control-group.fluid-input input[type=tel],.wy-control-group.fluid-input input[type=text],.wy-control-group.fluid-input input[type=time],.wy-control-group.fluid-input input[type=url],.wy-control-group.fluid-input input[type=week]{width:100%}.wy-form-message-inline{padding-left:.3em;color:#666;font-size:90%}.wy-form-message{display:block;color:#999;font-size:70%;margin-top:.3125em;font-style:italic}.wy-form-message p{font-size:inherit;font-style:italic;margin-bottom:6px}.wy-form-message p:last-child{margin-bottom:0}input{line-height:normal}input[type=button],input[type=reset],input[type=submit]{-webkit-appearance:button;cursor:pointer;font-family:Lato,proxima-nova,Helvetica Neue,Arial,sans-serif;*overflow:visible}input[type=color],input[type=date],input[type=datetime-local],input[type=datetime],input[type=email],input[type=month],input[type=number],input[type=password],input[type=search],input[type=tel],input[type=text],input[type=time],input[type=url],input[type=week]{-webkit-appearance:none;padding:6px;display:inline-block;border:1px solid #ccc;font-size:80%;font-family:Lato,proxima-nova,Helvetica Neue,Arial,sans-serif;box-shadow:inset 0 1px 3px #ddd;border-radius:0;-webkit-transition:border .3s linear;-moz-transition:border .3s linear;transition:border .3s linear}input[type=datetime-local]{padding:.34375em .625em}input[disabled]{cursor:default}input[type=checkbox],input[type=radio]{padding:0;margin-right:.3125em;*height:13px;*width:13px}input[type=checkbox],input[type=radio],input[type=search]{-webkit-box-sizing:border-box;-moz-box-sizing:border-box;box-sizing:border-box}input[type=search]::-webkit-search-cancel-button,input[type=search]::-webkit-search-decoration{-webkit-appearance:none}input[type=color]:focus,input[type=date]:focus,input[type=datetime-local]:focus,input[type=datetime]:focus,input[type=email]:focus,input[type=month]:focus,input[type=number]:focus,input[type=password]:focus,input[type=search]:focus,input[type=tel]:focus,input[type=text]:focus,input[type=time]:focus,input[type=url]:focus,input[type=week]:focus{outline:0;outline:thin dotted\9;border-color:#333}input.no-focus:focus{border-color:#ccc!important}input[type=checkbox]:focus,input[type=file]:focus,input[type=radio]:focus{outline:thin dotted #333;outline:1px auto #129fea}input[type=color][disabled],input[type=date][disabled],input[type=datetime-local][disabled],input[type=datetime][disabled],input[type=email][disabled],input[type=month][disabled],input[type=number][disabled],input[type=password][disabled],input[type=search][disabled],input[type=tel][disabled],input[type=text][disabled],input[type=time][disabled],input[type=url][disabled],input[type=week][disabled]{cursor:not-allowed;background-color:#fafafa}input:focus:invalid,select:focus:invalid,textarea:focus:invalid{color:#e74c3c;border:1px solid #e74c3c}input:focus:invalid:focus,select:focus:invalid:focus,textarea:focus:invalid:focus{border-color:#e74c3c}input[type=checkbox]:focus:invalid:focus,input[type=file]:focus:invalid:focus,input[type=radio]:focus:invalid:focus{outline-color:#e74c3c}input.wy-input-large{padding:12px;font-size:100%}textarea{overflow:auto;vertical-align:top;width:100%;font-family:Lato,proxima-nova,Helvetica Neue,Arial,sans-serif}select,textarea{padding:.5em .625em;display:inline-block;border:1px solid #ccc;font-size:80%;box-shadow:inset 0 1px 3px #ddd;-webkit-transition:border .3s linear;-moz-transition:border .3s linear;transition:border .3s linear}select{border:1px solid #ccc;background-color:#fff}select[multiple]{height:auto}select:focus,textarea:focus{outline:0}input[readonly],select[disabled],select[readonly],textarea[disabled],textarea[readonly]{cursor:not-allowed;background-color:#fafafa}input[type=checkbox][disabled],input[type=radio][disabled]{cursor:not-allowed}.wy-checkbox,.wy-radio{margin:6px 0;color:#404040;display:block}.wy-checkbox input,.wy-radio input{vertical-align:baseline}.wy-form-message-inline{display:inline-block;*display:inline;*zoom:1;vertical-align:middle}.wy-input-prefix,.wy-input-suffix{white-space:nowrap;padding:6px}.wy-input-prefix .wy-input-context,.wy-input-suffix .wy-input-context{line-height:27px;padding:0 8px;display:inline-block;font-size:80%;background-color:#f3f6f6;border:1px solid #ccc;color:#999}.wy-input-suffix .wy-input-context{border-left:0}.wy-input-prefix .wy-input-context{border-right:0}.wy-switch{position:relative;display:block;height:24px;margin-top:12px;cursor:pointer}.wy-switch:before{left:0;top:0;width:36px;height:12px;background:#ccc}.wy-switch:after,.wy-switch:before{position:absolute;content:"";display:block;border-radius:4px;-webkit-transition:all .2s ease-in-out;-moz-transition:all .2s ease-in-out;transition:all .2s ease-in-out}.wy-switch:after{width:18px;height:18px;background:#999;left:-3px;top:-3px}.wy-switch span{position:absolute;left:48px;display:block;font-size:12px;color:#ccc;line-height:1}.wy-switch.active:before{background:#1e8449}.wy-switch.active:after{left:24px;background:#27ae60}.wy-switch.disabled{cursor:not-allowed;opacity:.8}.wy-control-group.wy-control-group-error .wy-form-message,.wy-control-group.wy-control-group-error>label{color:#e74c3c}.wy-control-group.wy-control-group-error input[type=color],.wy-control-group.wy-control-group-error input[type=date],.wy-control-group.wy-control-group-error input[type=datetime-local],.wy-control-group.wy-control-group-error input[type=datetime],.wy-control-group.wy-control-group-error input[type=email],.wy-control-group.wy-control-group-error input[type=month],.wy-control-group.wy-control-group-error input[type=number],.wy-control-group.wy-control-group-error input[type=password],.wy-control-group.wy-control-group-error input[type=search],.wy-control-group.wy-control-group-error input[type=tel],.wy-control-group.wy-control-group-error input[type=text],.wy-control-group.wy-control-group-error input[type=time],.wy-control-group.wy-control-group-error input[type=url],.wy-control-group.wy-control-group-error input[type=week],.wy-control-group.wy-control-group-error textarea{border:1px solid #e74c3c}.wy-inline-validate{white-space:nowrap}.wy-inline-validate .wy-input-context{padding:.5em .625em;display:inline-block;font-size:80%}.wy-inline-validate.wy-inline-validate-success .wy-input-context{color:#27ae60}.wy-inline-validate.wy-inline-validate-danger .wy-input-context{color:#e74c3c}.wy-inline-validate.wy-inline-validate-warning .wy-input-context{color:#e67e22}.wy-inline-validate.wy-inline-validate-info .wy-input-context{color:#2980b9}.rotate-90{-webkit-transform:rotate(90deg);-moz-transform:rotate(90deg);-ms-transform:rotate(90deg);-o-transform:rotate(90deg);transform:rotate(90deg)}.rotate-180{-webkit-transform:rotate(180deg);-moz-transform:rotate(180deg);-ms-transform:rotate(180deg);-o-transform:rotate(180deg);transform:rotate(180deg)}.rotate-270{-webkit-transform:rotate(270deg);-moz-transform:rotate(270deg);-ms-transform:rotate(270deg);-o-transform:rotate(270deg);transform:rotate(270deg)}.mirror{-webkit-transform:scaleX(-1);-moz-transform:scaleX(-1);-ms-transform:scaleX(-1);-o-transform:scaleX(-1);transform:scaleX(-1)}.mirror.rotate-90{-webkit-transform:scaleX(-1) rotate(90deg);-moz-transform:scaleX(-1) rotate(90deg);-ms-transform:scaleX(-1) rotate(90deg);-o-transform:scaleX(-1) rotate(90deg);transform:scaleX(-1) rotate(90deg)}.mirror.rotate-180{-webkit-transform:scaleX(-1) rotate(180deg);-moz-transform:scaleX(-1) rotate(180deg);-ms-transform:scaleX(-1) rotate(180deg);-o-transform:scaleX(-1) rotate(180deg);transform:scaleX(-1) rotate(180deg)}.mirror.rotate-270{-webkit-transform:scaleX(-1) rotate(270deg);-moz-transform:scaleX(-1) rotate(270deg);-ms-transform:scaleX(-1) rotate(270deg);-o-transform:scaleX(-1) rotate(270deg);transform:scaleX(-1) rotate(270deg)}@media only screen and (max-width:480px){.wy-form button[type=submit]{margin:.7em 0 0}.wy-form input[type=color],.wy-form input[type=date],.wy-form input[type=datetime-local],.wy-form input[type=datetime],.wy-form input[type=email],.wy-form input[type=month],.wy-form input[type=number],.wy-form input[type=password],.wy-form input[type=search],.wy-form input[type=tel],.wy-form input[type=text],.wy-form input[type=time],.wy-form input[type=url],.wy-form input[type=week],.wy-form label{margin-bottom:.3em;display:block}.wy-form input[type=color],.wy-form input[type=date],.wy-form input[type=datetime-local],.wy-form input[type=datetime],.wy-form input[type=email],.wy-form input[type=month],.wy-form input[type=number],.wy-form input[type=password],.wy-form input[type=search],.wy-form input[type=tel],.wy-form input[type=time],.wy-form input[type=url],.wy-form input[type=week]{margin-bottom:0}.wy-form-aligned .wy-control-group label{margin-bottom:.3em;text-align:left;display:block;width:100%}.wy-form-aligned .wy-control{margin:1.5em 0 0}.wy-form-message,.wy-form-message-inline,.wy-form .wy-help-inline{display:block;font-size:80%;padding:6px 0}}@media screen and (max-width:768px){.tablet-hide{display:none}}@media screen and (max-width:480px){.mobile-hide{display:none}}.float-left{float:left}.float-right{float:right}.full-width{width:100%}.rst-content table.docutils,.rst-content table.field-list,.wy-table{border-collapse:collapse;border-spacing:0;empty-cells:show;margin-bottom:24px}.rst-content table.docutils caption,.rst-content table.field-list caption,.wy-table caption{color:#000;font:italic 85%/1 arial,sans-serif;padding:1em 0;text-align:center}.rst-content table.docutils td,.rst-content table.docutils th,.rst-content table.field-list td,.rst-content table.field-list th,.wy-table td,.wy-table th{font-size:90%;margin:0;overflow:visible;padding:8px 16px}.rst-content table.docutils td:first-child,.rst-content table.docutils th:first-child,.rst-content table.field-list td:first-child,.rst-content table.field-list th:first-child,.wy-table td:first-child,.wy-table th:first-child{border-left-width:0}.rst-content table.docutils thead,.rst-content table.field-list thead,.wy-table thead{color:#000;text-align:left;vertical-align:bottom;white-space:nowrap}.rst-content table.docutils thead th,.rst-content table.field-list thead th,.wy-table thead th{font-weight:700;border-bottom:2px solid #e1e4e5}.rst-content table.docutils td,.rst-content table.field-list td,.wy-table td{background-color:transparent;vertical-align:middle}.rst-content table.docutils td p,.rst-content table.field-list td p,.wy-table td p{line-height:18px}.rst-content table.docutils td p:last-child,.rst-content table.field-list td p:last-child,.wy-table td p:last-child{margin-bottom:0}.rst-content table.docutils .wy-table-cell-min,.rst-content table.field-list .wy-table-cell-min,.wy-table .wy-table-cell-min{width:1%;padding-right:0}.rst-content table.docutils .wy-table-cell-min input[type=checkbox],.rst-content table.field-list .wy-table-cell-min input[type=checkbox],.wy-table .wy-table-cell-min input[type=checkbox]{margin:0}.wy-table-secondary{color:grey;font-size:90%}.wy-table-tertiary{color:grey;font-size:80%}.rst-content table.docutils:not(.field-list) tr:nth-child(2n-1) td,.wy-table-backed,.wy-table-odd td,.wy-table-striped tr:nth-child(2n-1) td{background-color:#f3f6f6}.rst-content table.docutils,.wy-table-bordered-all{border:1px solid #e1e4e5}.rst-content table.docutils td,.wy-table-bordered-all td{border-bottom:1px solid #e1e4e5;border-left:1px solid #e1e4e5}.rst-content table.docutils tbody>tr:last-child td,.wy-table-bordered-all tbody>tr:last-child td{border-bottom-width:0}.wy-table-bordered{border:1px solid #e1e4e5}.wy-table-bordered-rows td{border-bottom:1px solid #e1e4e5}.wy-table-bordered-rows tbody>tr:last-child td{border-bottom-width:0}.wy-table-horizontal td,.wy-table-horizontal th{border-width:0 0 1px;border-bottom:1px solid #e1e4e5}.wy-table-horizontal tbody>tr:last-child td{border-bottom-width:0}.wy-table-responsive{margin-bottom:24px;max-width:100%;overflow:auto}.wy-table-responsive table{margin-bottom:0!important}.wy-table-responsive table td,.wy-table-responsive table th{white-space:nowrap}a{color:#2980b9;text-decoration:none;cursor:pointer}a:hover{color:#3091d1}a:visited{color:#9b59b6}html{height:100%}body,html{overflow-x:hidden}body{font-family:Lato,proxima-nova,Helvetica Neue,Arial,sans-serif;font-weight:400;color:#404040;min-height:100%;background:#edf0f2}.wy-text-left{text-align:left}.wy-text-center{text-align:center}.wy-text-right{text-align:right}.wy-text-large{font-size:120%}.wy-text-normal{font-size:100%}.wy-text-small,small{font-size:80%}.wy-text-strike{text-decoration:line-through}.wy-text-warning{color:#e67e22!important}a.wy-text-warning:hover{color:#eb9950!important}.wy-text-info{color:#2980b9!important}a.wy-text-info:hover{color:#409ad5!important}.wy-text-success{color:#27ae60!important}a.wy-text-success:hover{color:#36d278!important}.wy-text-danger{color:#e74c3c!important}a.wy-text-danger:hover{color:#ed7669!important}.wy-text-neutral{color:#404040!important}a.wy-text-neutral:hover{color:#595959!important}.rst-content .toctree-wrapper>p.caption,h1,h2,h3,h4,h5,h6,legend{margin-top:0;font-weight:700;font-family:Roboto Slab,ff-tisa-web-pro,Georgia,Arial,sans-serif}p{line-height:24px;font-size:16px;margin:0 0 24px}h1{font-size:175%}.rst-content .toctree-wrapper>p.caption,h2{font-size:150%}h3{font-size:125%}h4{font-size:115%}h5{font-size:110%}h6{font-size:100%}hr{display:block;height:1px;border:0;border-top:1px solid #e1e4e5;margin:24px 0;padding:0}.rst-content code,.rst-content tt,code{white-space:nowrap;max-width:100%;background:#fff;border:1px solid #e1e4e5;font-size:75%;padding:0 5px;font-family:SFMono-Regular,Menlo,Monaco,Consolas,Liberation Mono,Courier New,Courier,monospace;color:#e74c3c;overflow-x:auto}.rst-content tt.code-large,code.code-large{font-size:90%}.rst-content .section ul,.rst-content .toctree-wrapper ul,.rst-content section ul,.wy-plain-list-disc,article ul{list-style:disc;line-height:24px;margin-bottom:24px}.rst-content .section ul li,.rst-content .toctree-wrapper ul li,.rst-content section ul li,.wy-plain-list-disc li,article ul li{list-style:disc;margin-left:24px}.rst-content .section ul li p:last-child,.rst-content .section ul li ul,.rst-content .toctree-wrapper ul li p:last-child,.rst-content .toctree-wrapper ul li ul,.rst-content section ul li p:last-child,.rst-content section ul li ul,.wy-plain-list-disc li p:last-child,.wy-plain-list-disc li ul,article ul li p:last-child,article ul li ul{margin-bottom:0}.rst-content .section ul li li,.rst-content .toctree-wrapper ul li li,.rst-content section ul li li,.wy-plain-list-disc li li,article ul li li{list-style:circle}.rst-content .section ul li li li,.rst-content .toctree-wrapper ul li li li,.rst-content section ul li li li,.wy-plain-list-disc li li li,article ul li li li{list-style:square}.rst-content .section ul li ol li,.rst-content .toctree-wrapper ul li ol li,.rst-content section ul li ol li,.wy-plain-list-disc li ol li,article ul li ol li{list-style:decimal}.rst-content .section ol,.rst-content .section ol.arabic,.rst-content .toctree-wrapper ol,.rst-content .toctree-wrapper ol.arabic,.rst-content section ol,.rst-content section ol.arabic,.wy-plain-list-decimal,article ol{list-style:decimal;line-height:24px;margin-bottom:24px}.rst-content .section ol.arabic li,.rst-content .section ol li,.rst-content .toctree-wrapper ol.arabic li,.rst-content .toctree-wrapper ol li,.rst-content section ol.arabic li,.rst-content section ol li,.wy-plain-list-decimal li,article ol li{list-style:decimal;margin-left:24px}.rst-content .section ol.arabic li ul,.rst-content .section ol li p:last-child,.rst-content .section ol li ul,.rst-content .toctree-wrapper ol.arabic li ul,.rst-content .toctree-wrapper ol li p:last-child,.rst-content .toctree-wrapper ol li ul,.rst-content section ol.arabic li ul,.rst-content section ol li p:last-child,.rst-content section ol li ul,.wy-plain-list-decimal li p:last-child,.wy-plain-list-decimal li ul,article ol li p:last-child,article ol li ul{margin-bottom:0}.rst-content .section ol.arabic li ul li,.rst-content .section ol li ul li,.rst-content .toctree-wrapper ol.arabic li ul li,.rst-content .toctree-wrapper ol li ul li,.rst-content section ol.arabic li ul li,.rst-content section ol li ul li,.wy-plain-list-decimal li ul li,article ol li ul li{list-style:disc}.wy-breadcrumbs{*zoom:1}.wy-breadcrumbs:after,.wy-breadcrumbs:before{display:table;content:""}.wy-breadcrumbs:after{clear:both}.wy-breadcrumbs>li{display:inline-block;padding-top:5px}.wy-breadcrumbs>li.wy-breadcrumbs-aside{float:right}.rst-content .wy-breadcrumbs>li code,.rst-content .wy-breadcrumbs>li tt,.wy-breadcrumbs>li .rst-content tt,.wy-breadcrumbs>li code{all:inherit;color:inherit}.breadcrumb-item:before{content:"/";color:#bbb;font-size:13px;padding:0 6px 0 3px}.wy-breadcrumbs-extra{margin-bottom:0;color:#b3b3b3;font-size:80%;display:inline-block}@media screen and (max-width:480px){.wy-breadcrumbs-extra,.wy-breadcrumbs li.wy-breadcrumbs-aside{display:none}}@media print{.wy-breadcrumbs li.wy-breadcrumbs-aside{display:none}}html{font-size:16px}.wy-affix{position:fixed;top:1.618em}.wy-menu a:hover{text-decoration:none}.wy-menu-horiz{*zoom:1}.wy-menu-horiz:after,.wy-menu-horiz:before{display:table;content:""}.wy-menu-horiz:after{clear:both}.wy-menu-horiz li,.wy-menu-horiz ul{display:inline-block}.wy-menu-horiz li:hover{background:hsla(0,0%,100%,.1)}.wy-menu-horiz li.divide-left{border-left:1px solid #404040}.wy-menu-horiz li.divide-right{border-right:1px solid #404040}.wy-menu-horiz a{height:32px;display:inline-block;line-height:32px;padding:0 16px}.wy-menu-vertical{width:300px}.wy-menu-vertical header,.wy-menu-vertical p.caption{color:#55a5d9;height:32px;line-height:32px;padding:0 1.618em;margin:12px 0 0;display:block;font-weight:700;text-transform:uppercase;font-size:85%;white-space:nowrap}.wy-menu-vertical ul{margin-bottom:0}.wy-menu-vertical li.divide-top{border-top:1px solid #404040}.wy-menu-vertical li.divide-bottom{border-bottom:1px solid #404040}.wy-menu-vertical li.current{background:#e3e3e3}.wy-menu-vertical li.current a{color:grey;border-right:1px solid #c9c9c9;padding:.4045em 2.427em}.wy-menu-vertical li.current a:hover{background:#d6d6d6}.rst-content .wy-menu-vertical li tt,.wy-menu-vertical li .rst-content tt,.wy-menu-vertical li code{border:none;background:inherit;color:inherit;padding-left:0;padding-right:0}.wy-menu-vertical li button.toctree-expand{display:block;float:left;margin-left:-1.2em;line-height:18px;color:#4d4d4d;border:none;background:none;padding:0}.wy-menu-vertical li.current>a,.wy-menu-vertical li.on a{color:#404040;font-weight:700;position:relative;background:#fcfcfc;border:none;padding:.4045em 1.618em}.wy-menu-vertical li.current>a:hover,.wy-menu-vertical li.on a:hover{background:#fcfcfc}.wy-menu-vertical li.current>a:hover button.toctree-expand,.wy-menu-vertical li.on a:hover button.toctree-expand{color:grey}.wy-menu-vertical li.current>a button.toctree-expand,.wy-menu-vertical li.on a button.toctree-expand{display:block;line-height:18px;color:#333}.wy-menu-vertical li.toctree-l1.current>a{border-bottom:1px solid #c9c9c9;border-top:1px solid #c9c9c9}.wy-menu-vertical .toctree-l1.current .toctree-l2>ul,.wy-menu-vertical .toctree-l2.current .toctree-l3>ul,.wy-menu-vertical .toctree-l3.current .toctree-l4>ul,.wy-menu-vertical .toctree-l4.current .toctree-l5>ul,.wy-menu-vertical .toctree-l5.current .toctree-l6>ul,.wy-menu-vertical .toctree-l6.current .toctree-l7>ul,.wy-menu-vertical .toctree-l7.current .toctree-l8>ul,.wy-menu-vertical .toctree-l8.current .toctree-l9>ul,.wy-menu-vertical .toctree-l9.current .toctree-l10>ul,.wy-menu-vertical .toctree-l10.current .toctree-l11>ul{display:none}.wy-menu-vertical .toctree-l1.current .current.toctree-l2>ul,.wy-menu-vertical .toctree-l2.current .current.toctree-l3>ul,.wy-menu-vertical .toctree-l3.current .current.toctree-l4>ul,.wy-menu-vertical .toctree-l4.current .current.toctree-l5>ul,.wy-menu-vertical .toctree-l5.current .current.toctree-l6>ul,.wy-menu-vertical .toctree-l6.current .current.toctree-l7>ul,.wy-menu-vertical .toctree-l7.current .current.toctree-l8>ul,.wy-menu-vertical .toctree-l8.current .current.toctree-l9>ul,.wy-menu-vertical .toctree-l9.current .current.toctree-l10>ul,.wy-menu-vertical .toctree-l10.current .current.toctree-l11>ul{display:block}.wy-menu-vertical li.toctree-l3,.wy-menu-vertical li.toctree-l4{font-size:.9em}.wy-menu-vertical li.toctree-l2 a,.wy-menu-vertical li.toctree-l3 a,.wy-menu-vertical li.toctree-l4 a,.wy-menu-vertical li.toctree-l5 a,.wy-menu-vertical li.toctree-l6 a,.wy-menu-vertical li.toctree-l7 a,.wy-menu-vertical li.toctree-l8 a,.wy-menu-vertical li.toctree-l9 a,.wy-menu-vertical li.toctree-l10 a{color:#404040}.wy-menu-vertical li.toctree-l2 a:hover button.toctree-expand,.wy-menu-vertical li.toctree-l3 a:hover button.toctree-expand,.wy-menu-vertical li.toctree-l4 a:hover button.toctree-expand,.wy-menu-vertical li.toctree-l5 a:hover button.toctree-expand,.wy-menu-vertical li.toctree-l6 a:hover button.toctree-expand,.wy-menu-vertical li.toctree-l7 a:hover button.toctree-expand,.wy-menu-vertical li.toctree-l8 a:hover button.toctree-expand,.wy-menu-vertical li.toctree-l9 a:hover button.toctree-expand,.wy-menu-vertical li.toctree-l10 a:hover button.toctree-expand{color:grey}.wy-menu-vertical li.toctree-l2.current li.toctree-l3>a,.wy-menu-vertical li.toctree-l3.current li.toctree-l4>a,.wy-menu-vertical li.toctree-l4.current li.toctree-l5>a,.wy-menu-vertical li.toctree-l5.current li.toctree-l6>a,.wy-menu-vertical li.toctree-l6.current li.toctree-l7>a,.wy-menu-vertical li.toctree-l7.current li.toctree-l8>a,.wy-menu-vertical li.toctree-l8.current li.toctree-l9>a,.wy-menu-vertical li.toctree-l9.current li.toctree-l10>a,.wy-menu-vertical li.toctree-l10.current li.toctree-l11>a{display:block}.wy-menu-vertical li.toctree-l2.current>a{padding:.4045em 2.427em}.wy-menu-vertical li.toctree-l2.current li.toctree-l3>a{padding:.4045em 1.618em .4045em 4.045em}.wy-menu-vertical li.toctree-l3.current>a{padding:.4045em 4.045em}.wy-menu-vertical li.toctree-l3.current li.toctree-l4>a{padding:.4045em 1.618em .4045em 5.663em}.wy-menu-vertical li.toctree-l4.current>a{padding:.4045em 5.663em}.wy-menu-vertical li.toctree-l4.current li.toctree-l5>a{padding:.4045em 1.618em .4045em 7.281em}.wy-menu-vertical li.toctree-l5.current>a{padding:.4045em 7.281em}.wy-menu-vertical li.toctree-l5.current li.toctree-l6>a{padding:.4045em 1.618em .4045em 8.899em}.wy-menu-vertical li.toctree-l6.current>a{padding:.4045em 8.899em}.wy-menu-vertical li.toctree-l6.current li.toctree-l7>a{padding:.4045em 1.618em .4045em 10.517em}.wy-menu-vertical li.toctree-l7.current>a{padding:.4045em 10.517em}.wy-menu-vertical li.toctree-l7.current li.toctree-l8>a{padding:.4045em 1.618em .4045em 12.135em}.wy-menu-vertical li.toctree-l8.current>a{padding:.4045em 12.135em}.wy-menu-vertical li.toctree-l8.current li.toctree-l9>a{padding:.4045em 1.618em .4045em 13.753em}.wy-menu-vertical li.toctree-l9.current>a{padding:.4045em 13.753em}.wy-menu-vertical li.toctree-l9.current li.toctree-l10>a{padding:.4045em 1.618em .4045em 15.371em}.wy-menu-vertical li.toctree-l10.current>a{padding:.4045em 15.371em}.wy-menu-vertical li.toctree-l10.current li.toctree-l11>a{padding:.4045em 1.618em .4045em 16.989em}.wy-menu-vertical li.toctree-l2.current>a,.wy-menu-vertical li.toctree-l2.current li.toctree-l3>a{background:#c9c9c9}.wy-menu-vertical li.toctree-l2 button.toctree-expand{color:#a3a3a3}.wy-menu-vertical li.toctree-l3.current>a,.wy-menu-vertical li.toctree-l3.current li.toctree-l4>a{background:#bdbdbd}.wy-menu-vertical li.toctree-l3 button.toctree-expand{color:#969696}.wy-menu-vertical li.current ul{display:block}.wy-menu-vertical li ul{margin-bottom:0;display:none}.wy-menu-vertical li ul li a{margin-bottom:0;color:#d9d9d9;font-weight:400}.wy-menu-vertical a{line-height:18px;padding:.4045em 1.618em;display:block;position:relative;font-size:90%;color:#d9d9d9}.wy-menu-vertical a:hover{background-color:#4e4a4a;cursor:pointer}.wy-menu-vertical a:hover button.toctree-expand{color:#d9d9d9}.wy-menu-vertical a:active{background-color:#2980b9;cursor:pointer;color:#fff}.wy-menu-vertical a:active button.toctree-expand{color:#fff}.wy-side-nav-search{display:block;width:300px;padding:.809em;margin-bottom:.809em;z-index:200;background-color:#2980b9;text-align:center;color:#fcfcfc}.wy-side-nav-search input[type=text]{width:100%;border-radius:50px;padding:6px 12px;border-color:#2472a4}.wy-side-nav-search img{display:block;margin:auto auto .809em;height:45px;width:45px;background-color:#2980b9;padding:5px;border-radius:100%}.wy-side-nav-search .wy-dropdown>a,.wy-side-nav-search>a{color:#fcfcfc;font-size:100%;font-weight:700;display:inline-block;padding:4px 6px;margin-bottom:.809em;max-width:100%}.wy-side-nav-search .wy-dropdown>a:hover,.wy-side-nav-search>a:hover{background:hsla(0,0%,100%,.1)}.wy-side-nav-search .wy-dropdown>a img.logo,.wy-side-nav-search>a img.logo{display:block;margin:0 auto;height:auto;width:auto;border-radius:0;max-width:100%;background:transparent}.wy-side-nav-search .wy-dropdown>a.icon img.logo,.wy-side-nav-search>a.icon img.logo{margin-top:.85em}.wy-side-nav-search>div.version{margin-top:-.4045em;margin-bottom:.809em;font-weight:400;color:hsla(0,0%,100%,.3)}.wy-nav .wy-menu-vertical header{color:#2980b9}.wy-nav .wy-menu-vertical a{color:#b3b3b3}.wy-nav .wy-menu-vertical a:hover{background-color:#2980b9;color:#fff}[data-menu-wrap]{-webkit-transition:all .2s ease-in;-moz-transition:all .2s ease-in;transition:all .2s ease-in;position:absolute;opacity:1;width:100%;opacity:0}[data-menu-wrap].move-center{left:0;right:auto;opacity:1}[data-menu-wrap].move-left{right:auto;left:-100%;opacity:0}[data-menu-wrap].move-right{right:-100%;left:auto;opacity:0}.wy-body-for-nav{background:#fcfcfc}.wy-grid-for-nav{position:absolute;width:100%;height:100%}.wy-nav-side{position:fixed;top:0;bottom:0;left:0;padding-bottom:2em;width:300px;overflow-x:hidden;overflow-y:hidden;min-height:100%;color:#9b9b9b;background:#343131;z-index:200}.wy-side-scroll{width:320px;position:relative;overflow-x:hidden;overflow-y:scroll;height:100%}.wy-nav-top{display:none;background:#2980b9;color:#fff;padding:.4045em .809em;position:relative;line-height:50px;text-align:center;font-size:100%;*zoom:1}.wy-nav-top:after,.wy-nav-top:before{display:table;content:""}.wy-nav-top:after{clear:both}.wy-nav-top a{color:#fff;font-weight:700}.wy-nav-top img{margin-right:12px;height:45px;width:45px;background-color:#2980b9;padding:5px;border-radius:100%}.wy-nav-top i{font-size:30px;float:left;cursor:pointer;padding-top:inherit}.wy-nav-content-wrap{margin-left:300px;background:#fcfcfc;min-height:100%}.wy-nav-content{padding:1.618em 3.236em;height:100%;max-width:800px;margin:auto}.wy-body-mask{position:fixed;width:100%;height:100%;background:rgba(0,0,0,.2);display:none;z-index:499}.wy-body-mask.on{display:block}footer{color:grey}footer p{margin-bottom:12px}.rst-content footer span.commit tt,footer span.commit .rst-content tt,footer span.commit code{padding:0;font-family:SFMono-Regular,Menlo,Monaco,Consolas,Liberation Mono,Courier New,Courier,monospace;font-size:1em;background:none;border:none;color:grey}.rst-footer-buttons{*zoom:1}.rst-footer-buttons:after,.rst-footer-buttons:before{width:100%;display:table;content:""}.rst-footer-buttons:after{clear:both}.rst-breadcrumbs-buttons{margin-top:12px;*zoom:1}.rst-breadcrumbs-buttons:after,.rst-breadcrumbs-buttons:before{display:table;content:""}.rst-breadcrumbs-buttons:after{clear:both}#search-results .search li{margin-bottom:24px;border-bottom:1px solid #e1e4e5;padding-bottom:24px}#search-results .search li:first-child{border-top:1px solid #e1e4e5;padding-top:24px}#search-results .search li a{font-size:120%;margin-bottom:12px;display:inline-block}#search-results .context{color:grey;font-size:90%}.genindextable li>ul{margin-left:24px}@media screen and (max-width:768px){.wy-body-for-nav{background:#fcfcfc}.wy-nav-top{display:block}.wy-nav-side{left:-300px}.wy-nav-side.shift{width:85%;left:0}.wy-menu.wy-menu-vertical,.wy-side-nav-search,.wy-side-scroll{width:auto}.wy-nav-content-wrap{margin-left:0}.wy-nav-content-wrap .wy-nav-content{padding:1.618em}.wy-nav-content-wrap.shift{position:fixed;min-width:100%;left:85%;top:0;height:100%;overflow:hidden}}@media screen and (min-width:1100px){.wy-nav-content-wrap{background:rgba(0,0,0,.05)}.wy-nav-content{margin:0;background:#fcfcfc}}@media print{.rst-versions,.wy-nav-side,footer{display:none}.wy-nav-content-wrap{margin-left:0}}.rst-versions{position:fixed;bottom:0;left:0;width:300px;color:#fcfcfc;background:#1f1d1d;font-family:Lato,proxima-nova,Helvetica Neue,Arial,sans-serif;z-index:400}.rst-versions a{color:#2980b9;text-decoration:none}.rst-versions .rst-badge-small{display:none}.rst-versions .rst-current-version{padding:12px;background-color:#272525;display:block;text-align:right;font-size:90%;cursor:pointer;color:#27ae60;*zoom:1}.rst-versions .rst-current-version:after,.rst-versions .rst-current-version:before{display:table;content:""}.rst-versions .rst-current-version:after{clear:both}.rst-content .code-block-caption .rst-versions .rst-current-version .headerlink,.rst-content .eqno .rst-versions .rst-current-version .headerlink,.rst-content .rst-versions .rst-current-version .admonition-title,.rst-content code.download .rst-versions .rst-current-version span:first-child,.rst-content dl dt .rst-versions .rst-current-version .headerlink,.rst-content h1 .rst-versions .rst-current-version .headerlink,.rst-content h2 .rst-versions .rst-current-version .headerlink,.rst-content h3 .rst-versions .rst-current-version .headerlink,.rst-content h4 .rst-versions .rst-current-version .headerlink,.rst-content h5 .rst-versions .rst-current-version .headerlink,.rst-content h6 .rst-versions .rst-current-version .headerlink,.rst-content p .rst-versions .rst-current-version .headerlink,.rst-content table>caption .rst-versions .rst-current-version .headerlink,.rst-content tt.download .rst-versions .rst-current-version span:first-child,.rst-versions .rst-current-version .fa,.rst-versions .rst-current-version .icon,.rst-versions .rst-current-version .rst-content .admonition-title,.rst-versions .rst-current-version .rst-content .code-block-caption .headerlink,.rst-versions .rst-current-version .rst-content .eqno .headerlink,.rst-versions .rst-current-version .rst-content code.download span:first-child,.rst-versions .rst-current-version .rst-content dl dt .headerlink,.rst-versions .rst-current-version .rst-content h1 .headerlink,.rst-versions .rst-current-version .rst-content h2 .headerlink,.rst-versions .rst-current-version .rst-content h3 .headerlink,.rst-versions .rst-current-version .rst-content h4 .headerlink,.rst-versions .rst-current-version .rst-content h5 .headerlink,.rst-versions .rst-current-version .rst-content h6 .headerlink,.rst-versions .rst-current-version .rst-content p .headerlink,.rst-versions .rst-current-version .rst-content table>caption .headerlink,.rst-versions .rst-current-version .rst-content tt.download span:first-child,.rst-versions .rst-current-version .wy-menu-vertical li button.toctree-expand,.wy-menu-vertical li .rst-versions .rst-current-version button.toctree-expand{color:#fcfcfc}.rst-versions .rst-current-version .fa-book,.rst-versions .rst-current-version .icon-book{float:left}.rst-versions .rst-current-version.rst-out-of-date{background-color:#e74c3c;color:#fff}.rst-versions .rst-current-version.rst-active-old-version{background-color:#f1c40f;color:#000}.rst-versions.shift-up{height:auto;max-height:100%;overflow-y:scroll}.rst-versions.shift-up .rst-other-versions{display:block}.rst-versions .rst-other-versions{font-size:90%;padding:12px;color:grey;display:none}.rst-versions .rst-other-versions hr{display:block;height:1px;border:0;margin:20px 0;padding:0;border-top:1px solid #413d3d}.rst-versions .rst-other-versions dd{display:inline-block;margin:0}.rst-versions .rst-other-versions dd a{display:inline-block;padding:6px;color:#fcfcfc}.rst-versions.rst-badge{width:auto;bottom:20px;right:20px;left:auto;border:none;max-width:300px;max-height:90%}.rst-versions.rst-badge .fa-book,.rst-versions.rst-badge .icon-book{float:none;line-height:30px}.rst-versions.rst-badge.shift-up .rst-current-version{text-align:right}.rst-versions.rst-badge.shift-up .rst-current-version .fa-book,.rst-versions.rst-badge.shift-up .rst-current-version .icon-book{float:left}.rst-versions.rst-badge>.rst-current-version{width:auto;height:30px;line-height:30px;padding:0 6px;display:block;text-align:center}@media screen and (max-width:768px){.rst-versions{width:85%;display:none}.rst-versions.shift{display:block}}.rst-content .toctree-wrapper>p.caption,.rst-content h1,.rst-content h2,.rst-content h3,.rst-content h4,.rst-content h5,.rst-content h6{margin-bottom:24px}.rst-content img{max-width:100%;height:auto}.rst-content div.figure,.rst-content figure{margin-bottom:24px}.rst-content div.figure .caption-text,.rst-content figure .caption-text{font-style:italic}.rst-content div.figure p:last-child.caption,.rst-content figure p:last-child.caption{margin-bottom:0}.rst-content div.figure.align-center,.rst-content figure.align-center{text-align:center}.rst-content .section>a>img,.rst-content .section>img,.rst-content section>a>img,.rst-content section>img{margin-bottom:24px}.rst-content abbr[title]{text-decoration:none}.rst-content.style-external-links a.reference.external:after{font-family:FontAwesome;content:"\f08e";color:#b3b3b3;vertical-align:super;font-size:60%;margin:0 .2em}.rst-content blockquote{margin-left:24px;line-height:24px;margin-bottom:24px}.rst-content pre.literal-block{white-space:pre;margin:0;padding:12px;font-family:SFMono-Regular,Menlo,Monaco,Consolas,Liberation Mono,Courier New,Courier,monospace;display:block;overflow:auto}.rst-content div[class^=highlight],.rst-content pre.literal-block{border:1px solid #e1e4e5;overflow-x:auto;margin:1px 0 24px}.rst-content div[class^=highlight] div[class^=highlight],.rst-content pre.literal-block div[class^=highlight]{padding:0;border:none;margin:0}.rst-content div[class^=highlight] td.code{width:100%}.rst-content .linenodiv pre{border-right:1px solid #e6e9ea;margin:0;padding:12px;font-family:SFMono-Regular,Menlo,Monaco,Consolas,Liberation Mono,Courier New,Courier,monospace;user-select:none;pointer-events:none}.rst-content div[class^=highlight] pre{white-space:pre;margin:0;padding:12px;display:block;overflow:auto}.rst-content div[class^=highlight] pre .hll{display:block;margin:0 -12px;padding:0 12px}.rst-content .linenodiv pre,.rst-content div[class^=highlight] pre,.rst-content pre.literal-block{font-family:SFMono-Regular,Menlo,Monaco,Consolas,Liberation Mono,Courier New,Courier,monospace;font-size:12px;line-height:1.4}.rst-content div.highlight .gp,.rst-content div.highlight span.linenos{user-select:none;pointer-events:none}.rst-content div.highlight span.linenos{display:inline-block;padding-left:0;padding-right:12px;margin-right:12px;border-right:1px solid #e6e9ea}.rst-content .code-block-caption{font-style:italic;font-size:85%;line-height:1;padding:1em 0;text-align:center}@media print{.rst-content .codeblock,.rst-content div[class^=highlight],.rst-content div[class^=highlight] pre{white-space:pre-wrap}}.rst-content .admonition,.rst-content .admonition-todo,.rst-content .attention,.rst-content .caution,.rst-content .danger,.rst-content .error,.rst-content .hint,.rst-content .important,.rst-content .note,.rst-content .seealso,.rst-content .tip,.rst-content .warning{clear:both}.rst-content .admonition-todo .last,.rst-content .admonition-todo>:last-child,.rst-content .admonition .last,.rst-content .admonition>:last-child,.rst-content .attention .last,.rst-content .attention>:last-child,.rst-content .caution .last,.rst-content .caution>:last-child,.rst-content .danger .last,.rst-content .danger>:last-child,.rst-content .error .last,.rst-content .error>:last-child,.rst-content .hint .last,.rst-content .hint>:last-child,.rst-content .important .last,.rst-content .important>:last-child,.rst-content .note .last,.rst-content .note>:last-child,.rst-content .seealso .last,.rst-content .seealso>:last-child,.rst-content .tip .last,.rst-content .tip>:last-child,.rst-content .warning .last,.rst-content .warning>:last-child{margin-bottom:0}.rst-content .admonition-title:before{margin-right:4px}.rst-content .admonition table{border-color:rgba(0,0,0,.1)}.rst-content .admonition table td,.rst-content .admonition table th{background:transparent!important;border-color:rgba(0,0,0,.1)!important}.rst-content .section ol.loweralpha,.rst-content .section ol.loweralpha>li,.rst-content .toctree-wrapper ol.loweralpha,.rst-content .toctree-wrapper ol.loweralpha>li,.rst-content section ol.loweralpha,.rst-content section ol.loweralpha>li{list-style:lower-alpha}.rst-content .section ol.upperalpha,.rst-content .section ol.upperalpha>li,.rst-content .toctree-wrapper ol.upperalpha,.rst-content .toctree-wrapper ol.upperalpha>li,.rst-content section ol.upperalpha,.rst-content section ol.upperalpha>li{list-style:upper-alpha}.rst-content .section ol li>*,.rst-content .section ul li>*,.rst-content .toctree-wrapper ol li>*,.rst-content .toctree-wrapper ul li>*,.rst-content section ol li>*,.rst-content section ul li>*{margin-top:12px;margin-bottom:12px}.rst-content .section ol li>:first-child,.rst-content .section ul li>:first-child,.rst-content .toctree-wrapper ol li>:first-child,.rst-content .toctree-wrapper ul li>:first-child,.rst-content section ol li>:first-child,.rst-content section ul li>:first-child{margin-top:0}.rst-content .section ol li>p,.rst-content .section ol li>p:last-child,.rst-content .section ul li>p,.rst-content .section ul li>p:last-child,.rst-content .toctree-wrapper ol li>p,.rst-content .toctree-wrapper ol li>p:last-child,.rst-content .toctree-wrapper ul li>p,.rst-content .toctree-wrapper ul li>p:last-child,.rst-content section ol li>p,.rst-content section ol li>p:last-child,.rst-content section ul li>p,.rst-content section ul li>p:last-child{margin-bottom:12px}.rst-content .section ol li>p:only-child,.rst-content .section ol li>p:only-child:last-child,.rst-content .section ul li>p:only-child,.rst-content .section ul li>p:only-child:last-child,.rst-content .toctree-wrapper ol li>p:only-child,.rst-content .toctree-wrapper ol li>p:only-child:last-child,.rst-content .toctree-wrapper ul li>p:only-child,.rst-content .toctree-wrapper ul li>p:only-child:last-child,.rst-content section ol li>p:only-child,.rst-content section ol li>p:only-child:last-child,.rst-content section ul li>p:only-child,.rst-content section ul li>p:only-child:last-child{margin-bottom:0}.rst-content .section ol li>ol,.rst-content .section ol li>ul,.rst-content .section ul li>ol,.rst-content .section ul li>ul,.rst-content .toctree-wrapper ol li>ol,.rst-content .toctree-wrapper ol li>ul,.rst-content .toctree-wrapper ul li>ol,.rst-content .toctree-wrapper ul li>ul,.rst-content section ol li>ol,.rst-content section ol li>ul,.rst-content section ul li>ol,.rst-content section ul li>ul{margin-bottom:12px}.rst-content .section ol.simple li>*,.rst-content .section ol.simple li ol,.rst-content .section ol.simple li ul,.rst-content .section ul.simple li>*,.rst-content .section ul.simple li ol,.rst-content .section ul.simple li ul,.rst-content .toctree-wrapper ol.simple li>*,.rst-content .toctree-wrapper ol.simple li ol,.rst-content .toctree-wrapper ol.simple li ul,.rst-content .toctree-wrapper ul.simple li>*,.rst-content .toctree-wrapper ul.simple li ol,.rst-content .toctree-wrapper ul.simple li ul,.rst-content section ol.simple li>*,.rst-content section ol.simple li ol,.rst-content section ol.simple li ul,.rst-content section ul.simple li>*,.rst-content section ul.simple li ol,.rst-content section ul.simple li ul{margin-top:0;margin-bottom:0}.rst-content .line-block{margin-left:0;margin-bottom:24px;line-height:24px}.rst-content .line-block .line-block{margin-left:24px;margin-bottom:0}.rst-content .topic-title{font-weight:700;margin-bottom:12px}.rst-content .toc-backref{color:#404040}.rst-content .align-right{float:right;margin:0 0 24px 24px}.rst-content .align-left{float:left;margin:0 24px 24px 0}.rst-content .align-center{margin:auto}.rst-content .align-center:not(table){display:block}.rst-content .code-block-caption .headerlink,.rst-content .eqno .headerlink,.rst-content .toctree-wrapper>p.caption .headerlink,.rst-content dl dt .headerlink,.rst-content h1 .headerlink,.rst-content h2 .headerlink,.rst-content h3 .headerlink,.rst-content h4 .headerlink,.rst-content h5 .headerlink,.rst-content h6 .headerlink,.rst-content p.caption .headerlink,.rst-content p .headerlink,.rst-content table>caption .headerlink{opacity:0;font-size:14px;font-family:FontAwesome;margin-left:.5em}.rst-content .code-block-caption .headerlink:focus,.rst-content .code-block-caption:hover .headerlink,.rst-content .eqno .headerlink:focus,.rst-content .eqno:hover .headerlink,.rst-content .toctree-wrapper>p.caption .headerlink:focus,.rst-content .toctree-wrapper>p.caption:hover .headerlink,.rst-content dl dt .headerlink:focus,.rst-content dl dt:hover .headerlink,.rst-content h1 .headerlink:focus,.rst-content h1:hover .headerlink,.rst-content h2 .headerlink:focus,.rst-content h2:hover .headerlink,.rst-content h3 .headerlink:focus,.rst-content h3:hover .headerlink,.rst-content h4 .headerlink:focus,.rst-content h4:hover .headerlink,.rst-content h5 .headerlink:focus,.rst-content h5:hover .headerlink,.rst-content h6 .headerlink:focus,.rst-content h6:hover .headerlink,.rst-content p.caption .headerlink:focus,.rst-content p.caption:hover .headerlink,.rst-content p .headerlink:focus,.rst-content p:hover .headerlink,.rst-content table>caption .headerlink:focus,.rst-content table>caption:hover .headerlink{opacity:1}.rst-content p a{overflow-wrap:anywhere}.rst-content .wy-table td p,.rst-content .wy-table td ul,.rst-content .wy-table th p,.rst-content .wy-table th ul,.rst-content table.docutils td p,.rst-content table.docutils td ul,.rst-content table.docutils th p,.rst-content table.docutils th ul,.rst-content table.field-list td p,.rst-content table.field-list td ul,.rst-content table.field-list th p,.rst-content table.field-list th ul{font-size:inherit}.rst-content .btn:focus{outline:2px solid}.rst-content table>caption .headerlink:after{font-size:12px}.rst-content .centered{text-align:center}.rst-content .sidebar{float:right;width:40%;display:block;margin:0 0 24px 24px;padding:24px;background:#f3f6f6;border:1px solid #e1e4e5}.rst-content .sidebar dl,.rst-content .sidebar p,.rst-content .sidebar ul{font-size:90%}.rst-content .sidebar .last,.rst-content .sidebar>:last-child{margin-bottom:0}.rst-content .sidebar .sidebar-title{display:block;font-family:Roboto Slab,ff-tisa-web-pro,Georgia,Arial,sans-serif;font-weight:700;background:#e1e4e5;padding:6px 12px;margin:-24px -24px 24px;font-size:100%}.rst-content .highlighted{background:#f1c40f;box-shadow:0 0 0 2px #f1c40f;display:inline;font-weight:700}.rst-content .citation-reference,.rst-content .footnote-reference{vertical-align:baseline;position:relative;top:-.4em;line-height:0;font-size:90%}.rst-content .citation-reference>span.fn-bracket,.rst-content .footnote-reference>span.fn-bracket{display:none}.rst-content .hlist{width:100%}.rst-content dl dt span.classifier:before{content:" : "}.rst-content dl dt span.classifier-delimiter{display:none!important}html.writer-html4 .rst-content table.docutils.citation,html.writer-html4 .rst-content table.docutils.footnote{background:none;border:none}html.writer-html4 .rst-content table.docutils.citation td,html.writer-html4 .rst-content table.docutils.citation tr,html.writer-html4 .rst-content table.docutils.footnote td,html.writer-html4 .rst-content table.docutils.footnote tr{border:none;background-color:transparent!important;white-space:normal}html.writer-html4 .rst-content table.docutils.citation td.label,html.writer-html4 .rst-content table.docutils.footnote td.label{padding-left:0;padding-right:0;vertical-align:top}html.writer-html5 .rst-content dl.citation,html.writer-html5 .rst-content dl.field-list,html.writer-html5 .rst-content dl.footnote{display:grid;grid-template-columns:auto minmax(80%,95%)}html.writer-html5 .rst-content dl.citation>dt,html.writer-html5 .rst-content dl.field-list>dt,html.writer-html5 .rst-content dl.footnote>dt{display:inline-grid;grid-template-columns:max-content auto}html.writer-html5 .rst-content aside.citation,html.writer-html5 .rst-content aside.footnote,html.writer-html5 .rst-content div.citation{display:grid;grid-template-columns:auto auto minmax(.65rem,auto) minmax(40%,95%)}html.writer-html5 .rst-content aside.citation>span.label,html.writer-html5 .rst-content aside.footnote>span.label,html.writer-html5 .rst-content div.citation>span.label{grid-column-start:1;grid-column-end:2}html.writer-html5 .rst-content aside.citation>span.backrefs,html.writer-html5 .rst-content aside.footnote>span.backrefs,html.writer-html5 .rst-content div.citation>span.backrefs{grid-column-start:2;grid-column-end:3;grid-row-start:1;grid-row-end:3}html.writer-html5 .rst-content aside.citation>p,html.writer-html5 .rst-content aside.footnote>p,html.writer-html5 .rst-content div.citation>p{grid-column-start:4;grid-column-end:5}html.writer-html5 .rst-content dl.citation,html.writer-html5 .rst-content dl.field-list,html.writer-html5 .rst-content dl.footnote{margin-bottom:24px}html.writer-html5 .rst-content dl.citation>dt,html.writer-html5 .rst-content dl.field-list>dt,html.writer-html5 .rst-content dl.footnote>dt{padding-left:1rem}html.writer-html5 .rst-content dl.citation>dd,html.writer-html5 .rst-content dl.citation>dt,html.writer-html5 .rst-content dl.field-list>dd,html.writer-html5 .rst-content dl.field-list>dt,html.writer-html5 .rst-content dl.footnote>dd,html.writer-html5 .rst-content dl.footnote>dt{margin-bottom:0}html.writer-html5 .rst-content dl.citation,html.writer-html5 .rst-content dl.footnote{font-size:.9rem}html.writer-html5 .rst-content dl.citation>dt,html.writer-html5 .rst-content dl.footnote>dt{margin:0 .5rem .5rem 0;line-height:1.2rem;word-break:break-all;font-weight:400}html.writer-html5 .rst-content dl.citation>dt>span.brackets:before,html.writer-html5 .rst-content dl.footnote>dt>span.brackets:before{content:"["}html.writer-html5 .rst-content dl.citation>dt>span.brackets:after,html.writer-html5 .rst-content dl.footnote>dt>span.brackets:after{content:"]"}html.writer-html5 .rst-content dl.citation>dt>span.fn-backref,html.writer-html5 .rst-content dl.footnote>dt>span.fn-backref{text-align:left;font-style:italic;margin-left:.65rem;word-break:break-word;word-spacing:-.1rem;max-width:5rem}html.writer-html5 .rst-content dl.citation>dt>span.fn-backref>a,html.writer-html5 .rst-content dl.footnote>dt>span.fn-backref>a{word-break:keep-all}html.writer-html5 .rst-content dl.citation>dt>span.fn-backref>a:not(:first-child):before,html.writer-html5 .rst-content dl.footnote>dt>span.fn-backref>a:not(:first-child):before{content:" "}html.writer-html5 .rst-content dl.citation>dd,html.writer-html5 .rst-content dl.footnote>dd{margin:0 0 .5rem;line-height:1.2rem}html.writer-html5 .rst-content dl.citation>dd p,html.writer-html5 .rst-content dl.footnote>dd p{font-size:.9rem}html.writer-html5 .rst-content aside.citation,html.writer-html5 .rst-content aside.footnote,html.writer-html5 .rst-content div.citation{padding-left:1rem;padding-right:1rem;font-size:.9rem;line-height:1.2rem}html.writer-html5 .rst-content aside.citation p,html.writer-html5 .rst-content aside.footnote p,html.writer-html5 .rst-content div.citation p{font-size:.9rem;line-height:1.2rem;margin-bottom:12px}html.writer-html5 .rst-content aside.citation span.backrefs,html.writer-html5 .rst-content aside.footnote span.backrefs,html.writer-html5 .rst-content div.citation span.backrefs{text-align:left;font-style:italic;margin-left:.65rem;word-break:break-word;word-spacing:-.1rem;max-width:5rem}html.writer-html5 .rst-content aside.citation span.backrefs>a,html.writer-html5 .rst-content aside.footnote span.backrefs>a,html.writer-html5 .rst-content div.citation span.backrefs>a{word-break:keep-all}html.writer-html5 .rst-content aside.citation span.backrefs>a:not(:first-child):before,html.writer-html5 .rst-content aside.footnote span.backrefs>a:not(:first-child):before,html.writer-html5 .rst-content div.citation span.backrefs>a:not(:first-child):before{content:" "}html.writer-html5 .rst-content aside.citation span.label,html.writer-html5 .rst-content aside.footnote span.label,html.writer-html5 .rst-content div.citation span.label{line-height:1.2rem}html.writer-html5 .rst-content aside.citation-list,html.writer-html5 .rst-content aside.footnote-list,html.writer-html5 .rst-content div.citation-list{margin-bottom:24px}html.writer-html5 .rst-content dl.option-list kbd{font-size:.9rem}.rst-content table.docutils.footnote,html.writer-html4 .rst-content table.docutils.citation,html.writer-html5 .rst-content aside.footnote,html.writer-html5 .rst-content aside.footnote-list aside.footnote,html.writer-html5 .rst-content div.citation-list>div.citation,html.writer-html5 .rst-content dl.citation,html.writer-html5 .rst-content dl.footnote{color:grey}.rst-content table.docutils.footnote code,.rst-content table.docutils.footnote tt,html.writer-html4 .rst-content table.docutils.citation code,html.writer-html4 .rst-content table.docutils.citation tt,html.writer-html5 .rst-content aside.footnote-list aside.footnote code,html.writer-html5 .rst-content aside.footnote-list aside.footnote tt,html.writer-html5 .rst-content aside.footnote code,html.writer-html5 .rst-content aside.footnote tt,html.writer-html5 .rst-content div.citation-list>div.citation code,html.writer-html5 .rst-content div.citation-list>div.citation tt,html.writer-html5 .rst-content dl.citation code,html.writer-html5 .rst-content dl.citation tt,html.writer-html5 .rst-content dl.footnote code,html.writer-html5 .rst-content dl.footnote tt{color:#555}.rst-content .wy-table-responsive.citation,.rst-content .wy-table-responsive.footnote{margin-bottom:0}.rst-content .wy-table-responsive.citation+:not(.citation),.rst-content .wy-table-responsive.footnote+:not(.footnote){margin-top:24px}.rst-content .wy-table-responsive.citation:last-child,.rst-content .wy-table-responsive.footnote:last-child{margin-bottom:24px}.rst-content table.docutils th{border-color:#e1e4e5}html.writer-html5 .rst-content table.docutils th{border:1px solid #e1e4e5}html.writer-html5 .rst-content table.docutils td>p,html.writer-html5 .rst-content table.docutils th>p{line-height:1rem;margin-bottom:0;font-size:.9rem}.rst-content table.docutils td .last,.rst-content table.docutils td .last>:last-child{margin-bottom:0}.rst-content table.field-list,.rst-content table.field-list td{border:none}.rst-content table.field-list td p{line-height:inherit}.rst-content table.field-list td>strong{display:inline-block}.rst-content table.field-list .field-name{padding-right:10px;text-align:left;white-space:nowrap}.rst-content table.field-list .field-body{text-align:left}.rst-content code,.rst-content tt{color:#000;font-family:SFMono-Regular,Menlo,Monaco,Consolas,Liberation Mono,Courier New,Courier,monospace;padding:2px 5px}.rst-content code big,.rst-content code em,.rst-content tt big,.rst-content tt em{font-size:100%!important;line-height:normal}.rst-content code.literal,.rst-content tt.literal{color:#e74c3c;white-space:normal}.rst-content code.xref,.rst-content tt.xref,a .rst-content code,a .rst-content tt{font-weight:700;color:#404040;overflow-wrap:normal}.rst-content kbd,.rst-content pre,.rst-content samp{font-family:SFMono-Regular,Menlo,Monaco,Consolas,Liberation Mono,Courier New,Courier,monospace}.rst-content a code,.rst-content a tt{color:#2980b9}.rst-content dl{margin-bottom:24px}.rst-content dl dt{font-weight:700;margin-bottom:12px}.rst-content dl ol,.rst-content dl p,.rst-content dl table,.rst-content dl ul{margin-bottom:12px}.rst-content dl dd{margin:0 0 12px 24px;line-height:24px}.rst-content dl dd>ol:last-child,.rst-content dl dd>p:last-child,.rst-content dl dd>table:last-child,.rst-content dl dd>ul:last-child{margin-bottom:0}html.writer-html4 .rst-content dl:not(.docutils),html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple){margin-bottom:24px}html.writer-html4 .rst-content dl:not(.docutils)>dt,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple)>dt{display:table;margin:6px 0;font-size:90%;line-height:normal;background:#e7f2fa;color:#2980b9;border-top:3px solid #6ab0de;padding:6px;position:relative}html.writer-html4 .rst-content dl:not(.docutils)>dt:before,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple)>dt:before{color:#6ab0de}html.writer-html4 .rst-content dl:not(.docutils)>dt .headerlink,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple)>dt .headerlink{color:#404040;font-size:100%!important}html.writer-html4 .rst-content dl:not(.docutils) dl:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple)>dt,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple) dl:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple)>dt{margin-bottom:6px;border:none;border-left:3px solid #ccc;background:#f0f0f0;color:#555}html.writer-html4 .rst-content dl:not(.docutils) dl:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple)>dt .headerlink,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple) dl:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple)>dt .headerlink{color:#404040;font-size:100%!important}html.writer-html4 .rst-content dl:not(.docutils)>dt:first-child,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple)>dt:first-child{margin-top:0}html.writer-html4 .rst-content dl:not(.docutils) code.descclassname,html.writer-html4 .rst-content dl:not(.docutils) code.descname,html.writer-html4 .rst-content dl:not(.docutils) tt.descclassname,html.writer-html4 .rst-content dl:not(.docutils) tt.descname,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple) code.descclassname,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple) code.descname,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple) tt.descclassname,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple) tt.descname{background-color:transparent;border:none;padding:0;font-size:100%!important}html.writer-html4 .rst-content dl:not(.docutils) code.descname,html.writer-html4 .rst-content dl:not(.docutils) tt.descname,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple) code.descname,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple) tt.descname{font-weight:700}html.writer-html4 .rst-content dl:not(.docutils) .optional,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple) .optional{display:inline-block;padding:0 4px;color:#000;font-weight:700}html.writer-html4 .rst-content dl:not(.docutils) .property,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple) .property{display:inline-block;padding-right:8px;max-width:100%}html.writer-html4 .rst-content dl:not(.docutils) .k,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple) .k{font-style:italic}html.writer-html4 .rst-content dl:not(.docutils) .descclassname,html.writer-html4 .rst-content dl:not(.docutils) .descname,html.writer-html4 .rst-content dl:not(.docutils) .sig-name,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple) .descclassname,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple) .descname,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple) .sig-name{font-family:SFMono-Regular,Menlo,Monaco,Consolas,Liberation Mono,Courier New,Courier,monospace;color:#000}.rst-content .viewcode-back,.rst-content .viewcode-link{display:inline-block;color:#27ae60;font-size:80%;padding-left:24px}.rst-content .viewcode-back{display:block;float:right}.rst-content p.rubric{margin-bottom:12px;font-weight:700}.rst-content code.download,.rst-content tt.download{background:inherit;padding:inherit;font-weight:400;font-family:inherit;font-size:inherit;color:inherit;border:inherit;white-space:inherit}.rst-content code.download span:first-child,.rst-content tt.download span:first-child{-webkit-font-smoothing:subpixel-antialiased}.rst-content code.download span:first-child:before,.rst-content tt.download span:first-child:before{margin-right:4px}.rst-content .guilabel,.rst-content .menuselection{font-size:80%;font-weight:700;border-radius:4px;padding:2.4px 6px;margin:auto 2px}.rst-content .guilabel,.rst-content .menuselection{border:1px solid #7fbbe3;background:#e7f2fa}.rst-content :not(dl.option-list)>:not(dt):not(kbd):not(.kbd)>.kbd,.rst-content :not(dl.option-list)>:not(dt):not(kbd):not(.kbd)>kbd{color:inherit;font-size:80%;background-color:#fff;border:1px solid #a6a6a6;border-radius:4px;box-shadow:0 2px grey;padding:2.4px 6px;margin:auto 0}.rst-content .versionmodified{font-style:italic}@media screen and (max-width:480px){.rst-content .sidebar{width:100%}}span[id*=MathJax-Span]{color:#404040}.math{text-align:center}@font-face{font-family:Lato;src:url(fonts/lato-normal.woff2?bd03a2cc277bbbc338d464e679fe9942) format("woff2"),url(fonts/lato-normal.woff?27bd77b9162d388cb8d4c4217c7c5e2a) format("woff");font-weight:400;font-style:normal;font-display:block}@font-face{font-family:Lato;src:url(fonts/lato-bold.woff2?cccb897485813c7c256901dbca54ecf2) format("woff2"),url(fonts/lato-bold.woff?d878b6c29b10beca227e9eef4246111b) format("woff");font-weight:700;font-style:normal;font-display:block}@font-face{font-family:Lato;src:url(fonts/lato-bold-italic.woff2?0b6bb6725576b072c5d0b02ecdd1900d) format("woff2"),url(fonts/lato-bold-italic.woff?9c7e4e9eb485b4a121c760e61bc3707c) format("woff");font-weight:700;font-style:italic;font-display:block}@font-face{font-family:Lato;src:url(fonts/lato-normal-italic.woff2?4eb103b4d12be57cb1d040ed5e162e9d) format("woff2"),url(fonts/lato-normal-italic.woff?f28f2d6482446544ef1ea1ccc6dd5892) format("woff");font-weight:400;font-style:italic;font-display:block}@font-face{font-family:Roboto Slab;font-style:normal;font-weight:400;src:url(fonts/Roboto-Slab-Regular.woff2?7abf5b8d04d26a2cafea937019bca958) format("woff2"),url(fonts/Roboto-Slab-Regular.woff?c1be9284088d487c5e3ff0a10a92e58c) format("woff");font-display:block}@font-face{font-family:Roboto Slab;font-style:normal;font-weight:700;src:url(fonts/Roboto-Slab-Bold.woff2?9984f4a9bda09be08e83f2506954adbe) format("woff2"),url(fonts/Roboto-Slab-Bold.woff?bed5564a116b05148e3b3bea6fb1162a) format("woff");font-display:block} \ No newline at end of file diff --git a/_static/doctools.js b/_static/doctools.js new file mode 100644 index 00000000..d06a71d7 --- /dev/null +++ b/_static/doctools.js @@ -0,0 +1,156 @@ +/* + * doctools.js + * ~~~~~~~~~~~ + * + * Base JavaScript utilities for all Sphinx HTML documentation. + * + * :copyright: Copyright 2007-2023 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ +"use strict"; + +const BLACKLISTED_KEY_CONTROL_ELEMENTS = new Set([ + "TEXTAREA", + "INPUT", + "SELECT", + "BUTTON", +]); + +const _ready = (callback) => { + if (document.readyState !== "loading") { + callback(); + } else { + document.addEventListener("DOMContentLoaded", callback); + } +}; + +/** + * Small JavaScript module for the documentation. + */ +const Documentation = { + init: () => { + Documentation.initDomainIndexTable(); + Documentation.initOnKeyListeners(); + }, + + /** + * i18n support + */ + TRANSLATIONS: {}, + PLURAL_EXPR: (n) => (n === 1 ? 0 : 1), + LOCALE: "unknown", + + // gettext and ngettext don't access this so that the functions + // can safely bound to a different name (_ = Documentation.gettext) + gettext: (string) => { + const translated = Documentation.TRANSLATIONS[string]; + switch (typeof translated) { + case "undefined": + return string; // no translation + case "string": + return translated; // translation exists + default: + return translated[0]; // (singular, plural) translation tuple exists + } + }, + + ngettext: (singular, plural, n) => { + const translated = Documentation.TRANSLATIONS[singular]; + if (typeof translated !== "undefined") + return translated[Documentation.PLURAL_EXPR(n)]; + return n === 1 ? singular : plural; + }, + + addTranslations: (catalog) => { + Object.assign(Documentation.TRANSLATIONS, catalog.messages); + Documentation.PLURAL_EXPR = new Function( + "n", + `return (${catalog.plural_expr})` + ); + Documentation.LOCALE = catalog.locale; + }, + + /** + * helper function to focus on search bar + */ + focusSearchBar: () => { + document.querySelectorAll("input[name=q]")[0]?.focus(); + }, + + /** + * Initialise the domain index toggle buttons + */ + initDomainIndexTable: () => { + const toggler = (el) => { + const idNumber = el.id.substr(7); + const toggledRows = document.querySelectorAll(`tr.cg-${idNumber}`); + if (el.src.substr(-9) === "minus.png") { + el.src = `${el.src.substr(0, el.src.length - 9)}plus.png`; + toggledRows.forEach((el) => (el.style.display = "none")); + } else { + el.src = `${el.src.substr(0, el.src.length - 8)}minus.png`; + toggledRows.forEach((el) => (el.style.display = "")); + } + }; + + const togglerElements = document.querySelectorAll("img.toggler"); + togglerElements.forEach((el) => + el.addEventListener("click", (event) => toggler(event.currentTarget)) + ); + togglerElements.forEach((el) => (el.style.display = "")); + if (DOCUMENTATION_OPTIONS.COLLAPSE_INDEX) togglerElements.forEach(toggler); + }, + + initOnKeyListeners: () => { + // only install a listener if it is really needed + if ( + !DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS && + !DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS + ) + return; + + document.addEventListener("keydown", (event) => { + // bail for input elements + if (BLACKLISTED_KEY_CONTROL_ELEMENTS.has(document.activeElement.tagName)) return; + // bail with special keys + if (event.altKey || event.ctrlKey || event.metaKey) return; + + if (!event.shiftKey) { + switch (event.key) { + case "ArrowLeft": + if (!DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS) break; + + const prevLink = document.querySelector('link[rel="prev"]'); + if (prevLink && prevLink.href) { + window.location.href = prevLink.href; + event.preventDefault(); + } + break; + case "ArrowRight": + if (!DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS) break; + + const nextLink = document.querySelector('link[rel="next"]'); + if (nextLink && nextLink.href) { + window.location.href = nextLink.href; + event.preventDefault(); + } + break; + } + } + + // some keyboard layouts may need Shift to get / + switch (event.key) { + case "/": + if (!DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS) break; + Documentation.focusSearchBar(); + event.preventDefault(); + } + }); + }, +}; + +// quick alias for translations +const _ = Documentation.gettext; + +_ready(Documentation.init); diff --git a/_static/documentation_options.js b/_static/documentation_options.js new file mode 100644 index 00000000..89435bb4 --- /dev/null +++ b/_static/documentation_options.js @@ -0,0 +1,13 @@ +const DOCUMENTATION_OPTIONS = { + VERSION: '1.0.0', + LANGUAGE: 'en', + COLLAPSE_INDEX: false, + BUILDER: 'html', + FILE_SUFFIX: '.html', + LINK_SUFFIX: '.html', + HAS_SOURCE: true, + SOURCELINK_SUFFIX: '.txt', + NAVIGATION_WITH_KEYS: false, + SHOW_SEARCH_SUMMARY: true, + ENABLE_SEARCH_SHORTCUTS: true, +}; \ No newline at end of file diff --git a/_static/file.png b/_static/file.png new file mode 100644 index 00000000..a858a410 Binary files /dev/null and b/_static/file.png differ diff --git a/_static/jquery.js b/_static/jquery.js new file mode 100644 index 00000000..c4c6022f --- /dev/null +++ b/_static/jquery.js @@ -0,0 +1,2 @@ +/*! jQuery v3.6.0 | (c) OpenJS Foundation and other contributors | jquery.org/license */ +!function(e,t){"use strict";"object"==typeof module&&"object"==typeof module.exports?module.exports=e.document?t(e,!0):function(e){if(!e.document)throw new Error("jQuery requires a window with a document");return t(e)}:t(e)}("undefined"!=typeof window?window:this,function(C,e){"use strict";var t=[],r=Object.getPrototypeOf,s=t.slice,g=t.flat?function(e){return t.flat.call(e)}:function(e){return t.concat.apply([],e)},u=t.push,i=t.indexOf,n={},o=n.toString,v=n.hasOwnProperty,a=v.toString,l=a.call(Object),y={},m=function(e){return"function"==typeof e&&"number"!=typeof e.nodeType&&"function"!=typeof e.item},x=function(e){return null!=e&&e===e.window},E=C.document,c={type:!0,src:!0,nonce:!0,noModule:!0};function b(e,t,n){var r,i,o=(n=n||E).createElement("script");if(o.text=e,t)for(r in c)(i=t[r]||t.getAttribute&&t.getAttribute(r))&&o.setAttribute(r,i);n.head.appendChild(o).parentNode.removeChild(o)}function w(e){return null==e?e+"":"object"==typeof e||"function"==typeof e?n[o.call(e)]||"object":typeof e}var f="3.6.0",S=function(e,t){return new S.fn.init(e,t)};function p(e){var t=!!e&&"length"in e&&e.length,n=w(e);return!m(e)&&!x(e)&&("array"===n||0===t||"number"==typeof t&&0+~]|"+M+")"+M+"*"),U=new RegExp(M+"|>"),X=new RegExp(F),V=new RegExp("^"+I+"$"),G={ID:new RegExp("^#("+I+")"),CLASS:new RegExp("^\\.("+I+")"),TAG:new RegExp("^("+I+"|[*])"),ATTR:new RegExp("^"+W),PSEUDO:new RegExp("^"+F),CHILD:new RegExp("^:(only|first|last|nth|nth-last)-(child|of-type)(?:\\("+M+"*(even|odd|(([+-]|)(\\d*)n|)"+M+"*(?:([+-]|)"+M+"*(\\d+)|))"+M+"*\\)|)","i"),bool:new RegExp("^(?:"+R+")$","i"),needsContext:new RegExp("^"+M+"*[>+~]|:(even|odd|eq|gt|lt|nth|first|last)(?:\\("+M+"*((?:-\\d)?\\d*)"+M+"*\\)|)(?=[^-]|$)","i")},Y=/HTML$/i,Q=/^(?:input|select|textarea|button)$/i,J=/^h\d$/i,K=/^[^{]+\{\s*\[native \w/,Z=/^(?:#([\w-]+)|(\w+)|\.([\w-]+))$/,ee=/[+~]/,te=new RegExp("\\\\[\\da-fA-F]{1,6}"+M+"?|\\\\([^\\r\\n\\f])","g"),ne=function(e,t){var n="0x"+e.slice(1)-65536;return t||(n<0?String.fromCharCode(n+65536):String.fromCharCode(n>>10|55296,1023&n|56320))},re=/([\0-\x1f\x7f]|^-?\d)|^-$|[^\0-\x1f\x7f-\uFFFF\w-]/g,ie=function(e,t){return t?"\0"===e?"\ufffd":e.slice(0,-1)+"\\"+e.charCodeAt(e.length-1).toString(16)+" ":"\\"+e},oe=function(){T()},ae=be(function(e){return!0===e.disabled&&"fieldset"===e.nodeName.toLowerCase()},{dir:"parentNode",next:"legend"});try{H.apply(t=O.call(p.childNodes),p.childNodes),t[p.childNodes.length].nodeType}catch(e){H={apply:t.length?function(e,t){L.apply(e,O.call(t))}:function(e,t){var n=e.length,r=0;while(e[n++]=t[r++]);e.length=n-1}}}function se(t,e,n,r){var i,o,a,s,u,l,c,f=e&&e.ownerDocument,p=e?e.nodeType:9;if(n=n||[],"string"!=typeof t||!t||1!==p&&9!==p&&11!==p)return n;if(!r&&(T(e),e=e||C,E)){if(11!==p&&(u=Z.exec(t)))if(i=u[1]){if(9===p){if(!(a=e.getElementById(i)))return n;if(a.id===i)return n.push(a),n}else if(f&&(a=f.getElementById(i))&&y(e,a)&&a.id===i)return n.push(a),n}else{if(u[2])return H.apply(n,e.getElementsByTagName(t)),n;if((i=u[3])&&d.getElementsByClassName&&e.getElementsByClassName)return H.apply(n,e.getElementsByClassName(i)),n}if(d.qsa&&!N[t+" "]&&(!v||!v.test(t))&&(1!==p||"object"!==e.nodeName.toLowerCase())){if(c=t,f=e,1===p&&(U.test(t)||z.test(t))){(f=ee.test(t)&&ye(e.parentNode)||e)===e&&d.scope||((s=e.getAttribute("id"))?s=s.replace(re,ie):e.setAttribute("id",s=S)),o=(l=h(t)).length;while(o--)l[o]=(s?"#"+s:":scope")+" "+xe(l[o]);c=l.join(",")}try{return H.apply(n,f.querySelectorAll(c)),n}catch(e){N(t,!0)}finally{s===S&&e.removeAttribute("id")}}}return g(t.replace($,"$1"),e,n,r)}function ue(){var r=[];return function e(t,n){return r.push(t+" ")>b.cacheLength&&delete e[r.shift()],e[t+" "]=n}}function le(e){return e[S]=!0,e}function ce(e){var t=C.createElement("fieldset");try{return!!e(t)}catch(e){return!1}finally{t.parentNode&&t.parentNode.removeChild(t),t=null}}function fe(e,t){var n=e.split("|"),r=n.length;while(r--)b.attrHandle[n[r]]=t}function pe(e,t){var n=t&&e,r=n&&1===e.nodeType&&1===t.nodeType&&e.sourceIndex-t.sourceIndex;if(r)return r;if(n)while(n=n.nextSibling)if(n===t)return-1;return e?1:-1}function de(t){return function(e){return"input"===e.nodeName.toLowerCase()&&e.type===t}}function he(n){return function(e){var t=e.nodeName.toLowerCase();return("input"===t||"button"===t)&&e.type===n}}function ge(t){return function(e){return"form"in e?e.parentNode&&!1===e.disabled?"label"in e?"label"in e.parentNode?e.parentNode.disabled===t:e.disabled===t:e.isDisabled===t||e.isDisabled!==!t&&ae(e)===t:e.disabled===t:"label"in e&&e.disabled===t}}function ve(a){return le(function(o){return o=+o,le(function(e,t){var n,r=a([],e.length,o),i=r.length;while(i--)e[n=r[i]]&&(e[n]=!(t[n]=e[n]))})})}function ye(e){return e&&"undefined"!=typeof e.getElementsByTagName&&e}for(e in d=se.support={},i=se.isXML=function(e){var t=e&&e.namespaceURI,n=e&&(e.ownerDocument||e).documentElement;return!Y.test(t||n&&n.nodeName||"HTML")},T=se.setDocument=function(e){var t,n,r=e?e.ownerDocument||e:p;return r!=C&&9===r.nodeType&&r.documentElement&&(a=(C=r).documentElement,E=!i(C),p!=C&&(n=C.defaultView)&&n.top!==n&&(n.addEventListener?n.addEventListener("unload",oe,!1):n.attachEvent&&n.attachEvent("onunload",oe)),d.scope=ce(function(e){return a.appendChild(e).appendChild(C.createElement("div")),"undefined"!=typeof e.querySelectorAll&&!e.querySelectorAll(":scope fieldset div").length}),d.attributes=ce(function(e){return e.className="i",!e.getAttribute("className")}),d.getElementsByTagName=ce(function(e){return e.appendChild(C.createComment("")),!e.getElementsByTagName("*").length}),d.getElementsByClassName=K.test(C.getElementsByClassName),d.getById=ce(function(e){return a.appendChild(e).id=S,!C.getElementsByName||!C.getElementsByName(S).length}),d.getById?(b.filter.ID=function(e){var t=e.replace(te,ne);return function(e){return e.getAttribute("id")===t}},b.find.ID=function(e,t){if("undefined"!=typeof t.getElementById&&E){var n=t.getElementById(e);return n?[n]:[]}}):(b.filter.ID=function(e){var n=e.replace(te,ne);return function(e){var t="undefined"!=typeof e.getAttributeNode&&e.getAttributeNode("id");return t&&t.value===n}},b.find.ID=function(e,t){if("undefined"!=typeof t.getElementById&&E){var n,r,i,o=t.getElementById(e);if(o){if((n=o.getAttributeNode("id"))&&n.value===e)return[o];i=t.getElementsByName(e),r=0;while(o=i[r++])if((n=o.getAttributeNode("id"))&&n.value===e)return[o]}return[]}}),b.find.TAG=d.getElementsByTagName?function(e,t){return"undefined"!=typeof t.getElementsByTagName?t.getElementsByTagName(e):d.qsa?t.querySelectorAll(e):void 0}:function(e,t){var n,r=[],i=0,o=t.getElementsByTagName(e);if("*"===e){while(n=o[i++])1===n.nodeType&&r.push(n);return r}return o},b.find.CLASS=d.getElementsByClassName&&function(e,t){if("undefined"!=typeof t.getElementsByClassName&&E)return t.getElementsByClassName(e)},s=[],v=[],(d.qsa=K.test(C.querySelectorAll))&&(ce(function(e){var t;a.appendChild(e).innerHTML="",e.querySelectorAll("[msallowcapture^='']").length&&v.push("[*^$]="+M+"*(?:''|\"\")"),e.querySelectorAll("[selected]").length||v.push("\\["+M+"*(?:value|"+R+")"),e.querySelectorAll("[id~="+S+"-]").length||v.push("~="),(t=C.createElement("input")).setAttribute("name",""),e.appendChild(t),e.querySelectorAll("[name='']").length||v.push("\\["+M+"*name"+M+"*="+M+"*(?:''|\"\")"),e.querySelectorAll(":checked").length||v.push(":checked"),e.querySelectorAll("a#"+S+"+*").length||v.push(".#.+[+~]"),e.querySelectorAll("\\\f"),v.push("[\\r\\n\\f]")}),ce(function(e){e.innerHTML="";var t=C.createElement("input");t.setAttribute("type","hidden"),e.appendChild(t).setAttribute("name","D"),e.querySelectorAll("[name=d]").length&&v.push("name"+M+"*[*^$|!~]?="),2!==e.querySelectorAll(":enabled").length&&v.push(":enabled",":disabled"),a.appendChild(e).disabled=!0,2!==e.querySelectorAll(":disabled").length&&v.push(":enabled",":disabled"),e.querySelectorAll("*,:x"),v.push(",.*:")})),(d.matchesSelector=K.test(c=a.matches||a.webkitMatchesSelector||a.mozMatchesSelector||a.oMatchesSelector||a.msMatchesSelector))&&ce(function(e){d.disconnectedMatch=c.call(e,"*"),c.call(e,"[s!='']:x"),s.push("!=",F)}),v=v.length&&new RegExp(v.join("|")),s=s.length&&new RegExp(s.join("|")),t=K.test(a.compareDocumentPosition),y=t||K.test(a.contains)?function(e,t){var n=9===e.nodeType?e.documentElement:e,r=t&&t.parentNode;return e===r||!(!r||1!==r.nodeType||!(n.contains?n.contains(r):e.compareDocumentPosition&&16&e.compareDocumentPosition(r)))}:function(e,t){if(t)while(t=t.parentNode)if(t===e)return!0;return!1},j=t?function(e,t){if(e===t)return l=!0,0;var n=!e.compareDocumentPosition-!t.compareDocumentPosition;return n||(1&(n=(e.ownerDocument||e)==(t.ownerDocument||t)?e.compareDocumentPosition(t):1)||!d.sortDetached&&t.compareDocumentPosition(e)===n?e==C||e.ownerDocument==p&&y(p,e)?-1:t==C||t.ownerDocument==p&&y(p,t)?1:u?P(u,e)-P(u,t):0:4&n?-1:1)}:function(e,t){if(e===t)return l=!0,0;var n,r=0,i=e.parentNode,o=t.parentNode,a=[e],s=[t];if(!i||!o)return e==C?-1:t==C?1:i?-1:o?1:u?P(u,e)-P(u,t):0;if(i===o)return pe(e,t);n=e;while(n=n.parentNode)a.unshift(n);n=t;while(n=n.parentNode)s.unshift(n);while(a[r]===s[r])r++;return r?pe(a[r],s[r]):a[r]==p?-1:s[r]==p?1:0}),C},se.matches=function(e,t){return se(e,null,null,t)},se.matchesSelector=function(e,t){if(T(e),d.matchesSelector&&E&&!N[t+" "]&&(!s||!s.test(t))&&(!v||!v.test(t)))try{var n=c.call(e,t);if(n||d.disconnectedMatch||e.document&&11!==e.document.nodeType)return n}catch(e){N(t,!0)}return 0":{dir:"parentNode",first:!0}," ":{dir:"parentNode"},"+":{dir:"previousSibling",first:!0},"~":{dir:"previousSibling"}},preFilter:{ATTR:function(e){return e[1]=e[1].replace(te,ne),e[3]=(e[3]||e[4]||e[5]||"").replace(te,ne),"~="===e[2]&&(e[3]=" "+e[3]+" "),e.slice(0,4)},CHILD:function(e){return e[1]=e[1].toLowerCase(),"nth"===e[1].slice(0,3)?(e[3]||se.error(e[0]),e[4]=+(e[4]?e[5]+(e[6]||1):2*("even"===e[3]||"odd"===e[3])),e[5]=+(e[7]+e[8]||"odd"===e[3])):e[3]&&se.error(e[0]),e},PSEUDO:function(e){var t,n=!e[6]&&e[2];return G.CHILD.test(e[0])?null:(e[3]?e[2]=e[4]||e[5]||"":n&&X.test(n)&&(t=h(n,!0))&&(t=n.indexOf(")",n.length-t)-n.length)&&(e[0]=e[0].slice(0,t),e[2]=n.slice(0,t)),e.slice(0,3))}},filter:{TAG:function(e){var t=e.replace(te,ne).toLowerCase();return"*"===e?function(){return!0}:function(e){return e.nodeName&&e.nodeName.toLowerCase()===t}},CLASS:function(e){var t=m[e+" "];return t||(t=new RegExp("(^|"+M+")"+e+"("+M+"|$)"))&&m(e,function(e){return t.test("string"==typeof e.className&&e.className||"undefined"!=typeof e.getAttribute&&e.getAttribute("class")||"")})},ATTR:function(n,r,i){return function(e){var t=se.attr(e,n);return null==t?"!="===r:!r||(t+="","="===r?t===i:"!="===r?t!==i:"^="===r?i&&0===t.indexOf(i):"*="===r?i&&-1:\x20\t\r\n\f]*)[\x20\t\r\n\f]*\/?>(?:<\/\1>|)$/i;function j(e,n,r){return m(n)?S.grep(e,function(e,t){return!!n.call(e,t,e)!==r}):n.nodeType?S.grep(e,function(e){return e===n!==r}):"string"!=typeof n?S.grep(e,function(e){return-1)[^>]*|#([\w-]+))$/;(S.fn.init=function(e,t,n){var r,i;if(!e)return this;if(n=n||D,"string"==typeof e){if(!(r="<"===e[0]&&">"===e[e.length-1]&&3<=e.length?[null,e,null]:q.exec(e))||!r[1]&&t)return!t||t.jquery?(t||n).find(e):this.constructor(t).find(e);if(r[1]){if(t=t instanceof S?t[0]:t,S.merge(this,S.parseHTML(r[1],t&&t.nodeType?t.ownerDocument||t:E,!0)),N.test(r[1])&&S.isPlainObject(t))for(r in t)m(this[r])?this[r](t[r]):this.attr(r,t[r]);return this}return(i=E.getElementById(r[2]))&&(this[0]=i,this.length=1),this}return e.nodeType?(this[0]=e,this.length=1,this):m(e)?void 0!==n.ready?n.ready(e):e(S):S.makeArray(e,this)}).prototype=S.fn,D=S(E);var L=/^(?:parents|prev(?:Until|All))/,H={children:!0,contents:!0,next:!0,prev:!0};function O(e,t){while((e=e[t])&&1!==e.nodeType);return e}S.fn.extend({has:function(e){var t=S(e,this),n=t.length;return this.filter(function(){for(var e=0;e\x20\t\r\n\f]*)/i,he=/^$|^module$|\/(?:java|ecma)script/i;ce=E.createDocumentFragment().appendChild(E.createElement("div")),(fe=E.createElement("input")).setAttribute("type","radio"),fe.setAttribute("checked","checked"),fe.setAttribute("name","t"),ce.appendChild(fe),y.checkClone=ce.cloneNode(!0).cloneNode(!0).lastChild.checked,ce.innerHTML="",y.noCloneChecked=!!ce.cloneNode(!0).lastChild.defaultValue,ce.innerHTML="",y.option=!!ce.lastChild;var ge={thead:[1,"","
"],col:[2,"","
"],tr:[2,"","
"],td:[3,"","
"],_default:[0,"",""]};function ve(e,t){var n;return n="undefined"!=typeof e.getElementsByTagName?e.getElementsByTagName(t||"*"):"undefined"!=typeof e.querySelectorAll?e.querySelectorAll(t||"*"):[],void 0===t||t&&A(e,t)?S.merge([e],n):n}function ye(e,t){for(var n=0,r=e.length;n",""]);var me=/<|&#?\w+;/;function xe(e,t,n,r,i){for(var o,a,s,u,l,c,f=t.createDocumentFragment(),p=[],d=0,h=e.length;d\s*$/g;function je(e,t){return A(e,"table")&&A(11!==t.nodeType?t:t.firstChild,"tr")&&S(e).children("tbody")[0]||e}function De(e){return e.type=(null!==e.getAttribute("type"))+"/"+e.type,e}function qe(e){return"true/"===(e.type||"").slice(0,5)?e.type=e.type.slice(5):e.removeAttribute("type"),e}function Le(e,t){var n,r,i,o,a,s;if(1===t.nodeType){if(Y.hasData(e)&&(s=Y.get(e).events))for(i in Y.remove(t,"handle events"),s)for(n=0,r=s[i].length;n").attr(n.scriptAttrs||{}).prop({charset:n.scriptCharset,src:n.url}).on("load error",i=function(e){r.remove(),i=null,e&&t("error"===e.type?404:200,e.type)}),E.head.appendChild(r[0])},abort:function(){i&&i()}}});var _t,zt=[],Ut=/(=)\?(?=&|$)|\?\?/;S.ajaxSetup({jsonp:"callback",jsonpCallback:function(){var e=zt.pop()||S.expando+"_"+wt.guid++;return this[e]=!0,e}}),S.ajaxPrefilter("json jsonp",function(e,t,n){var r,i,o,a=!1!==e.jsonp&&(Ut.test(e.url)?"url":"string"==typeof e.data&&0===(e.contentType||"").indexOf("application/x-www-form-urlencoded")&&Ut.test(e.data)&&"data");if(a||"jsonp"===e.dataTypes[0])return r=e.jsonpCallback=m(e.jsonpCallback)?e.jsonpCallback():e.jsonpCallback,a?e[a]=e[a].replace(Ut,"$1"+r):!1!==e.jsonp&&(e.url+=(Tt.test(e.url)?"&":"?")+e.jsonp+"="+r),e.converters["script json"]=function(){return o||S.error(r+" was not called"),o[0]},e.dataTypes[0]="json",i=C[r],C[r]=function(){o=arguments},n.always(function(){void 0===i?S(C).removeProp(r):C[r]=i,e[r]&&(e.jsonpCallback=t.jsonpCallback,zt.push(r)),o&&m(i)&&i(o[0]),o=i=void 0}),"script"}),y.createHTMLDocument=((_t=E.implementation.createHTMLDocument("").body).innerHTML="
",2===_t.childNodes.length),S.parseHTML=function(e,t,n){return"string"!=typeof e?[]:("boolean"==typeof t&&(n=t,t=!1),t||(y.createHTMLDocument?((r=(t=E.implementation.createHTMLDocument("")).createElement("base")).href=E.location.href,t.head.appendChild(r)):t=E),o=!n&&[],(i=N.exec(e))?[t.createElement(i[1])]:(i=xe([e],t,o),o&&o.length&&S(o).remove(),S.merge([],i.childNodes)));var r,i,o},S.fn.load=function(e,t,n){var r,i,o,a=this,s=e.indexOf(" ");return-1").append(S.parseHTML(e)).find(r):e)}).always(n&&function(e,t){a.each(function(){n.apply(this,o||[e.responseText,t,e])})}),this},S.expr.pseudos.animated=function(t){return S.grep(S.timers,function(e){return t===e.elem}).length},S.offset={setOffset:function(e,t,n){var r,i,o,a,s,u,l=S.css(e,"position"),c=S(e),f={};"static"===l&&(e.style.position="relative"),s=c.offset(),o=S.css(e,"top"),u=S.css(e,"left"),("absolute"===l||"fixed"===l)&&-1<(o+u).indexOf("auto")?(a=(r=c.position()).top,i=r.left):(a=parseFloat(o)||0,i=parseFloat(u)||0),m(t)&&(t=t.call(e,n,S.extend({},s))),null!=t.top&&(f.top=t.top-s.top+a),null!=t.left&&(f.left=t.left-s.left+i),"using"in t?t.using.call(e,f):c.css(f)}},S.fn.extend({offset:function(t){if(arguments.length)return void 0===t?this:this.each(function(e){S.offset.setOffset(this,t,e)});var e,n,r=this[0];return r?r.getClientRects().length?(e=r.getBoundingClientRect(),n=r.ownerDocument.defaultView,{top:e.top+n.pageYOffset,left:e.left+n.pageXOffset}):{top:0,left:0}:void 0},position:function(){if(this[0]){var e,t,n,r=this[0],i={top:0,left:0};if("fixed"===S.css(r,"position"))t=r.getBoundingClientRect();else{t=this.offset(),n=r.ownerDocument,e=r.offsetParent||n.documentElement;while(e&&(e===n.body||e===n.documentElement)&&"static"===S.css(e,"position"))e=e.parentNode;e&&e!==r&&1===e.nodeType&&((i=S(e).offset()).top+=S.css(e,"borderTopWidth",!0),i.left+=S.css(e,"borderLeftWidth",!0))}return{top:t.top-i.top-S.css(r,"marginTop",!0),left:t.left-i.left-S.css(r,"marginLeft",!0)}}},offsetParent:function(){return this.map(function(){var e=this.offsetParent;while(e&&"static"===S.css(e,"position"))e=e.offsetParent;return e||re})}}),S.each({scrollLeft:"pageXOffset",scrollTop:"pageYOffset"},function(t,i){var o="pageYOffset"===i;S.fn[t]=function(e){return $(this,function(e,t,n){var r;if(x(e)?r=e:9===e.nodeType&&(r=e.defaultView),void 0===n)return r?r[i]:e[t];r?r.scrollTo(o?r.pageXOffset:n,o?n:r.pageYOffset):e[t]=n},t,e,arguments.length)}}),S.each(["top","left"],function(e,n){S.cssHooks[n]=Fe(y.pixelPosition,function(e,t){if(t)return t=We(e,n),Pe.test(t)?S(e).position()[n]+"px":t})}),S.each({Height:"height",Width:"width"},function(a,s){S.each({padding:"inner"+a,content:s,"":"outer"+a},function(r,o){S.fn[o]=function(e,t){var n=arguments.length&&(r||"boolean"!=typeof e),i=r||(!0===e||!0===t?"margin":"border");return $(this,function(e,t,n){var r;return x(e)?0===o.indexOf("outer")?e["inner"+a]:e.document.documentElement["client"+a]:9===e.nodeType?(r=e.documentElement,Math.max(e.body["scroll"+a],r["scroll"+a],e.body["offset"+a],r["offset"+a],r["client"+a])):void 0===n?S.css(e,t,i):S.style(e,t,n,i)},s,n?e:void 0,n)}})}),S.each(["ajaxStart","ajaxStop","ajaxComplete","ajaxError","ajaxSuccess","ajaxSend"],function(e,t){S.fn[t]=function(e){return this.on(t,e)}}),S.fn.extend({bind:function(e,t,n){return this.on(e,null,t,n)},unbind:function(e,t){return this.off(e,null,t)},delegate:function(e,t,n,r){return this.on(t,e,n,r)},undelegate:function(e,t,n){return 1===arguments.length?this.off(e,"**"):this.off(t,e||"**",n)},hover:function(e,t){return this.mouseenter(e).mouseleave(t||e)}}),S.each("blur focus focusin focusout resize scroll click dblclick mousedown mouseup mousemove mouseover mouseout mouseenter mouseleave change select submit keydown keypress keyup contextmenu".split(" "),function(e,n){S.fn[n]=function(e,t){return 0",d.insertBefore(c.lastChild,d.firstChild)}function d(){var a=y.elements;return"string"==typeof a?a.split(" "):a}function e(a,b){var c=y.elements;"string"!=typeof c&&(c=c.join(" ")),"string"!=typeof a&&(a=a.join(" ")),y.elements=c+" "+a,j(b)}function f(a){var b=x[a[v]];return b||(b={},w++,a[v]=w,x[w]=b),b}function g(a,c,d){if(c||(c=b),q)return c.createElement(a);d||(d=f(c));var e;return e=d.cache[a]?d.cache[a].cloneNode():u.test(a)?(d.cache[a]=d.createElem(a)).cloneNode():d.createElem(a),!e.canHaveChildren||t.test(a)||e.tagUrn?e:d.frag.appendChild(e)}function h(a,c){if(a||(a=b),q)return a.createDocumentFragment();c=c||f(a);for(var e=c.frag.cloneNode(),g=0,h=d(),i=h.length;i>g;g++)e.createElement(h[g]);return e}function i(a,b){b.cache||(b.cache={},b.createElem=a.createElement,b.createFrag=a.createDocumentFragment,b.frag=b.createFrag()),a.createElement=function(c){return y.shivMethods?g(c,a,b):b.createElem(c)},a.createDocumentFragment=Function("h,f","return function(){var n=f.cloneNode(),c=n.createElement;h.shivMethods&&("+d().join().replace(/[\w\-:]+/g,function(a){return b.createElem(a),b.frag.createElement(a),'c("'+a+'")'})+");return n}")(y,b.frag)}function j(a){a||(a=b);var d=f(a);return!y.shivCSS||p||d.hasCSS||(d.hasCSS=!!c(a,"article,aside,dialog,figcaption,figure,footer,header,hgroup,main,nav,section{display:block}mark{background:#FF0;color:#000}template{display:none}")),q||i(a,d),a}function k(a){for(var b,c=a.getElementsByTagName("*"),e=c.length,f=RegExp("^(?:"+d().join("|")+")$","i"),g=[];e--;)b=c[e],f.test(b.nodeName)&&g.push(b.applyElement(l(b)));return g}function l(a){for(var b,c=a.attributes,d=c.length,e=a.ownerDocument.createElement(A+":"+a.nodeName);d--;)b=c[d],b.specified&&e.setAttribute(b.nodeName,b.nodeValue);return e.style.cssText=a.style.cssText,e}function m(a){for(var b,c=a.split("{"),e=c.length,f=RegExp("(^|[\\s,>+~])("+d().join("|")+")(?=[[\\s,>+~#.:]|$)","gi"),g="$1"+A+"\\:$2";e--;)b=c[e]=c[e].split("}"),b[b.length-1]=b[b.length-1].replace(f,g),c[e]=b.join("}");return c.join("{")}function n(a){for(var b=a.length;b--;)a[b].removeNode()}function o(a){function b(){clearTimeout(g._removeSheetTimer),d&&d.removeNode(!0),d=null}var d,e,g=f(a),h=a.namespaces,i=a.parentWindow;return!B||a.printShived?a:("undefined"==typeof h[A]&&h.add(A),i.attachEvent("onbeforeprint",function(){b();for(var f,g,h,i=a.styleSheets,j=[],l=i.length,n=Array(l);l--;)n[l]=i[l];for(;h=n.pop();)if(!h.disabled&&z.test(h.media)){try{f=h.imports,g=f.length}catch(o){g=0}for(l=0;g>l;l++)n.push(f[l]);try{j.push(h.cssText)}catch(o){}}j=m(j.reverse().join("")),e=k(a),d=c(a,j)}),i.attachEvent("onafterprint",function(){n(e),clearTimeout(g._removeSheetTimer),g._removeSheetTimer=setTimeout(b,500)}),a.printShived=!0,a)}var p,q,r="3.7.3",s=a.html5||{},t=/^<|^(?:button|map|select|textarea|object|iframe|option|optgroup)$/i,u=/^(?:a|b|code|div|fieldset|h1|h2|h3|h4|h5|h6|i|label|li|ol|p|q|span|strong|style|table|tbody|td|th|tr|ul)$/i,v="_html5shiv",w=0,x={};!function(){try{var a=b.createElement("a");a.innerHTML="",p="hidden"in a,q=1==a.childNodes.length||function(){b.createElement("a");var a=b.createDocumentFragment();return"undefined"==typeof a.cloneNode||"undefined"==typeof a.createDocumentFragment||"undefined"==typeof a.createElement}()}catch(c){p=!0,q=!0}}();var y={elements:s.elements||"abbr article aside audio bdi canvas data datalist details dialog figcaption figure footer header hgroup main mark meter nav output picture progress section summary template time video",version:r,shivCSS:s.shivCSS!==!1,supportsUnknownElements:q,shivMethods:s.shivMethods!==!1,type:"default",shivDocument:j,createElement:g,createDocumentFragment:h,addElements:e};a.html5=y,j(b);var z=/^$|\b(?:all|print)\b/,A="html5shiv",B=!q&&function(){var c=b.documentElement;return!("undefined"==typeof b.namespaces||"undefined"==typeof b.parentWindow||"undefined"==typeof c.applyElement||"undefined"==typeof c.removeNode||"undefined"==typeof a.attachEvent)}();y.type+=" print",y.shivPrint=o,o(b),"object"==typeof module&&module.exports&&(module.exports=y)}("undefined"!=typeof window?window:this,document); \ No newline at end of file diff --git a/_static/js/html5shiv.min.js b/_static/js/html5shiv.min.js new file mode 100644 index 00000000..cd1c674f --- /dev/null +++ b/_static/js/html5shiv.min.js @@ -0,0 +1,4 @@ +/** +* @preserve HTML5 Shiv 3.7.3 | @afarkas @jdalton @jon_neal @rem | MIT/GPL2 Licensed +*/ +!function(a,b){function c(a,b){var c=a.createElement("p"),d=a.getElementsByTagName("head")[0]||a.documentElement;return c.innerHTML="x",d.insertBefore(c.lastChild,d.firstChild)}function d(){var a=t.elements;return"string"==typeof a?a.split(" "):a}function e(a,b){var c=t.elements;"string"!=typeof c&&(c=c.join(" ")),"string"!=typeof a&&(a=a.join(" ")),t.elements=c+" "+a,j(b)}function f(a){var b=s[a[q]];return b||(b={},r++,a[q]=r,s[r]=b),b}function g(a,c,d){if(c||(c=b),l)return c.createElement(a);d||(d=f(c));var e;return e=d.cache[a]?d.cache[a].cloneNode():p.test(a)?(d.cache[a]=d.createElem(a)).cloneNode():d.createElem(a),!e.canHaveChildren||o.test(a)||e.tagUrn?e:d.frag.appendChild(e)}function h(a,c){if(a||(a=b),l)return a.createDocumentFragment();c=c||f(a);for(var e=c.frag.cloneNode(),g=0,h=d(),i=h.length;i>g;g++)e.createElement(h[g]);return e}function i(a,b){b.cache||(b.cache={},b.createElem=a.createElement,b.createFrag=a.createDocumentFragment,b.frag=b.createFrag()),a.createElement=function(c){return t.shivMethods?g(c,a,b):b.createElem(c)},a.createDocumentFragment=Function("h,f","return function(){var n=f.cloneNode(),c=n.createElement;h.shivMethods&&("+d().join().replace(/[\w\-:]+/g,function(a){return b.createElem(a),b.frag.createElement(a),'c("'+a+'")'})+");return n}")(t,b.frag)}function j(a){a||(a=b);var d=f(a);return!t.shivCSS||k||d.hasCSS||(d.hasCSS=!!c(a,"article,aside,dialog,figcaption,figure,footer,header,hgroup,main,nav,section{display:block}mark{background:#FF0;color:#000}template{display:none}")),l||i(a,d),a}var k,l,m="3.7.3-pre",n=a.html5||{},o=/^<|^(?:button|map|select|textarea|object|iframe|option|optgroup)$/i,p=/^(?:a|b|code|div|fieldset|h1|h2|h3|h4|h5|h6|i|label|li|ol|p|q|span|strong|style|table|tbody|td|th|tr|ul)$/i,q="_html5shiv",r=0,s={};!function(){try{var a=b.createElement("a");a.innerHTML="",k="hidden"in a,l=1==a.childNodes.length||function(){b.createElement("a");var a=b.createDocumentFragment();return"undefined"==typeof a.cloneNode||"undefined"==typeof a.createDocumentFragment||"undefined"==typeof a.createElement}()}catch(c){k=!0,l=!0}}();var t={elements:n.elements||"abbr article aside audio bdi canvas data datalist details dialog figcaption figure footer header hgroup main mark meter nav output picture progress section summary template time video",version:m,shivCSS:n.shivCSS!==!1,supportsUnknownElements:l,shivMethods:n.shivMethods!==!1,type:"default",shivDocument:j,createElement:g,createDocumentFragment:h,addElements:e};a.html5=t,j(b),"object"==typeof module&&module.exports&&(module.exports=t)}("undefined"!=typeof window?window:this,document); \ No newline at end of file diff --git a/_static/js/theme.js b/_static/js/theme.js new file mode 100644 index 00000000..1fddb6ee --- /dev/null +++ b/_static/js/theme.js @@ -0,0 +1 @@ +!function(n){var e={};function t(i){if(e[i])return e[i].exports;var o=e[i]={i:i,l:!1,exports:{}};return n[i].call(o.exports,o,o.exports,t),o.l=!0,o.exports}t.m=n,t.c=e,t.d=function(n,e,i){t.o(n,e)||Object.defineProperty(n,e,{enumerable:!0,get:i})},t.r=function(n){"undefined"!=typeof Symbol&&Symbol.toStringTag&&Object.defineProperty(n,Symbol.toStringTag,{value:"Module"}),Object.defineProperty(n,"__esModule",{value:!0})},t.t=function(n,e){if(1&e&&(n=t(n)),8&e)return n;if(4&e&&"object"==typeof n&&n&&n.__esModule)return n;var i=Object.create(null);if(t.r(i),Object.defineProperty(i,"default",{enumerable:!0,value:n}),2&e&&"string"!=typeof n)for(var o in n)t.d(i,o,function(e){return n[e]}.bind(null,o));return i},t.n=function(n){var e=n&&n.__esModule?function(){return n.default}:function(){return n};return t.d(e,"a",e),e},t.o=function(n,e){return Object.prototype.hasOwnProperty.call(n,e)},t.p="",t(t.s=0)}([function(n,e,t){t(1),n.exports=t(3)},function(n,e,t){(function(){var e="undefined"!=typeof window?window.jQuery:t(2);n.exports.ThemeNav={navBar:null,win:null,winScroll:!1,winResize:!1,linkScroll:!1,winPosition:0,winHeight:null,docHeight:null,isRunning:!1,enable:function(n){var t=this;void 0===n&&(n=!0),t.isRunning||(t.isRunning=!0,e((function(e){t.init(e),t.reset(),t.win.on("hashchange",t.reset),n&&t.win.on("scroll",(function(){t.linkScroll||t.winScroll||(t.winScroll=!0,requestAnimationFrame((function(){t.onScroll()})))})),t.win.on("resize",(function(){t.winResize||(t.winResize=!0,requestAnimationFrame((function(){t.onResize()})))})),t.onResize()})))},enableSticky:function(){this.enable(!0)},init:function(n){n(document);var e=this;this.navBar=n("div.wy-side-scroll:first"),this.win=n(window),n(document).on("click","[data-toggle='wy-nav-top']",(function(){n("[data-toggle='wy-nav-shift']").toggleClass("shift"),n("[data-toggle='rst-versions']").toggleClass("shift")})).on("click",".wy-menu-vertical .current ul li a",(function(){var t=n(this);n("[data-toggle='wy-nav-shift']").removeClass("shift"),n("[data-toggle='rst-versions']").toggleClass("shift"),e.toggleCurrent(t),e.hashChange()})).on("click","[data-toggle='rst-current-version']",(function(){n("[data-toggle='rst-versions']").toggleClass("shift-up")})),n("table.docutils:not(.field-list,.footnote,.citation)").wrap("
"),n("table.docutils.footnote").wrap("
"),n("table.docutils.citation").wrap("
"),n(".wy-menu-vertical ul").not(".simple").siblings("a").each((function(){var t=n(this);expand=n(''),expand.on("click",(function(n){return e.toggleCurrent(t),n.stopPropagation(),!1})),t.prepend(expand)}))},reset:function(){var n=encodeURI(window.location.hash)||"#";try{var e=$(".wy-menu-vertical"),t=e.find('[href="'+n+'"]');if(0===t.length){var i=$('.document [id="'+n.substring(1)+'"]').closest("div.section");0===(t=e.find('[href="#'+i.attr("id")+'"]')).length&&(t=e.find('[href="#"]'))}if(t.length>0){$(".wy-menu-vertical .current").removeClass("current").attr("aria-expanded","false"),t.addClass("current").attr("aria-expanded","true"),t.closest("li.toctree-l1").parent().addClass("current").attr("aria-expanded","true");for(let n=1;n<=10;n++)t.closest("li.toctree-l"+n).addClass("current").attr("aria-expanded","true");t[0].scrollIntoView()}}catch(n){console.log("Error expanding nav for anchor",n)}},onScroll:function(){this.winScroll=!1;var n=this.win.scrollTop(),e=n+this.winHeight,t=this.navBar.scrollTop()+(n-this.winPosition);n<0||e>this.docHeight||(this.navBar.scrollTop(t),this.winPosition=n)},onResize:function(){this.winResize=!1,this.winHeight=this.win.height(),this.docHeight=$(document).height()},hashChange:function(){this.linkScroll=!0,this.win.one("hashchange",(function(){this.linkScroll=!1}))},toggleCurrent:function(n){var e=n.closest("li");e.siblings("li.current").removeClass("current").attr("aria-expanded","false"),e.siblings().find("li.current").removeClass("current").attr("aria-expanded","false");var t=e.find("> ul li");t.length&&(t.removeClass("current").attr("aria-expanded","false"),e.toggleClass("current").attr("aria-expanded",(function(n,e){return"true"==e?"false":"true"})))}},"undefined"!=typeof window&&(window.SphinxRtdTheme={Navigation:n.exports.ThemeNav,StickyNav:n.exports.ThemeNav}),function(){for(var n=0,e=["ms","moz","webkit","o"],t=0;t0 + var meq1 = "^(" + C + ")?" + V + C + "(" + V + ")?$"; // [C]VC[V] is m=1 + var mgr1 = "^(" + C + ")?" + V + C + V + C; // [C]VCVC... is m>1 + var s_v = "^(" + C + ")?" + v; // vowel in stem + + this.stemWord = function (w) { + var stem; + var suffix; + var firstch; + var origword = w; + + if (w.length < 3) + return w; + + var re; + var re2; + var re3; + var re4; + + firstch = w.substr(0,1); + if (firstch == "y") + w = firstch.toUpperCase() + w.substr(1); + + // Step 1a + re = /^(.+?)(ss|i)es$/; + re2 = /^(.+?)([^s])s$/; + + if (re.test(w)) + w = w.replace(re,"$1$2"); + else if (re2.test(w)) + w = w.replace(re2,"$1$2"); + + // Step 1b + re = /^(.+?)eed$/; + re2 = /^(.+?)(ed|ing)$/; + if (re.test(w)) { + var fp = re.exec(w); + re = new RegExp(mgr0); + if (re.test(fp[1])) { + re = /.$/; + w = w.replace(re,""); + } + } + else if (re2.test(w)) { + var fp = re2.exec(w); + stem = fp[1]; + re2 = new RegExp(s_v); + if (re2.test(stem)) { + w = stem; + re2 = /(at|bl|iz)$/; + re3 = new RegExp("([^aeiouylsz])\\1$"); + re4 = new RegExp("^" + C + v + "[^aeiouwxy]$"); + if (re2.test(w)) + w = w + "e"; + else if (re3.test(w)) { + re = /.$/; + w = w.replace(re,""); + } + else if (re4.test(w)) + w = w + "e"; + } + } + + // Step 1c + re = /^(.+?)y$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + re = new RegExp(s_v); + if (re.test(stem)) + w = stem + "i"; + } + + // Step 2 + re = /^(.+?)(ational|tional|enci|anci|izer|bli|alli|entli|eli|ousli|ization|ation|ator|alism|iveness|fulness|ousness|aliti|iviti|biliti|logi)$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + suffix = fp[2]; + re = new RegExp(mgr0); + if (re.test(stem)) + w = stem + step2list[suffix]; + } + + // Step 3 + re = /^(.+?)(icate|ative|alize|iciti|ical|ful|ness)$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + suffix = fp[2]; + re = new RegExp(mgr0); + if (re.test(stem)) + w = stem + step3list[suffix]; + } + + // Step 4 + re = /^(.+?)(al|ance|ence|er|ic|able|ible|ant|ement|ment|ent|ou|ism|ate|iti|ous|ive|ize)$/; + re2 = /^(.+?)(s|t)(ion)$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + re = new RegExp(mgr1); + if (re.test(stem)) + w = stem; + } + else if (re2.test(w)) { + var fp = re2.exec(w); + stem = fp[1] + fp[2]; + re2 = new RegExp(mgr1); + if (re2.test(stem)) + w = stem; + } + + // Step 5 + re = /^(.+?)e$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + re = new RegExp(mgr1); + re2 = new RegExp(meq1); + re3 = new RegExp("^" + C + v + "[^aeiouwxy]$"); + if (re.test(stem) || (re2.test(stem) && !(re3.test(stem)))) + w = stem; + } + re = /ll$/; + re2 = new RegExp(mgr1); + if (re.test(w) && re2.test(w)) { + re = /.$/; + w = w.replace(re,""); + } + + // and turn initial Y back to y + if (firstch == "y") + w = firstch.toLowerCase() + w.substr(1); + return w; + } +} + diff --git a/_static/minus.png b/_static/minus.png new file mode 100644 index 00000000..d96755fd Binary files /dev/null and b/_static/minus.png differ diff --git a/_static/nbsphinx-broken-thumbnail.svg b/_static/nbsphinx-broken-thumbnail.svg new file mode 100644 index 00000000..4919ca88 --- /dev/null +++ b/_static/nbsphinx-broken-thumbnail.svg @@ -0,0 +1,9 @@ + + + + diff --git a/_static/nbsphinx-code-cells.css b/_static/nbsphinx-code-cells.css new file mode 100644 index 00000000..a3fb27c3 --- /dev/null +++ b/_static/nbsphinx-code-cells.css @@ -0,0 +1,259 @@ +/* remove conflicting styling from Sphinx themes */ +div.nbinput.container div.prompt *, +div.nboutput.container div.prompt *, +div.nbinput.container div.input_area pre, +div.nboutput.container div.output_area pre, +div.nbinput.container div.input_area .highlight, +div.nboutput.container div.output_area .highlight { + border: none; + padding: 0; + margin: 0; + box-shadow: none; +} + +div.nbinput.container > div[class*=highlight], +div.nboutput.container > div[class*=highlight] { + margin: 0; +} + +div.nbinput.container div.prompt *, +div.nboutput.container div.prompt * { + background: none; +} + +div.nboutput.container div.output_area .highlight, +div.nboutput.container div.output_area pre { + background: unset; +} + +div.nboutput.container div.output_area div.highlight { + color: unset; /* override Pygments text color */ +} + +/* avoid gaps between output lines */ +div.nboutput.container div[class*=highlight] pre { + line-height: normal; +} + +/* input/output containers */ +div.nbinput.container, +div.nboutput.container { + display: -webkit-flex; + display: flex; + align-items: flex-start; + margin: 0; + width: 100%; +} +@media (max-width: 540px) { + div.nbinput.container, + div.nboutput.container { + flex-direction: column; + } +} + +/* input container */ +div.nbinput.container { + padding-top: 5px; +} + +/* last container */ +div.nblast.container { + padding-bottom: 5px; +} + +/* input prompt */ +div.nbinput.container div.prompt pre, +/* for sphinx_immaterial theme: */ +div.nbinput.container div.prompt pre > code { + color: #307FC1; +} + +/* output prompt */ +div.nboutput.container div.prompt pre, +/* for sphinx_immaterial theme: */ +div.nboutput.container div.prompt pre > code { + color: #BF5B3D; +} + +/* all prompts */ +div.nbinput.container div.prompt, +div.nboutput.container div.prompt { + width: 4.5ex; + padding-top: 5px; + position: relative; + user-select: none; +} + +div.nbinput.container div.prompt > div, +div.nboutput.container div.prompt > div { + position: absolute; + right: 0; + margin-right: 0.3ex; +} + +@media (max-width: 540px) { + div.nbinput.container div.prompt, + div.nboutput.container div.prompt { + width: unset; + text-align: left; + padding: 0.4em; + } + div.nboutput.container div.prompt.empty { + padding: 0; + } + + div.nbinput.container div.prompt > div, + div.nboutput.container div.prompt > div { + position: unset; + } +} + +/* disable scrollbars and line breaks on prompts */ +div.nbinput.container div.prompt pre, +div.nboutput.container div.prompt pre { + overflow: hidden; + white-space: pre; +} + +/* input/output area */ +div.nbinput.container div.input_area, +div.nboutput.container div.output_area { + -webkit-flex: 1; + flex: 1; + overflow: auto; +} +@media (max-width: 540px) { + div.nbinput.container div.input_area, + div.nboutput.container div.output_area { + width: 100%; + } +} + +/* input area */ +div.nbinput.container div.input_area { + border: 1px solid #e0e0e0; + border-radius: 2px; + /*background: #f5f5f5;*/ +} + +/* override MathJax center alignment in output cells */ +div.nboutput.container div[class*=MathJax] { + text-align: left !important; +} + +/* override sphinx.ext.imgmath center alignment in output cells */ +div.nboutput.container div.math p { + text-align: left; +} + +/* standard error */ +div.nboutput.container div.output_area.stderr { + background: #fdd; +} + +/* ANSI colors */ +.ansi-black-fg { color: #3E424D; } +.ansi-black-bg { background-color: #3E424D; } +.ansi-black-intense-fg { color: #282C36; } +.ansi-black-intense-bg { background-color: #282C36; } +.ansi-red-fg { color: #E75C58; } +.ansi-red-bg { background-color: #E75C58; } +.ansi-red-intense-fg { color: #B22B31; } +.ansi-red-intense-bg { background-color: #B22B31; } +.ansi-green-fg { color: #00A250; } +.ansi-green-bg { background-color: #00A250; } +.ansi-green-intense-fg { color: #007427; } +.ansi-green-intense-bg { background-color: #007427; } +.ansi-yellow-fg { color: #DDB62B; } +.ansi-yellow-bg { background-color: #DDB62B; } +.ansi-yellow-intense-fg { color: #B27D12; } +.ansi-yellow-intense-bg { background-color: #B27D12; } +.ansi-blue-fg { color: #208FFB; } +.ansi-blue-bg { background-color: #208FFB; } +.ansi-blue-intense-fg { color: #0065CA; } +.ansi-blue-intense-bg { background-color: #0065CA; } +.ansi-magenta-fg { color: #D160C4; } +.ansi-magenta-bg { background-color: #D160C4; } +.ansi-magenta-intense-fg { color: #A03196; } +.ansi-magenta-intense-bg { background-color: #A03196; } +.ansi-cyan-fg { color: #60C6C8; } +.ansi-cyan-bg { background-color: #60C6C8; } +.ansi-cyan-intense-fg { color: #258F8F; } +.ansi-cyan-intense-bg { background-color: #258F8F; } +.ansi-white-fg { color: #C5C1B4; } +.ansi-white-bg { background-color: #C5C1B4; } +.ansi-white-intense-fg { color: #A1A6B2; } +.ansi-white-intense-bg { background-color: #A1A6B2; } + +.ansi-default-inverse-fg { color: #FFFFFF; } +.ansi-default-inverse-bg { background-color: #000000; } + +.ansi-bold { font-weight: bold; } +.ansi-underline { text-decoration: underline; } + + +div.nbinput.container div.input_area div[class*=highlight] > pre, +div.nboutput.container div.output_area div[class*=highlight] > pre, +div.nboutput.container div.output_area div[class*=highlight].math, +div.nboutput.container div.output_area.rendered_html, +div.nboutput.container div.output_area > div.output_javascript, +div.nboutput.container div.output_area:not(.rendered_html) > img{ + padding: 5px; + margin: 0; +} + +/* fix copybtn overflow problem in chromium (needed for 'sphinx_copybutton') */ +div.nbinput.container div.input_area > div[class^='highlight'], +div.nboutput.container div.output_area > div[class^='highlight']{ + overflow-y: hidden; +} + +/* hide copy button on prompts for 'sphinx_copybutton' extension ... */ +.prompt .copybtn, +/* ... and 'sphinx_immaterial' theme */ +.prompt .md-clipboard.md-icon { + display: none; +} + +/* Some additional styling taken form the Jupyter notebook CSS */ +.jp-RenderedHTMLCommon table, +div.rendered_html table { + border: none; + border-collapse: collapse; + border-spacing: 0; + color: black; + font-size: 12px; + table-layout: fixed; +} +.jp-RenderedHTMLCommon thead, +div.rendered_html thead { + border-bottom: 1px solid black; + vertical-align: bottom; +} +.jp-RenderedHTMLCommon tr, +.jp-RenderedHTMLCommon th, +.jp-RenderedHTMLCommon td, +div.rendered_html tr, +div.rendered_html th, +div.rendered_html td { + text-align: right; + vertical-align: middle; + padding: 0.5em 0.5em; + line-height: normal; + white-space: normal; + max-width: none; + border: none; +} +.jp-RenderedHTMLCommon th, +div.rendered_html th { + font-weight: bold; +} +.jp-RenderedHTMLCommon tbody tr:nth-child(odd), +div.rendered_html tbody tr:nth-child(odd) { + background: #f5f5f5; +} +.jp-RenderedHTMLCommon tbody tr:hover, +div.rendered_html tbody tr:hover { + background: rgba(66, 165, 245, 0.2); +} + diff --git a/_static/nbsphinx-gallery.css b/_static/nbsphinx-gallery.css new file mode 100644 index 00000000..365c27a9 --- /dev/null +++ b/_static/nbsphinx-gallery.css @@ -0,0 +1,31 @@ +.nbsphinx-gallery { + display: grid; + grid-template-columns: repeat(auto-fill, minmax(160px, 1fr)); + gap: 5px; + margin-top: 1em; + margin-bottom: 1em; +} + +.nbsphinx-gallery > a { + padding: 5px; + border: 1px dotted currentColor; + border-radius: 2px; + text-align: center; +} + +.nbsphinx-gallery > a:hover { + border-style: solid; +} + +.nbsphinx-gallery img { + max-width: 100%; + max-height: 100%; +} + +.nbsphinx-gallery > a > div:first-child { + display: flex; + align-items: start; + justify-content: center; + height: 120px; + margin-bottom: 5px; +} diff --git a/_static/nbsphinx-no-thumbnail.svg b/_static/nbsphinx-no-thumbnail.svg new file mode 100644 index 00000000..9dca7588 --- /dev/null +++ b/_static/nbsphinx-no-thumbnail.svg @@ -0,0 +1,9 @@ + + + + diff --git a/_static/plus.png b/_static/plus.png new file mode 100644 index 00000000..7107cec9 Binary files /dev/null and b/_static/plus.png differ diff --git a/_static/pygments.css b/_static/pygments.css new file mode 100644 index 00000000..84ab3030 --- /dev/null +++ b/_static/pygments.css @@ -0,0 +1,75 @@ +pre { line-height: 125%; } +td.linenos .normal { color: inherit; background-color: transparent; padding-left: 5px; padding-right: 5px; } +span.linenos { color: inherit; background-color: transparent; padding-left: 5px; padding-right: 5px; } +td.linenos .special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; } +span.linenos.special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; } +.highlight .hll { background-color: #ffffcc } +.highlight { background: #f8f8f8; } +.highlight .c { color: #3D7B7B; font-style: italic } /* Comment */ +.highlight .err { border: 1px solid #FF0000 } /* Error */ +.highlight .k { color: #008000; font-weight: bold } /* Keyword */ +.highlight .o { color: #666666 } /* Operator */ +.highlight .ch { color: #3D7B7B; font-style: italic } /* Comment.Hashbang */ +.highlight .cm { color: #3D7B7B; font-style: italic } /* Comment.Multiline */ +.highlight .cp { color: #9C6500 } /* Comment.Preproc */ +.highlight .cpf { color: #3D7B7B; font-style: italic } /* Comment.PreprocFile */ +.highlight .c1 { color: #3D7B7B; font-style: italic } /* Comment.Single */ +.highlight .cs { color: #3D7B7B; font-style: italic } /* Comment.Special */ +.highlight .gd { color: #A00000 } /* Generic.Deleted */ +.highlight .ge { font-style: italic } /* Generic.Emph */ +.highlight .ges { font-weight: bold; font-style: italic } /* Generic.EmphStrong */ +.highlight .gr { color: #E40000 } /* Generic.Error */ +.highlight .gh { color: #000080; font-weight: bold } /* Generic.Heading */ +.highlight .gi { color: #008400 } /* Generic.Inserted */ +.highlight .go { color: #717171 } /* Generic.Output */ +.highlight .gp { color: #000080; font-weight: bold } /* Generic.Prompt */ +.highlight .gs { font-weight: bold } /* Generic.Strong */ +.highlight .gu { color: #800080; font-weight: bold } /* Generic.Subheading */ +.highlight .gt { color: #0044DD } /* Generic.Traceback */ +.highlight .kc { color: #008000; font-weight: bold } /* Keyword.Constant */ +.highlight .kd { color: #008000; font-weight: bold } /* Keyword.Declaration */ +.highlight .kn { color: #008000; font-weight: bold } /* Keyword.Namespace */ +.highlight .kp { color: #008000 } /* Keyword.Pseudo */ +.highlight .kr { color: #008000; font-weight: bold } /* Keyword.Reserved */ +.highlight .kt { color: #B00040 } /* Keyword.Type */ +.highlight .m { color: #666666 } /* Literal.Number */ +.highlight .s { color: #BA2121 } /* Literal.String */ +.highlight .na { color: #687822 } /* Name.Attribute */ +.highlight .nb { color: #008000 } /* Name.Builtin */ +.highlight .nc { color: #0000FF; font-weight: bold } /* Name.Class */ +.highlight .no { color: #880000 } /* Name.Constant */ +.highlight .nd { color: #AA22FF } /* Name.Decorator */ +.highlight .ni { color: #717171; font-weight: bold } /* Name.Entity */ +.highlight .ne { color: #CB3F38; font-weight: bold } /* Name.Exception */ +.highlight .nf { color: #0000FF } /* Name.Function */ +.highlight .nl { color: #767600 } /* Name.Label */ +.highlight .nn { color: #0000FF; font-weight: bold } /* Name.Namespace */ +.highlight .nt { color: #008000; font-weight: bold } /* Name.Tag */ +.highlight .nv { color: #19177C } /* Name.Variable */ +.highlight .ow { color: #AA22FF; font-weight: bold } /* Operator.Word */ +.highlight .w { color: #bbbbbb } /* Text.Whitespace */ +.highlight .mb { color: #666666 } /* Literal.Number.Bin */ +.highlight .mf { color: #666666 } /* Literal.Number.Float */ +.highlight .mh { color: #666666 } /* Literal.Number.Hex */ +.highlight .mi { color: #666666 } /* Literal.Number.Integer */ +.highlight .mo { color: #666666 } /* Literal.Number.Oct */ +.highlight .sa { color: #BA2121 } /* Literal.String.Affix */ +.highlight .sb { color: #BA2121 } /* Literal.String.Backtick */ +.highlight .sc { color: #BA2121 } /* Literal.String.Char */ +.highlight .dl { color: #BA2121 } /* Literal.String.Delimiter */ +.highlight .sd { color: #BA2121; font-style: italic } /* Literal.String.Doc */ +.highlight .s2 { color: #BA2121 } /* Literal.String.Double */ +.highlight .se { color: #AA5D1F; font-weight: bold } /* Literal.String.Escape */ +.highlight .sh { color: #BA2121 } /* Literal.String.Heredoc */ +.highlight .si { color: #A45A77; font-weight: bold } /* Literal.String.Interpol */ +.highlight .sx { color: #008000 } /* Literal.String.Other */ +.highlight .sr { color: #A45A77 } /* Literal.String.Regex */ +.highlight .s1 { color: #BA2121 } /* Literal.String.Single */ +.highlight .ss { color: #19177C } /* Literal.String.Symbol */ +.highlight .bp { color: #008000 } /* Name.Builtin.Pseudo */ +.highlight .fm { color: #0000FF } /* Name.Function.Magic */ +.highlight .vc { color: #19177C } /* Name.Variable.Class */ +.highlight .vg { color: #19177C } /* Name.Variable.Global */ +.highlight .vi { color: #19177C } /* Name.Variable.Instance */ +.highlight .vm { color: #19177C } /* Name.Variable.Magic */ +.highlight .il { color: #666666 } /* Literal.Number.Integer.Long */ \ No newline at end of file diff --git a/_static/searchtools.js b/_static/searchtools.js new file mode 100644 index 00000000..7918c3fa --- /dev/null +++ b/_static/searchtools.js @@ -0,0 +1,574 @@ +/* + * searchtools.js + * ~~~~~~~~~~~~~~~~ + * + * Sphinx JavaScript utilities for the full-text search. + * + * :copyright: Copyright 2007-2023 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ +"use strict"; + +/** + * Simple result scoring code. + */ +if (typeof Scorer === "undefined") { + var Scorer = { + // Implement the following function to further tweak the score for each result + // The function takes a result array [docname, title, anchor, descr, score, filename] + // and returns the new score. + /* + score: result => { + const [docname, title, anchor, descr, score, filename] = result + return score + }, + */ + + // query matches the full name of an object + objNameMatch: 11, + // or matches in the last dotted part of the object name + objPartialMatch: 6, + // Additive scores depending on the priority of the object + objPrio: { + 0: 15, // used to be importantResults + 1: 5, // used to be objectResults + 2: -5, // used to be unimportantResults + }, + // Used when the priority is not in the mapping. + objPrioDefault: 0, + + // query found in title + title: 15, + partialTitle: 7, + // query found in terms + term: 5, + partialTerm: 2, + }; +} + +const _removeChildren = (element) => { + while (element && element.lastChild) element.removeChild(element.lastChild); +}; + +/** + * See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#escaping + */ +const _escapeRegExp = (string) => + string.replace(/[.*+\-?^${}()|[\]\\]/g, "\\$&"); // $& means the whole matched string + +const _displayItem = (item, searchTerms, highlightTerms) => { + const docBuilder = DOCUMENTATION_OPTIONS.BUILDER; + const docFileSuffix = DOCUMENTATION_OPTIONS.FILE_SUFFIX; + const docLinkSuffix = DOCUMENTATION_OPTIONS.LINK_SUFFIX; + const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY; + const contentRoot = document.documentElement.dataset.content_root; + + const [docName, title, anchor, descr, score, _filename] = item; + + let listItem = document.createElement("li"); + let requestUrl; + let linkUrl; + if (docBuilder === "dirhtml") { + // dirhtml builder + let dirname = docName + "/"; + if (dirname.match(/\/index\/$/)) + dirname = dirname.substring(0, dirname.length - 6); + else if (dirname === "index/") dirname = ""; + requestUrl = contentRoot + dirname; + linkUrl = requestUrl; + } else { + // normal html builders + requestUrl = contentRoot + docName + docFileSuffix; + linkUrl = docName + docLinkSuffix; + } + let linkEl = listItem.appendChild(document.createElement("a")); + linkEl.href = linkUrl + anchor; + linkEl.dataset.score = score; + linkEl.innerHTML = title; + if (descr) { + listItem.appendChild(document.createElement("span")).innerHTML = + " (" + descr + ")"; + // highlight search terms in the description + if (SPHINX_HIGHLIGHT_ENABLED) // set in sphinx_highlight.js + highlightTerms.forEach((term) => _highlightText(listItem, term, "highlighted")); + } + else if (showSearchSummary) + fetch(requestUrl) + .then((responseData) => responseData.text()) + .then((data) => { + if (data) + listItem.appendChild( + Search.makeSearchSummary(data, searchTerms) + ); + // highlight search terms in the summary + if (SPHINX_HIGHLIGHT_ENABLED) // set in sphinx_highlight.js + highlightTerms.forEach((term) => _highlightText(listItem, term, "highlighted")); + }); + Search.output.appendChild(listItem); +}; +const _finishSearch = (resultCount) => { + Search.stopPulse(); + Search.title.innerText = _("Search Results"); + if (!resultCount) + Search.status.innerText = Documentation.gettext( + "Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories." + ); + else + Search.status.innerText = _( + `Search finished, found ${resultCount} page(s) matching the search query.` + ); +}; +const _displayNextItem = ( + results, + resultCount, + searchTerms, + highlightTerms, +) => { + // results left, load the summary and display it + // this is intended to be dynamic (don't sub resultsCount) + if (results.length) { + _displayItem(results.pop(), searchTerms, highlightTerms); + setTimeout( + () => _displayNextItem(results, resultCount, searchTerms, highlightTerms), + 5 + ); + } + // search finished, update title and status message + else _finishSearch(resultCount); +}; + +/** + * Default splitQuery function. Can be overridden in ``sphinx.search`` with a + * custom function per language. + * + * The regular expression works by splitting the string on consecutive characters + * that are not Unicode letters, numbers, underscores, or emoji characters. + * This is the same as ``\W+`` in Python, preserving the surrogate pair area. + */ +if (typeof splitQuery === "undefined") { + var splitQuery = (query) => query + .split(/[^\p{Letter}\p{Number}_\p{Emoji_Presentation}]+/gu) + .filter(term => term) // remove remaining empty strings +} + +/** + * Search Module + */ +const Search = { + _index: null, + _queued_query: null, + _pulse_status: -1, + + htmlToText: (htmlString) => { + const htmlElement = new DOMParser().parseFromString(htmlString, 'text/html'); + htmlElement.querySelectorAll(".headerlink").forEach((el) => { el.remove() }); + const docContent = htmlElement.querySelector('[role="main"]'); + if (docContent !== undefined) return docContent.textContent; + console.warn( + "Content block not found. Sphinx search tries to obtain it via '[role=main]'. Could you check your theme or template." + ); + return ""; + }, + + init: () => { + const query = new URLSearchParams(window.location.search).get("q"); + document + .querySelectorAll('input[name="q"]') + .forEach((el) => (el.value = query)); + if (query) Search.performSearch(query); + }, + + loadIndex: (url) => + (document.body.appendChild(document.createElement("script")).src = url), + + setIndex: (index) => { + Search._index = index; + if (Search._queued_query !== null) { + const query = Search._queued_query; + Search._queued_query = null; + Search.query(query); + } + }, + + hasIndex: () => Search._index !== null, + + deferQuery: (query) => (Search._queued_query = query), + + stopPulse: () => (Search._pulse_status = -1), + + startPulse: () => { + if (Search._pulse_status >= 0) return; + + const pulse = () => { + Search._pulse_status = (Search._pulse_status + 1) % 4; + Search.dots.innerText = ".".repeat(Search._pulse_status); + if (Search._pulse_status >= 0) window.setTimeout(pulse, 500); + }; + pulse(); + }, + + /** + * perform a search for something (or wait until index is loaded) + */ + performSearch: (query) => { + // create the required interface elements + const searchText = document.createElement("h2"); + searchText.textContent = _("Searching"); + const searchSummary = document.createElement("p"); + searchSummary.classList.add("search-summary"); + searchSummary.innerText = ""; + const searchList = document.createElement("ul"); + searchList.classList.add("search"); + + const out = document.getElementById("search-results"); + Search.title = out.appendChild(searchText); + Search.dots = Search.title.appendChild(document.createElement("span")); + Search.status = out.appendChild(searchSummary); + Search.output = out.appendChild(searchList); + + const searchProgress = document.getElementById("search-progress"); + // Some themes don't use the search progress node + if (searchProgress) { + searchProgress.innerText = _("Preparing search..."); + } + Search.startPulse(); + + // index already loaded, the browser was quick! + if (Search.hasIndex()) Search.query(query); + else Search.deferQuery(query); + }, + + /** + * execute search (requires search index to be loaded) + */ + query: (query) => { + const filenames = Search._index.filenames; + const docNames = Search._index.docnames; + const titles = Search._index.titles; + const allTitles = Search._index.alltitles; + const indexEntries = Search._index.indexentries; + + // stem the search terms and add them to the correct list + const stemmer = new Stemmer(); + const searchTerms = new Set(); + const excludedTerms = new Set(); + const highlightTerms = new Set(); + const objectTerms = new Set(splitQuery(query.toLowerCase().trim())); + splitQuery(query.trim()).forEach((queryTerm) => { + const queryTermLower = queryTerm.toLowerCase(); + + // maybe skip this "word" + // stopwords array is from language_data.js + if ( + stopwords.indexOf(queryTermLower) !== -1 || + queryTerm.match(/^\d+$/) + ) + return; + + // stem the word + let word = stemmer.stemWord(queryTermLower); + // select the correct list + if (word[0] === "-") excludedTerms.add(word.substr(1)); + else { + searchTerms.add(word); + highlightTerms.add(queryTermLower); + } + }); + + if (SPHINX_HIGHLIGHT_ENABLED) { // set in sphinx_highlight.js + localStorage.setItem("sphinx_highlight_terms", [...highlightTerms].join(" ")) + } + + // console.debug("SEARCH: searching for:"); + // console.info("required: ", [...searchTerms]); + // console.info("excluded: ", [...excludedTerms]); + + // array of [docname, title, anchor, descr, score, filename] + let results = []; + _removeChildren(document.getElementById("search-progress")); + + const queryLower = query.toLowerCase(); + for (const [title, foundTitles] of Object.entries(allTitles)) { + if (title.toLowerCase().includes(queryLower) && (queryLower.length >= title.length/2)) { + for (const [file, id] of foundTitles) { + let score = Math.round(100 * queryLower.length / title.length) + results.push([ + docNames[file], + titles[file] !== title ? `${titles[file]} > ${title}` : title, + id !== null ? "#" + id : "", + null, + score, + filenames[file], + ]); + } + } + } + + // search for explicit entries in index directives + for (const [entry, foundEntries] of Object.entries(indexEntries)) { + if (entry.includes(queryLower) && (queryLower.length >= entry.length/2)) { + for (const [file, id] of foundEntries) { + let score = Math.round(100 * queryLower.length / entry.length) + results.push([ + docNames[file], + titles[file], + id ? "#" + id : "", + null, + score, + filenames[file], + ]); + } + } + } + + // lookup as object + objectTerms.forEach((term) => + results.push(...Search.performObjectSearch(term, objectTerms)) + ); + + // lookup as search terms in fulltext + results.push(...Search.performTermsSearch(searchTerms, excludedTerms)); + + // let the scorer override scores with a custom scoring function + if (Scorer.score) results.forEach((item) => (item[4] = Scorer.score(item))); + + // now sort the results by score (in opposite order of appearance, since the + // display function below uses pop() to retrieve items) and then + // alphabetically + results.sort((a, b) => { + const leftScore = a[4]; + const rightScore = b[4]; + if (leftScore === rightScore) { + // same score: sort alphabetically + const leftTitle = a[1].toLowerCase(); + const rightTitle = b[1].toLowerCase(); + if (leftTitle === rightTitle) return 0; + return leftTitle > rightTitle ? -1 : 1; // inverted is intentional + } + return leftScore > rightScore ? 1 : -1; + }); + + // remove duplicate search results + // note the reversing of results, so that in the case of duplicates, the highest-scoring entry is kept + let seen = new Set(); + results = results.reverse().reduce((acc, result) => { + let resultStr = result.slice(0, 4).concat([result[5]]).map(v => String(v)).join(','); + if (!seen.has(resultStr)) { + acc.push(result); + seen.add(resultStr); + } + return acc; + }, []); + + results = results.reverse(); + + // for debugging + //Search.lastresults = results.slice(); // a copy + // console.info("search results:", Search.lastresults); + + // print the results + _displayNextItem(results, results.length, searchTerms, highlightTerms); + }, + + /** + * search for object names + */ + performObjectSearch: (object, objectTerms) => { + const filenames = Search._index.filenames; + const docNames = Search._index.docnames; + const objects = Search._index.objects; + const objNames = Search._index.objnames; + const titles = Search._index.titles; + + const results = []; + + const objectSearchCallback = (prefix, match) => { + const name = match[4] + const fullname = (prefix ? prefix + "." : "") + name; + const fullnameLower = fullname.toLowerCase(); + if (fullnameLower.indexOf(object) < 0) return; + + let score = 0; + const parts = fullnameLower.split("."); + + // check for different match types: exact matches of full name or + // "last name" (i.e. last dotted part) + if (fullnameLower === object || parts.slice(-1)[0] === object) + score += Scorer.objNameMatch; + else if (parts.slice(-1)[0].indexOf(object) > -1) + score += Scorer.objPartialMatch; // matches in last name + + const objName = objNames[match[1]][2]; + const title = titles[match[0]]; + + // If more than one term searched for, we require other words to be + // found in the name/title/description + const otherTerms = new Set(objectTerms); + otherTerms.delete(object); + if (otherTerms.size > 0) { + const haystack = `${prefix} ${name} ${objName} ${title}`.toLowerCase(); + if ( + [...otherTerms].some((otherTerm) => haystack.indexOf(otherTerm) < 0) + ) + return; + } + + let anchor = match[3]; + if (anchor === "") anchor = fullname; + else if (anchor === "-") anchor = objNames[match[1]][1] + "-" + fullname; + + const descr = objName + _(", in ") + title; + + // add custom score for some objects according to scorer + if (Scorer.objPrio.hasOwnProperty(match[2])) + score += Scorer.objPrio[match[2]]; + else score += Scorer.objPrioDefault; + + results.push([ + docNames[match[0]], + fullname, + "#" + anchor, + descr, + score, + filenames[match[0]], + ]); + }; + Object.keys(objects).forEach((prefix) => + objects[prefix].forEach((array) => + objectSearchCallback(prefix, array) + ) + ); + return results; + }, + + /** + * search for full-text terms in the index + */ + performTermsSearch: (searchTerms, excludedTerms) => { + // prepare search + const terms = Search._index.terms; + const titleTerms = Search._index.titleterms; + const filenames = Search._index.filenames; + const docNames = Search._index.docnames; + const titles = Search._index.titles; + + const scoreMap = new Map(); + const fileMap = new Map(); + + // perform the search on the required terms + searchTerms.forEach((word) => { + const files = []; + const arr = [ + { files: terms[word], score: Scorer.term }, + { files: titleTerms[word], score: Scorer.title }, + ]; + // add support for partial matches + if (word.length > 2) { + const escapedWord = _escapeRegExp(word); + Object.keys(terms).forEach((term) => { + if (term.match(escapedWord) && !terms[word]) + arr.push({ files: terms[term], score: Scorer.partialTerm }); + }); + Object.keys(titleTerms).forEach((term) => { + if (term.match(escapedWord) && !titleTerms[word]) + arr.push({ files: titleTerms[word], score: Scorer.partialTitle }); + }); + } + + // no match but word was a required one + if (arr.every((record) => record.files === undefined)) return; + + // found search word in contents + arr.forEach((record) => { + if (record.files === undefined) return; + + let recordFiles = record.files; + if (recordFiles.length === undefined) recordFiles = [recordFiles]; + files.push(...recordFiles); + + // set score for the word in each file + recordFiles.forEach((file) => { + if (!scoreMap.has(file)) scoreMap.set(file, {}); + scoreMap.get(file)[word] = record.score; + }); + }); + + // create the mapping + files.forEach((file) => { + if (fileMap.has(file) && fileMap.get(file).indexOf(word) === -1) + fileMap.get(file).push(word); + else fileMap.set(file, [word]); + }); + }); + + // now check if the files don't contain excluded terms + const results = []; + for (const [file, wordList] of fileMap) { + // check if all requirements are matched + + // as search terms with length < 3 are discarded + const filteredTermCount = [...searchTerms].filter( + (term) => term.length > 2 + ).length; + if ( + wordList.length !== searchTerms.size && + wordList.length !== filteredTermCount + ) + continue; + + // ensure that none of the excluded terms is in the search result + if ( + [...excludedTerms].some( + (term) => + terms[term] === file || + titleTerms[term] === file || + (terms[term] || []).includes(file) || + (titleTerms[term] || []).includes(file) + ) + ) + break; + + // select one (max) score for the file. + const score = Math.max(...wordList.map((w) => scoreMap.get(file)[w])); + // add result to the result list + results.push([ + docNames[file], + titles[file], + "", + null, + score, + filenames[file], + ]); + } + return results; + }, + + /** + * helper function to return a node containing the + * search summary for a given text. keywords is a list + * of stemmed words. + */ + makeSearchSummary: (htmlText, keywords) => { + const text = Search.htmlToText(htmlText); + if (text === "") return null; + + const textLower = text.toLowerCase(); + const actualStartPosition = [...keywords] + .map((k) => textLower.indexOf(k.toLowerCase())) + .filter((i) => i > -1) + .slice(-1)[0]; + const startWithContext = Math.max(actualStartPosition - 120, 0); + + const top = startWithContext === 0 ? "" : "..."; + const tail = startWithContext + 240 < text.length ? "..." : ""; + + let summary = document.createElement("p"); + summary.classList.add("context"); + summary.textContent = top + text.substr(startWithContext, 240).trim() + tail; + + return summary; + }, +}; + +_ready(Search.init); diff --git a/_static/sphinx_highlight.js b/_static/sphinx_highlight.js new file mode 100644 index 00000000..8a96c69a --- /dev/null +++ b/_static/sphinx_highlight.js @@ -0,0 +1,154 @@ +/* Highlighting utilities for Sphinx HTML documentation. */ +"use strict"; + +const SPHINX_HIGHLIGHT_ENABLED = true + +/** + * highlight a given string on a node by wrapping it in + * span elements with the given class name. + */ +const _highlight = (node, addItems, text, className) => { + if (node.nodeType === Node.TEXT_NODE) { + const val = node.nodeValue; + const parent = node.parentNode; + const pos = val.toLowerCase().indexOf(text); + if ( + pos >= 0 && + !parent.classList.contains(className) && + !parent.classList.contains("nohighlight") + ) { + let span; + + const closestNode = parent.closest("body, svg, foreignObject"); + const isInSVG = closestNode && closestNode.matches("svg"); + if (isInSVG) { + span = document.createElementNS("http://www.w3.org/2000/svg", "tspan"); + } else { + span = document.createElement("span"); + span.classList.add(className); + } + + span.appendChild(document.createTextNode(val.substr(pos, text.length))); + const rest = document.createTextNode(val.substr(pos + text.length)); + parent.insertBefore( + span, + parent.insertBefore( + rest, + node.nextSibling + ) + ); + node.nodeValue = val.substr(0, pos); + /* There may be more occurrences of search term in this node. So call this + * function recursively on the remaining fragment. + */ + _highlight(rest, addItems, text, className); + + if (isInSVG) { + const rect = document.createElementNS( + "http://www.w3.org/2000/svg", + "rect" + ); + const bbox = parent.getBBox(); + rect.x.baseVal.value = bbox.x; + rect.y.baseVal.value = bbox.y; + rect.width.baseVal.value = bbox.width; + rect.height.baseVal.value = bbox.height; + rect.setAttribute("class", className); + addItems.push({ parent: parent, target: rect }); + } + } + } else if (node.matches && !node.matches("button, select, textarea")) { + node.childNodes.forEach((el) => _highlight(el, addItems, text, className)); + } +}; +const _highlightText = (thisNode, text, className) => { + let addItems = []; + _highlight(thisNode, addItems, text, className); + addItems.forEach((obj) => + obj.parent.insertAdjacentElement("beforebegin", obj.target) + ); +}; + +/** + * Small JavaScript module for the documentation. + */ +const SphinxHighlight = { + + /** + * highlight the search words provided in localstorage in the text + */ + highlightSearchWords: () => { + if (!SPHINX_HIGHLIGHT_ENABLED) return; // bail if no highlight + + // get and clear terms from localstorage + const url = new URL(window.location); + const highlight = + localStorage.getItem("sphinx_highlight_terms") + || url.searchParams.get("highlight") + || ""; + localStorage.removeItem("sphinx_highlight_terms") + url.searchParams.delete("highlight"); + window.history.replaceState({}, "", url); + + // get individual terms from highlight string + const terms = highlight.toLowerCase().split(/\s+/).filter(x => x); + if (terms.length === 0) return; // nothing to do + + // There should never be more than one element matching "div.body" + const divBody = document.querySelectorAll("div.body"); + const body = divBody.length ? divBody[0] : document.querySelector("body"); + window.setTimeout(() => { + terms.forEach((term) => _highlightText(body, term, "highlighted")); + }, 10); + + const searchBox = document.getElementById("searchbox"); + if (searchBox === null) return; + searchBox.appendChild( + document + .createRange() + .createContextualFragment( + '" + ) + ); + }, + + /** + * helper function to hide the search marks again + */ + hideSearchWords: () => { + document + .querySelectorAll("#searchbox .highlight-link") + .forEach((el) => el.remove()); + document + .querySelectorAll("span.highlighted") + .forEach((el) => el.classList.remove("highlighted")); + localStorage.removeItem("sphinx_highlight_terms") + }, + + initEscapeListener: () => { + // only install a listener if it is really needed + if (!DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS) return; + + document.addEventListener("keydown", (event) => { + // bail for input elements + if (BLACKLISTED_KEY_CONTROL_ELEMENTS.has(document.activeElement.tagName)) return; + // bail with special keys + if (event.shiftKey || event.altKey || event.ctrlKey || event.metaKey) return; + if (DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS && (event.key === "Escape")) { + SphinxHighlight.hideSearchWords(); + event.preventDefault(); + } + }); + }, +}; + +_ready(() => { + /* Do not call highlightSearchWords() when we are on the search page. + * It will highlight words from the *previous* search query. + */ + if (typeof Search === "undefined") SphinxHighlight.highlightSearchWords(); + SphinxHighlight.initEscapeListener(); +}); diff --git a/api.html b/api.html new file mode 100644 index 00000000..80e95bd7 --- /dev/null +++ b/api.html @@ -0,0 +1,128 @@ + + + + + + + API — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

API

+ + + + + + +

cmoncrawl

+
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/cli/cli.html b/cli/cli.html new file mode 100644 index 00000000..e2b08869 --- /dev/null +++ b/cli/cli.html @@ -0,0 +1,159 @@ + + + + + + + Command Line Interface — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

Command Line Interface

+

The command line interface is a simple wrapper around the library.

+

It provides the two main functionalities:

+
    +
  • download - Downloads samples of either Domain Record or HTML from common crawl indexes

  • +
  • extract - Downloads an HTML from Domain Record and extracts the content. It can also directly take the HTML and extract the data.

  • +
+

Both functionalities are invoked using cmon followed by the functionality and the required arguments. +The cmon command also takes a few optional arguments:

+
+
--verbosity
+

Verbosity level. Choices are [0, 1, 2], with 0 being the least verbose and 2 being the most verbose. Default is 1.

+
+
--aws_profile
+

AWS profile to use for AWS calls (Athena, S3). If not provided, the default AWS profile will be used.

+
+
+
+

Examples

+
# Download first 1000 domain records for example.com
+cmon download --match_type=domain --limit=1000 dr_output record example.com
+
+# Download first 100 htmls for example.com
+cmon download --match_type=domain --limit=100 html_output html example.com
+
+# Take the domain records downloaded using the first command and extracts them using your extractors
+cmon extract config.json extracted_output dr_output/*.jsonl record
+
+# Take the htmls downloaded using the second command and extracts them using your extractors
+cmon extract config.json extracted_output html_output/*.html html
+
+
+
+
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/cli/download.html b/cli/download.html new file mode 100644 index 00000000..477c6abc --- /dev/null +++ b/cli/download.html @@ -0,0 +1,237 @@ + + + + + + + Command Line Download — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

Command Line Download

+

The download mode of the cmon command line tool serves to query and download from CommonCrawl indexes. +The following arguments are needed in this order:

+
+

Positional arguments

+
    +
  1. output - Path to output directory.

  2. +
  3. {record,html} - Download mode:

    +
      +
    • record: Download record files from Common Crawl.

    • +
    • html: Download HTML files from Common Crawl.

    • +
    +
  4. +
  5. urls - URLs to download, e.g. www.bcc.cz.

  6. +
+

In html mode, the output directory will contain .html files, one +for each found URL. In record mode, the output directory will contain +.jsonl files, each containing multiple domain records in JSON format.

+
+
+

Options

+
+
--limit LIMIT
+

Max number of URLs to download.

+
+
--since SINCE
+

Start date in ISO format (e.g., 2020-01-01).

+
+
--to TO
+

End date in ISO format (e.g., 2020-01-01).

+
+
--cc_server CC_SERVER
+

Common Crawl indexes to query. Must provide the whole URL (e.g., https://index.commoncrawl.org/CC-MAIN-2023-14-index).

+
+
--max_retry MAX_RETRY
+

Max number of retries for a request. Increase this number when requests are failing.

+
+
--sleep_base SLEEP_BASE
+

Base sleep time for exponential backoff in case of request failure.

+
+
--max_requests_per_second MAX_REQUESTS_PER_SECOND
+

Max number of requests per second.

+
+
--match_type MATCH_TYPE
+

One of exact, prefix, host, domain +Match type for the URL. Refer to cdx-api for more information. +See cmoncrawl.common.types.MatchType for more information.

+
+
--max_directory_size MAX_DIRECTORY_SIZE
+

Max number of files per directory.

+
+
--filter_non_200
+

Filter out non-200 status code.

+
+
--aggregator AGGREGATOR
+

Aggregator to use for the query.

+
    +
  • athena: Athena aggregator. Fastest, but requires AWS credentials with correct permissions. See Athena for more information.

  • +
  • gateway: Gateway aggregator (default). Very slow, but no need for AWS config.

  • +
+
+
--s3_bucket S3_BUCKET
+

S3 bucket to use for Athena aggregator. Only needed if using Athena aggregator.

+
    +
  • If set the bucket will not be deleted after the query is done, allowing to reuse it for future queries.

  • +
  • If not set, a temporary bucket will be created and deleted after the query is done.

  • +
+
+
+
+

Note

+

If you specify an S3 bucket, remember to delete it manually after you’re done to avoid incurring unnecessary costs.

+
+
+
+

Record mode options

+
+
--max_crawls_per_file MAX_CRAWLS_PER_FILE
+

Max number of domain records per file output

+
+
+
+
+

HTML mode options

+
+
--encoding ENCODING
+

Force usage of specified encoding if possible.

+
+
--download_method DOWNLOAD_METHOD
+

Method for downloading warc files from Common Crawl, it only applies to HTML download.

+
    +
  • api: Download from Common Crawl API Gateway. This is the default option.

  • +
  • s3: Download from Common Crawl S3 bucket. This is the fastest option, but requires AWS credentials with correct permissions.

  • +
+
+
+
+
+

Examples

+
# Download first 1000 domain records for example.com
+cmon download dr_output record --match_type=domain --limit=1000 example.com
+
+# Download first 100 htmls for example.com
+cmon download html_output html --match_type=domain --limit=100 example.com
+
+
+
+
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/cli/extract.html b/cli/extract.html new file mode 100644 index 00000000..22dbf990 --- /dev/null +++ b/cli/extract.html @@ -0,0 +1,220 @@ + + + + + + + Command line Extract — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

Command line Extract

+

The extract mode of the cmon command line tool serves to extract data from your downloaded files. +The following arguments are needed in this order:

+
+

Positional arguments

+
    +
  1. config_path - Path to the config file containing extraction rules.

  2. +
  3. output_path - Path to the output directory.

  4. +
  5. {record,html} - Extraction mode:

    +
      +
    • record: Extract data from jsonl (domain record) files.

    • +
    • html: Extract data from HTML files.

    • +
    +
  6. +
  7. files - Files to extract data from (Either HTML files or .jsonl files).

  8. +
+

To create a config file, see Extractor config file.

+

Both modes yield the same output format, which is a .jsonl file containing the extracted data, +one per line. For each file, a new directory is created in the output directory, named after the +file.

+

The files created by the download mode can be directly used with the appropriate mode +in the extraction.

+
    +
  • If you have an HTML file, you can use the HTML mode to extract it.

  • +
  • If you have a domain records, you can use the RECORD mode to extract it.

  • +
  • If you have domain records, which you acquired without using cmoncrawl,

  • +
+

please refer to Domain Record JSONL format, which describes how to create .jsonl files from your domain records, +which you can then use with the record mode.

+
+
+

Optional arguments

+
+
--max_crawls_per_file MAX_CRAWLS_PER_FILE
+

Max number of extractions per file output.

+
+
--max_directory_size MAX_DIRECTORY_SIZE
+

Max number of extraction files per directory.

+
+
--n_proc N_PROC
+

Number of processes to use for extraction. The parallelization is on file level, +thus for a single file, it’s useless to use more than one process.

+
+
+
+
+

Record arguments

+
+
--max_retry MAX_RETRY
+

Max number of WARC download attempts.

+
+
--download_method DOWNLOAD_METHOD
+

Method for downloading warc files from Common Crawl, it only applies to HTML download.

+
    +
  • api: Download from Common Crawl API Gateway. This is the default option.

  • +
  • s3: Download from Common Crawl S3 bucket. This is the fastest option, but requires AWS credentials with correct permissions.

  • +
+
+
--sleep_base SLEEP_BASE
+

Base value for exponential backoff between failed requests.

+
+
--max_requests_per_second MAX_REQUESTS_PER_SECOND
+

Max number of requests per second.

+
+
+
+
+

Html arguments

+
+
--date DATE
+

Date of extraction of HTML files in ISO format (e.g., 2021-01-01). The default is today.

+
+
--url URL
+

URL from which the HTML files were downloaded. By default, it will try to infer from the file content.

+
+
+
+
+

Examples

+
# Take the domain records downloaded using the first command and extracts them using your extractors
+cmon extract config.json extracted_output dr_output/*.jsonl record --max_retry 100 --download_method=gateway --sleep_base 1.3
+
+# Take the htmls downloaded using the second command and extracts them using your extractors
+cmon extract config.json extracted_output html_output/*.html html --date 2021-01-01 --url https://www.example.com
+
+
+

When you are going to build the extractors, you will appreciate that you can specify +what the URL of the HTML file is and what the date of the extraction is. This is because +those information are used during the extractor routing.

+
+
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/cli/index.html b/cli/index.html new file mode 100644 index 00000000..3138a8ba --- /dev/null +++ b/cli/index.html @@ -0,0 +1,148 @@ + + + + + + + Command Line Interface — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+ + +
+
+
+
+ + + + \ No newline at end of file diff --git a/extraction/config_file.html b/extraction/config_file.html new file mode 100644 index 00000000..9860b4eb --- /dev/null +++ b/extraction/config_file.html @@ -0,0 +1,252 @@ + + + + + + + Extractor config file — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

Extractor config file

+

In many cases you will want to use more than single extractor. +Imagine if you crawl two news websites which have completely different structure +and you want to extract the article. You can achieve this by using the Extractor config file.

+

The extractor config file, defines what extractor should be used for a given HTML file. +You can leverage datetime of the crawl and url to specify which extractor should be used.

+
+

Structure

+

The structure is following:

+
{
+
+    "extractors_path": "Path to the extractors folder",
+    "routes": [
+        {
+            "regexes": [".*"],
+            "extractors": [{
+                "name": "my_extractor",
+                "since": "iso date string",
+                "to": "iso date string"
+            },
+            {
+                "name": "my_extractor2",
+            }
+            ]
+        },
+        {
+            "regexes": ["another_regex"],
+            "....": "...."
+        }
+    ]
+}
+
+
+

The extractors_path is the path to the folder where the extractors are located.

+
+

Note

+

The extractors_path is relative to the current working directory.

+
+

The routes defined a list of possible extractors and conditions we can route to. Each route is a dictionary with the following keys:

+
    +
  • regexes: a list of regexes. At least one regex must match the url, for this route to be used.

  • +
  • extractors: a list of extractors that will be used to extract the data from the url. The first extractor for which since < record_date < to is used.

  • +
+

Each extractor has the following keys:

+
    +
  • name: the name of the extractor. This is the name of the python file without the .py extension, you can also set NAME variable in the extractor file to override this.

  • +
  • since [optional] : The starting crawl date for which the extractor is valid (e.g. 2009-01-01)

  • +
  • to [optional] : The ending crawl date for which the extractor is valid. Format is the same as for since.

  • +
+
+

Note

+

If since and to are not specified, the extractor will match for all crawls for that route.

+
+
+
+

Example

+

Given the following folder structure:

+
extractors/
+├── a_extractor.py
+├── a_extractor2.py
+└── b_extractor.py
+
+
+

and the following config:

+
{
+
+    "extractors_path": "./extractors",
+    "routes": [
+        {
+            "regexes": [".*cmon.cz.*"],
+            "extractors": [{
+                "name": "a_extractor",
+                "to": "2010-01-01"
+            },
+            {
+                "name": "a_extractor2",
+                "since": "2010-01-01"
+            }
+            ]
+        },
+        {
+            "regexes": [".*cmon2.cz.*"],
+            "extractors": [{
+                "name": "b_extractor",
+            }
+            ]
+        }
+    ]
+}
+
+
+

The following will happen:

+
    +
  • A domain record with url http://www.cmon.cz, cralwed on 2012 will be extracted using the a_extractor2.py extractor.

  • +
  • A domain record with url http://www.cmon.cz, cralwed on 2009 will be extracted using the a_extractor.py extractor.

  • +
  • A domain record with url http://www.cmon2.cz, cralwed on 2012 will be extracted using the b_extractor.py extractor.

  • +
+
+
+

__init__.py

+

You might want to put the common code of the extractors into +a common python file. The problem is that during the execution, +the extractors directory is not in the python path. To add the extractors +directory we also load __init__.py` file (But don’t add load extractors in it).

+

Thus you can create __init__.py file in the extractors directory with the following content:

+
import sys
+from pathlib import Path
+sys.path.append(Path(__file__).parent)
+
+
+

which will add the extractors directory to the python path.

+
+
+

Arbitrary Code Execution

+
+

Warning

+

Since the router, loads and executes all files in the extractors +directory, every .py file in this directory is executed. Thus +you should not put any untrusted files in this directory.

+
+
+
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/extraction/creating_extractor.html b/extraction/creating_extractor.html new file mode 100644 index 00000000..a1b3adfd --- /dev/null +++ b/extraction/creating_extractor.html @@ -0,0 +1,222 @@ + + + + + + + Extractor types — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

Extractor types

+

All the extractors you will write must implement the cmoncrawl.processor.pipeline.extractor.IExtractor class. +If you choose to implement it directly, you will have to implement the extract method. +In the method you are provided with the HTML page as a string and crawl Medatata. You then define what data you want to extract from HTML as dictionary or None if you want +to discard the HTML.

+

While the interface is simple it doesn’t handle encoding problems or filtering. +If you want to parse the HTML using bs4 and then extract the data you can use either:

+ +
+
+

Extractor Definition

+

In order to register you extractor, you must define each extractor in +separate file and you must initialize the extractor in that file to variable +named extractor.

+
+

Example 1.

+
+
extractor.py
+
# You can either use the NAME variable to define name,
+# otherwise the name will be inherited from the file name
+NAME='title_extractor'
+
+from cmoncrawl.processor.pipeline.extractor import IExtractor
+from cmoncrawl.common.types import PipeMetadata
+
+class MyExtractor(IExtractor):
+    def extract(self, response: str, metadata: PipeMetadata) -> Dict[str, Any] | None:
+        return {"title": "My title"}
+
+extractor = MyExtractor()
+
+
+
+
+
+
+

BaseExtractor

+

The BaseExtractor` assumes you will want to use parsed HTML using +BeautifulSoup +Thus the only method you need to implement is the extract_soup method.

+
+

Extraction

+
    +
  • extract_soup method

  • +
+

It takes a BeautifulSoup object and crawl metadata (see cmoncrawl.common.types.PipeMetadata) and must return +a dictionary of extracted data or None if the page should not be extacted, for example if you haven’t found all the data you need.

+

Additionaly, you might want to filter the pages you don’t want to +extract. For this, you have two options:

+
+
+

Filtering

+
    +
  • filter_raw method

  • +
+

This method take the raw HTML and crawl metadata and must return True if the page should be extracted or False otherwise. If you can +decide based on raw HTML, this is the most efficient way to filter pages, as now soup parsing will be done.

+
    +
  • filter_soup method

  • +
+

This method take the BeautifulSoup object and crawl metadata and must return True if the page should be extracted or False otherwise.

+

Finally your file must create the said extractor and name it extractor.

+
+
+

Example 2.

+

Here is an example of an extractor that will extract the title of the page.

+
+
ext.py
+
from cmoncrawl.processor.pipeline.extractor import BaseExtractor
+from cmoncrawl.common.types import PipeMetadata
+
+class TitleExtractor(BaseExtractor):
+    def extract_soup(self, soup: BeautifulSoup, metadata: PipeMetadata) -> dict:
+        return {'title': soup.title.text}
+
+    def filter_soup(self, soup: BeautifulSoup, metadata: PipeMetadata) -> bool:
+        return soup.title is not None
+
+extractor = TitleExtractor()
+NAME='title'
+
+
+
+

Now in Extractor config file you would refer to this extractor as title_extractor. +If you would’t set the NAME variable, you would refer to it as ext.

+
+
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/extraction/index.html b/extraction/index.html new file mode 100644 index 00000000..201b4bea --- /dev/null +++ b/extraction/index.html @@ -0,0 +1,157 @@ + + + + + + + Extraction — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

Extraction

+

In order to save space, you might want to extract the information from the +HTMLs directly, without saving the HTMLs themselves. The library does allow +you to do that. In this section, we will show you how you can define your +own extractors.

+ +
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/extraction/utils.html b/extraction/utils.html new file mode 100644 index 00000000..7fb35aed --- /dev/null +++ b/extraction/utils.html @@ -0,0 +1,145 @@ + + + + + + + Extraction utils — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

Extraction utils

+

The utilies for extraction are defined cmoncrawl.processor.extraction. +It provides helper function for both filtering and extraction.

+
+

Filtering

+
    +
  • must_exist_filter`: filter out the ulrs that don’t contain css selector

  • +
  • must_not_exist_filter: filter out the ulrs that contain css selector

  • +
+
+
+

Extraction

+

check_required: Creates a function that checks if all the required fileds are present in the extracted data

+

chain_transform: Creates a function that chains multiple transformation function, if any return None, the chain is broken and None is returned. Especially usefull with soup select etc…

+

extract_transform: Creates a function that extracts the data from the soup tag using the css selector and transforms it using your transformation functions.

+
+
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/generated/cmoncrawl.aggregator.athena_query.html b/generated/cmoncrawl.aggregator.athena_query.html new file mode 100644 index 00000000..8610aa2f --- /dev/null +++ b/generated/cmoncrawl.aggregator.athena_query.html @@ -0,0 +1,181 @@ + + + + + + + cmoncrawl.aggregator.athena_query — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

cmoncrawl.aggregator.athena_query

+

Classes

+
+
+class cmoncrawl.aggregator.athena_query.AthenaAggregator(urls: List[str], match_type: MatchType = MatchType.EXACT, cc_servers: List[str] | None = None, since: datetime = datetime.datetime(1, 1, 1, 0, 0), to: datetime = datetime.datetime(9999, 12, 31, 23, 59, 59, 999999), limit: int | None = None, prefetch_size: int = 2, sleep_base: float = 1.3, max_retry: int = 5, extra_sql_where_clause: str | None = None, batch_size: int = 1, aws_profile: str | None = None, bucket_name: str | None = None, catalog_name: str = 'AwsDataCatalog', database_name: str = 'commoncrawl', table_name: str = 'ccindex')
+

This class is responsible for aggregating the index files from commoncrawl using AWS Athena. +It is an async context manager which can then be used as an async iterator +which yields DomainRecord objects, found in the index files of commoncrawl.

+

It uses the AWS Athena to query from s3 the index files of commoncrawl.

+
+
Parameters:
+
    +
  • urls (List[str]) – A list of urls to search for.

  • +
  • cc_indexes_server (str, optional) – The commoncrawl index server to use. Defaults to “http://index.commoncrawl.org/collinfo.json”.

  • +
  • match_type (MatchType, optional) – Match type for cdx-api. Defaults to MatchType.EXACT.

  • +
  • cc_servers (List[str], optional) – A list of commoncrawl servers to use. If None, then indexes will be retrieved from the cc_indexes_server. Defaults to None.

  • +
  • since (datetime, optional) – The start date for the search. Defaults to datetime.min.

  • +
  • to (datetime, optional) – The end date for the search. Defaults to datetime.max.

  • +
  • limit (int, optional) – The maximum number of results to return. Defaults to None.

  • +
  • prefetch_size (int, optional) – The number of indexes to fetch concurrently. Defaults to 3.

  • +
  • max_retry (int, optional) – The maximum number of retries for a single request. Defaults to 5.

  • +
  • extra_sql_where_clause (str, optional) – Additional SQL WHERE clause to append to the Athena query. Defaults to None.

  • +
  • batch_size (int) – How many crawls to query at once. Defaults to 1. If <= 0, all crawls will be queried at once.

  • +
  • aws_profile (str, optional) – The AWS profile to use for Athena and S3. Defaults to “default”.

  • +
  • bucket_name (str, optional) – The S3 bucket to use for Athena query results. If None, a new bucket will be created. Defaults to None.

  • +
  • catalog_name (str, optional) – The Athena catalog to use. Defaults to “AwsDataCatalog”.

  • +
  • database_name (str, optional) – The Athena database to use. Defaults to “commoncrawl”.

  • +
  • table_name (str, optional) – The Athena table to use. Defaults to “ccindex”.

  • +
+
+
+

Examples

+
>>> async with AthenaAggregator(["example.com"]) as aggregator:
+>>>     async for domain_record in aggregator:
+>>>         print(domain_record)
+
+
+
+
+class AthenaAggregatorIterator(aws_client: Session, urls: List[str], cc_servers: List[str], match_type: MatchType, since: datetime | None, to: datetime | None, limit: int | None, prefetch_size: int, sleep_base: float, max_retry: int, batch_size: int, extra_sql_where_clause: str | None, bucket_name: str, database_name: str, table_name: str)
+
+ +
+ +
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/generated/cmoncrawl.aggregator.base.html b/generated/cmoncrawl.aggregator.base.html new file mode 100644 index 00000000..1fd9323c --- /dev/null +++ b/generated/cmoncrawl.aggregator.base.html @@ -0,0 +1,146 @@ + + + + + + + cmoncrawl.aggregator.base — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

cmoncrawl.aggregator.base

+

Classes

+ + + + + + +

IAggregator()

Base interface for aggregators

+
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/generated/cmoncrawl.aggregator.gateway_query.html b/generated/cmoncrawl.aggregator.gateway_query.html new file mode 100644 index 00000000..4d8aacaf --- /dev/null +++ b/generated/cmoncrawl.aggregator.gateway_query.html @@ -0,0 +1,176 @@ + + + + + + + cmoncrawl.aggregator.gateway_query — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

cmoncrawl.aggregator.gateway_query

+

Classes

+
+
+class cmoncrawl.aggregator.gateway_query.GatewayAggregator(urls: List[str], match_type: MatchType = MatchType.EXACT, cc_servers: List[str] | None = None, since: datetime = datetime.datetime(1, 1, 1, 0, 0), to: datetime = datetime.datetime(9999, 12, 31, 23, 59, 59, 999999), limit: int | None = None, max_retry: int = 5, prefetch_size: int = 3, sleep_base: float = 1.3, max_requests_per_second: int = 20)
+

This class is responsible for aggregating the index files from commoncrawl. +It is an async context manager which can then be used as an async iterator +which yields DomainRecord objects, found in the index files of commoncrawl.

+

It uses the commoncrawl index server to find the index files.

+
+
Parameters:
+
    +
  • urls (List[str]) – A list of urls to search for.

  • +
  • cc_indexes_server (str, optional) – The commoncrawl index server to use. Defaults to “http://index.commoncrawl.org/collinfo.json”.

  • +
  • match_type (MatchType, optional) – Match type for cdx-api. Defaults to None.

  • +
  • cc_servers (List[str], optional) – A list of commoncrawl servers to use. If None, then indexes will be retrieved from the cc_indexes_server. Defaults to None.

  • +
  • since (datetime, optional) – The start date for the search. Defaults to datetime.min.

  • +
  • to (datetime, optional) – The end date for the search. Defaults to datetime.max.

  • +
  • limit (int, optional) – The maximum number of results to return. Defaults to None.

  • +
  • max_retry (int, optional) – The maximum number of retries for a single request. Defaults to 5.

  • +
  • prefetch_size (int, optional) – The number of indexes to fetch concurrently. Defaults to 3.

  • +
  • sleep_base – float: The base for the exponential backoff time calculation between retries. Defaults to 1.5.

  • +
  • max_requests_per_second (int, optional) – The maximum number of requests per second. Defaults to 20.

  • +
+
+
+

Examples

+
>>> async with GatewayAggregator(["example.com"]) as aggregator:
+>>>     async for domain_record in aggregator:
+>>>         print(domain_record)
+
+
+
+
+class GatewayAggregatorIterator(client: ClientSession, urls: List[str], CC_files: List[str], match_type: MatchType | None, since: datetime, to: datetime, limit: int | None, max_retry: int, prefetch_size: int, sleep_base: float)
+
+ +
+ +
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/generated/cmoncrawl.aggregator.html b/generated/cmoncrawl.aggregator.html new file mode 100644 index 00000000..72c6af44 --- /dev/null +++ b/generated/cmoncrawl.aggregator.html @@ -0,0 +1,154 @@ + + + + + + + cmoncrawl.aggregator — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+ + +
+
+
+
+ + + + \ No newline at end of file diff --git a/generated/cmoncrawl.aggregator.utils.athena_query_maker.html b/generated/cmoncrawl.aggregator.utils.athena_query_maker.html new file mode 100644 index 00000000..b815a473 --- /dev/null +++ b/generated/cmoncrawl.aggregator.utils.athena_query_maker.html @@ -0,0 +1,168 @@ + + + + + + + cmoncrawl.aggregator.utils.athena_query_maker — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

cmoncrawl.aggregator.utils.athena_query_maker

+

Functions

+ + + + + + + + + + + + + + + + + + + + + + + + + + + +

crawl_query(crawl_urls, since, to)

crawl_url_to_name(crawl_url)

date_to_sql_format(date)

get_name(since, until, urls[, match_type])

prepare_athena_sql_query(urls, since, to, ...)

prepare_athena_where_conditions(urls, since, ...)

url_query_based_on_match_type(match_type, url)

url_query_date_range(since, to)

+
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/generated/cmoncrawl.aggregator.utils.helpers.html b/generated/cmoncrawl.aggregator.utils.helpers.html new file mode 100644 index 00000000..0420820f --- /dev/null +++ b/generated/cmoncrawl.aggregator.utils.helpers.html @@ -0,0 +1,164 @@ + + + + + + + cmoncrawl.aggregator.utils.helpers — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

cmoncrawl.aggregator.utils.helpers

+

Functions

+ + + + + + + + + + + + + + + +

get_all_CC_indexes(client, cdx_server)

Get all CC index servers from a given CDX server

log_after_retry(retry_state)

retrieve(client, cdx_server, params, ...[, ...])

unify_url_id(url)

+

Exceptions

+ + + + + + +

DownloadError(reason, status)

+
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/generated/cmoncrawl.aggregator.utils.html b/generated/cmoncrawl.aggregator.utils.html new file mode 100644 index 00000000..58544f2e --- /dev/null +++ b/generated/cmoncrawl.aggregator.utils.html @@ -0,0 +1,152 @@ + + + + + + + cmoncrawl.aggregator.utils — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+ +
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/generated/cmoncrawl.aggregator.utils.ndjson.html b/generated/cmoncrawl.aggregator.utils.ndjson.html new file mode 100644 index 00000000..7e834c9f --- /dev/null +++ b/generated/cmoncrawl.aggregator.utils.ndjson.html @@ -0,0 +1,147 @@ + + + + + + + cmoncrawl.aggregator.utils.ndjson — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

cmoncrawl.aggregator.utils.ndjson

+

Classes

+ + + + + + +

Decoder(*[, object_hook, parse_float, ...])

+
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/generated/cmoncrawl.common.html b/generated/cmoncrawl.common.html new file mode 100644 index 00000000..ace76630 --- /dev/null +++ b/generated/cmoncrawl.common.html @@ -0,0 +1,150 @@ + + + + + + + cmoncrawl.common — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

cmoncrawl.common

+

Modules

+ + + + + + + + + + + + +

cmoncrawl.common.loggers

cmoncrawl.common.throttling

cmoncrawl.common.types

+
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/generated/cmoncrawl.common.loggers.html b/generated/cmoncrawl.common.loggers.html new file mode 100644 index 00000000..555ec5f7 --- /dev/null +++ b/generated/cmoncrawl.common.loggers.html @@ -0,0 +1,145 @@ + + + + + + + cmoncrawl.common.loggers — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

cmoncrawl.common.loggers

+

Functions

+ + + + + + +

setup_loggers(verbosity)

+
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/generated/cmoncrawl.common.throttling.html b/generated/cmoncrawl.common.throttling.html new file mode 100644 index 00000000..c2467015 --- /dev/null +++ b/generated/cmoncrawl.common.throttling.html @@ -0,0 +1,145 @@ + + + + + + + cmoncrawl.common.throttling — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

cmoncrawl.common.throttling

+

Classes

+ + + + + + +

Throttler(milliseconds)

Throttler class for restricting the number of function calls per second.

+
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/generated/cmoncrawl.common.types.html b/generated/cmoncrawl.common.types.html new file mode 100644 index 00000000..d23162ef --- /dev/null +++ b/generated/cmoncrawl.common.types.html @@ -0,0 +1,279 @@ + + + + + + + cmoncrawl.common.types — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

cmoncrawl.common.types

+

Functions

+ + + + + + +

parse_timestamp(v)

+

Classes

+
+
+class cmoncrawl.common.types.DomainCrawl(url: str = '', cdx_server: str = '', page: int = 0)
+

Domain crawl.

+
+ +
+
+class cmoncrawl.common.types.DomainRecord(*, filename: str, url: str | None, offset: int, length: int, digest: str | None = None, encoding: str | None = None, timestamp: datetime | None = None)
+

Domain record.

+
+
+model_config: ClassVar[ConfigDict] = {}
+

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

+
+ +
+
+model_fields: ClassVar[dict[str, FieldInfo]] = {'digest': FieldInfo(annotation=Union[str, NoneType], required=False), 'encoding': FieldInfo(annotation=Union[str, NoneType], required=False), 'filename': FieldInfo(annotation=str, required=True), 'length': FieldInfo(annotation=int, required=True), 'offset': FieldInfo(annotation=int, required=True), 'timestamp': FieldInfo(annotation=Union[datetime, NoneType], required=False), 'url': FieldInfo(annotation=Union[str, NoneType], required=True)}
+

Metadata about the fields defined on the model, +mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

+

This replaces Model.__fields__ from Pydantic V1.

+
+ +
+ +
+
+class cmoncrawl.common.types.ExtractConfig(*, extractors_path: Path, routes: List[RoutesConfig])
+

Configuration for run.

+
+
+model_config: ClassVar[ConfigDict] = {}
+

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

+
+ +
+
+model_fields: ClassVar[dict[str, FieldInfo]] = {'extractors_path': FieldInfo(annotation=Path, required=True), 'routes': FieldInfo(annotation=List[RoutesConfig], required=True)}
+

Metadata about the fields defined on the model, +mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

+

This replaces Model.__fields__ from Pydantic V1.

+
+ +
+ +
+
+class cmoncrawl.common.types.ExtractorConfig(*, name: str, since: datetime | None = None, to: datetime | None = None)
+

Configuration for extractor.

+
+
+model_config: ClassVar[ConfigDict] = {}
+

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

+
+ +
+
+model_fields: ClassVar[dict[str, FieldInfo]] = {'name': FieldInfo(annotation=str, required=True), 'since': FieldInfo(annotation=Union[datetime, NoneType], required=False), 'to': FieldInfo(annotation=Union[datetime, NoneType], required=False)}
+

Metadata about the fields defined on the model, +mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

+

This replaces Model.__fields__ from Pydantic V1.

+
+ +
+ +
+
+class cmoncrawl.common.types.MatchType(value)
+

Match type for cdx server. +See https://github.com/internetarchive/wayback/blob/master/wayback-cdx-server/README.md#url-match-scope

+

Example: +Query: example.com/abc

+

Matches: +EXACT: (www.)?example.com/abc +PREFIX: (www.)?example.com/abc(/.*)? +HOST: (www.)?example.com(/.*)? +DOMAIN: (.*.)?example.com(/.*)?

+
+ +
+
+class cmoncrawl.common.types.PipeMetadata(domain_record: ~cmoncrawl.common.types.DomainRecord, article_data: ~typing.Dict[~typing.Any, ~typing.Any] = <factory>, warc_header: ~typing.Dict[str, ~typing.Any] = <factory>, http_header: ~typing.Dict[str, ~typing.Any] = <factory>, rec_type: str | None = None, encoding: str = 'latin-1', name: str | None = None)
+

Metadata for a pipe.

+

Attributes: +domain_record: DomainRecord

+
+

An instance of the DomainRecord class representing associated domain record, +eg. pointer to the WARC file.

+
+
+
article_data: Dict[Any, Any] = field(default_factory=dict)

A dictionary storing article data with keys and values of any type. +Those are the data extracted using Extractors.

+
+
warc_header: Dict[str, Any] = field(default_factory=dict)

A dictionary storing the WARC header metadata.

+
+
http_header: Dict[str, Any] = field(default_factory=dict)

A dictionary storing the HTTP header information.

+
+
rec_type: str | None = None

A string or None representing the type of record.

+
+
encoding: str = “latin-1”

A string representing the character encoding used for the record. The default value is “latin-1”.

+
+
name: str | None = None

A string or None representing the name associated with the record.

+
+
+
+ +
+
+class cmoncrawl.common.types.RetrieveResponse(content: Any)
+

Response from retrieve.

+
+ +
+
+class cmoncrawl.common.types.RoutesConfig(*, regexes: List[str] = [], extractors: List[ExtractorConfig] = [])
+

Configuration for extractors.

+
+
+model_config: ClassVar[ConfigDict] = {}
+

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

+
+ +
+
+model_fields: ClassVar[dict[str, FieldInfo]] = {'extractors': FieldInfo(annotation=List[ExtractorConfig], required=False, default=[]), 'regexes': FieldInfo(annotation=List[str], required=False, default=[])}
+

Metadata about the fields defined on the model, +mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

+

This replaces Model.__fields__ from Pydantic V1.

+
+ +
+ +
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/generated/cmoncrawl.config.html b/generated/cmoncrawl.config.html new file mode 100644 index 00000000..90b9902e --- /dev/null +++ b/generated/cmoncrawl.config.html @@ -0,0 +1,147 @@ + + + + + + + cmoncrawl.config — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

cmoncrawl.config

+

Functions

+ + + + + + +

get_str_env(name, default)

+

Classes

+ + + + + + +

Config([AWS_PROFILE])

+
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/generated/cmoncrawl.html b/generated/cmoncrawl.html new file mode 100644 index 00000000..b6f041a8 --- /dev/null +++ b/generated/cmoncrawl.html @@ -0,0 +1,153 @@ + + + + + + + cmoncrawl — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+ +
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/generated/cmoncrawl.integrations.commands.html b/generated/cmoncrawl.integrations.commands.html new file mode 100644 index 00000000..5731f727 --- /dev/null +++ b/generated/cmoncrawl.integrations.commands.html @@ -0,0 +1,158 @@ + + + + + + + cmoncrawl.integrations.commands — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

cmoncrawl.integrations.commands

+

Functions

+ + + + + + + + + + + + + + + + + + +

add_args(parser)

add_subparsers(parser)

get_args()

main()

process_args(args)

+
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/generated/cmoncrawl.integrations.download.html b/generated/cmoncrawl.integrations.download.html new file mode 100644 index 00000000..6092b9bc --- /dev/null +++ b/generated/cmoncrawl.integrations.download.html @@ -0,0 +1,178 @@ + + + + + + + cmoncrawl.integrations.download — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

cmoncrawl.integrations.download

+

Functions

+ + + + + + + + + + + + + + + + + + + + + + + + + + + +

add_args(subparser)

add_mode_args(subparser)

get_aggregator(aggregator, cc_servers, urls, ...)

get_download_downloader(output_format, ...)

run_download(args)

url_download(urls, match_type, output, ...)

url_download_prepare_router(output_format, ...)

url_download_prepare_streamer(output_format, ...)

+

Classes

+ + + + + + + + + +

Aggregator(value)

An enumeration.

DownloadOutputFormat(value)

An enumeration.

+
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/generated/cmoncrawl.integrations.extract.html b/generated/cmoncrawl.integrations.extract.html new file mode 100644 index 00000000..77c240d4 --- /dev/null +++ b/generated/cmoncrawl.integrations.extract.html @@ -0,0 +1,178 @@ + + + + + + + cmoncrawl.integrations.extract — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

cmoncrawl.integrations.extract

+

Functions

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

add_args(subparser)

add_mode_args(subparser)

create_router(config)

extract_from_files(files, output_path, ...)

get_domain_records_html(url, date)

get_domain_records_json(file_path)

get_extract_downloader(mode, files_path, ...)

load_config(config_path)

run_extract(args)

+

Classes

+ + + + + + +

ExtractMode(value)

An enumeration.

+
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/generated/cmoncrawl.integrations.html b/generated/cmoncrawl.integrations.html new file mode 100644 index 00000000..68d5feac --- /dev/null +++ b/generated/cmoncrawl.integrations.html @@ -0,0 +1,154 @@ + + + + + + + cmoncrawl.integrations — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+ +
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/generated/cmoncrawl.integrations.utils.html b/generated/cmoncrawl.integrations.utils.html new file mode 100644 index 00000000..a8a27f9d --- /dev/null +++ b/generated/cmoncrawl.integrations.utils.html @@ -0,0 +1,154 @@ + + + + + + + cmoncrawl.integrations.utils — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

cmoncrawl.integrations.utils

+

Functions

+ + + + + + +

get_dao(download_method)

+

Classes

+ + + + + + +

DAOname(value)

An enumeration.

+
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/generated/cmoncrawl.middleware.html b/generated/cmoncrawl.middleware.html new file mode 100644 index 00000000..cee4f097 --- /dev/null +++ b/generated/cmoncrawl.middleware.html @@ -0,0 +1,146 @@ + + + + + + + cmoncrawl.middleware — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

cmoncrawl.middleware

+

Modules

+ + + + + + + + + +

cmoncrawl.middleware.stompware

cmoncrawl.middleware.synchronized

+
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/generated/cmoncrawl.middleware.stompware.html b/generated/cmoncrawl.middleware.stompware.html new file mode 100644 index 00000000..1a6c4f2b --- /dev/null +++ b/generated/cmoncrawl.middleware.stompware.html @@ -0,0 +1,204 @@ + + + + + + + cmoncrawl.middleware.stompware — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

cmoncrawl.middleware.stompware

+

Classes

+
+
+class cmoncrawl.middleware.stompware.StompAggregator(queue_host: str, queue_port: int, url: str, index_agg: GatewayAggregator, heartbeat: int = 10000)
+

Aggregator that listens queries the common crawl index and sends the results to a queue +using the stomp protocol. It the creates a queue +with name queue.{url} and sends the results to it. +It also creates a topic with name topic.poisson_pill.{url} +and sends a message with type poisson_pill to it when it finishes.

+
+
Parameters:
+
    +
  • queue_host (str) – The host of the queue

  • +
  • queue_port (int) – The port of the queue

  • +
  • url (str) – The url of the queue

  • +
  • index_agg (IndexAggregator) – The index aggregator

  • +
  • heartbeat (int, optional) – The heartbeat of the connection. Defaults to 10000.

  • +
+
+
+
+
+async aggregate(filter_duplicates: bool = True)
+

Aggregates the results of the index aggregator and sends them to the queue. +If filter_duplicates is True, it will use the DUPL_ID_HEADER header, +which Artemis uses to filter duplicates.

+
+ +
+ +
+
+class cmoncrawl.middleware.stompware.StompProcessor(queue_host: str, queue_port: int, pills_to_die: int | None, queue_size: int, timeout: int, addresses: List[str], pipeline: ProcessorPipeline, heartbeat: int = 10000)
+

Processor that listens to a queues and processes the messages using a pipeline. +When it receives a message with type enough poisson_pill messages, it will +stop listening if it doesn’t receive any messages for timeout minutes.

+
+
Parameters:
+
    +
  • queue_host (str) – The host of the queue

  • +
  • queue_port (int) – The port of the queue

  • +
  • pills_to_die (int, optional) – The number of poisson_pill messages to receive before dying. Defaults to None.

  • +
  • queue_size (int) – The size of the queue

  • +
  • timeout (int) – The timeout in minutes

  • +
  • addresses (List[str]) – The addresses of the queues

  • +
  • pipeline (ProcessorPipeline) – The pipeline to use for processing

  • +
  • heartbeat (int, optional) – The heartbeat of the connection. Defaults to 10000.

  • +
+
+
+
+
+class Listener(messages: Queue[Message], listener_stats: ListnerStats)
+
+
+on_message(frame: Frame)
+

Called by the STOMP connection when a MESSAGE frame is received.

+
+
Parameters:
+

frame (Frame) – the stomp frame

+
+
+
+ +
+ +
+ +
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/generated/cmoncrawl.middleware.synchronized.html b/generated/cmoncrawl.middleware.synchronized.html new file mode 100644 index 00000000..b8337115 --- /dev/null +++ b/generated/cmoncrawl.middleware.synchronized.html @@ -0,0 +1,147 @@ + + + + + + + cmoncrawl.middleware.synchronized — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

cmoncrawl.middleware.synchronized

+

Functions

+ + + + + + + + + +

extract(records, pipeline[, concurrent_length])

Extracts the records using the pipeline, with at most concurrent_length records being processed at the same time.

query_and_extract(index_agg, pipeline[, ...])

Query the index and extracts the results using the pipeline

+
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/generated/cmoncrawl.processor.dao.api.html b/generated/cmoncrawl.processor.dao.api.html new file mode 100644 index 00000000..c6d3ffa7 --- /dev/null +++ b/generated/cmoncrawl.processor.dao.api.html @@ -0,0 +1,178 @@ + + + + + + + cmoncrawl.processor.dao.api — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

cmoncrawl.processor.dao.api

+

Classes

+
+
+class cmoncrawl.processor.dao.api.CCAPIGatewayDAO(base_url: str = 'https://data.commoncrawl.org/')
+

This class represents a DAO (Data Access Object) for interacting with the Common Crawl API Gateway. +It provides methods for opening and closing a connection, fetching data for a given domain record, +and handling errors related to downloading data.

+
+
Parameters:
+

base_url (str) – The base URL of the Common Crawl API Gateway. Defaults to “https://data.commoncrawl.org/”.

+
+
+
+
+aopen()
+

Asynchronously opens a connection to the API Gateway.

+
+ +
+
+aclose()
+

Asynchronously closes the connection to the API Gateway.

+
+ +
+
+fetch()
+

Asynchronously fetches data for a given domain record.

+
+ +
+
Example usage:
>>> dao = CCAPIGatewayDAO()
+>>> async with dao:
+>>>     data = await dao.fetch(domain_record)
+
+
+
+
+
+ +
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/generated/cmoncrawl.processor.dao.base.html b/generated/cmoncrawl.processor.dao.base.html new file mode 100644 index 00000000..f5e95c3f --- /dev/null +++ b/generated/cmoncrawl.processor.dao.base.html @@ -0,0 +1,160 @@ + + + + + + + cmoncrawl.processor.dao.base — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

cmoncrawl.processor.dao.base

+

Classes

+
+
+class cmoncrawl.processor.dao.base.ICC_Dao
+

ICC_Dao is a base class representing a Data Access Object for interacting with a data source. +It provides methods for fetching data and managing the connection.

+
+
+fetch(domain_record)
+

Fetches data for a given domain record.

+
+ +
+ +

Exceptions

+ + + + + + +

DownloadError(reason, status)

+
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/generated/cmoncrawl.processor.dao.html b/generated/cmoncrawl.processor.dao.html new file mode 100644 index 00000000..773bdb15 --- /dev/null +++ b/generated/cmoncrawl.processor.dao.html @@ -0,0 +1,151 @@ + + + + + + + cmoncrawl.processor.dao — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

cmoncrawl.processor.dao

+

Modules

+ + + + + + + + + + + + +

cmoncrawl.processor.dao.api

cmoncrawl.processor.dao.base

cmoncrawl.processor.dao.s3

+
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/generated/cmoncrawl.processor.dao.s3.html b/generated/cmoncrawl.processor.dao.s3.html new file mode 100644 index 00000000..0f7cf17b --- /dev/null +++ b/generated/cmoncrawl.processor.dao.s3.html @@ -0,0 +1,229 @@ + + + + + + + cmoncrawl.processor.dao.s3 — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

cmoncrawl.processor.dao.s3

+

Classes

+
+
+class cmoncrawl.processor.dao.s3.S3Dao(aws_profile: str | None = None, bucket_name: str = 'commoncrawl')
+

S3Dao is a class that provides methods to interact with AWS S3 for downloading warc files from the commoncrawl bucket.

+
+
Parameters:
+
    +
  • aws_profile (str, optional) – The AWS profile to use for the download. Defaults to None.

  • +
  • bucket_name (str, optional) – The name of the S3 bucket. Defaults to “commoncrawl”.

  • +
+
+
+
+
+bucket_name
+

The name of the S3 bucket.

+
+
Type:
+

str

+
+
+
+ +
+
+aws_profile
+

The AWS profile to use for the download.

+
+
Type:
+

str

+
+
+
+ +
+
+client
+

The S3 client.

+
+
Type:
+

aioboto3.client

+
+
+
+ +
+
+__aenter__()
+

Asynchronous context manager method to initialize the S3 client.

+
+ +
+
+__aexit__(exc_type, exc, tb)
+

Asynchronous context manager method to clean up the S3 client.

+
+ +
+
+fetch(domain_record)
+

Downloads a warc file from the commoncrawl bucket using S3 and returns its bytes.

+
+ +
+
Raises:
+

ValueError – If the S3Dao client is not initialized.

+
+
+
+
+async fetch(domain_record: DomainRecord) bytes
+

Downloads a warc file from commoncrawl bucket using s3 and returns its bytes.

+
+
Parameters:
+
    +
  • domain_record (DomainRecord) – The domain record to use for the download.

  • +
  • aws_profile (str) – The AWS profile to use for the download.

  • +
+
+
Returns:
+

The bytes of the downloaded warc file.

+
+
Return type:
+

bytes

+
+
+
+ +
+ +
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/generated/cmoncrawl.processor.extraction.filters.html b/generated/cmoncrawl.processor.extraction.filters.html new file mode 100644 index 00000000..e60a2b0e --- /dev/null +++ b/generated/cmoncrawl.processor.extraction.filters.html @@ -0,0 +1,149 @@ + + + + + + + cmoncrawl.processor.extraction.filters — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

cmoncrawl.processor.extraction.filters

+

Functions

+ + + + + + + + + +

must_exist_filter(soup, filter_list)

This function takes in a BeautifulSoup object and a list of CSS selectors.

must_not_exist_filter(soup, filter_list)

This function takes in a BeautifulSoup object and a list of CSS selectors.

+
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/generated/cmoncrawl.processor.extraction.html b/generated/cmoncrawl.processor.extraction.html new file mode 100644 index 00000000..1bf2b0cc --- /dev/null +++ b/generated/cmoncrawl.processor.extraction.html @@ -0,0 +1,148 @@ + + + + + + + cmoncrawl.processor.extraction — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

cmoncrawl.processor.extraction

+

Modules

+ + + + + + + + + +

cmoncrawl.processor.extraction.filters

cmoncrawl.processor.extraction.utils

+
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/generated/cmoncrawl.processor.extraction.utils.html b/generated/cmoncrawl.processor.extraction.utils.html new file mode 100644 index 00000000..9a9010f3 --- /dev/null +++ b/generated/cmoncrawl.processor.extraction.utils.html @@ -0,0 +1,176 @@ + + + + + + + cmoncrawl.processor.extraction.utils — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

cmoncrawl.processor.extraction.utils

+

Functions

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

all_same_transform(dict, fc)

Applies fc to all values in dict and returns a dict with same keys but with transformed values.

chain_transforms(trans)

Chains transforms together.

check_required(required_fields, extractor_name)

Checks if required fields are present in the extracted dict.

combine_dicts(dicts)

Combines list of dictioneries into one.

extract_transform(tag, extract_dict, ...)

Extracts data from tag using extract_dict defining what to extract and how to name it, and extract_transform_dict defining how to transform the extracted data.

get_attribute_transform(attr_name)

Returns a function that takes a bs4 tag and returns the value of the attribute attr_name or None if the attribute doesn't exist.

get_tag_transform(tag_desc)

Returns a function that takes a bs4 tag and returns the first tag that matches the tag_desc.

get_tags_transform(tag_desc)

Returns a function that takes a bs4 tag and returns a list of tags that match the tag_desc.

get_text_list_transform([sep])

Returns a function that takes a list of bs4 tags and returns a string with all the text from the tags joined with sep.

get_text_transform(tag[, recursive])

Returns text from tag.

transform(dict, transforms)

Transforms dict using transforms dict.

+
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/generated/cmoncrawl.processor.html b/generated/cmoncrawl.processor.html new file mode 100644 index 00000000..6844e7aa --- /dev/null +++ b/generated/cmoncrawl.processor.html @@ -0,0 +1,150 @@ + + + + + + + cmoncrawl.processor — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

cmoncrawl.processor

+

Modules

+ + + + + + + + + + + + +

cmoncrawl.processor.dao

cmoncrawl.processor.extraction

cmoncrawl.processor.pipeline

+
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/generated/cmoncrawl.processor.pipeline.downloader.html b/generated/cmoncrawl.processor.pipeline.downloader.html new file mode 100644 index 00000000..4c9757a2 --- /dev/null +++ b/generated/cmoncrawl.processor.pipeline.downloader.html @@ -0,0 +1,230 @@ + + + + + + + cmoncrawl.processor.pipeline.downloader — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

cmoncrawl.processor.pipeline.downloader

+

Functions

+ + + + + + +

log_after_retry(retry_state)

+

Classes

+
+
+class cmoncrawl.processor.pipeline.downloader.AsyncDownloader(dao: ICC_Dao, digest_verification: bool = True, max_retry: int = 5, sleep_base: float = 1.3, max_requests_per_second: int = 20, encoding: str = 'latin-1')
+

Downloader which asynchronously downloads the the data for the domain_record

+
+
Parameters:
+
    +
  • dao (ICC_Dao) – Data access object to use for downloading

  • +
  • digest_verification (bool, optional) – Whether to verify the digest of the downloaded data. Defaults to True.

  • +
  • max_retry (int, optional) – Maximum number of retries. Defaults to 5.

  • +
  • sleep_base (float, optional) – Base sleep time for exponential backoff in retries. Defaults to 1.5.

  • +
  • max_requests_per_second (int, optional) – Maximum number of requests per second. Defaults to 20.

  • +
  • encoding – Default encoding to be used

  • +
+
+
+
+ +
+
+class cmoncrawl.processor.pipeline.downloader.DownloaderLocalFiles(files: List[Path], url: str | None = None, date: datetime | None = None)
+

Local file downloader and metadata extractor for testing +It doesn’t download anything but passes local files further in the pipeline +and extracts metadata from the file

+
+
Parameters:
+
    +
  • files (List[Path]) – List of local files to pass

  • +
  • url (str, optional) – Url to use for metadata. Defaults to None.

  • +
  • date (datetime, optional) – Date to add to metadata. Defaults to None.

  • +
+
+
+
+ +
+
+class cmoncrawl.processor.pipeline.downloader.DummyDownloader
+

A dummy downloader class that does not perform any actual downloading. It simply adds an empty string as the content +and passes the domain record further into the pipeline.

+
+
+async download(domain_record: DomainRecord | None)
+

Downloads the content for the given domain record.

+
+
Parameters:
+

domain_record (DomainRecord | None) – The domain record to download.

+
+
Returns:
+

+
A list containing a single tuple with an empty string as the first element

and the pipe metadata as the second element.

+
+
+

+
+
Return type:
+

List[Tuple[str, PipeMetadata]]

+
+
+
+ +
+ +
+
+class cmoncrawl.processor.pipeline.downloader.IDownloader
+

Base class for all downloaders

+
+ +
+
+class cmoncrawl.processor.pipeline.downloader.WarcIterator(file: Path, encoding: str = 'latin-1', show_progress: bool = False)
+

WarcIterator is local downloader which iterates over the specified warc file

+
+
Parameters:
+
    +
  • file (Path) – Path to the warc file

  • +
  • encoding (str, optional) – Encoding to be used. Defaults to “latin-1”.

  • +
+
+
+
+ +
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/generated/cmoncrawl.processor.pipeline.extractor.html b/generated/cmoncrawl.processor.pipeline.extractor.html new file mode 100644 index 00000000..2f092609 --- /dev/null +++ b/generated/cmoncrawl.processor.pipeline.extractor.html @@ -0,0 +1,262 @@ + + + + + + + cmoncrawl.processor.pipeline.extractor — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

cmoncrawl.processor.pipeline.extractor

+

Classes

+
+
+class cmoncrawl.processor.pipeline.extractor.BaseExtractor(encoding: str | None = None, raise_on_encoding: bool = False, parser: str = 'html.parser')
+

Base class for all soup extractors

+
+
Parameters:
+
    +
  • encoding (str, optional) – Default encoding to be used. Defaults to None.

  • +
  • raise_on_encoding (bool, optional) – If True, the extractor will raise ValueException if it fails to decode the response. Defaults to False.

  • +
+
+
+
+
+extract(response: str, metadata: PipeMetadata) Dict[str, Any] | None
+

Extracts the data from the response, if the extractor fails to extract the data it should return None

+
+
Parameters:
+
    +
  • response (str) – response from the downloader

  • +
  • metadata (PipeMetadata) – Metadata of the response

  • +
+
+
+
+ +
+ +
+
+class cmoncrawl.processor.pipeline.extractor.DomainRecordExtractor(filter_non_ok: bool = True)
+

Dummy Extractor which simply extracts the domain record

+
+
Parameters:
+

filter_non_ok (bool, optional) – If True, only 200 status codes will be extracted. Defaults to True.

+
+
+
+ +
+
+class cmoncrawl.processor.pipeline.extractor.HTMLExtractor(filter_non_ok: bool = True, encoding: str | None = None)
+

Dummy Extractor which simply extracts the html

+
+
Parameters:
+
    +
  • filter_non_ok (bool, optional) – If True, only 200 status codes will be extracted. Defaults to True.

  • +
  • encoding (str, optional) – Default encoding to be used. Defaults to None. If set, the extractor will raise ValueException if it fails to decode the response.

  • +
+
+
+
+ +
+
+class cmoncrawl.processor.pipeline.extractor.IExtractor
+

Base class for all extractors

+
+
+abstract extract(response: str, metadata: PipeMetadata) Dict[str, Any] | None
+

Extracts the data from the response, if the extractor fails to extract the data it should return None

+
+
Parameters:
+
    +
  • response (str) – response from the downloader

  • +
  • metadata (PipeMetadata) – Metadata of the response

  • +
+
+
+
+ +
+ +
+
+class cmoncrawl.processor.pipeline.extractor.PageExtractor(header_css_dict: Dict[str, str] = {}, header_extract_dict: Dict[str, Callable[[Any], Any] | List[Callable[[Any], Any]]] = {}, content_css_selector: str = 'body', content_css_dict: Dict[str, str] = {}, content_extract_dict: Dict[str, Callable[[Any], Any] | List[Callable[[Any], Any]]] = {}, css_selectors_must_exist: List[str] = [], css_selectors_must_not_exist: List[str] = [], allowed_domain_prefixes: List[str] | None = None, is_valid_extraction: Callable[[Dict[Any, Any], PipeMetadata], bool] | None = None, encoding: str | None = None)
+

The PageExtractor is designed to extracte specific elements from a web page, +while adding ability to choose when to extract the data.

+
+
Parameters:
+
    +
  • header_css_dict (Dict[str, str]) – A dictionary specifying the CSS selectors for the header elements.

  • +
  • header_extract_dict (Dict[str, List[Callable[[Any], Any]] | Callable[[Any], Any]]) – A dictionary +specifying the extraction functions for the header elements. +The keys must match the keys in the header_css_dict. +The functions are applied in the order they are specified in the list.

  • +
  • content_css_selector (str) – The CSS selector specifying where the content elements are located.

  • +
  • content_css_dict (Dict[str, str]) – A dictionary specifying the CSS selectors for the content elements. +Selectors must be relative to the content_css_selector.

  • +
  • content_extract_dict (Dict[str, List[Callable[[Any], Any]] | Callable[[Any], Any]]) – A dictionary +specifying the extraction functions for the content elements. +The keys must match the keys in the content_css_dict. +The functions are applied in the order they are specified in the list.

  • +
  • css_selectors_must_exist (List[str]) – A list of CSS selectors that must exist for the extraction to proceed.

  • +
  • css_selectors_must_not_exist (List[str]) – A list of CSS selectors that must not exist for the extraction to proceed.

  • +
  • allowed_domain_prefixes (List[str] | None) – A list of allowed domain prefixes. If None, all domain prefixes are allowed.

  • +
  • is_valid_extraction (Callable[[Dict[Any, Any], PipeMetadata], bool]) – A function that takes in the extracted data and the metadata and returns True if the extraction is valid, False otherwise.

  • +
  • encoding (str | None) – The encoding to be used. If None, the default encoding is used.

  • +
+
+
Returns:
+

A dictionary containing the extracted data, or None if the extraction failed.

+
+
Return type:
+

Dict[Any, Any] | None

+
+
+
+
+extract(response: str, metadata: PipeMetadata) Dict[Any, Any] | None
+

Extracts the data from the response, if the extractor fails to extract the data it should return None

+
+
Parameters:
+
    +
  • response (str) – response from the downloader

  • +
  • metadata (PipeMetadata) – Metadata of the response

  • +
+
+
+
+ +
+ +
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/generated/cmoncrawl.processor.pipeline.html b/generated/cmoncrawl.processor.pipeline.html new file mode 100644 index 00000000..3fbd8a65 --- /dev/null +++ b/generated/cmoncrawl.processor.pipeline.html @@ -0,0 +1,157 @@ + + + + + + + cmoncrawl.processor.pipeline — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+ + +
+
+
+
+ + + + \ No newline at end of file diff --git a/generated/cmoncrawl.processor.pipeline.pipeline.html b/generated/cmoncrawl.processor.pipeline.pipeline.html new file mode 100644 index 00000000..5edd9e74 --- /dev/null +++ b/generated/cmoncrawl.processor.pipeline.pipeline.html @@ -0,0 +1,146 @@ + + + + + + + cmoncrawl.processor.pipeline.pipeline — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

cmoncrawl.processor.pipeline.pipeline

+

Classes

+ + + + + + +

ProcessorPipeline(router, downloader, ...)

+
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/generated/cmoncrawl.processor.pipeline.router.html b/generated/cmoncrawl.processor.pipeline.router.html new file mode 100644 index 00000000..6cc18778 --- /dev/null +++ b/generated/cmoncrawl.processor.pipeline.router.html @@ -0,0 +1,198 @@ + + + + + + + cmoncrawl.processor.pipeline.router — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

cmoncrawl.processor.pipeline.router

+

Classes

+
+
+class cmoncrawl.processor.pipeline.router.IRouter
+

Base class for all routers

+
+
+abstract route(url: str | None, time: datetime | None, metadata: PipeMetadata) IExtractor
+

Routes the url to the correct extractor

+
+ +
+ +
+
+class cmoncrawl.processor.pipeline.router.Route(name: str, regexes: List[re.Pattern[str]], since: datetime.datetime, to: datetime.datetime)
+
+ +
+
+class cmoncrawl.processor.pipeline.router.Router
+
+
+load_module_as_extractor(module_path: Path)
+

Loads a module and returns its extractor

+
+ +
+
+register_route(name: str, regex: str | List[str], since: datetime | None = None, to: datetime | None = None)
+

Registers a route for a given extractor name and regex

+
+
Parameters:
+
    +
  • name (str) – The name of the extractor

  • +
  • regex (Union[str, List[str]]) – The regex to match against

  • +
  • since (datetime | None, optional) – The earliest time to route to this extractor. Defaults to None.

  • +
  • to (datetime | None, optional) – The latest time to route to this extractor. Defaults to None.

  • +
+
+
+
+ +
+
+route(url: str | None, time: datetime | None, metadata: PipeMetadata) IExtractor
+

Routes the url to the correct extractor based on the url and time

+
+
Parameters:
+
    +
  • url (str | None) – The url to route

  • +
  • time (datetime | None) – The time to route

  • +
  • metadata (PipeMetadata) – The metadata for the current pipeline

  • +
+
+
+
+ +
+ +
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/generated/cmoncrawl.processor.pipeline.streamer.html b/generated/cmoncrawl.processor.pipeline.streamer.html new file mode 100644 index 00000000..210fe5ec --- /dev/null +++ b/generated/cmoncrawl.processor.pipeline.streamer.html @@ -0,0 +1,166 @@ + + + + + + + cmoncrawl.processor.pipeline.streamer — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

cmoncrawl.processor.pipeline.streamer

+

Classes

+
+
+class cmoncrawl.processor.pipeline.streamer.BaseStreamerFile(root: Path, max_directory_size: int, max_file_size: int, extension: str, directory_prefix: str = 'directory_', max_retries: int = 3)
+

Abstract Class which defines the basic functionality of a file streamer

+
+ +
+
+class cmoncrawl.processor.pipeline.streamer.IStreamer
+

Base class for all outstreamers, it streams the data out and returns identifier for the data +if successful, otherwise it returns None

+
+ +
+
+class cmoncrawl.processor.pipeline.streamer.MemoryStreamer
+

Memory Streamer which keeps the output is memory

+
+ +
+
+class cmoncrawl.processor.pipeline.streamer.StreamerFileHTML(root: Path, max_directory_size: int)
+
+ +
+
+class cmoncrawl.processor.pipeline.streamer.StreamerFileJSON(root: Path, max_directory_size: int, max_file_size: int, pretty: bool = False)
+
+ +
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/genindex.html b/genindex.html new file mode 100644 index 00000000..4aa8b505 --- /dev/null +++ b/genindex.html @@ -0,0 +1,722 @@ + + + + + + Index — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+
    +
  • + +
  • +
  • +
+
+
+
+
+ + +

Index

+ +
+ _ + | A + | B + | C + | D + | E + | F + | G + | H + | I + | L + | M + | O + | P + | R + | S + | W + +
+

_

+ + + +
+ +

A

+ + + +
+ +

B

+ + + +
+ +

C

+ + + +
    +
  • + cmoncrawl.integrations.utils + +
  • +
  • + cmoncrawl.middleware + +
  • +
  • + cmoncrawl.middleware.stompware + +
  • +
  • + cmoncrawl.middleware.synchronized + +
  • +
  • + cmoncrawl.processor + +
  • +
  • + cmoncrawl.processor.dao + +
  • +
  • + cmoncrawl.processor.dao.api + +
  • +
  • + cmoncrawl.processor.dao.base + +
  • +
  • + cmoncrawl.processor.dao.s3 + +
  • +
  • + cmoncrawl.processor.extraction + +
  • +
  • + cmoncrawl.processor.extraction.filters + +
  • +
  • + cmoncrawl.processor.extraction.utils + +
  • +
  • + cmoncrawl.processor.pipeline + +
  • +
  • + cmoncrawl.processor.pipeline.downloader + +
  • +
  • + cmoncrawl.processor.pipeline.extractor + +
  • +
  • + cmoncrawl.processor.pipeline.pipeline + +
  • +
  • + cmoncrawl.processor.pipeline.router + +
  • +
  • + cmoncrawl.processor.pipeline.streamer + +
  • +
+ +

D

+ + + +
+ +

E

+ + + +
+ +

F

+ + +
+ +

G

+ + + +
+ +

H

+ + +
+ +

I

+ + + +
+ +

L

+ + +
+ +

M

+ + +
+ +

O

+ + +
+ +

P

+ + + +
+ +

R

+ + + +
+ +

S

+ + + +
+ +

W

+ + +
+ + + +
+
+
+ +
+ +
+

© Copyright 2022, Hynek Kydlíček.

+
+ + Built with Sphinx using a + theme + provided by Read the Docs. + + +
+
+
+
+
+ + + + \ No newline at end of file diff --git a/index.html b/index.html new file mode 100644 index 00000000..1c2abe98 --- /dev/null +++ b/index.html @@ -0,0 +1,226 @@ + + + + + + + Welcome to CommonCrawl Extractor’s documentation! — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+ + +
+
+
+
+ + + + \ No newline at end of file diff --git a/misc/athena.html b/misc/athena.html new file mode 100644 index 00000000..7308a4b7 --- /dev/null +++ b/misc/athena.html @@ -0,0 +1,204 @@ + + + + + + + Athena — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

Athena

+

AWS Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. +Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run. +Athena is easy to use. Simply point to your data in Amazon S3, define the schema, and start querying using standard SQL. +Most results are delivered within seconds. With Athena, there’s no need for complex ETL jobs to prepare your data for analysis. +This makes it easy for anyone with SQL skills to quickly analyze large-scale datasets.

+
+

Prerequisites

+

In order to use the athena module, you must have AWS account with following credentials:

+
{
+    "Version": "2012-10-17",
+    "Statement": [
+        {
+            "Sid": "CommoncrawlDB",
+            "Effect": "Allow",
+            "Action": [
+                "athena:CreateDataCatalog",
+                "glue:BatchCreatePartition",
+                "athena:StartQueryExecution",
+                "glue:CreateTable",
+                "glue:CreateDatabase",
+                "glue:GetTable",
+                "glue:GetTables",
+                "glue:GetDatabase",
+                "glue:GetDatabases",
+                "glue:UpdateTable",
+                "glue:UpdatePartition",
+                "glue:GetPartition",
+                "glue:GetPartitions",
+                "athena:GetQueryExecution",
+                "athena:ListTableMetadata",
+                "s3:GetBucketLocation",
+                "s3:DescribeJob"
+            ],
+            "Resource": "*"
+        },
+        {
+            "Sid": "ResultsBucket",
+            "Effect": "Allow",
+            "Action": "s3:ListBucket",
+            "Resource": "arn:aws:s3:::cmoncrawl-testbucket"
+        },
+        {
+            "Sid": "ResultsBucket-objects",
+            "Effect": "Allow",
+            "Action": [
+                "s3:PutObject",
+                "s3:GetObject",
+                "s3:DeleteObject"
+            ],
+            "Resource": "arn:aws:s3:::cmoncrawl-testbucket/*"
+        },
+        {
+            "Sid": "CommoncrawlBucket",
+            "Effect": "Allow",
+            "Action": [
+                "s3:GetObject",
+                "s3:ListBucket"
+            ],
+            "Resource": [
+                "arn:aws:s3:::commoncrawl/*",
+                "arn:aws:s3:::commoncrawl"
+            ]
+        }
+    ]
+}
+
+
+
+
+

Caching

+

If you provide a bucket name when itnializing the cmoncrawl.aggregator.athena_query.AthenaAggregator, +the results of the query will be cached in the bucket. Whenever you make the same query the results will be reused. +This means that the bucket is not automatically cleaned up and it’s your responsibility to do so.

+

If you don’t provide a bucket name, the results will not be cached and randomly generated bucket will be used and deleted +after the query is finished.

+
+
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/misc/domain_record.html b/misc/domain_record.html new file mode 100644 index 00000000..900d9655 --- /dev/null +++ b/misc/domain_record.html @@ -0,0 +1,167 @@ + + + + + + + Domain Record — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

Domain Record

+

By domain record we refer to a strucuture that cotains the information +about how to download a crawl of an url. It contains the following

+
    +
  • url: the url to crawl

  • +
  • filename: the warc filename

  • +
  • offset: the offset in the warc file

  • +
  • length: the length of the html crawl

  • +
  • digest [optional]: the digest of the html crawl

  • +
  • encoding [optional]: the encoding of the html crawl

  • +
  • timestamp [optional]: the timestamp of the crawl

  • +
+
+
+

Domain Record JSONL format

+

In order to use your own domain records with extract mode of cli, +you must format them into follwoing json format

+
{
+    "domain_record":
+    {
+        "url": "http://example.com",
+        "filename": "crawl.warc.gz",
+        "offset": 123,
+        "length": 456,
+        "digest": "sha1:1234567890abcdef",
+        "encoding": "utf-8",
+        "timestamp": "2018-01-01T00:00:00Z"
+    },
+    "additional_info":
+    {
+        "key1": "value1",
+        "key2": "value2"
+    }
+}
+
+
+

Each such json must be on a separate line in a file. +You don’t have to provide all the fields, only url, filename, +offset and length are required. +The Athena SQL keys are: +u.url, cc.warc_filename, cc.warc_record_offset, cc.warc_record_length, cc.content_digest, cc.fetch_time

+

The additional_info field is optional and can contain any additional +information. It will be added to extracted fields as is. It’s usefull +when you for example want to add to which set the url belongs to.

+
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/misc/index.html b/misc/index.html new file mode 100644 index 00000000..2fa096db --- /dev/null +++ b/misc/index.html @@ -0,0 +1,135 @@ + + + + + + + Miscellaneous — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

Miscellaneous

+ +
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/modules.html b/modules.html new file mode 100644 index 00000000..a393404d --- /dev/null +++ b/modules.html @@ -0,0 +1,115 @@ + + + + + + + docs — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

docs

+
+
+
+ + +
+
+
+ +
+ +
+

© Copyright 2022, Hynek Kydlíček.

+
+ + Built with Sphinx using a + theme + provided by Read the Docs. + + +
+
+
+
+
+ + + + \ No newline at end of file diff --git a/objects.inv b/objects.inv new file mode 100644 index 00000000..96ef1d28 Binary files /dev/null and b/objects.inv differ diff --git a/prog_guide/index.html b/prog_guide/index.html new file mode 100644 index 00000000..5b8339d5 --- /dev/null +++ b/prog_guide/index.html @@ -0,0 +1,148 @@ + + + + + + + Programming Guide — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

Programming Guide

+

This section of the documentation is for people who want to use the +cmoncrawl library to create their own extraction pipeline in python. +It allows use to take full advatange of the cmoncrawl library unlike +the command line utility which is limited to a few options.

+ +
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/prog_guide/overview.html b/prog_guide/overview.html new file mode 100644 index 00000000..cc0c0aa3 --- /dev/null +++ b/prog_guide/overview.html @@ -0,0 +1,208 @@ + + + + + + + How to extract from Common Crawl (theory) — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

How to extract from Common Crawl (theory)

+

The process of getting one parsed web page from CommonCrawl can be described as a pipeline.

+
    +
  1. Query CommmonCrawl to find a link to a file that contains the web page we want.

  2. +
  3. Download a file

  4. +
  5. Choose parser for the web page

  6. +
  7. Filter out the web page if not matching the conditions

  8. +
  9. Extract fields from the page

  10. +
  11. Save the fields to a file

  12. +
+

The first step is handled by Aggregator while the rest is handled by Processor.

+
+

1. Querying CommonCrawl

+
+
what WARC File how

WARC is a file format that is used for storing multitudes of web resources. +In our case these files contain a bunch of downloaded web pages and their metadata. +It’s possible to get only part of the file by specifying the offset in file and length of the part we want.

+
+
what

Common Crawl Index

+
+
how

A CommonCrawl index is a collection which maps crawled urls to WARC file which contain the crawl of that url.

+
+
+

Every month a CommonCrawl releases a new index which contains all links to web pages that were crawled that month.

+
+

Warning

+

It is important to understand that even if the index was released in a certain month, it can contain the links to web pages that might be older.

+
+

Thus in order to download an page we query the index to get link to respective WARC file, offset and length of page. +Since there are multiples of the indexes we should query all of them to make sure we don’t miss the page. +With the link to the WARC and offset and length we can continue to another step.

+

All this is handled by cmoncrawl.aggregator.gateway_query.GatewayAggregator. But for basic use you will not need to use it directly.

+
+
+

2. Downloading a file

+

The Processor node than downloads the url and related information from queue and downloads the appropriate WARC file. +This step is handled by cmoncrawl.processor.pipeline.downloader.AsyncDownloader. +It simply downloads and extracts the page from the WARC file. For downloading we use two data access objects (DAOs, cmoncrawl.processor.dao.base.ICC_Dao):

+ +
+
+

3. Choose extractor

+

Once the page is downloaded we first need to choose a extractor for it. +Extractors are dynamically loaded based on definitions in Extractor config file. +All loaded processors are then matched against the url and crawl date and first matching is used. +This functionality is handled by cmoncrawl.processor.pipeline.router.Router.

+

For development of extractors refer to Extractor types.

+
+
+

4. Filtering out the web page

+

Once the extractor is chosen the filtering function is used to either drop or pass a page. +In order to filter your you can use either cmoncrawl.processor.pipeline.extractor.BaseExtractor.filter_raw() for +filtering based on raw html pages (fast). Or wait for conversion to soup and then filter using +cmoncrawl.processor.pipeline.extractor.BaseExtractor.filter_soup() (slow).

+
+
+

5. Extract fields from the page

+

The extracting function defined by the extractor is used to extract the fields from the page. +Just extract the values and return them in dict.

+
+
+

6. File saving

+

With the field extracted we need to save them to a file. +By default the fields are saved in json file. +The way the file is saved is defined by streamers. +All of the currently implemented streamers are derived from cmoncrawl.processor.pipeline.streamer.BaseStreamerFile. +Which defined how are the files saved, but the content parsing is left to the derived classes.

+

Currently we support 2 streamers:

+ +

If you want to debug you might want to use cmoncrawl.processor.pipeline.streamer.MemoryStreamer which outputs the data to memory instead of file.

+

If you would like different format you can create your own saver by inheriting from cmoncrawl.processor.pipeline.streamer.IStreamer and then changing pipeline creation with your new outstreamer.

+
+
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/prog_guide/practice.html b/prog_guide/practice.html new file mode 100644 index 00000000..4232dc27 --- /dev/null +++ b/prog_guide/practice.html @@ -0,0 +1,269 @@ + + + + + + + How to extract from Common Crawl (practice) — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

How to extract from Common Crawl (practice)

+

Since we now know what steps should we do in order to extract data from Common Crawl and +how they map to cmoncrawl primitives, let’s now see how to do it in practice.

+
+

Pipeline

+

We already know how to get the domain records and we also know how to download, extract and save the data. +The pipeline allows use to combine all but the first step into single object that can be used to extract data from Common Crawl.

+

To create a pipeline simply initialize cmoncrawl.processor.pipeline.pipeline.ProcessorPipeline with Downloader, Router and Streamer. +You can then call it’s cmoncrawl.processor.pipeline.pipeline.ProcessorPipeline.process_domain_record() method with the query and it will run the whole pipeline for single domain record.

+
+

Note

+

The exceptions are not handled by the pipeline and are passed to the caller, to handle them as you wish.

+
+
+
+

Simulatenous querying and extracting

+

Now all we need to resolve is how t effectively connect querying index and download/extracting (pipeline) data. +One way is to query index and whenever we get a domain record, we can pass it to the pipeline, this is exactly how +cmoncrawl.integrations.middleware.synchronized.query_and_extract() works. This works great when we use Gateway DAO, +as the querying index takes about the same time as downloading/extracting. This is how we can do it:

+
+
Simultaneously query and extract data from Common Crawl
+
from typing import Any, Dict
+from bs4 import BeautifulSoup
+from cmoncrawl.aggregator.gateway_query import GatewayAggregator
+from cmoncrawl.processor.pipeline.extractor import BaseExtractor
+from cmoncrawl.processor.pipeline.pipeline import ProcessorPipeline
+from cmoncrawl.processor.pipeline.downloader import AsyncDownloader
+from cmoncrawl.processor.pipeline.router import Router
+from cmoncrawl.processor.pipeline.streamer import StreamerFileJSON
+from cmoncrawl.common.loggers import all_purpose_logger
+from cmoncrawl.common.types import MatchType, PipeMetadata
+from cmoncrawl.middleware.synchronized import query_and_extract
+from cmoncrawl.processor.dao.s3 import S3Dao
+from pathlib import Path
+
+
+class YourCustomExtractor(BaseExtractor):
+    def extract_soup(self, soup: BeautifulSoup, metadata: PipeMetadata) -> Dict[str, Any] | None:
+        return {"title": "Dummy"}
+
+your_custom_extractor = YourCustomExtractor()
+
+# We register our custom extractor to the router
+router = Router()
+router.load_extractor("ext", your_custom_extractor)
+router.register_route("ext", ".*bbc.com.*")
+streamer = StreamerFileJSON(Path("extracted"), max_directory_size=1000, max_file_size=100)
+
+async with S3Dao(aws_profile="dev") as dao:
+    downloader = AsyncDownloader(dao)
+    pipeline = ProcessorPipeline(downloader=downloader, router=router, outstreamer=streamer)
+
+    index_agg = GatewayAggregator(
+        urls=["bbc.com"],
+        match_type=MatchType.DOMAIN,
+        limit=1000,
+    )
+
+    processed_urls = await query_and_extract(index_agg, pipeline)
+
+
+
+
+
+

Query records and then extract

+

The otherway is to query index for all records and download/extract them afterwards. This approach works +great with Athena as the query takes around 1-2 minutes. With this approach we can than abuse both multiprocessing to process +and asyncio queues to download the data faster. This is how we can do it:

+
+
Query and extract data from Common Crawl
+
from cmoncrawl.aggregator.athena_query import AthenaAggregator
+from cmoncrawl.common.types import MatchType
+from typing import Any, Dict
+from bs4 import BeautifulSoup
+from cmoncrawl.aggregator.gateway_query import GatewayAggregator
+from cmoncrawl.processor.pipeline.extractor import BaseExtractor
+from cmoncrawl.processor.pipeline.pipeline import ProcessorPipeline
+from cmoncrawl.processor.pipeline.downloader import AsyncDownloader
+from cmoncrawl.processor.pipeline.router import Router
+from cmoncrawl.processor.pipeline.streamer import StreamerFileJSON
+from cmoncrawl.common.loggers import all_purpose_logger
+from cmoncrawl.common.types import MatchType, PipeMetadata
+from cmoncrawl.middleware.synchronized import extract
+from cmoncrawl.processor.dao.s3 import S3Dao
+from pathlib import Path
+
+# Query
+records = []
+async with AthenaAggregator(urls=["bbc.com"],
+    match_type=MatchType.DOMAIN,
+    limit=1000,
+    bucket_name="test-dev-cmoncrawl",
+    aws_profile="dev"
+) as agg:
+    async for record in agg:
+        records.append(record)
+
+#Then extract
+
+
+
+class YourCustomExtractor(BaseExtractor):
+    def extract_soup(self, soup: BeautifulSoup, metadata: PipeMetadata) -> Dict[str, Any] | None:
+        return {"title": "Dummy"}
+
+your_custom_extractor = YourCustomExtractor()
+
+# We register our custom extractor to the router
+router = Router()
+router.load_extractor("ext", your_custom_extractor)
+router.register_route("ext", ".*bbc.com.*")
+streamer = StreamerFileJSON(Path("extracted"), max_directory_size=1000, max_file_size=100)
+
+async with S3Dao(aws_profile="dev") as dao:
+    downloader = AsyncDownloader(dao)
+    pipeline = ProcessorPipeline(downloader=downloader, router=router, outstreamer=streamer)
+
+    index_agg = GatewayAggregator(
+        urls=["bbc.com"],
+        match_type=MatchType.DOMAIN,
+        limit=1000,
+    )
+
+    processed_urls = await extract(pipeline=pipeline, records=[(rec, {}) for rec in records])
+
+
+
+

To leverage multiprocessing, simply divide the records into n chunks and for each chunk initialize a new process.

+
+
+

Distributed Simulatenous high-throughput querying and extracting

+

Lastly you can leverage cmoncrawl.middleware.stompware.StompAggregator to query and send data to queue using stomp protocol, +and simulatenous retrieve the data from the queue and extract it using cmoncrawl.middleware.stompware.StompProcessor.

+
+
+

Be cooperative

+

If you plan to use multiprocessing or distributed approach, please try to be nice to others and limit the number of requests +at Downloader/Aggregator accordingly.

+
+
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/py-modindex.html b/py-modindex.html new file mode 100644 index 00000000..bd8d4267 --- /dev/null +++ b/py-modindex.html @@ -0,0 +1,304 @@ + + + + + + Python Module Index — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+
    +
  • + +
  • +
  • +
+
+
+
+
+ + +

Python Module Index

+ +
+ c +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
 
+ c
+ cmoncrawl +
    + cmoncrawl.aggregator +
    + cmoncrawl.aggregator.athena_query +
    + cmoncrawl.aggregator.base +
    + cmoncrawl.aggregator.gateway_query +
    + cmoncrawl.aggregator.utils +
    + cmoncrawl.aggregator.utils.athena_query_maker +
    + cmoncrawl.aggregator.utils.helpers +
    + cmoncrawl.aggregator.utils.ndjson +
    + cmoncrawl.common +
    + cmoncrawl.common.loggers +
    + cmoncrawl.common.throttling +
    + cmoncrawl.common.types +
    + cmoncrawl.config +
    + cmoncrawl.integrations +
    + cmoncrawl.integrations.commands +
    + cmoncrawl.integrations.download +
    + cmoncrawl.integrations.extract +
    + cmoncrawl.integrations.utils +
    + cmoncrawl.middleware +
    + cmoncrawl.middleware.stompware +
    + cmoncrawl.middleware.synchronized +
    + cmoncrawl.processor +
    + cmoncrawl.processor.dao +
    + cmoncrawl.processor.dao.api +
    + cmoncrawl.processor.dao.base +
    + cmoncrawl.processor.dao.s3 +
    + cmoncrawl.processor.extraction +
    + cmoncrawl.processor.extraction.filters +
    + cmoncrawl.processor.extraction.utils +
    + cmoncrawl.processor.pipeline +
    + cmoncrawl.processor.pipeline.downloader +
    + cmoncrawl.processor.pipeline.extractor +
    + cmoncrawl.processor.pipeline.pipeline +
    + cmoncrawl.processor.pipeline.router +
    + cmoncrawl.processor.pipeline.streamer +
+ + +
+
+
+ +
+ +
+

© Copyright 2022, Hynek Kydlíček.

+
+ + Built with Sphinx using a + theme + provided by Read the Docs. + + +
+
+
+
+
+ + + + \ No newline at end of file diff --git a/search.html b/search.html new file mode 100644 index 00000000..7dae22dd --- /dev/null +++ b/search.html @@ -0,0 +1,129 @@ + + + + + + Search — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+
    +
  • + +
  • +
  • +
+
+
+
+
+ + + + +
+ +
+ +
+
+
+ +
+ +
+

© Copyright 2022, Hynek Kydlíček.

+
+ + Built with Sphinx using a + theme + provided by Read the Docs. + + +
+
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/searchindex.js b/searchindex.js new file mode 100644 index 00000000..24958428 --- /dev/null +++ b/searchindex.js @@ -0,0 +1 @@ +Search.setIndex({"docnames": ["api", "cli/cli", "cli/download", "cli/extract", "cli/index", "extraction/config_file", "extraction/creating_extractor", "extraction/index", "extraction/utils", "generated/cmoncrawl", "generated/cmoncrawl.aggregator", "generated/cmoncrawl.aggregator.athena_query", "generated/cmoncrawl.aggregator.base", "generated/cmoncrawl.aggregator.gateway_query", "generated/cmoncrawl.aggregator.utils", "generated/cmoncrawl.aggregator.utils.athena_query_maker", "generated/cmoncrawl.aggregator.utils.helpers", "generated/cmoncrawl.aggregator.utils.ndjson", "generated/cmoncrawl.common", "generated/cmoncrawl.common.loggers", "generated/cmoncrawl.common.throttling", "generated/cmoncrawl.common.types", "generated/cmoncrawl.config", "generated/cmoncrawl.integrations", "generated/cmoncrawl.integrations.commands", "generated/cmoncrawl.integrations.download", "generated/cmoncrawl.integrations.extract", "generated/cmoncrawl.integrations.utils", "generated/cmoncrawl.middleware", "generated/cmoncrawl.middleware.stompware", "generated/cmoncrawl.middleware.synchronized", "generated/cmoncrawl.processor", "generated/cmoncrawl.processor.dao", "generated/cmoncrawl.processor.dao.api", "generated/cmoncrawl.processor.dao.base", "generated/cmoncrawl.processor.dao.s3", "generated/cmoncrawl.processor.extraction", "generated/cmoncrawl.processor.extraction.filters", "generated/cmoncrawl.processor.extraction.utils", "generated/cmoncrawl.processor.pipeline", "generated/cmoncrawl.processor.pipeline.downloader", "generated/cmoncrawl.processor.pipeline.extractor", "generated/cmoncrawl.processor.pipeline.pipeline", "generated/cmoncrawl.processor.pipeline.router", "generated/cmoncrawl.processor.pipeline.streamer", "index", "misc/athena", "misc/domain_record", "misc/index", "modules", "prog_guide/index", "prog_guide/overview", "prog_guide/practice", "usage"], "filenames": ["api.rst", "cli/cli.rst", "cli/download.rst", "cli/extract.rst", "cli/index.rst", "extraction/config_file.rst", "extraction/creating_extractor.rst", "extraction/index.rst", "extraction/utils.rst", "generated/cmoncrawl.rst", "generated/cmoncrawl.aggregator.rst", "generated/cmoncrawl.aggregator.athena_query.rst", "generated/cmoncrawl.aggregator.base.rst", "generated/cmoncrawl.aggregator.gateway_query.rst", "generated/cmoncrawl.aggregator.utils.rst", "generated/cmoncrawl.aggregator.utils.athena_query_maker.rst", "generated/cmoncrawl.aggregator.utils.helpers.rst", "generated/cmoncrawl.aggregator.utils.ndjson.rst", "generated/cmoncrawl.common.rst", "generated/cmoncrawl.common.loggers.rst", "generated/cmoncrawl.common.throttling.rst", "generated/cmoncrawl.common.types.rst", "generated/cmoncrawl.config.rst", "generated/cmoncrawl.integrations.rst", "generated/cmoncrawl.integrations.commands.rst", "generated/cmoncrawl.integrations.download.rst", "generated/cmoncrawl.integrations.extract.rst", "generated/cmoncrawl.integrations.utils.rst", "generated/cmoncrawl.middleware.rst", "generated/cmoncrawl.middleware.stompware.rst", "generated/cmoncrawl.middleware.synchronized.rst", "generated/cmoncrawl.processor.rst", "generated/cmoncrawl.processor.dao.rst", "generated/cmoncrawl.processor.dao.api.rst", "generated/cmoncrawl.processor.dao.base.rst", "generated/cmoncrawl.processor.dao.s3.rst", "generated/cmoncrawl.processor.extraction.rst", "generated/cmoncrawl.processor.extraction.filters.rst", "generated/cmoncrawl.processor.extraction.utils.rst", "generated/cmoncrawl.processor.pipeline.rst", "generated/cmoncrawl.processor.pipeline.downloader.rst", "generated/cmoncrawl.processor.pipeline.extractor.rst", "generated/cmoncrawl.processor.pipeline.pipeline.rst", "generated/cmoncrawl.processor.pipeline.router.rst", "generated/cmoncrawl.processor.pipeline.streamer.rst", "index.rst", "misc/athena.rst", "misc/domain_record.rst", "misc/index.rst", "modules.rst", "prog_guide/index.rst", "prog_guide/overview.rst", "prog_guide/practice.rst", "usage.rst"], "titles": ["API", "Command Line Interface", "Command Line Download", "Command line Extract", "Command Line Interface", "Extractor config file", "Extractor types", "Extraction", "Extraction utils", "cmoncrawl", "cmoncrawl.aggregator", "cmoncrawl.aggregator.athena_query", "cmoncrawl.aggregator.base", "cmoncrawl.aggregator.gateway_query", "cmoncrawl.aggregator.utils", "cmoncrawl.aggregator.utils.athena_query_maker", "cmoncrawl.aggregator.utils.helpers", "cmoncrawl.aggregator.utils.ndjson", "cmoncrawl.common", "cmoncrawl.common.loggers", "cmoncrawl.common.throttling", "cmoncrawl.common.types", "cmoncrawl.config", "cmoncrawl.integrations", "cmoncrawl.integrations.commands", "cmoncrawl.integrations.download", "cmoncrawl.integrations.extract", "cmoncrawl.integrations.utils", "cmoncrawl.middleware", "cmoncrawl.middleware.stompware", "cmoncrawl.middleware.synchronized", "cmoncrawl.processor", "cmoncrawl.processor.dao", "cmoncrawl.processor.dao.api", "cmoncrawl.processor.dao.base", "cmoncrawl.processor.dao.s3", "cmoncrawl.processor.extraction", "cmoncrawl.processor.extraction.filters", "cmoncrawl.processor.extraction.utils", "cmoncrawl.processor.pipeline", "cmoncrawl.processor.pipeline.downloader", "cmoncrawl.processor.pipeline.extractor", "cmoncrawl.processor.pipeline.pipeline", "cmoncrawl.processor.pipeline.router", "cmoncrawl.processor.pipeline.streamer", "Welcome to CommonCrawl Extractor\u2019s documentation!", "Athena", "Domain Record", "Miscellaneous", "docs", "Programming Guide", "How to extract from Common Crawl (theory)", "How to extract from Common Crawl (practice)", "Usage"], "terms": {"The": [1, 2, 3, 5, 6, 7, 8, 11, 13, 21, 29, 33, 35, 40, 41, 43, 47, 51, 52, 53], "i": [1, 2, 3, 5, 6, 8, 11, 13, 21, 29, 34, 35, 40, 41, 44, 46, 47, 50, 51, 52, 53], "simpl": [1, 6, 53], "wrapper": 1, "around": [1, 52], "librari": [1, 7, 50, 53], "It": [1, 6, 8, 11, 13, 29, 33, 34, 40, 47, 50, 51], "provid": [1, 2, 6, 8, 33, 34, 35, 46, 47, 53], "two": [1, 5, 6, 51, 53], "main": [1, 2], "function": [1, 6, 8, 15, 16, 19, 21, 22, 24, 25, 26, 27, 30, 37, 38, 40, 41, 44, 51], "download": [1, 3, 4, 33, 35, 41, 45, 47, 50, 52, 53], "sampl": 1, "either": [1, 3, 6, 51], "domain": [1, 2, 3, 5, 21, 33, 34, 35, 40, 41, 45, 48, 52, 53], "record": [1, 4, 5, 21, 33, 34, 35, 40, 41, 45, 48, 50, 53], "html": [1, 4, 5, 6, 7, 41, 45, 47, 51, 53], "from": [1, 2, 3, 5, 6, 7, 8, 11, 13, 21, 35, 40, 41, 45, 50, 53], "common": [1, 2, 3, 5, 6, 29, 33, 45, 50, 53], "crawl": [1, 2, 3, 5, 6, 11, 21, 29, 33, 45, 47, 50, 53], "index": [1, 2, 11, 13, 29, 45, 51, 52, 53], "extract": [1, 4, 5, 21, 40, 41, 45, 47, 50, 53], "an": [1, 2, 3, 6, 11, 13, 21, 40, 46, 47, 51], "content": [1, 3, 5, 21, 40, 41, 51], "can": [1, 3, 5, 6, 7, 11, 13, 47, 51, 52, 53], "also": [1, 5, 29, 52, 53], "directli": [1, 3, 6, 7, 51, 53], "take": [1, 3, 6, 41, 50, 52], "data": [1, 3, 5, 6, 8, 21, 33, 34, 40, 41, 44, 46, 51, 52, 53], "both": [1, 3, 8, 52], "ar": [1, 2, 3, 5, 6, 8, 21, 41, 46, 47, 51, 52, 53], "invok": 1, "us": [1, 2, 3, 5, 6, 8, 11, 13, 21, 29, 35, 40, 41, 46, 47, 50, 51, 52, 53], "cmon": [1, 2, 3, 5], "follow": [1, 2, 3, 5, 46, 47, 53], "requir": [1, 2, 3, 8, 21, 47], "argument": [1, 4, 45], "few": [1, 50], "option": [1, 4, 5, 6, 11, 13, 29, 35, 40, 41, 43, 45, 47, 50], "verbos": 1, "level": [1, 3], "choic": 1, "0": [1, 11, 13, 21], "1": [1, 3, 7, 11, 13, 21, 40, 45, 50, 52, 53], "2": [1, 7, 11, 45, 50, 52], "being": 1, "least": [1, 5], "most": [1, 6, 46], "default": [1, 2, 3, 11, 13, 21, 29, 33, 35, 40, 41, 43, 51, 53], "aws_profil": [1, 11, 35, 52], "aw": [1, 2, 3, 11, 35, 45, 46, 51], "profil": [1, 11, 35], "call": [1, 29, 52, 53], "athena": [1, 2, 11, 45, 47, 48, 52, 53], "s3": [1, 2, 3, 11, 46, 51, 52, 53], "If": [1, 2, 3, 5, 6, 11, 13, 29, 35, 41, 46, 51, 52, 53], "first": [1, 2, 3, 5, 40, 51, 52, 53], "1000": [1, 2, 52], "com": [1, 2, 3, 11, 13, 21, 47, 52], "match_typ": [1, 2, 11, 13, 52], "limit": [1, 2, 11, 13, 50, 52, 53], "dr_output": [1, 2, 3], "100": [1, 2, 3, 52], "html_output": [1, 2, 3], "them": [1, 3, 29, 47, 51, 52], "your": [1, 3, 6, 7, 8, 46, 47, 51], "extractor": [1, 3, 7, 21, 40, 43, 50, 52, 53], "config": [1, 2, 3, 6, 7, 21, 45, 51], "json": [1, 2, 3, 11, 13, 47, 51], "extracted_output": [1, 3], "jsonl": [1, 2, 3, 45, 48], "second": [1, 2, 3, 13, 40, 46], "tool": [2, 3], "serv": [2, 3], "queri": [2, 11, 21, 29, 45, 46, 50, 53], "commoncrawl": [2, 11, 13, 33, 35, 46, 50, 53], "need": [2, 3, 6, 46, 51, 52, 53], "thi": [2, 3, 5, 6, 7, 11, 13, 21, 33, 43, 46, 50, 51, 52, 53], "order": [2, 3, 6, 7, 41, 46, 47, 51, 52, 53], "output": [2, 3, 44, 51], "path": [2, 3, 5, 21, 40, 43, 44, 52], "directori": [2, 3, 5], "file": [2, 3, 6, 7, 8, 11, 13, 21, 35, 40, 44, 45, 47, 50, 53], "url": [2, 3, 5, 11, 13, 21, 29, 33, 40, 43, 47, 51, 52], "e": [2, 3, 5], "g": [2, 3, 5], "www": [2, 3, 5, 21], "bcc": 2, "cz": [2, 5], "In": [2, 5, 6, 7, 46, 47, 51, 53], "contain": [2, 3, 8, 40, 41, 47, 51, 53], "one": [2, 3, 5, 51], "each": [2, 3, 5, 6, 47, 52], "found": [2, 6, 11, 13], "multipl": [2, 8, 51], "format": [2, 3, 5, 45, 48, 51], "max": [2, 3, 11, 13], "number": [2, 3, 11, 13, 29, 40, 52, 53], "sinc": [2, 5, 11, 13, 21, 43, 51, 52, 53], "start": [2, 5, 11, 13, 46], "date": [2, 3, 5, 11, 13, 40, 51], "iso": [2, 3, 5], "2020": 2, "01": [2, 3, 5, 47], "TO": 2, "end": [2, 5, 11, 13], "cc_server": [2, 11, 13], "must": [2, 5, 6, 41, 46, 47], "whole": [2, 52], "http": [2, 3, 5, 11, 13, 21, 33, 47, 53], "org": [2, 11, 13, 33], "cc": [2, 47], "2023": 2, "14": 2, "max_retri": [2, 3, 11, 13, 40, 44], "retri": [2, 11, 13, 40], "request": [2, 3, 11, 13, 40, 52, 53], "increas": 2, "when": [2, 3, 29, 41, 46, 47, 52], "fail": [2, 3, 41], "sleep_bas": [2, 3, 11, 13, 40], "base": [2, 3, 6, 13, 33, 40, 41, 43, 44, 51], "sleep": [2, 40], "time": [2, 13, 40, 43, 52], "exponenti": [2, 3, 13, 40], "backoff": [2, 3, 13, 40], "case": [2, 5, 51, 53], "failur": 2, "max_requests_per_second": [2, 3, 13, 40], "per": [2, 3, 13, 40, 51, 53], "One": [2, 52], "exact": [2, 11, 13, 21], "prefix": [2, 21, 41], "host": [2, 21, 29], "match": [2, 5, 11, 13, 21, 41, 43, 51], "type": [2, 7, 11, 13, 29, 35, 40, 41, 45, 51, 52], "refer": [2, 3, 6, 47, 51, 53], "cdx": [2, 11, 13, 21], "api": [2, 3, 11, 13, 45, 51], "more": [2, 3, 5, 53], "inform": [2, 3, 7, 21, 47, 51], "see": [2, 3, 6, 21, 52, 53], "cmoncrawl": [2, 3, 6, 8, 45, 46, 50, 51, 52], "matchtyp": [2, 11, 13, 21, 52], "max_directory_s": [2, 3, 44, 52], "filter_non_200": 2, "filter": [2, 7, 29, 45, 50], "out": [2, 8, 44, 45, 50], "non": 2, "200": [2, 41], "statu": [2, 41], "code": [2, 7, 41, 45], "aggreg": [2, 29, 45, 46, 51, 52], "fastest": [2, 3, 53], "credenti": [2, 3, 46, 53], "correct": [2, 3, 43], "permiss": [2, 3], "gatewai": [2, 3, 33, 51, 52, 53], "veri": 2, "slow": [2, 51, 53], "s3_bucket": 2, "bucket": [2, 3, 11, 35, 46, 53], "onli": [2, 3, 6, 41, 46, 47, 51, 53], "set": [2, 5, 6, 41, 47, 53], "delet": [2, 46], "after": [2, 3, 46], "done": [2, 6, 53], "allow": [2, 7, 41, 46, 50, 52, 53], "reus": [2, 46], "futur": 2, "temporari": 2, "creat": [2, 3, 5, 6, 8, 11, 29, 50, 51, 52], "you": [2, 3, 5, 6, 7, 46, 47, 51, 52, 53], "specifi": [2, 3, 5, 40, 41, 51, 53], "rememb": 2, "manual": 2, "re": [2, 43], "avoid": 2, "incur": 2, "unnecessari": 2, "cost": 2, "max_crawls_per_fil": [2, 3], "encod": [2, 6, 21, 40, 41, 47], "forc": 2, "usag": [2, 33, 45], "possibl": [2, 5, 51], "download_method": [2, 3], "method": [2, 3, 6, 33, 34, 35, 52, 53], "warc": [2, 3, 21, 35, 40, 47, 51], "appli": [2, 3, 41], "mode": [3, 4, 45, 47], "config_path": 3, "rule": 3, "output_path": 3, "To": [3, 5, 52], "yield": [3, 11, 13], "same": [3, 5, 46, 52], "which": [3, 5, 6, 11, 13, 29, 40, 41, 44, 47, 50, 51], "For": [3, 6, 51], "new": [3, 5, 11, 51, 52], "name": [3, 5, 6, 21, 29, 35, 43, 46], "appropri": [3, 51], "have": [3, 5, 6, 46, 47, 53], "acquir": 3, "without": [3, 5, 7], "pleas": [3, 52, 53], "describ": [3, 51], "how": [3, 7, 11, 45, 47, 50, 53], "n_proc": 3, "process": [3, 29, 51, 52, 53], "parallel": 3, "thu": [3, 5, 6, 51, 53], "singl": [3, 5, 11, 13, 40, 52], "": [3, 46, 47, 51, 52, 53], "useless": 3, "than": [3, 5, 51, 52], "attempt": 3, "valu": [3, 21, 51], "between": [3, 13], "2021": 3, "todai": 3, "were": [3, 51], "By": [3, 47, 51], "try": [3, 52], "infer": 3, "3": [3, 11, 13, 40, 44, 45, 50], "go": 3, "build": 3, "appreci": 3, "what": [3, 5, 6, 51, 52], "becaus": 3, "those": [3, 21], "dure": [3, 5], "rout": [3, 5, 21, 43], "exampl": [4, 7, 11, 13, 21, 33, 45, 47], "posit": [4, 45], "mani": [5, 11], "want": [5, 6, 7, 47, 50, 51, 53], "imagin": 5, "websit": 5, "complet": 5, "differ": [5, 51, 53], "articl": [5, 21], "achiev": 5, "defin": [5, 6, 7, 8, 21, 44, 46, 51], "should": [5, 6, 21, 41, 51, 52, 53], "given": [5, 33, 34, 40, 43], "leverag": [5, 52], "datetim": [5, 11, 13, 21, 40, 43], "extractors_path": [5, 21], "folder": 5, "regex": [5, 21, 43], "my_extractor": 5, "string": [5, 6, 21, 40], "my_extractor2": 5, "another_regex": 5, "where": [5, 11, 41], "locat": [5, 41], "rel": [5, 41], "current": [5, 43, 51], "work": [5, 52], "list": [5, 11, 13, 21, 29, 40, 41, 43, 53], "condit": [5, 51], "we": [5, 7, 47, 51, 52, 53], "dictionari": [5, 6, 21, 41], "kei": [5, 21, 41, 47], "At": 5, "record_d": 5, "ha": 5, "python": [5, 50], "extens": [5, 44], "variabl": [5, 6], "overrid": 5, "valid": [5, 41], "2009": 5, "all": [5, 6, 8, 11, 40, 41, 43, 44, 47, 51, 52, 53], "a_extractor": 5, "a_extractor2": 5, "b_extractor": 5, "2010": 5, "cmon2": 5, "happen": 5, "A": [5, 11, 13, 21, 40, 41, 51], "cralw": 5, "2012": [5, 46], "might": [5, 6, 7, 51, 53], "put": 5, "problem": [5, 6], "add": [5, 40, 47], "load": [5, 43, 51], "But": [5, 51], "don": [5, 6, 8, 46, 47, 51, 53], "t": [5, 6, 8, 29, 40, 46, 47, 51, 52, 53], "import": [5, 6, 51, 52], "sy": 5, "pathlib": [5, 52], "append": [5, 11, 52], "__file__": 5, "parent": 5, "router": [5, 51, 52], "everi": [5, 51], "ani": [5, 6, 8, 21, 29, 40, 41, 47, 52], "untrust": 5, "write": 6, "implement": [6, 51], "processor": [6, 8, 29, 45, 51, 52], "pipelin": [6, 29, 45, 50, 51], "iextractor": [6, 41, 43], "class": [6, 11, 12, 13, 17, 20, 21, 22, 25, 26, 27, 29, 33, 34, 35, 40, 41, 42, 43, 44, 51, 52], "choos": [6, 41, 45, 50], "page": [6, 21, 41, 45, 50], "medatata": 6, "none": [6, 8, 11, 13, 21, 29, 35, 40, 41, 43, 44, 52], "discard": 6, "while": [6, 41, 51, 53], "interfac": [6, 45, 53], "doesn": [6, 29, 40], "handl": [6, 33, 51, 52], "pars": [6, 51], "bs4": [6, 52], "resolv": [6, 52], "issu": 6, "pageextractor": [6, 41], "just": [6, 51], "css": [6, 8, 41], "selector": [6, 8, 41], "transform": [6, 8], "regist": [6, 43, 52], "separ": [6, 47], "initi": [6, 35, 52], "py": [6, 7, 45], "otherwis": [6, 41, 44], "inherit": [6, 51], "title_extractor": 6, "pipemetadata": [6, 21, 40, 41, 43, 52], "myextractor": 6, "def": [6, 52], "self": [6, 52], "respons": [6, 11, 13, 21, 41, 46], "str": [6, 11, 13, 21, 29, 33, 35, 40, 41, 43, 44, 52], "metadata": [6, 21, 40, 41, 43, 51, 52, 53], "dict": [6, 21, 41, 51, 52], "return": [6, 8, 11, 13, 35, 40, 41, 43, 44, 51, 52], "titl": [6, 52], "my": 6, "assum": [6, 51], "beautifulsoup": [6, 52], "extract_soup": [6, 52], "object": [6, 11, 13, 33, 34, 40, 46, 51, 52], "extact": 6, "haven": 6, "additionali": [6, 53], "filter_raw": [6, 51], "raw": [6, 51], "true": [6, 21, 29, 40, 41], "fals": [6, 21, 40, 41, 44], "decid": 6, "effici": 6, "wai": [6, 51, 52, 53], "now": [6, 52], "soup": [6, 8, 41, 51, 52], "filter_soup": [6, 51], "final": 6, "said": 6, "here": 6, "ext": [6, 52], "titleextractor": 6, "text": 6, "bool": [6, 29, 40, 41, 44], "would": [6, 51], "save": [7, 45, 50, 52], "space": 7, "themselv": 7, "doe": [7, 40], "do": [7, 46, 52], "section": [7, 50], "show": 7, "own": [7, 47, 50, 51], "definit": [7, 45, 51], "baseextractor": [7, 41, 45, 51, 52], "structur": [7, 45], "__init__": [7, 45], "arbitrari": [7, 45], "execut": [7, 45], "util": [7, 45, 50], "utili": 8, "helper": 8, "must_exist_filt": 8, "ulr": 8, "must_not_exist_filt": 8, "check_requir": 8, "check": 8, "present": 8, "chain_transform": 8, "chain": 8, "broken": 8, "especi": 8, "useful": [8, 47], "select": 8, "etc": 8, "extract_transform": 8, "tag": 8, "modul": [9, 10, 14, 18, 23, 28, 31, 32, 36, 39, 43, 45, 46], "athenaaggreg": [11, 46, 52], "9999": [11, 13], "12": [11, 13], "31": [11, 13], "23": [11, 13], "59": [11, 13], "999999": [11, 13], "int": [11, 13, 21, 29, 40, 44], "prefetch_s": [11, 13], "float": [11, 13, 40], "5": [11, 13, 40, 45, 50], "extra_sql_where_claus": 11, "batch_siz": 11, "bucket_nam": [11, 35, 52], "catalog_nam": 11, "awsdatacatalog": 11, "database_nam": 11, "table_nam": 11, "ccindex": 11, "async": [11, 13, 29, 33, 35, 40, 52], "context": [11, 13, 35], "manag": [11, 13, 34, 35, 46], "iter": [11, 13, 40], "domainrecord": [11, 13, 21, 35, 40], "paramet": [11, 13, 29, 33, 35, 40, 41, 43], "search": [11, 13, 45, 53], "cc_indexes_serv": [11, 13], "server": [11, 13, 21, 53], "collinfo": [11, 13], "retriev": [11, 13, 21, 52], "min": [11, 13], "maximum": [11, 13, 40], "result": [11, 13, 29, 46], "fetch": [11, 13, 33, 34, 35], "concurr": [11, 13], "addit": [11, 47], "sql": [11, 46, 47, 53], "claus": 11, "onc": [11, 51, 53], "catalog": 11, "databas": 11, "tabl": 11, "domain_record": [11, 13, 21, 33, 34, 35, 40, 47], "print": [11, 13], "athenaaggregatoriter": 11, "aws_client": 11, "session": 11, "gatewayaggreg": [13, 29, 51, 52], "20": [13, 40, 53], "find": [13, 51, 53], "calcul": 13, "gatewayaggregatoriter": 13, "client": [13, 35], "clientsess": 13, "cc_file": 13, "except": [16, 34, 52], "domaincrawl": 21, "cdx_server": 21, "filenam": [21, 47], "offset": [21, 47, 51], "length": [21, 47, 51], "digest": [21, 40, 47], "timestamp": [21, 47], "model_config": 21, "classvar": 21, "configdict": 21, "configur": 21, "model": 21, "conform": 21, "pydant": 21, "model_field": 21, "fieldinfo": 21, "annot": 21, "union": [21, 43], "nonetyp": 21, "about": [21, 47, 52], "field": [21, 45, 47, 50], "map": [21, 51, 52], "replac": 21, "__fields__": 21, "v1": 21, "extractconfig": 21, "routesconfig": 21, "run": [21, 46, 52, 53], "extractorconfig": 21, "github": 21, "internetarch": 21, "wayback": 21, "blob": 21, "master": 21, "readm": 21, "md": 21, "scope": 21, "abc": 21, "article_data": 21, "factori": 21, "warc_head": 21, "http_header": 21, "rec_typ": 21, "latin": [21, 40], "pipe": [21, 40], "attribut": 21, "instanc": 21, "repres": [21, 33, 34], "associ": 21, "eg": 21, "pointer": [21, 53], "default_factori": 21, "store": [21, 51, 53], "header": [21, 29, 41], "charact": 21, "retrieverespons": 21, "stompaggreg": [29, 52], "queue_host": 29, "queue_port": 29, "index_agg": [29, 52], "heartbeat": 29, "10000": 29, "listen": 29, "send": [29, 52], "queue": [29, 51, 52], "stomp": [29, 52], "protocol": [29, 52], "topic": 29, "poisson_pil": 29, "messag": 29, "finish": [29, 46], "port": 29, "indexaggreg": 29, "connect": [29, 33, 34, 52], "filter_dupl": 29, "dupl_id_head": 29, "artemi": 29, "duplic": 29, "stompprocessor": [29, 52], "pills_to_di": 29, "queue_siz": 29, "timeout": 29, "address": 29, "processorpipelin": [29, 52], "receiv": 29, "enough": 29, "stop": 29, "minut": [29, 52], "befor": 29, "dy": 29, "size": 29, "listener_stat": 29, "listnerstat": 29, "on_messag": 29, "frame": 29, "ccapigatewaydao": 33, "base_url": 33, "access": [33, 34, 40, 51, 53], "interact": [33, 34, 35, 46, 53], "open": 33, "close": 33, "error": 33, "relat": [33, 51], "aopen": 33, "asynchron": [33, 35, 40], "aclos": 33, "await": [33, 52], "icc_dao": [34, 40, 51], "sourc": 34, "s3dao": [35, 52], "aioboto3": 35, "__aenter__": 35, "__aexit__": 35, "exc_typ": 35, "exc": 35, "tb": 35, "clean": [35, 46], "up": [35, 46], "its": [35, 43], "byte": 35, "rais": [35, 41], "valueerror": 35, "asyncdownload": [40, 51, 52], "dao": [40, 51, 52], "digest_verif": 40, "whether": 40, "verifi": 40, "downloaderlocalfil": 40, "local": 40, "test": [40, 52], "anyth": 40, "pass": [40, 51, 52], "further": 40, "dummydownload": 40, "dummi": [40, 41, 52], "perform": 40, "actual": 40, "simpli": [40, 41, 46, 51, 52], "empti": 40, "tupl": 40, "element": [40, 41], "idownload": 40, "warciter": 40, "show_progress": 40, "over": [40, 53], "raise_on_encod": 41, "parser": [41, 51], "valueexcept": 41, "decod": 41, "domainrecordextractor": 41, "filter_non_ok": 41, "htmlextractor": 41, "abstract": [41, 43, 44], "header_css_dict": 41, "header_extract_dict": 41, "callabl": 41, "content_css_selector": 41, "bodi": 41, "content_css_dict": 41, "content_extract_dict": 41, "css_selectors_must_exist": 41, "css_selectors_must_not_exist": 41, "allowed_domain_prefix": 41, "is_valid_extract": 41, "design": [41, 53], "specif": [41, 53], "web": [41, 45, 50], "ad": [41, 47], "abil": 41, "thei": [41, 52], "exist": 41, "proce": 41, "irout": 43, "pattern": 43, "load_module_as_extractor": 43, "module_path": 43, "register_rout": [43, 52], "against": [43, 51], "earliest": 43, "latest": 43, "basestreamerfil": [44, 51], "root": 44, "max_file_s": [44, 52], "directory_prefix": 44, "directory_": 44, "basic": [44, 51], "istream": [44, 51], "outstream": [44, 51, 52], "stream": 44, "identifi": 44, "success": 44, "memorystream": [44, 51], "memori": [44, 51], "keep": 44, "streamerfilehtml": [44, 51], "streamerfilejson": [44, 51, 52], "pretti": 44, "workflow": 45, "Be": [45, 50], "nice": [45, 52], "other": [45, 52], "command": [45, 50, 53], "line": [45, 47, 50, 51, 53], "program": 45, "guid": 45, "theori": [45, 50], "4": [45, 50], "6": [45, 50], "practic": [45, 50, 53], "simulaten": [45, 50], "distribut": [45, 50], "high": [45, 50, 53], "throughput": [45, 50], "cooper": [45, 50], "miscellan": 45, "prerequisit": [45, 48], "cach": [45, 48], "integr": [45, 52], "middlewar": [45, 52], "servic": 46, "make": [46, 51, 53], "easi": [46, 53], "analyz": 46, "amazon": 46, "standard": 46, "serverless": 46, "so": 46, "infrastructur": 46, "pai": 46, "point": 46, "schema": 46, "deliv": 46, "within": 46, "With": [46, 51, 52], "complex": 46, "etl": 46, "job": 46, "prepar": 46, "analysi": 46, "anyon": 46, "skill": 46, "quickli": 46, "larg": 46, "scale": 46, "dataset": 46, "account": 46, "version": 46, "10": 46, "17": 46, "statement": 46, "sid": 46, "commoncrawldb": 46, "effect": [46, 52], "action": 46, "createdatacatalog": 46, "glue": 46, "batchcreatepartit": 46, "startqueryexecut": 46, "createt": 46, "createdatabas": 46, "gettabl": 46, "getdatabas": 46, "updatet": 46, "updatepartit": 46, "getpartit": 46, "getqueryexecut": 46, "listtablemetadata": 46, "getbucketloc": 46, "describejob": 46, "resourc": [46, 51], "resultsbucket": 46, "listbucket": 46, "arn": 46, "testbucket": 46, "putobject": 46, "getobject": 46, "deleteobject": 46, "commoncrawlbucket": 46, "itnial": 46, "athena_queri": [46, 52], "whenev": [46, 52], "mean": 46, "automat": 46, "randomli": 46, "gener": 46, "strucutur": 47, "cotain": 47, "cli": [47, 53], "follwo": 47, "gz": 47, "123": 47, "456": 47, "sha1": 47, "1234567890abcdef": 47, "utf": 47, "8": 47, "2018": 47, "01t00": 47, "00": 47, "00z": 47, "additional_info": 47, "key1": 47, "value1": 47, "key2": 47, "value2": 47, "u": [47, 53], "warc_filenam": 47, "warc_record_offset": 47, "warc_record_length": 47, "content_digest": 47, "fetch_tim": 47, "belong": 47, "document": 50, "peopl": 50, "who": 50, "full": 50, "advatang": 50, "unlik": 50, "get": [51, 52], "commmoncrawl": 51, "link": 51, "step": [51, 52, 53], "rest": 51, "multitud": 51, "our": [51, 52], "bunch": 51, "part": 51, "collect": 51, "month": 51, "releas": 51, "understand": 51, "even": 51, "wa": 51, "certain": 51, "older": 51, "respect": 51, "sure": 51, "miss": 51, "continu": 51, "anoth": 51, "gateway_queri": [51, 52], "node": 51, "through": [51, 53], "dynam": 51, "develop": 51, "chosen": 51, "drop": 51, "fast": 51, "Or": 51, "wait": 51, "convers": 51, "streamer": [51, 52], "deriv": 51, "left": 51, "support": [51, 53], "debug": 51, "instead": 51, "like": 51, "saver": 51, "chang": 51, "creation": 51, "know": 52, "primit": 52, "let": 52, "alreadi": 52, "combin": [52, 53], "process_domain_record": 52, "caller": 52, "wish": 52, "exactli": 52, "synchron": 52, "query_and_extract": 52, "great": 52, "simultan": 52, "logger": 52, "all_purpose_logg": 52, "yourcustomextractor": 52, "your_custom_extractor": 52, "custom": 52, "load_extractor": 52, "bbc": 52, "dev": 52, "processed_url": 52, "otherwai": 52, "afterward": 52, "approach": 52, "abus": 52, "multiprocess": 52, "asyncio": 52, "faster": [52, 53], "agg": 52, "Then": 52, "rec": 52, "divid": 52, "n": 52, "chunk": 52, "lastli": 52, "stompwar": 52, "plan": [52, 53], "accordingli": 52, "framework": 53, "suffic": 53, "80": 53, "restrict": 53, "control": 53, "programmat": 53, "itself": 53, "rather": 53, "east": 53, "throught": 53, "cloudflar": 53, "slowest": 53, "free": 53, "incrdibli": 53, "paid": 53, "much": 53, "recommend": 53, "imag": 53, "thread": 53, "awar": 53, "prevent": 53, "overload": 53, "consider": 53, "too": 53}, "objects": {"": [[9, 0, 0, "-", "cmoncrawl"]], "cmoncrawl": [[10, 0, 0, "-", "aggregator"], [18, 0, 0, "-", "common"], [22, 0, 0, "-", "config"], [23, 0, 0, "-", "integrations"], [28, 0, 0, "-", "middleware"], [31, 0, 0, "-", "processor"]], "cmoncrawl.aggregator": [[11, 0, 0, "-", "athena_query"], [12, 0, 0, "-", "base"], [13, 0, 0, "-", "gateway_query"], [14, 0, 0, "-", "utils"]], "cmoncrawl.aggregator.athena_query": [[11, 1, 1, "", "AthenaAggregator"]], "cmoncrawl.aggregator.athena_query.AthenaAggregator": [[11, 1, 1, "", "AthenaAggregatorIterator"]], "cmoncrawl.aggregator.gateway_query": [[13, 1, 1, "", "GatewayAggregator"]], "cmoncrawl.aggregator.gateway_query.GatewayAggregator": [[13, 1, 1, "", "GatewayAggregatorIterator"]], "cmoncrawl.aggregator.utils": [[15, 0, 0, "-", "athena_query_maker"], [16, 0, 0, "-", "helpers"], [17, 0, 0, "-", "ndjson"]], "cmoncrawl.common": [[19, 0, 0, "-", "loggers"], [20, 0, 0, "-", "throttling"], [21, 0, 0, "-", "types"]], "cmoncrawl.common.types": [[21, 1, 1, "", "DomainCrawl"], [21, 1, 1, "", "DomainRecord"], [21, 1, 1, "", "ExtractConfig"], [21, 1, 1, "", "ExtractorConfig"], [21, 1, 1, "", "MatchType"], [21, 1, 1, "", "PipeMetadata"], [21, 1, 1, "", "RetrieveResponse"], [21, 1, 1, "", "RoutesConfig"]], "cmoncrawl.common.types.DomainRecord": [[21, 2, 1, "", "model_config"], [21, 2, 1, "", "model_fields"]], "cmoncrawl.common.types.ExtractConfig": [[21, 2, 1, "", "model_config"], [21, 2, 1, "", "model_fields"]], "cmoncrawl.common.types.ExtractorConfig": [[21, 2, 1, "", "model_config"], [21, 2, 1, "", "model_fields"]], "cmoncrawl.common.types.RoutesConfig": [[21, 2, 1, "", "model_config"], [21, 2, 1, "", "model_fields"]], "cmoncrawl.integrations": [[24, 0, 0, "-", "commands"], [25, 0, 0, "-", "download"], [26, 0, 0, "-", "extract"], [27, 0, 0, "-", "utils"]], "cmoncrawl.middleware": [[29, 0, 0, "-", "stompware"], [30, 0, 0, "-", "synchronized"]], "cmoncrawl.middleware.stompware": [[29, 1, 1, "", "StompAggregator"], [29, 1, 1, "", "StompProcessor"]], "cmoncrawl.middleware.stompware.StompAggregator": [[29, 3, 1, "", "aggregate"]], "cmoncrawl.middleware.stompware.StompProcessor": [[29, 1, 1, "", "Listener"]], "cmoncrawl.middleware.stompware.StompProcessor.Listener": [[29, 3, 1, "", "on_message"]], "cmoncrawl.processor": [[32, 0, 0, "-", "dao"], [36, 0, 0, "-", "extraction"], [39, 0, 0, "-", "pipeline"]], "cmoncrawl.processor.dao": [[33, 0, 0, "-", "api"], [34, 0, 0, "-", "base"], [35, 0, 0, "-", "s3"]], "cmoncrawl.processor.dao.api": [[33, 1, 1, "", "CCAPIGatewayDAO"]], "cmoncrawl.processor.dao.api.CCAPIGatewayDAO": [[33, 3, 1, "", "aclose"], [33, 3, 1, "", "aopen"], [33, 3, 1, "", "fetch"]], "cmoncrawl.processor.dao.base": [[34, 1, 1, "", "ICC_Dao"]], "cmoncrawl.processor.dao.base.ICC_Dao": [[34, 3, 1, "", "fetch"]], "cmoncrawl.processor.dao.s3": [[35, 1, 1, "", "S3Dao"]], "cmoncrawl.processor.dao.s3.S3Dao": [[35, 3, 1, "", "__aenter__"], [35, 3, 1, "", "__aexit__"], [35, 2, 1, "", "aws_profile"], [35, 2, 1, "", "bucket_name"], [35, 2, 1, "", "client"], [35, 3, 1, "id0", "fetch"]], "cmoncrawl.processor.extraction": [[37, 0, 0, "-", "filters"], [38, 0, 0, "-", "utils"]], "cmoncrawl.processor.pipeline": [[40, 0, 0, "-", "downloader"], [41, 0, 0, "-", "extractor"], [42, 0, 0, "-", "pipeline"], [43, 0, 0, "-", "router"], [44, 0, 0, "-", "streamer"]], "cmoncrawl.processor.pipeline.downloader": [[40, 1, 1, "", "AsyncDownloader"], [40, 1, 1, "", "DownloaderLocalFiles"], [40, 1, 1, "", "DummyDownloader"], [40, 1, 1, "", "IDownloader"], [40, 1, 1, "", "WarcIterator"]], "cmoncrawl.processor.pipeline.downloader.DummyDownloader": [[40, 3, 1, "", "download"]], "cmoncrawl.processor.pipeline.extractor": [[41, 1, 1, "", "BaseExtractor"], [41, 1, 1, "", "DomainRecordExtractor"], [41, 1, 1, "", "HTMLExtractor"], [41, 1, 1, "", "IExtractor"], [41, 1, 1, "", "PageExtractor"]], "cmoncrawl.processor.pipeline.extractor.BaseExtractor": [[41, 3, 1, "", "extract"]], "cmoncrawl.processor.pipeline.extractor.IExtractor": [[41, 3, 1, "", "extract"]], "cmoncrawl.processor.pipeline.extractor.PageExtractor": [[41, 3, 1, "", "extract"]], "cmoncrawl.processor.pipeline.router": [[43, 1, 1, "", "IRouter"], [43, 1, 1, "", "Route"], [43, 1, 1, "", "Router"]], "cmoncrawl.processor.pipeline.router.IRouter": [[43, 3, 1, "", "route"]], "cmoncrawl.processor.pipeline.router.Router": [[43, 3, 1, "", "load_module_as_extractor"], [43, 3, 1, "", "register_route"], [43, 3, 1, "", "route"]], "cmoncrawl.processor.pipeline.streamer": [[44, 1, 1, "", "BaseStreamerFile"], [44, 1, 1, "", "IStreamer"], [44, 1, 1, "", "MemoryStreamer"], [44, 1, 1, "", "StreamerFileHTML"], [44, 1, 1, "", "StreamerFileJSON"]]}, "objtypes": {"0": "py:module", "1": "py:class", "2": "py:attribute", "3": "py:method"}, "objnames": {"0": ["py", "module", "Python module"], "1": ["py", "class", "Python class"], "2": ["py", "attribute", "Python attribute"], "3": ["py", "method", "Python method"]}, "titleterms": {"api": [0, 33], "command": [1, 2, 3, 4, 24], "line": [1, 2, 3, 4], "interfac": [1, 4], "exampl": [1, 2, 3, 5, 6], "download": [2, 25, 40, 51], "posit": [2, 3], "argument": [2, 3], "option": [2, 3], "record": [2, 3, 47, 52], "mode": 2, "html": [2, 3], "extract": [3, 6, 7, 8, 26, 36, 37, 38, 51, 52], "content": [4, 7, 45, 48, 50], "extractor": [5, 6, 41, 45, 51], "config": [5, 22], "file": [5, 51], "structur": 5, "__init__": 5, "py": 5, "arbitrari": 5, "code": 5, "execut": 5, "type": [6, 21], "definit": 6, "1": [6, 51], "baseextractor": 6, "filter": [6, 8, 37, 51], "2": [6, 51], "util": [8, 14, 15, 16, 17, 27, 38], "cmoncrawl": [9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "aggreg": [10, 11, 12, 13, 14, 15, 16, 17], "athena_queri": 11, "base": [12, 34], "gateway_queri": 13, "athena_query_mak": 15, "helper": 16, "ndjson": 17, "common": [18, 19, 20, 21, 51, 52], "logger": 19, "throttl": 20, "integr": [23, 24, 25, 26, 27], "middlewar": [28, 29, 30], "stompwar": 29, "synchron": 30, "processor": [31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "dao": [32, 33, 34, 35], "s3": 35, "pipelin": [39, 40, 41, 42, 43, 44, 52], "router": 43, "streamer": 44, "welcom": 45, "commoncrawl": [45, 51], "": 45, "document": 45, "indic": 45, "tabl": 45, "athena": 46, "prerequisit": 46, "cach": 46, "domain": 47, "jsonl": 47, "format": 47, "miscellan": 48, "doc": 49, "program": 50, "guid": 50, "how": [51, 52], "from": [51, 52], "crawl": [51, 52], "theori": 51, "queri": [51, 52], "3": 51, "choos": 51, "4": 51, "out": 51, "web": 51, "page": 51, "5": 51, "field": 51, "6": 51, "save": 51, "practic": 52, "simulaten": 52, "distribut": 52, "high": 52, "throughput": 52, "Be": [52, 53], "cooper": 52, "usag": 53, "workflow": 53, "aw": 53, "nice": 53, "other": 53}, "envversion": {"sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "nbsphinx": 4, "sphinx": 60}, "alltitles": {"API": [[0, "api"]], "Command Line Interface": [[1, "command-line-interface"], [4, "command-line-interface"]], "Examples": [[1, "examples"], [2, "examples"], [3, "examples"]], "Command Line Download": [[2, "command-line-download"]], "Positional arguments": [[2, "positional-arguments"], [3, "positional-arguments"]], "Options": [[2, "options"]], "Record mode options": [[2, "record-mode-options"]], "HTML mode options": [[2, "html-mode-options"]], "Command line Extract": [[3, "command-line-extract"]], "Optional arguments": [[3, "optional-arguments"]], "Record arguments": [[3, "record-arguments"]], "Html arguments": [[3, "html-arguments"]], "Contents:": [[4, null], [7, null], [45, null], [48, null], [50, null]], "Extractor config file": [[5, "extractor-config-file"]], "Structure": [[5, "structure"]], "Example": [[5, "example"]], "__init__.py": [[5, "init-py"]], "Arbitrary Code Execution": [[5, "arbitrary-code-execution"]], "Extractor types": [[6, "extractor-types"]], "Extractor Definition": [[6, "extractor-definition"]], "Example 1.": [[6, "example-1"]], "BaseExtractor": [[6, "baseextractor"]], "Extraction": [[6, "extraction"], [7, "extraction"], [8, "extraction"]], "Filtering": [[6, "filtering"], [8, "filtering"]], "Example 2.": [[6, "example-2"]], "Extraction utils": [[8, "extraction-utils"]], "cmoncrawl": [[9, "module-cmoncrawl"]], "cmoncrawl.aggregator": [[10, "module-cmoncrawl.aggregator"]], "cmoncrawl.aggregator.athena_query": [[11, "module-cmoncrawl.aggregator.athena_query"]], "cmoncrawl.aggregator.base": [[12, "module-cmoncrawl.aggregator.base"]], "cmoncrawl.aggregator.gateway_query": [[13, "module-cmoncrawl.aggregator.gateway_query"]], "cmoncrawl.aggregator.utils": [[14, "module-cmoncrawl.aggregator.utils"]], "cmoncrawl.aggregator.utils.athena_query_maker": [[15, "module-cmoncrawl.aggregator.utils.athena_query_maker"]], "cmoncrawl.aggregator.utils.helpers": [[16, "module-cmoncrawl.aggregator.utils.helpers"]], "cmoncrawl.aggregator.utils.ndjson": [[17, "module-cmoncrawl.aggregator.utils.ndjson"]], "cmoncrawl.common": [[18, "module-cmoncrawl.common"]], "cmoncrawl.common.loggers": [[19, "module-cmoncrawl.common.loggers"]], "cmoncrawl.common.throttling": [[20, "module-cmoncrawl.common.throttling"]], "cmoncrawl.common.types": [[21, "module-cmoncrawl.common.types"]], "cmoncrawl.config": [[22, "module-cmoncrawl.config"]], "cmoncrawl.integrations": [[23, "module-cmoncrawl.integrations"]], "cmoncrawl.integrations.commands": [[24, "module-cmoncrawl.integrations.commands"]], "cmoncrawl.integrations.download": [[25, "module-cmoncrawl.integrations.download"]], "cmoncrawl.integrations.extract": [[26, "module-cmoncrawl.integrations.extract"]], "cmoncrawl.integrations.utils": [[27, "module-cmoncrawl.integrations.utils"]], "cmoncrawl.middleware": [[28, "module-cmoncrawl.middleware"]], "cmoncrawl.middleware.stompware": [[29, "module-cmoncrawl.middleware.stompware"]], "cmoncrawl.middleware.synchronized": [[30, "module-cmoncrawl.middleware.synchronized"]], "cmoncrawl.processor": [[31, "module-cmoncrawl.processor"]], "cmoncrawl.processor.dao": [[32, "module-cmoncrawl.processor.dao"]], "cmoncrawl.processor.dao.api": [[33, "module-cmoncrawl.processor.dao.api"]], "cmoncrawl.processor.dao.base": [[34, "module-cmoncrawl.processor.dao.base"]], "cmoncrawl.processor.dao.s3": [[35, "module-cmoncrawl.processor.dao.s3"]], "cmoncrawl.processor.extraction": [[36, "module-cmoncrawl.processor.extraction"]], "cmoncrawl.processor.extraction.filters": [[37, "module-cmoncrawl.processor.extraction.filters"]], "cmoncrawl.processor.extraction.utils": [[38, "module-cmoncrawl.processor.extraction.utils"]], "cmoncrawl.processor.pipeline": [[39, "module-cmoncrawl.processor.pipeline"]], "cmoncrawl.processor.pipeline.downloader": [[40, "module-cmoncrawl.processor.pipeline.downloader"]], "cmoncrawl.processor.pipeline.extractor": [[41, "module-cmoncrawl.processor.pipeline.extractor"]], "cmoncrawl.processor.pipeline.pipeline": [[42, "module-cmoncrawl.processor.pipeline.pipeline"]], "cmoncrawl.processor.pipeline.router": [[43, "module-cmoncrawl.processor.pipeline.router"]], "cmoncrawl.processor.pipeline.streamer": [[44, "module-cmoncrawl.processor.pipeline.streamer"]], "Welcome to CommonCrawl Extractor\u2019s documentation!": [[45, "welcome-to-commoncrawl-extractor-s-documentation"]], "Indices and tables": [[45, "indices-and-tables"]], "Athena": [[46, "athena"]], "Prerequisites": [[46, "prerequisites"]], "Caching": [[46, "caching"]], "Domain Record": [[47, "domain-record"]], "Domain Record JSONL format": [[47, "domain-record-jsonl-format"]], "Miscellaneous": [[48, "miscellaneous"]], "docs": [[49, "docs"]], "Programming Guide": [[50, "programming-guide"]], "How to extract from Common Crawl (theory)": [[51, "how-to-extract-from-common-crawl-theory"]], "1. Querying CommonCrawl": [[51, "querying-commoncrawl"]], "2. Downloading a file": [[51, "downloading-a-file"]], "3. Choose extractor": [[51, "choose-extractor"]], "4. Filtering out the web page": [[51, "filtering-out-the-web-page"]], "5. Extract fields from the page": [[51, "extract-fields-from-the-page"]], "6. File saving": [[51, "file-saving"]], "How to extract from Common Crawl (practice)": [[52, "how-to-extract-from-common-crawl-practice"]], "Pipeline": [[52, "pipeline"]], "Simulatenous querying and extracting": [[52, "simulatenous-querying-and-extracting"]], "Query records and then extract": [[52, "query-records-and-then-extract"]], "Distributed Simulatenous high-throughput querying and extracting": [[52, "distributed-simulatenous-high-throughput-querying-and-extracting"]], "Be cooperative": [[52, "be-cooperative"]], "Usage": [[53, "usage"]], "Workflow": [[53, "workflow"]], "AWS": [[53, "aws"]], "Be nice to others": [[53, "be-nice-to-others"]]}, "indexentries": {"cmoncrawl": [[9, "module-cmoncrawl"]], "module": [[9, "module-cmoncrawl"], [10, "module-cmoncrawl.aggregator"], [11, "module-cmoncrawl.aggregator.athena_query"], [12, "module-cmoncrawl.aggregator.base"], [13, "module-cmoncrawl.aggregator.gateway_query"], [14, "module-cmoncrawl.aggregator.utils"], [15, "module-cmoncrawl.aggregator.utils.athena_query_maker"], [16, "module-cmoncrawl.aggregator.utils.helpers"], [17, "module-cmoncrawl.aggregator.utils.ndjson"], [18, "module-cmoncrawl.common"], [19, "module-cmoncrawl.common.loggers"], [20, "module-cmoncrawl.common.throttling"], [21, "module-cmoncrawl.common.types"], [22, "module-cmoncrawl.config"], [23, "module-cmoncrawl.integrations"], [24, "module-cmoncrawl.integrations.commands"], [25, "module-cmoncrawl.integrations.download"], [26, "module-cmoncrawl.integrations.extract"], [27, "module-cmoncrawl.integrations.utils"], [28, "module-cmoncrawl.middleware"], [29, "module-cmoncrawl.middleware.stompware"], [30, "module-cmoncrawl.middleware.synchronized"], [31, "module-cmoncrawl.processor"], [32, "module-cmoncrawl.processor.dao"], [33, "module-cmoncrawl.processor.dao.api"], [34, "module-cmoncrawl.processor.dao.base"], [35, "module-cmoncrawl.processor.dao.s3"], [36, "module-cmoncrawl.processor.extraction"], [37, "module-cmoncrawl.processor.extraction.filters"], [38, "module-cmoncrawl.processor.extraction.utils"], [39, "module-cmoncrawl.processor.pipeline"], [40, "module-cmoncrawl.processor.pipeline.downloader"], [41, "module-cmoncrawl.processor.pipeline.extractor"], [42, "module-cmoncrawl.processor.pipeline.pipeline"], [43, "module-cmoncrawl.processor.pipeline.router"], [44, "module-cmoncrawl.processor.pipeline.streamer"]], "cmoncrawl.aggregator": [[10, "module-cmoncrawl.aggregator"]], "athenaaggregator (class in cmoncrawl.aggregator.athena_query)": [[11, "cmoncrawl.aggregator.athena_query.AthenaAggregator"]], "athenaaggregator.athenaaggregatoriterator (class in cmoncrawl.aggregator.athena_query)": [[11, "cmoncrawl.aggregator.athena_query.AthenaAggregator.AthenaAggregatorIterator"]], "cmoncrawl.aggregator.athena_query": [[11, "module-cmoncrawl.aggregator.athena_query"]], "cmoncrawl.aggregator.base": [[12, "module-cmoncrawl.aggregator.base"]], "gatewayaggregator (class in cmoncrawl.aggregator.gateway_query)": [[13, "cmoncrawl.aggregator.gateway_query.GatewayAggregator"]], "gatewayaggregator.gatewayaggregatoriterator (class in cmoncrawl.aggregator.gateway_query)": [[13, "cmoncrawl.aggregator.gateway_query.GatewayAggregator.GatewayAggregatorIterator"]], "cmoncrawl.aggregator.gateway_query": [[13, "module-cmoncrawl.aggregator.gateway_query"]], "cmoncrawl.aggregator.utils": [[14, "module-cmoncrawl.aggregator.utils"]], "cmoncrawl.aggregator.utils.athena_query_maker": [[15, "module-cmoncrawl.aggregator.utils.athena_query_maker"]], "cmoncrawl.aggregator.utils.helpers": [[16, "module-cmoncrawl.aggregator.utils.helpers"]], "cmoncrawl.aggregator.utils.ndjson": [[17, "module-cmoncrawl.aggregator.utils.ndjson"]], "cmoncrawl.common": [[18, "module-cmoncrawl.common"]], "cmoncrawl.common.loggers": [[19, "module-cmoncrawl.common.loggers"]], "cmoncrawl.common.throttling": [[20, "module-cmoncrawl.common.throttling"]], "domaincrawl (class in cmoncrawl.common.types)": [[21, "cmoncrawl.common.types.DomainCrawl"]], "domainrecord (class in cmoncrawl.common.types)": [[21, "cmoncrawl.common.types.DomainRecord"]], "extractconfig (class in cmoncrawl.common.types)": [[21, "cmoncrawl.common.types.ExtractConfig"]], "extractorconfig (class in cmoncrawl.common.types)": [[21, "cmoncrawl.common.types.ExtractorConfig"]], "matchtype (class in cmoncrawl.common.types)": [[21, "cmoncrawl.common.types.MatchType"]], "pipemetadata (class in cmoncrawl.common.types)": [[21, "cmoncrawl.common.types.PipeMetadata"]], "retrieveresponse (class in cmoncrawl.common.types)": [[21, "cmoncrawl.common.types.RetrieveResponse"]], "routesconfig (class in cmoncrawl.common.types)": [[21, "cmoncrawl.common.types.RoutesConfig"]], "cmoncrawl.common.types": [[21, "module-cmoncrawl.common.types"]], "model_config (cmoncrawl.common.types.domainrecord attribute)": [[21, "cmoncrawl.common.types.DomainRecord.model_config"]], "model_config (cmoncrawl.common.types.extractconfig attribute)": [[21, "cmoncrawl.common.types.ExtractConfig.model_config"]], "model_config (cmoncrawl.common.types.extractorconfig attribute)": [[21, "cmoncrawl.common.types.ExtractorConfig.model_config"]], "model_config (cmoncrawl.common.types.routesconfig attribute)": [[21, "cmoncrawl.common.types.RoutesConfig.model_config"]], "model_fields (cmoncrawl.common.types.domainrecord attribute)": [[21, "cmoncrawl.common.types.DomainRecord.model_fields"]], "model_fields (cmoncrawl.common.types.extractconfig attribute)": [[21, "cmoncrawl.common.types.ExtractConfig.model_fields"]], "model_fields (cmoncrawl.common.types.extractorconfig attribute)": [[21, "cmoncrawl.common.types.ExtractorConfig.model_fields"]], "model_fields (cmoncrawl.common.types.routesconfig attribute)": [[21, "cmoncrawl.common.types.RoutesConfig.model_fields"]], "cmoncrawl.config": [[22, "module-cmoncrawl.config"]], "cmoncrawl.integrations": [[23, "module-cmoncrawl.integrations"]], "cmoncrawl.integrations.commands": [[24, "module-cmoncrawl.integrations.commands"]], "cmoncrawl.integrations.download": [[25, "module-cmoncrawl.integrations.download"]], "cmoncrawl.integrations.extract": [[26, "module-cmoncrawl.integrations.extract"]], "cmoncrawl.integrations.utils": [[27, "module-cmoncrawl.integrations.utils"]], "cmoncrawl.middleware": [[28, "module-cmoncrawl.middleware"]], "stompaggregator (class in cmoncrawl.middleware.stompware)": [[29, "cmoncrawl.middleware.stompware.StompAggregator"]], "stompprocessor (class in cmoncrawl.middleware.stompware)": [[29, "cmoncrawl.middleware.stompware.StompProcessor"]], "stompprocessor.listener (class in cmoncrawl.middleware.stompware)": [[29, "cmoncrawl.middleware.stompware.StompProcessor.Listener"]], "aggregate() (cmoncrawl.middleware.stompware.stompaggregator method)": [[29, "cmoncrawl.middleware.stompware.StompAggregator.aggregate"]], "cmoncrawl.middleware.stompware": [[29, "module-cmoncrawl.middleware.stompware"]], "on_message() (cmoncrawl.middleware.stompware.stompprocessor.listener method)": [[29, "cmoncrawl.middleware.stompware.StompProcessor.Listener.on_message"]], "cmoncrawl.middleware.synchronized": [[30, "module-cmoncrawl.middleware.synchronized"]], "cmoncrawl.processor": [[31, "module-cmoncrawl.processor"]], "cmoncrawl.processor.dao": [[32, "module-cmoncrawl.processor.dao"]], "ccapigatewaydao (class in cmoncrawl.processor.dao.api)": [[33, "cmoncrawl.processor.dao.api.CCAPIGatewayDAO"]], "aclose() (cmoncrawl.processor.dao.api.ccapigatewaydao method)": [[33, "cmoncrawl.processor.dao.api.CCAPIGatewayDAO.aclose"]], "aopen() (cmoncrawl.processor.dao.api.ccapigatewaydao method)": [[33, "cmoncrawl.processor.dao.api.CCAPIGatewayDAO.aopen"]], "cmoncrawl.processor.dao.api": [[33, "module-cmoncrawl.processor.dao.api"]], "fetch() (cmoncrawl.processor.dao.api.ccapigatewaydao method)": [[33, "cmoncrawl.processor.dao.api.CCAPIGatewayDAO.fetch"]], "icc_dao (class in cmoncrawl.processor.dao.base)": [[34, "cmoncrawl.processor.dao.base.ICC_Dao"]], "cmoncrawl.processor.dao.base": [[34, "module-cmoncrawl.processor.dao.base"]], "fetch() (cmoncrawl.processor.dao.base.icc_dao method)": [[34, "cmoncrawl.processor.dao.base.ICC_Dao.fetch"]], "s3dao (class in cmoncrawl.processor.dao.s3)": [[35, "cmoncrawl.processor.dao.s3.S3Dao"]], "__aenter__() (cmoncrawl.processor.dao.s3.s3dao method)": [[35, "cmoncrawl.processor.dao.s3.S3Dao.__aenter__"]], "__aexit__() (cmoncrawl.processor.dao.s3.s3dao method)": [[35, "cmoncrawl.processor.dao.s3.S3Dao.__aexit__"]], "aws_profile (cmoncrawl.processor.dao.s3.s3dao attribute)": [[35, "cmoncrawl.processor.dao.s3.S3Dao.aws_profile"]], "bucket_name (cmoncrawl.processor.dao.s3.s3dao attribute)": [[35, "cmoncrawl.processor.dao.s3.S3Dao.bucket_name"]], "client (cmoncrawl.processor.dao.s3.s3dao attribute)": [[35, "cmoncrawl.processor.dao.s3.S3Dao.client"]], "cmoncrawl.processor.dao.s3": [[35, "module-cmoncrawl.processor.dao.s3"]], "fetch() (cmoncrawl.processor.dao.s3.s3dao method)": [[35, "cmoncrawl.processor.dao.s3.S3Dao.fetch"], [35, "id0"]], "cmoncrawl.processor.extraction": [[36, "module-cmoncrawl.processor.extraction"]], "cmoncrawl.processor.extraction.filters": [[37, "module-cmoncrawl.processor.extraction.filters"]], "cmoncrawl.processor.extraction.utils": [[38, "module-cmoncrawl.processor.extraction.utils"]], "cmoncrawl.processor.pipeline": [[39, "module-cmoncrawl.processor.pipeline"]], "asyncdownloader (class in cmoncrawl.processor.pipeline.downloader)": [[40, "cmoncrawl.processor.pipeline.downloader.AsyncDownloader"]], "downloaderlocalfiles (class in cmoncrawl.processor.pipeline.downloader)": [[40, "cmoncrawl.processor.pipeline.downloader.DownloaderLocalFiles"]], "dummydownloader (class in cmoncrawl.processor.pipeline.downloader)": [[40, "cmoncrawl.processor.pipeline.downloader.DummyDownloader"]], "idownloader (class in cmoncrawl.processor.pipeline.downloader)": [[40, "cmoncrawl.processor.pipeline.downloader.IDownloader"]], "warciterator (class in cmoncrawl.processor.pipeline.downloader)": [[40, "cmoncrawl.processor.pipeline.downloader.WarcIterator"]], "cmoncrawl.processor.pipeline.downloader": [[40, "module-cmoncrawl.processor.pipeline.downloader"]], "download() (cmoncrawl.processor.pipeline.downloader.dummydownloader method)": [[40, "cmoncrawl.processor.pipeline.downloader.DummyDownloader.download"]], "baseextractor (class in cmoncrawl.processor.pipeline.extractor)": [[41, "cmoncrawl.processor.pipeline.extractor.BaseExtractor"]], "domainrecordextractor (class in cmoncrawl.processor.pipeline.extractor)": [[41, "cmoncrawl.processor.pipeline.extractor.DomainRecordExtractor"]], "htmlextractor (class in cmoncrawl.processor.pipeline.extractor)": [[41, "cmoncrawl.processor.pipeline.extractor.HTMLExtractor"]], "iextractor (class in cmoncrawl.processor.pipeline.extractor)": [[41, "cmoncrawl.processor.pipeline.extractor.IExtractor"]], "pageextractor (class in cmoncrawl.processor.pipeline.extractor)": [[41, "cmoncrawl.processor.pipeline.extractor.PageExtractor"]], "cmoncrawl.processor.pipeline.extractor": [[41, "module-cmoncrawl.processor.pipeline.extractor"]], "extract() (cmoncrawl.processor.pipeline.extractor.baseextractor method)": [[41, "cmoncrawl.processor.pipeline.extractor.BaseExtractor.extract"]], "extract() (cmoncrawl.processor.pipeline.extractor.iextractor method)": [[41, "cmoncrawl.processor.pipeline.extractor.IExtractor.extract"]], "extract() (cmoncrawl.processor.pipeline.extractor.pageextractor method)": [[41, "cmoncrawl.processor.pipeline.extractor.PageExtractor.extract"]], "cmoncrawl.processor.pipeline.pipeline": [[42, "module-cmoncrawl.processor.pipeline.pipeline"]], "irouter (class in cmoncrawl.processor.pipeline.router)": [[43, "cmoncrawl.processor.pipeline.router.IRouter"]], "route (class in cmoncrawl.processor.pipeline.router)": [[43, "cmoncrawl.processor.pipeline.router.Route"]], "router (class in cmoncrawl.processor.pipeline.router)": [[43, "cmoncrawl.processor.pipeline.router.Router"]], "cmoncrawl.processor.pipeline.router": [[43, "module-cmoncrawl.processor.pipeline.router"]], "load_module_as_extractor() (cmoncrawl.processor.pipeline.router.router method)": [[43, "cmoncrawl.processor.pipeline.router.Router.load_module_as_extractor"]], "register_route() (cmoncrawl.processor.pipeline.router.router method)": [[43, "cmoncrawl.processor.pipeline.router.Router.register_route"]], "route() (cmoncrawl.processor.pipeline.router.irouter method)": [[43, "cmoncrawl.processor.pipeline.router.IRouter.route"]], "route() (cmoncrawl.processor.pipeline.router.router method)": [[43, "cmoncrawl.processor.pipeline.router.Router.route"]], "basestreamerfile (class in cmoncrawl.processor.pipeline.streamer)": [[44, "cmoncrawl.processor.pipeline.streamer.BaseStreamerFile"]], "istreamer (class in cmoncrawl.processor.pipeline.streamer)": [[44, "cmoncrawl.processor.pipeline.streamer.IStreamer"]], "memorystreamer (class in cmoncrawl.processor.pipeline.streamer)": [[44, "cmoncrawl.processor.pipeline.streamer.MemoryStreamer"]], "streamerfilehtml (class in cmoncrawl.processor.pipeline.streamer)": [[44, "cmoncrawl.processor.pipeline.streamer.StreamerFileHTML"]], "streamerfilejson (class in cmoncrawl.processor.pipeline.streamer)": [[44, "cmoncrawl.processor.pipeline.streamer.StreamerFileJSON"]], "cmoncrawl.processor.pipeline.streamer": [[44, "module-cmoncrawl.processor.pipeline.streamer"]]}}) \ No newline at end of file diff --git a/usage.html b/usage.html new file mode 100644 index 00000000..9655b4e8 --- /dev/null +++ b/usage.html @@ -0,0 +1,169 @@ + + + + + + + Usage — CmonCrawl 1.0.0 documentation + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

Usage

+

The library is designed to make interaction with CommonCrawl’s indexes simple, +while also providing a framework for extracting data from the downloaded +HTMLs.

+

You can use the library in two ways:

+
    +
  1. Command Line Interface - This should suffice for 80% of the use cases. Restricted, but easy to use.

  2. +
  3. How to extract from Common Crawl (practice) - If you need more control over the process, you can use the library programmatically.

  4. +
+
+

Workflow

+

In order to download from CommonCrawl you first need to find the pointers to the data you want to download. +Search for the pointers is done over the specific files called indexes. The indexes don’t contain the data itself, +but rather metadata and pointers to the data. We call these pointers domain records (see Domain Record). +Once you have the domain records you can download the data from the CommonCrawl’s S3 bucket. Since you might want +to extract only specific data from the downloaded HTMLs, you can also specify a list of extractors to be run on the +downloaded HTMLs.

+

The library thus supports the two step workflow:

+
    +
  1. First download domain records from the indexes.

  2. +
  3. Download and extract the domain records.

  4. +
+
+
+

AWS

+

The CommonCrawl are stored on AWS S3 us-east-1 bucket. The CommonCrawl allows you to access the data using following methods:

+
    +
  1. Gateway - you can download the data throught CloudFlare HTTP Gateway. You will not need AWS credentials, but it is also the slowest.

  2. +
  3. S3 - you can download the data directly from S3. You will need AWS credentials, but it is also the fastest.

  4. +
+

Additionaly, the CommonCrawl provides two ways to to query the data:

+
    +
  1. CommonCrawl Index - Free, but more limited and incrdibly slow.

  2. +
  3. AWS Athena - Paid, but much faster, you can use SQL to query the data.

  4. +
+

The library supports all of these methods. We recommend using S3/AWS Athena combination. Refer to the following image to see the differences:

+When to use this library +
+
+

Be nice to others

+

If you use the library programmatically or through CLI, +you will find, that you can specify the number of threads to use. +Please be aware that by default we limit the number of requests per thread +to 20/s. This is to prevent overloading the CommonCrawl’s servers. If you +plan to use more threads, be considerate to others and don’t set the number +of threads too high.

+
+
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file