diff --git a/aggregator/index.html b/aggregator/index.html
index 1840455..7f58b30 100644
--- a/aggregator/index.html
+++ b/aggregator/index.html
@@ -97,18 +97,20 @@
             <div class="section" itemprop="articleBody">
               
                 <h1 id="aggregator">Aggregator</h1>
-<p>The aggregator obtains the mapped data of a ledger (from <code>output/&lt;project_name&gt;/mapped_data.json</code>), aggregates it
-over the given timeframe(s) and outputs one or more <code>csv</code> files with the distribution of block to entities,
-structured as follows:</p>
-<pre><code>Entity,Resources
-&lt;name of entity&gt;,&lt;(int) number of blocks&gt;
+<p>The aggregator obtains the mapped data of a ledger (from <code>output/&lt;project_name&gt;/mapped_data.json</code>) and aggregates it
+over units of time that are determined based on the given <code>timeframe</code> and <code>aggregate_by</code> parameters.
+It then outputs a <code>csv</code> file with the distribution of blocks to entities for each time unit under consideration.
+This file is saved in the directory <code>output/&lt;project name&gt;/blocks_per_entity/</code> and is named based on the <code>timeframe</code>
+and <code>aggregate_by</code> parameters.
+For example, if the specified timeframe is from June 2023 to September 2023 and the aggregation is by month, then
+the output file would be named <code>monthly_from_2023-06-01_to_2023-09-30.csv</code> and would be structured as follows:</p>
+<pre><code>Entity \ Time period,Jun-2023,Jul-2023,Aug-2023,Sep-2023
+&lt;name of entity 1&gt;,&lt;number of blocks produced by entity 1 in June 2023&gt;,&lt;number of blocks produced by entity 1 in July 2023&gt;,&lt;number of blocks produced by entity 1 in August 2023&gt;,&lt;number of blocks produced by entity 1 in September 2023&gt;
+&lt;name of entity 2&gt;,&lt;number of blocks produced by entity 2 in June 2023&gt;,&lt;number of blocks produced by entity 2 in July 2023&gt;,&lt;number of blocks produced by entity 2 in August 2023&gt;,&lt;number of blocks produced by entity 2 in September 2023&gt;
 </code></pre>
-<p>Specifically, if the <code>timeframe</code> argument is provided during execution, then the mapping outputs a single <code>csv</code> 
-file that corresponds to that timeframe. Otherwise, it outputs a <code>csv</code> file for each month contained in the default 
-time range (as specified in the <a href="https://github.com/Blockchain-Technology-Lab/pooling-analysis/blob/main/config.yaml">config file</a>). 
-It also outputs a <code>csv</code> file for each year contained in the relevant time frames.</p>
-<p>Each <code>csv</code> file is named after the timeframe over which the mapping was executed (e.g., <code>2021-04.csv</code>) and is
-stored in a dedicated folder in the project's output directory (<code>output/&lt;project_name&gt;/blocks_per_entity/</code>).</p>
+<p>Therefore, the file will have as many rows as the number of entities that have produced blocks in the given
+timeframe (+ 1 for the header) and as many columns as the number of time units in the given timeframe (+ 1 for the
+entity names).</p>
               
             </div>
           </div><footer>
diff --git a/index.html b/index.html
index f17f587..b8ddf14 100644
--- a/index.html
+++ b/index.html
@@ -117,13 +117,15 @@ <h2 id="overview">Overview</h2>
 with all the information that is needed for the mapping.</p>
 <p>The mapping takes the output of the parser, combines it with some other sources of information, and produces a new 
 file that includes attribution data for each block and which mapping method was used to obtain it.</p>
-<p>The aggregator takes as input the output of the mapping, as well as one or more time frames to aggregate over. It then 
-outputs a file for each time frame that reveals the distribution of resources to different entities during that time 
-frame. In this context, "resources" correspond to the number of produced blocks.</p>
+<p>The aggregator takes as input the output of the mapping, as well as a time frame to aggregate over and a unit to 
+divide the time frame by (e.g. week or month).
+It then outputs a file that reveals the distribution of resources to different entities during each time unit under 
+consideration.
+In this context, "resources" correspond to the number of produced blocks.</p>
 <p>These distributions are then the input for the metrics module, which tracks various
 decentralization-related metrics and produces files with the results.</p>
-<p>More details about the different modules can be found in the corresponding <a href="parsers/">Parser</a>, <a href="mappings/">Mapping</a>
-and <a href="metrics/">Metrics</a> pages.</p>
+<p>More details about the different modules can be found in the corresponding <a href="parsers/">Parser</a>, <a href="mappings/">Mapping</a>,
+<a href="aggregator/">Aggregator</a> and <a href="metrics/">Metrics</a> pages.</p>
 <p>Currently, the supported ledgers are:</p>
 <ul>
 <li>Bitcoin</li>
@@ -135,7 +137,7 @@ <h2 id="overview">Overview</h2>
 <li>Tezos</li>
 <li>Zcash</li>
 </ul>
-<p>We intend to add more ledgers to this list in the future. </p>
+<p>We intend to add more ledgers to this list in the future.</p>
 <h2 id="contributing">Contributing</h2>
 <p>This is an open source project licensed under the terms and conditions of the 
 <a href="https://github.com/Blockchain-Technology-Lab/consensus-decentralization/blob/main/LICENSE">MIT license</a> and
@@ -200,5 +202,5 @@ <h2 id="contributing">Contributing</h2>
 
 <!--
 MkDocs version : 1.5.3
-Build Date UTC : 2023-10-05 13:47:03.079361+00:00
+Build Date UTC : 2023-10-06 09:45:12.044458+00:00
 -->
diff --git a/search/search_index.json b/search/search_index.json
index 13349e1..787388c 100644
--- a/search/search_index.json
+++ b/search/search_index.json
@@ -1 +1 @@
-{"config":{"indexing":"full","lang":["en"],"min_search_length":3,"prebuild_index":false,"separator":"[\\s\\-]+"},"docs":[{"location":"","text":"Consensus Blockchain Decentralization - Documentation This is the documentation for the Consensus Decentralization Analysis tool developed by the University of Edinburgh's Blockchain Technology Lab. The tool is responsible for analyzing the block production of various blockchains and measuring their subsequent levels of decentralization. The relevant source code is available on GitHub . Overview The tool consists of the following modules: Parser Mapping Aggregator Metrics The parser is responsible for pre-processing the raw data that comes from a full node. It produces a file with all the information that is needed for the mapping. The mapping takes the output of the parser, combines it with some other sources of information, and produces a new file that includes attribution data for each block and which mapping method was used to obtain it. The aggregator takes as input the output of the mapping, as well as one or more time frames to aggregate over. It then outputs a file for each time frame that reveals the distribution of resources to different entities during that time frame. In this context, \"resources\" correspond to the number of produced blocks. These distributions are then the input for the metrics module, which tracks various decentralization-related metrics and produces files with the results. More details about the different modules can be found in the corresponding Parser , Mapping and Metrics pages. Currently, the supported ledgers are: Bitcoin Bitcoin Cash Cardano Dogecoin Ethereum Litecoin Tezos Zcash We intend to add more ledgers to this list in the future. Contributing This is an open source project licensed under the terms and conditions of the MIT license and CC BY-SA 4.0 . Everyone is welcome to contribute to it by proposing or implementing their ideas. Example contributions include, but are not limited to, reporting potential bugs, supplying useful information for the mappings of supported ledgers, adding support for a new ledger, or making the code more efficient. All contributions to the project will also be covered by the above-mentioned license. When making changes in the code, contributors are required to fork the project's repository first and then issue a pull request with their changes. Each PR will be reviewed before being merged to the main branch. Bugs can be reported in the Issues page. Other comments and ideas can be brought up in the project's Discussions . For more information on how to make specific contributions, see How to Contribute .","title":"Home"},{"location":"#consensus-blockchain-decentralization-documentation","text":"This is the documentation for the Consensus Decentralization Analysis tool developed by the University of Edinburgh's Blockchain Technology Lab. The tool is responsible for analyzing the block production of various blockchains and measuring their subsequent levels of decentralization. The relevant source code is available on GitHub .","title":"Consensus Blockchain Decentralization - Documentation"},{"location":"#overview","text":"The tool consists of the following modules: Parser Mapping Aggregator Metrics The parser is responsible for pre-processing the raw data that comes from a full node. It produces a file with all the information that is needed for the mapping. The mapping takes the output of the parser, combines it with some other sources of information, and produces a new file that includes attribution data for each block and which mapping method was used to obtain it. The aggregator takes as input the output of the mapping, as well as one or more time frames to aggregate over. It then outputs a file for each time frame that reveals the distribution of resources to different entities during that time frame. In this context, \"resources\" correspond to the number of produced blocks. These distributions are then the input for the metrics module, which tracks various decentralization-related metrics and produces files with the results. More details about the different modules can be found in the corresponding Parser , Mapping and Metrics pages. Currently, the supported ledgers are: Bitcoin Bitcoin Cash Cardano Dogecoin Ethereum Litecoin Tezos Zcash We intend to add more ledgers to this list in the future.","title":"Overview"},{"location":"#contributing","text":"This is an open source project licensed under the terms and conditions of the MIT license and CC BY-SA 4.0 . Everyone is welcome to contribute to it by proposing or implementing their ideas. Example contributions include, but are not limited to, reporting potential bugs, supplying useful information for the mappings of supported ledgers, adding support for a new ledger, or making the code more efficient. All contributions to the project will also be covered by the above-mentioned license. When making changes in the code, contributors are required to fork the project's repository first and then issue a pull request with their changes. Each PR will be reviewed before being merged to the main branch. Bugs can be reported in the Issues page. Other comments and ideas can be brought up in the project's Discussions . For more information on how to make specific contributions, see How to Contribute .","title":"Contributing"},{"location":"aggregator/","text":"Aggregator The aggregator obtains the mapped data of a ledger (from output/<project_name>/mapped_data.json ), aggregates it over the given timeframe(s) and outputs one or more csv files with the distribution of block to entities, structured as follows: Entity,Resources <name of entity>,<(int) number of blocks> Specifically, if the timeframe argument is provided during execution, then the mapping outputs a single csv file that corresponds to that timeframe. Otherwise, it outputs a csv file for each month contained in the default time range (as specified in the config file ). It also outputs a csv file for each year contained in the relevant time frames. Each csv file is named after the timeframe over which the mapping was executed (e.g., 2021-04.csv ) and is stored in a dedicated folder in the project's output directory ( output/<project_name>/blocks_per_entity/ ).","title":"Aggregator"},{"location":"aggregator/#aggregator","text":"The aggregator obtains the mapped data of a ledger (from output/<project_name>/mapped_data.json ), aggregates it over the given timeframe(s) and outputs one or more csv files with the distribution of block to entities, structured as follows: Entity,Resources <name of entity>,<(int) number of blocks> Specifically, if the timeframe argument is provided during execution, then the mapping outputs a single csv file that corresponds to that timeframe. Otherwise, it outputs a csv file for each month contained in the default time range (as specified in the config file ). It also outputs a csv file for each year contained in the relevant time frames. Each csv file is named after the timeframe over which the mapping was executed (e.g., 2021-04.csv ) and is stored in a dedicated folder in the project's output directory ( output/<project_name>/blocks_per_entity/ ).","title":"Aggregator"},{"location":"contribute/","text":"How to contribute You can contribute to the tool by adding support for a ledger, updating the mapping process for an existing ledger, or adding a new metric. In all cases, the information should be submitted via a GitHub PR. Add support for ledgers You can add support for a ledger that is not already supported as follows. Mapping information In the directory mapping_information/ , there exist three folders ( addresses , clusters , identifiers ). In each folder, add a file named <project_name>.json , if there exist such information for the new ledger (for more details on what type of information each folder corresponds to see the mapping documentation ). Parser and mapping If no existing parser can be reused, create a file named <project_name>_parser.py in the directory consensus_decentralization/parsers/ . In this file create a new class, which inherits from the DefaultParser class of default_parser.py . Then, override its parse method in order to implement the new parser (or override another method if there are only small changes needed, e.g. parse_identifiers if the only thing that is different from the default parser is the way identifiers are decoded). If no existing mapping can be reused, create a file named <project_name>_mapping.py in the directory consensus_decentralization/mappings/ . In this file create a new class, which inherits from the DefaultMapping class of default_mapping.py . Then, override its perform_mapping method and/or any other methods that are required (e.g. map_from_known_identifiers ). Then, you should enable support for the new ledger in the parser and mapping module scripts. Specifically: in the script consensus_decentralization/parse.py , import the parser class and assign it to the project's name in the ledger_parser dictionary; in the script consensus_decentralization/map.py , import the mapping class and assign it to the project's name in the dictionary ledger_mapping . Notes : You should add an entry in each dictionary, regardless of whether you use a new or existing parser or mapping \u2013 if no new parser or mapping class was created for the project, simply assign the suitable class (e.g. DefaultParser or DefaultMapping ) to the project's name in the corresponding dictionary. If you create a new parser/mapping, you should also add unit tests here Documentation Finally, you should include the new ledger in the documentation pages; specifically: add the ledger in the list of supported ledgers in the repository's main README file add the ledger in the list of supported ledgers in the index documentation page document the new ledger's parser in the corresponding documentation page document how the new ledger's data is retrieved in the corresponding documentation page ; if Google BigQuery is used, add the new query to queries.yaml Data You can optionally commit small sample data for the new ledger in the raw_block_data folder. Alternatively, make sure to add your raw data file in the raw_block_data folder before running the tool on the new ledger. Update existing mapping information All mapping data are in the folder mapping_information . To update or add information about a supported ledger's mapping, you should open a Pull Request. This can be done either via console or as follows, via the browser: Open the file that you want to change (e.g., for Bitcoin, follow this link ) on your browser. Click Edit this file . Make your changes in the file. On the bottom, initiate a Pull Request. Write a short and descriptive commit title message (e.g., \"Update 2019 links for company A\"). Select Create a new branch for this commit and start a pull request. In the page that opens, change the PR title (if necessary) and click on Create pull request . When updating the mapping information, the following guidelines should be observed: The link to a pool's website should be active and public. All sources cited should be publicly available and respectable. Unofficial tweets or unavailable or private sources will be rejected.You can use specific keywords, in the cases when the information is available on-chain. Specifically: homepage : this keyword is used in Cardano, to denote that two pools define the same homepage in their metadata (which are published on-chain) Specifically, for legal_links.json : The value of the pool's name (that is the first value in each array entry under a company), should be the same as the value that corresponds to a key name in the ledger-specific pool information, as defined in the corresponding addresses , clusters or identifiers file. If this string is not exactly the same (including capitalization), the link will not be identified during the mapping process. There should exist no time gaps in a pool's ownership structure. Add metrics To add a new metric, you should do the following steps. First, create a relevant script in the folder consensus_decentralization/metrics . The script should include a function named compute_{metric_name} that, given a dictionary of entities (as keys) to number of blocks (as values), outputs a single value (the outcome of the metric). Second, import this new function to consensus_decentralization/analyze.py . Third, add the name of the metric (which should be the same as the one used in the filename above) and any parameter values it might require to the file config.yaml , under metrics . Fourth, you should add unit tests for the new metric here . Finally, you should update the corresponding documentation page","title":"How to contribute"},{"location":"contribute/#how-to-contribute","text":"You can contribute to the tool by adding support for a ledger, updating the mapping process for an existing ledger, or adding a new metric. In all cases, the information should be submitted via a GitHub PR.","title":"How to contribute"},{"location":"contribute/#add-support-for-ledgers","text":"You can add support for a ledger that is not already supported as follows.","title":"Add support for ledgers"},{"location":"contribute/#mapping-information","text":"In the directory mapping_information/ , there exist three folders ( addresses , clusters , identifiers ). In each folder, add a file named <project_name>.json , if there exist such information for the new ledger (for more details on what type of information each folder corresponds to see the mapping documentation ).","title":"Mapping information"},{"location":"contribute/#parser-and-mapping","text":"If no existing parser can be reused, create a file named <project_name>_parser.py in the directory consensus_decentralization/parsers/ . In this file create a new class, which inherits from the DefaultParser class of default_parser.py . Then, override its parse method in order to implement the new parser (or override another method if there are only small changes needed, e.g. parse_identifiers if the only thing that is different from the default parser is the way identifiers are decoded). If no existing mapping can be reused, create a file named <project_name>_mapping.py in the directory consensus_decentralization/mappings/ . In this file create a new class, which inherits from the DefaultMapping class of default_mapping.py . Then, override its perform_mapping method and/or any other methods that are required (e.g. map_from_known_identifiers ). Then, you should enable support for the new ledger in the parser and mapping module scripts. Specifically: in the script consensus_decentralization/parse.py , import the parser class and assign it to the project's name in the ledger_parser dictionary; in the script consensus_decentralization/map.py , import the mapping class and assign it to the project's name in the dictionary ledger_mapping . Notes : You should add an entry in each dictionary, regardless of whether you use a new or existing parser or mapping \u2013 if no new parser or mapping class was created for the project, simply assign the suitable class (e.g. DefaultParser or DefaultMapping ) to the project's name in the corresponding dictionary. If you create a new parser/mapping, you should also add unit tests here","title":"Parser and mapping"},{"location":"contribute/#documentation","text":"Finally, you should include the new ledger in the documentation pages; specifically: add the ledger in the list of supported ledgers in the repository's main README file add the ledger in the list of supported ledgers in the index documentation page document the new ledger's parser in the corresponding documentation page document how the new ledger's data is retrieved in the corresponding documentation page ; if Google BigQuery is used, add the new query to queries.yaml","title":"Documentation"},{"location":"contribute/#data","text":"You can optionally commit small sample data for the new ledger in the raw_block_data folder. Alternatively, make sure to add your raw data file in the raw_block_data folder before running the tool on the new ledger.","title":"Data"},{"location":"contribute/#update-existing-mapping-information","text":"All mapping data are in the folder mapping_information . To update or add information about a supported ledger's mapping, you should open a Pull Request. This can be done either via console or as follows, via the browser: Open the file that you want to change (e.g., for Bitcoin, follow this link ) on your browser. Click Edit this file . Make your changes in the file. On the bottom, initiate a Pull Request. Write a short and descriptive commit title message (e.g., \"Update 2019 links for company A\"). Select Create a new branch for this commit and start a pull request. In the page that opens, change the PR title (if necessary) and click on Create pull request . When updating the mapping information, the following guidelines should be observed: The link to a pool's website should be active and public. All sources cited should be publicly available and respectable. Unofficial tweets or unavailable or private sources will be rejected.You can use specific keywords, in the cases when the information is available on-chain. Specifically: homepage : this keyword is used in Cardano, to denote that two pools define the same homepage in their metadata (which are published on-chain) Specifically, for legal_links.json : The value of the pool's name (that is the first value in each array entry under a company), should be the same as the value that corresponds to a key name in the ledger-specific pool information, as defined in the corresponding addresses , clusters or identifiers file. If this string is not exactly the same (including capitalization), the link will not be identified during the mapping process. There should exist no time gaps in a pool's ownership structure.","title":"Update existing mapping information"},{"location":"contribute/#add-metrics","text":"To add a new metric, you should do the following steps. First, create a relevant script in the folder consensus_decentralization/metrics . The script should include a function named compute_{metric_name} that, given a dictionary of entities (as keys) to number of blocks (as values), outputs a single value (the outcome of the metric). Second, import this new function to consensus_decentralization/analyze.py . Third, add the name of the metric (which should be the same as the one used in the filename above) and any parameter values it might require to the file config.yaml , under metrics . Fourth, you should add unit tests for the new metric here . Finally, you should update the corresponding documentation page","title":"Add metrics"},{"location":"data/","text":"Data collection Currently, the data for the analysis of the different ledgers is collected through Google BigQuery . Note that when saving results from BigQuery you should select the option \"JSONL (newline delimited)\". Sample data & queries Sample data for all blockchains can be found here . Alternatively, one can retrieve the data directly from BigQuery using the queries below. Bitcoin SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_bitcoin.transactions`.outputs FROM `bigquery-public-data.crypto_bitcoin.transactions` JOIN `bigquery-public-data.crypto_bitcoin.blocks` ON `bigquery-public-data.crypto_bitcoin.transactions`.block_number = `bigquery-public-data.crypto_bitcoin.blocks`.number WHERE is_coinbase is TRUE AND timestamp > '2018-01-01' ORDER BY timestamp Bitcoin Cash SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_bitcoin_cash.transactions`.outputs FROM `bigquery-public-data.crypto_bitcoin_cash.transactions` JOIN `bigquery-public-data.crypto_bitcoin_cash.blocks` ON `bigquery-public-data.crypto_bitcoin_cash.transactions`.block_number = `bigquery-public-data.crypto_bitcoin_cash.blocks`.number WHERE is_coinbase is TRUE AND timestamp > '2018-01-01' ORDER BY timestamp Cardano SELECT `iog-data-analytics.cardano_mainnet.block`.slot_no as number, `iog-data-analytics.cardano_mainnet.pool_offline_data`.ticker_name as identifiers, `iog-data-analytics.cardano_mainnet.block`.block_time as timestamp,`iog-data-analytics.cardano_mainnet.block`.pool_hash as reward_addresses FROM `iog-data-analytics.cardano_mainnet.block` LEFT JOIN `iog-data-analytics.cardano_mainnet.pool_offline_data` ON `iog-data-analytics.cardano_mainnet.block`.pool_hash = `iog-data-analytics.cardano_mainnet.pool_offline_data`.pool_hash WHERE `iog-data-analytics.cardano_mainnet.block`.block_time > '2018-01-01' ORDER BY `iog-data-analytics.cardano_mainnet.block`.block_time Dogecoin SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_dogecoin.transactions`.outputs FROM `bigquery-public-data.crypto_dogecoin.transactions` JOIN `bigquery-public-data.crypto_dogecoin.blocks` ON `bigquery-public-data.crypto_dogecoin.transactions`.block_number = `bigquery-public-data.crypto_dogecoin.blocks`.number WHERE is_coinbase is TRUE AND timestamp > '2018-01-01' ORDER BY timestamp Ethereum SELECT number, timestamp, miner as reward_addresses, extra_data as identifiers FROM `bigquery-public-data.crypto_ethereum.blocks` WHERE timestamp > '2018-01-01' ORDER BY timestamp Litecoin SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_litecoin.transactions`.outputs FROM `bigquery-public-data.crypto_litecoin.transactions` JOIN `bigquery-public-data.crypto_litecoin.blocks` ON `bigquery-public-data.crypto_litecoin.transactions`.block_number = `bigquery-public-data.crypto_litecoin.blocks`.number WHERE is_coinbase is TRUE AND timestamp > '2018-01-01' ORDER BY timestamp Tezos SELECT level as number, timestamp, baker as reward_addresses FROM `public-data-finance.crypto_tezos.blocks` WHERE timestamp > '2018-01-01' ORDER BY timestamp Zcash SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_zcash.transactions`.outputs FROM `bigquery-public-data.crypto_zcash.transactions` JOIN `bigquery-public-data.crypto_zcash.blocks` ON `bigquery-public-data.crypto_zcash.transactions`.block_number = `bigquery-public-data.crypto_zcash.blocks`.number WHERE is_coinbase is TRUE AND timestamp > '2018-01-01' ORDER BY timestamp Automating the data collection process Instead of executing each of these queries separately on the BigQuery console and saving the results manually, it is also possible to automate the process using a script and collect all relevant data in one go. Executing this script will run queries from this file . IMPORTANT: the script uses service account credentials for authentication, therefore before running it, you need to generate the relevant credentials from Google, as described here and save your key in the root directory of the project under the name 'google-service-account-key.json'. There is a sample file that you can consult, which shows what your credentials are supposed to look like (but note that this is for informational purposes only, this file is not used in the code). Once you have set up the credentials, you can just run the following command from the root directory to retrieve data for all supported blockchains: python -m consensus_decentralization.collect_data There are also two command line arguments that can be used to customize the data collection process: ledgers accepts any number of the supported ledgers (case-insensitive). For example, adding --ledgers bitcoin results in collecting data only for Bitcoin, while --ledgers Bitcoin Ethereum Cardano would collect data for Bitcoin, Ethereum and Cardano. If the ledgers argument is omitted, then the default value is used, which is taken from the configuration file and typically corresponds to all supported blockchains. --force-query forces the collection of all raw data files, even if the corresponding files already exist. By default, this flag is set to False and the script only fetches block data for some blockchain if the corresponding file does not already exist.","title":"Data Collection"},{"location":"data/#data-collection","text":"Currently, the data for the analysis of the different ledgers is collected through Google BigQuery . Note that when saving results from BigQuery you should select the option \"JSONL (newline delimited)\".","title":"Data collection"},{"location":"data/#sample-data-queries","text":"Sample data for all blockchains can be found here . Alternatively, one can retrieve the data directly from BigQuery using the queries below.","title":"Sample data &amp; queries"},{"location":"data/#bitcoin","text":"SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_bitcoin.transactions`.outputs FROM `bigquery-public-data.crypto_bitcoin.transactions` JOIN `bigquery-public-data.crypto_bitcoin.blocks` ON `bigquery-public-data.crypto_bitcoin.transactions`.block_number = `bigquery-public-data.crypto_bitcoin.blocks`.number WHERE is_coinbase is TRUE AND timestamp > '2018-01-01' ORDER BY timestamp","title":"Bitcoin"},{"location":"data/#bitcoin-cash","text":"SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_bitcoin_cash.transactions`.outputs FROM `bigquery-public-data.crypto_bitcoin_cash.transactions` JOIN `bigquery-public-data.crypto_bitcoin_cash.blocks` ON `bigquery-public-data.crypto_bitcoin_cash.transactions`.block_number = `bigquery-public-data.crypto_bitcoin_cash.blocks`.number WHERE is_coinbase is TRUE AND timestamp > '2018-01-01' ORDER BY timestamp","title":"Bitcoin Cash"},{"location":"data/#cardano","text":"SELECT `iog-data-analytics.cardano_mainnet.block`.slot_no as number, `iog-data-analytics.cardano_mainnet.pool_offline_data`.ticker_name as identifiers, `iog-data-analytics.cardano_mainnet.block`.block_time as timestamp,`iog-data-analytics.cardano_mainnet.block`.pool_hash as reward_addresses FROM `iog-data-analytics.cardano_mainnet.block` LEFT JOIN `iog-data-analytics.cardano_mainnet.pool_offline_data` ON `iog-data-analytics.cardano_mainnet.block`.pool_hash = `iog-data-analytics.cardano_mainnet.pool_offline_data`.pool_hash WHERE `iog-data-analytics.cardano_mainnet.block`.block_time > '2018-01-01' ORDER BY `iog-data-analytics.cardano_mainnet.block`.block_time","title":"Cardano"},{"location":"data/#dogecoin","text":"SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_dogecoin.transactions`.outputs FROM `bigquery-public-data.crypto_dogecoin.transactions` JOIN `bigquery-public-data.crypto_dogecoin.blocks` ON `bigquery-public-data.crypto_dogecoin.transactions`.block_number = `bigquery-public-data.crypto_dogecoin.blocks`.number WHERE is_coinbase is TRUE AND timestamp > '2018-01-01' ORDER BY timestamp","title":"Dogecoin"},{"location":"data/#ethereum","text":"SELECT number, timestamp, miner as reward_addresses, extra_data as identifiers FROM `bigquery-public-data.crypto_ethereum.blocks` WHERE timestamp > '2018-01-01' ORDER BY timestamp","title":"Ethereum"},{"location":"data/#litecoin","text":"SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_litecoin.transactions`.outputs FROM `bigquery-public-data.crypto_litecoin.transactions` JOIN `bigquery-public-data.crypto_litecoin.blocks` ON `bigquery-public-data.crypto_litecoin.transactions`.block_number = `bigquery-public-data.crypto_litecoin.blocks`.number WHERE is_coinbase is TRUE AND timestamp > '2018-01-01' ORDER BY timestamp","title":"Litecoin"},{"location":"data/#tezos","text":"SELECT level as number, timestamp, baker as reward_addresses FROM `public-data-finance.crypto_tezos.blocks` WHERE timestamp > '2018-01-01' ORDER BY timestamp","title":"Tezos"},{"location":"data/#zcash","text":"SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_zcash.transactions`.outputs FROM `bigquery-public-data.crypto_zcash.transactions` JOIN `bigquery-public-data.crypto_zcash.blocks` ON `bigquery-public-data.crypto_zcash.transactions`.block_number = `bigquery-public-data.crypto_zcash.blocks`.number WHERE is_coinbase is TRUE AND timestamp > '2018-01-01' ORDER BY timestamp","title":"Zcash"},{"location":"data/#automating-the-data-collection-process","text":"Instead of executing each of these queries separately on the BigQuery console and saving the results manually, it is also possible to automate the process using a script and collect all relevant data in one go. Executing this script will run queries from this file . IMPORTANT: the script uses service account credentials for authentication, therefore before running it, you need to generate the relevant credentials from Google, as described here and save your key in the root directory of the project under the name 'google-service-account-key.json'. There is a sample file that you can consult, which shows what your credentials are supposed to look like (but note that this is for informational purposes only, this file is not used in the code). Once you have set up the credentials, you can just run the following command from the root directory to retrieve data for all supported blockchains: python -m consensus_decentralization.collect_data There are also two command line arguments that can be used to customize the data collection process: ledgers accepts any number of the supported ledgers (case-insensitive). For example, adding --ledgers bitcoin results in collecting data only for Bitcoin, while --ledgers Bitcoin Ethereum Cardano would collect data for Bitcoin, Ethereum and Cardano. If the ledgers argument is omitted, then the default value is used, which is taken from the configuration file and typically corresponds to all supported blockchains. --force-query forces the collection of all raw data files, even if the corresponding files already exist. By default, this flag is set to False and the script only fetches block data for some blockchain if the corresponding file does not already exist.","title":"Automating the data collection process"},{"location":"mappings/","text":"Mappings A mapping is responsible for linking blocks to the entities that created them. While the parsed data contains information about the addresses that received rewards for producing some block or identifiers that are related to them, it does not contain information about the entities that control these addresses, which is where the mapping comes in. The mapping takes as input the parsed data and outputs a file ( output/<project_name>/mapped_data.json ), which is structured as follows: [ { \"number\": \"<block's number>\", \"timestamp\": \"<block's timestamp of the form: yyyy-mm-dd hh:mm:ss UTC>\", \"reward_addresses\": \"<address1>,<address2>\" \"creator\": <entity that created the block>, \"mapping_method\": <method used to map the block to its creator> } ] Mapping Information To assist the mapping process, the directory mapping_information/ contains mapping information about the supported projects. There exist three subdirectories and two additional files. In each subdirectory there exists a file for the corresponding ledger data, if such data exists. Identifiers The files under identifiers define information about block creators. Each key corresponds to a tag or ticker, by which the pool is identifiable in its produced blocks. The value for each key is a dictionary of pool-related information, specifically its name, a URL to its homepage, etc. Each file's structure is as follows: { \"P1\": { \"name\": \"Pool P1\", \"homepage\": \"example.com/p1\" }, \"--P2--\": { \"name\": \"Pool P2\", \"homepage\": \"example.com/p2\" } } Clusters The files under clusters define information about pool clusters. This information is organized per cluster. For each cluster, an array of pool-related information is defined. Each item in the array defines the pool's name, the time window during which the pool belonged to the cluster (from the beginning of from until the beginning of to excluding ), and the publicly available source of information, via which the link between the pool and the cluster is established. Each file's structure is as follows: { \"cluster A\": [ {\"name\": \"P1\", \"from\": \"\", \"to\": \"2023\", \"source\": \"example.com/link1\"} ], \"cluster B\": [ {\"name\": \"--P2--\", \"from\": \"\", \"to\": \"\", \"source\": \"example.com/link2\"} ] } Addresses The files under addresses define ownership information about addresses. As with clusters, for each address the pool ownership information defines the pool's name and a public source of information about the ownership. Each file's structure is as follows: { \"address1\": {\"name\": \"Pool P2\", \"source\": \"example.com\"}, } Legal links The file legal_links.json defines legal links between pools and companies, based on off-chain information. For example, it defines ownership information of a pool by a company. The structure of the file is as follows: { \"<parent company>\": [ {\"name\": \"<pool name>\", \"from\": \"<start date>\", \"to\": \"<end date>\", \"source\": \"<source of information>\"} ] } The values for each entry are the same as clusters in the above pool information. Special addresses The file special_addresses.json defines per-project information about addresses that are not related to some entity but are used for protocol-specific reasons (e.g. treasury address). The format of the file is the following: { \"Project A\": [ {\"address\": \"A special address 1\", \"source\": \"some.public.source\"}, {\"address\": \"A special address 2\", \"source\": \"some.public.source\"} ], \"Project B\": [ {\"address\": \"B special address\", \"source\": \"some.public.source\"} ] } Mapping process implementation In our implementation, the mapping of a block uses the auxiliary information as follows. First, it iterates over all known tags and compares each one with the block's identifiers. If the tag is a substring of the parameter, then a match is found. If the first step fails, we compare the block's reward addresses with known pool addresses (including special addresses that exist for some blockchains) and again look for a match. In both cases, if there is a match, then: We map the block to the matched pool. We associate all of the block's reward addresses (that is, the addresses that receive fees from the block) with the matched pool. We record the mapping method that was used to obtain the mapping ( known_identifiers for the first case or known_addresses for the second. In essence, the identifiers are the principal element for mapping a block to an entity and the known addresses are the fallback mechanism. If there is a match, we also parse the auxiliary information, such as pool ownership or clusters, in order to assign the block to the top level entity, e.g., the pool's parent company or cluster. If a match is found this way, we update the mapping method to known_pool_links . If all mechanisms fail, then no match is found. In this case, we assign the reward addresses as the block's entity.","title":"Mappings"},{"location":"mappings/#mappings","text":"A mapping is responsible for linking blocks to the entities that created them. While the parsed data contains information about the addresses that received rewards for producing some block or identifiers that are related to them, it does not contain information about the entities that control these addresses, which is where the mapping comes in. The mapping takes as input the parsed data and outputs a file ( output/<project_name>/mapped_data.json ), which is structured as follows: [ { \"number\": \"<block's number>\", \"timestamp\": \"<block's timestamp of the form: yyyy-mm-dd hh:mm:ss UTC>\", \"reward_addresses\": \"<address1>,<address2>\" \"creator\": <entity that created the block>, \"mapping_method\": <method used to map the block to its creator> } ]","title":"Mappings"},{"location":"mappings/#mapping-information","text":"To assist the mapping process, the directory mapping_information/ contains mapping information about the supported projects. There exist three subdirectories and two additional files. In each subdirectory there exists a file for the corresponding ledger data, if such data exists.","title":"Mapping Information"},{"location":"mappings/#identifiers","text":"The files under identifiers define information about block creators. Each key corresponds to a tag or ticker, by which the pool is identifiable in its produced blocks. The value for each key is a dictionary of pool-related information, specifically its name, a URL to its homepage, etc. Each file's structure is as follows: { \"P1\": { \"name\": \"Pool P1\", \"homepage\": \"example.com/p1\" }, \"--P2--\": { \"name\": \"Pool P2\", \"homepage\": \"example.com/p2\" } }","title":"Identifiers"},{"location":"mappings/#clusters","text":"The files under clusters define information about pool clusters. This information is organized per cluster. For each cluster, an array of pool-related information is defined. Each item in the array defines the pool's name, the time window during which the pool belonged to the cluster (from the beginning of from until the beginning of to excluding ), and the publicly available source of information, via which the link between the pool and the cluster is established. Each file's structure is as follows: { \"cluster A\": [ {\"name\": \"P1\", \"from\": \"\", \"to\": \"2023\", \"source\": \"example.com/link1\"} ], \"cluster B\": [ {\"name\": \"--P2--\", \"from\": \"\", \"to\": \"\", \"source\": \"example.com/link2\"} ] }","title":"Clusters"},{"location":"mappings/#addresses","text":"The files under addresses define ownership information about addresses. As with clusters, for each address the pool ownership information defines the pool's name and a public source of information about the ownership. Each file's structure is as follows: { \"address1\": {\"name\": \"Pool P2\", \"source\": \"example.com\"}, }","title":"Addresses"},{"location":"mappings/#legal-links","text":"The file legal_links.json defines legal links between pools and companies, based on off-chain information. For example, it defines ownership information of a pool by a company. The structure of the file is as follows: { \"<parent company>\": [ {\"name\": \"<pool name>\", \"from\": \"<start date>\", \"to\": \"<end date>\", \"source\": \"<source of information>\"} ] } The values for each entry are the same as clusters in the above pool information.","title":"Legal links"},{"location":"mappings/#special-addresses","text":"The file special_addresses.json defines per-project information about addresses that are not related to some entity but are used for protocol-specific reasons (e.g. treasury address). The format of the file is the following: { \"Project A\": [ {\"address\": \"A special address 1\", \"source\": \"some.public.source\"}, {\"address\": \"A special address 2\", \"source\": \"some.public.source\"} ], \"Project B\": [ {\"address\": \"B special address\", \"source\": \"some.public.source\"} ] }","title":"Special addresses"},{"location":"mappings/#mapping-process-implementation","text":"In our implementation, the mapping of a block uses the auxiliary information as follows. First, it iterates over all known tags and compares each one with the block's identifiers. If the tag is a substring of the parameter, then a match is found. If the first step fails, we compare the block's reward addresses with known pool addresses (including special addresses that exist for some blockchains) and again look for a match. In both cases, if there is a match, then: We map the block to the matched pool. We associate all of the block's reward addresses (that is, the addresses that receive fees from the block) with the matched pool. We record the mapping method that was used to obtain the mapping ( known_identifiers for the first case or known_addresses for the second. In essence, the identifiers are the principal element for mapping a block to an entity and the known addresses are the fallback mechanism. If there is a match, we also parse the auxiliary information, such as pool ownership or clusters, in order to assign the block to the top level entity, e.g., the pool's parent company or cluster. If a match is found this way, we update the mapping method to known_pool_links . If all mechanisms fail, then no match is found. In this case, we assign the reward addresses as the block's entity.","title":"Mapping process implementation"},{"location":"metrics/","text":"Metrics A metric gets the aggregated data (see Aggregator ) and outputs a relevant value. The metrics that have been implemented so far are the following: Nakamoto coefficient : The Nakamoto coefficient represents the minimum number of entities that collectively produce more than 50% of the total blocks within a given timeframe. The output of the metric is an integer. Gini coefficient : The Gini coefficient represents the degree of inequality in block production. The output of the metric is a decimal number in [0,1]. Values close to 0 indicate equality (all entities in the system produce the same number of blocks) and values close to 1 indicate inequality (one entity produces most or all blocks). Entropy : Entropy represents the expected amount of information in the distribution of blocks across entities. The output of the metric is a real number. Typically, a higher value of entropy indicates higher decentralization (lower predictability). Entropy is parameterized by a base rate \u03b1, which defines different types of entropy: \u03b1 = -1: min entropy \u03b1 = 0: Hartley entropy \u03b1 = 1: Shannon entropy (this is used by default) \u03b1 = 2: collision entropy HHI : The Herfindahl-Hirschman Index (HHI) is a measure of market concentration. It is defined as the sum of the squares of the market shares (as whole numbers, e.g. 40 for 40%) of the entities in the system. The output of the metric is a real number in (0, 10000]. Values close to 0 indicate low concentration (many entities produce a similar number of blocks) and values close to 1 indicate high concentration (one entity produces most or all blocks). The U.S. Department of Justice has set the following thresholds for interpreting HHI values (in traditional markets): (0, 1500): Competitive market [1500, 2500]: Moderately concentrated market (2500, 10000]: Highly concentrated market Each metric is implemented in a separate Python script in the folder metrics . Each script defines a function named compute_<metric_name> , which takes as input a dictionary of the form {'<entity name>': <number of resources>} (and possibly other relevant arguments) and outputs the corresponding metric values.","title":"Metrics"},{"location":"metrics/#metrics","text":"A metric gets the aggregated data (see Aggregator ) and outputs a relevant value. The metrics that have been implemented so far are the following: Nakamoto coefficient : The Nakamoto coefficient represents the minimum number of entities that collectively produce more than 50% of the total blocks within a given timeframe. The output of the metric is an integer. Gini coefficient : The Gini coefficient represents the degree of inequality in block production. The output of the metric is a decimal number in [0,1]. Values close to 0 indicate equality (all entities in the system produce the same number of blocks) and values close to 1 indicate inequality (one entity produces most or all blocks). Entropy : Entropy represents the expected amount of information in the distribution of blocks across entities. The output of the metric is a real number. Typically, a higher value of entropy indicates higher decentralization (lower predictability). Entropy is parameterized by a base rate \u03b1, which defines different types of entropy: \u03b1 = -1: min entropy \u03b1 = 0: Hartley entropy \u03b1 = 1: Shannon entropy (this is used by default) \u03b1 = 2: collision entropy HHI : The Herfindahl-Hirschman Index (HHI) is a measure of market concentration. It is defined as the sum of the squares of the market shares (as whole numbers, e.g. 40 for 40%) of the entities in the system. The output of the metric is a real number in (0, 10000]. Values close to 0 indicate low concentration (many entities produce a similar number of blocks) and values close to 1 indicate high concentration (one entity produces most or all blocks). The U.S. Department of Justice has set the following thresholds for interpreting HHI values (in traditional markets): (0, 1500): Competitive market [1500, 2500]: Moderately concentrated market (2500, 10000]: Highly concentrated market Each metric is implemented in a separate Python script in the folder metrics . Each script defines a function named compute_<metric_name> , which takes as input a dictionary of the form {'<entity name>': <number of resources>} (and possibly other relevant arguments) and outputs the corresponding metric values.","title":"Metrics"},{"location":"parsers/","text":"Parsers The parser obtains raw data from a full node (see Data Collection page on how to obtain the required data). It parses the data into a list of entries (dictionaries), each entry corresponding to a block. The input file should be placed in the raw_block_data/ directory and named as <project_name>_raw_data.json . The parsed data is structured as follows: [ { \"number\": \"<block's number>\", \"timestamp\": \"<block's timestamp of the form: yyyy-mm-dd hh:mm:ss UTC>\", \"reward_addresses\": \"<address1>,<address2>\" \"identifiers\": \"<identifiers>\" } ] number and timestamp are consistent among different blockchains. reward_addresses and identifiers vary, depending on each ledger. Specifically, reward_addresses corresponds to: Bitcoin , Bitcoin Cash , Dogecoin , Litecoin , Zcash : a string of comma-separated addresses which appear in the block's coinbase transaction with non-negative value (i.e., which are given part of the block's fees) Ethereum : the block's miner field Cardano : the hash of the pool that created the data, if defined, otherwise the empty string Tezos : the block's baker field The field identifiers corresponds to: Bitcoin , Bitcoin Cash , Dogecoin , Litecoin , Zcash : the field coinbase_param of the block's coinbase transaction Ethereum : the block's extra_data field Cardano : the ticker name of the pool that created the block, if defined, otherwise an empty string Tezos : there is no such field If using BigQuery, the queries for Bitcoin, Bitcoin Cash, Dogecoin, Litecoin, Zcash (see Data Collection ) return data that are parsed with the default_parser module in parsers . The query for Ethereum returns data that is parsed using the ethereum_parser module in parsers . All other queries return data already in the necessary parsed form, so they are parsed using a \"dummy\" parser that only sorts the blocks.","title":"Parsers"},{"location":"parsers/#parsers","text":"The parser obtains raw data from a full node (see Data Collection page on how to obtain the required data). It parses the data into a list of entries (dictionaries), each entry corresponding to a block. The input file should be placed in the raw_block_data/ directory and named as <project_name>_raw_data.json . The parsed data is structured as follows: [ { \"number\": \"<block's number>\", \"timestamp\": \"<block's timestamp of the form: yyyy-mm-dd hh:mm:ss UTC>\", \"reward_addresses\": \"<address1>,<address2>\" \"identifiers\": \"<identifiers>\" } ] number and timestamp are consistent among different blockchains. reward_addresses and identifiers vary, depending on each ledger. Specifically, reward_addresses corresponds to: Bitcoin , Bitcoin Cash , Dogecoin , Litecoin , Zcash : a string of comma-separated addresses which appear in the block's coinbase transaction with non-negative value (i.e., which are given part of the block's fees) Ethereum : the block's miner field Cardano : the hash of the pool that created the data, if defined, otherwise the empty string Tezos : the block's baker field The field identifiers corresponds to: Bitcoin , Bitcoin Cash , Dogecoin , Litecoin , Zcash : the field coinbase_param of the block's coinbase transaction Ethereum : the block's extra_data field Cardano : the ticker name of the pool that created the block, if defined, otherwise an empty string Tezos : there is no such field If using BigQuery, the queries for Bitcoin, Bitcoin Cash, Dogecoin, Litecoin, Zcash (see Data Collection ) return data that are parsed with the default_parser module in parsers . The query for Ethereum returns data that is parsed using the ethereum_parser module in parsers . All other queries return data already in the necessary parsed form, so they are parsed using a \"dummy\" parser that only sorts the blocks.","title":"Parsers"},{"location":"setup/","text":"Setup Installation To install the consensus decentralization analysis tool, simply clone this GitHub repository: git clone https://github.com/Blockchain-Technology-Lab/consensus-decentralization.git The tool is written in Python 3, therefore a Python 3 interpreter is required in order to run it locally. The requirements file lists the dependencies of the project. Make sure you have all of them installed before running the scripts. To install all of them in one go, run the following command from the root directory of the project: python -m pip install -r requirements.txt Execution The consensus decentralization analysis tool is a CLI tool. The run.py script in the root directory of the project invokes the required parsers, mappings and metrics, but it is also possible to execute each module individually. The following process describes the most typical workflow. Place all raw data (which could be collected from BigQuery for example; see Data Collection for more details) in the raw_block_data/ directory, each file named as <project_name>_raw_data.json (e.g., bitcoin_raw_data.json ). By default, there is a (very small) sample input file for some supported projects; to use it, remove the prefix sample_ . Run python run.py --ledgers <ledger_1> <ledger_n> --timeframe <timeframe> to analyze the n specified ledgers for the given timeframe. Both arguments are optional, so it's possible to omit one or both of them; in this case, the default values will be used. Specifically: ledgers accepts any number of the supported ledgers (case-insensitive). For example, --ledgers bitcoin would run the analysis for Bitcoin, while --ledgers Bitcoin Ethereum Cardano would run the analysis for Bitcoin, Ethereum and Cardano. If the ledgers argument is omitted, then all supported ledgers are analyzed. The timeframe argument should be of the form YYYY-MM-DD (month and day can be omitted). For example, --timeframe 2022 would run the analysis for the year 2022, while --timeframe 2022-02 would do it for the month of February 2022 and --timeframe 2022-02-03 would do it for a single day (Feburary 3rd 2022). If the timeframe argument is omitted, then a monthly analysis is performed for each month between January 2010 and the current month or the subset of this time period for which relevant data exists. Additionally, there are three flags that can be used to customize an execution: --force-map forces the parsing, mapping and aggregation to be performed on all data, even if the relevant output files already exist. This can be useful for when mapping info is updated for some blockchain. By default, this flag is set to False and the tool only performs the mapping and aggregation when the relevant output files do not exist. --plot enables the generation of graphs at the end of the execution. Specifically, the output of each implemented metric is plotted for the specified ledgers and timeframe, as well as the block production dynamics for each specified ledger. By default, this flag is set to False and no plots are generated. --animated enables the generation of (additional) animated graphs at the end of the execution. By default, this flag is set to False and no animated plots are generated. Note that this flag is ignored if --plot is set to False. All output files can then be found under the output/ directory, which is automatically created the first time the tool is run.","title":"How to use"},{"location":"setup/#setup","text":"","title":"Setup"},{"location":"setup/#installation","text":"To install the consensus decentralization analysis tool, simply clone this GitHub repository: git clone https://github.com/Blockchain-Technology-Lab/consensus-decentralization.git The tool is written in Python 3, therefore a Python 3 interpreter is required in order to run it locally. The requirements file lists the dependencies of the project. Make sure you have all of them installed before running the scripts. To install all of them in one go, run the following command from the root directory of the project: python -m pip install -r requirements.txt","title":"Installation"},{"location":"setup/#execution","text":"The consensus decentralization analysis tool is a CLI tool. The run.py script in the root directory of the project invokes the required parsers, mappings and metrics, but it is also possible to execute each module individually. The following process describes the most typical workflow. Place all raw data (which could be collected from BigQuery for example; see Data Collection for more details) in the raw_block_data/ directory, each file named as <project_name>_raw_data.json (e.g., bitcoin_raw_data.json ). By default, there is a (very small) sample input file for some supported projects; to use it, remove the prefix sample_ . Run python run.py --ledgers <ledger_1> <ledger_n> --timeframe <timeframe> to analyze the n specified ledgers for the given timeframe. Both arguments are optional, so it's possible to omit one or both of them; in this case, the default values will be used. Specifically: ledgers accepts any number of the supported ledgers (case-insensitive). For example, --ledgers bitcoin would run the analysis for Bitcoin, while --ledgers Bitcoin Ethereum Cardano would run the analysis for Bitcoin, Ethereum and Cardano. If the ledgers argument is omitted, then all supported ledgers are analyzed. The timeframe argument should be of the form YYYY-MM-DD (month and day can be omitted). For example, --timeframe 2022 would run the analysis for the year 2022, while --timeframe 2022-02 would do it for the month of February 2022 and --timeframe 2022-02-03 would do it for a single day (Feburary 3rd 2022). If the timeframe argument is omitted, then a monthly analysis is performed for each month between January 2010 and the current month or the subset of this time period for which relevant data exists. Additionally, there are three flags that can be used to customize an execution: --force-map forces the parsing, mapping and aggregation to be performed on all data, even if the relevant output files already exist. This can be useful for when mapping info is updated for some blockchain. By default, this flag is set to False and the tool only performs the mapping and aggregation when the relevant output files do not exist. --plot enables the generation of graphs at the end of the execution. Specifically, the output of each implemented metric is plotted for the specified ledgers and timeframe, as well as the block production dynamics for each specified ledger. By default, this flag is set to False and no plots are generated. --animated enables the generation of (additional) animated graphs at the end of the execution. By default, this flag is set to False and no animated plots are generated. Note that this flag is ignored if --plot is set to False. All output files can then be found under the output/ directory, which is automatically created the first time the tool is run.","title":"Execution"}]}
\ No newline at end of file
+{"config":{"indexing":"full","lang":["en"],"min_search_length":3,"prebuild_index":false,"separator":"[\\s\\-]+"},"docs":[{"location":"","text":"Consensus Blockchain Decentralization - Documentation This is the documentation for the Consensus Decentralization Analysis tool developed by the University of Edinburgh's Blockchain Technology Lab. The tool is responsible for analyzing the block production of various blockchains and measuring their subsequent levels of decentralization. The relevant source code is available on GitHub . Overview The tool consists of the following modules: Parser Mapping Aggregator Metrics The parser is responsible for pre-processing the raw data that comes from a full node. It produces a file with all the information that is needed for the mapping. The mapping takes the output of the parser, combines it with some other sources of information, and produces a new file that includes attribution data for each block and which mapping method was used to obtain it. The aggregator takes as input the output of the mapping, as well as a time frame to aggregate over and a unit to divide the time frame by (e.g. week or month). It then outputs a file that reveals the distribution of resources to different entities during each time unit under consideration. In this context, \"resources\" correspond to the number of produced blocks. These distributions are then the input for the metrics module, which tracks various decentralization-related metrics and produces files with the results. More details about the different modules can be found in the corresponding Parser , Mapping , Aggregator and Metrics pages. Currently, the supported ledgers are: Bitcoin Bitcoin Cash Cardano Dogecoin Ethereum Litecoin Tezos Zcash We intend to add more ledgers to this list in the future. Contributing This is an open source project licensed under the terms and conditions of the MIT license and CC BY-SA 4.0 . Everyone is welcome to contribute to it by proposing or implementing their ideas. Example contributions include, but are not limited to, reporting potential bugs, supplying useful information for the mappings of supported ledgers, adding support for a new ledger, or making the code more efficient. All contributions to the project will also be covered by the above-mentioned license. When making changes in the code, contributors are required to fork the project's repository first and then issue a pull request with their changes. Each PR will be reviewed before being merged to the main branch. Bugs can be reported in the Issues page. Other comments and ideas can be brought up in the project's Discussions . For more information on how to make specific contributions, see How to Contribute .","title":"Home"},{"location":"#consensus-blockchain-decentralization-documentation","text":"This is the documentation for the Consensus Decentralization Analysis tool developed by the University of Edinburgh's Blockchain Technology Lab. The tool is responsible for analyzing the block production of various blockchains and measuring their subsequent levels of decentralization. The relevant source code is available on GitHub .","title":"Consensus Blockchain Decentralization - Documentation"},{"location":"#overview","text":"The tool consists of the following modules: Parser Mapping Aggregator Metrics The parser is responsible for pre-processing the raw data that comes from a full node. It produces a file with all the information that is needed for the mapping. The mapping takes the output of the parser, combines it with some other sources of information, and produces a new file that includes attribution data for each block and which mapping method was used to obtain it. The aggregator takes as input the output of the mapping, as well as a time frame to aggregate over and a unit to divide the time frame by (e.g. week or month). It then outputs a file that reveals the distribution of resources to different entities during each time unit under consideration. In this context, \"resources\" correspond to the number of produced blocks. These distributions are then the input for the metrics module, which tracks various decentralization-related metrics and produces files with the results. More details about the different modules can be found in the corresponding Parser , Mapping , Aggregator and Metrics pages. Currently, the supported ledgers are: Bitcoin Bitcoin Cash Cardano Dogecoin Ethereum Litecoin Tezos Zcash We intend to add more ledgers to this list in the future.","title":"Overview"},{"location":"#contributing","text":"This is an open source project licensed under the terms and conditions of the MIT license and CC BY-SA 4.0 . Everyone is welcome to contribute to it by proposing or implementing their ideas. Example contributions include, but are not limited to, reporting potential bugs, supplying useful information for the mappings of supported ledgers, adding support for a new ledger, or making the code more efficient. All contributions to the project will also be covered by the above-mentioned license. When making changes in the code, contributors are required to fork the project's repository first and then issue a pull request with their changes. Each PR will be reviewed before being merged to the main branch. Bugs can be reported in the Issues page. Other comments and ideas can be brought up in the project's Discussions . For more information on how to make specific contributions, see How to Contribute .","title":"Contributing"},{"location":"aggregator/","text":"Aggregator The aggregator obtains the mapped data of a ledger (from output/<project_name>/mapped_data.json ) and aggregates it over units of time that are determined based on the given timeframe and aggregate_by parameters. It then outputs a csv file with the distribution of blocks to entities for each time unit under consideration. This file is saved in the directory output/<project name>/blocks_per_entity/ and is named based on the timeframe and aggregate_by parameters. For example, if the specified timeframe is from June 2023 to September 2023 and the aggregation is by month, then the output file would be named monthly_from_2023-06-01_to_2023-09-30.csv and would be structured as follows: Entity \\ Time period,Jun-2023,Jul-2023,Aug-2023,Sep-2023 <name of entity 1>,<number of blocks produced by entity 1 in June 2023>,<number of blocks produced by entity 1 in July 2023>,<number of blocks produced by entity 1 in August 2023>,<number of blocks produced by entity 1 in September 2023> <name of entity 2>,<number of blocks produced by entity 2 in June 2023>,<number of blocks produced by entity 2 in July 2023>,<number of blocks produced by entity 2 in August 2023>,<number of blocks produced by entity 2 in September 2023> Therefore, the file will have as many rows as the number of entities that have produced blocks in the given timeframe (+ 1 for the header) and as many columns as the number of time units in the given timeframe (+ 1 for the entity names).","title":"Aggregator"},{"location":"aggregator/#aggregator","text":"The aggregator obtains the mapped data of a ledger (from output/<project_name>/mapped_data.json ) and aggregates it over units of time that are determined based on the given timeframe and aggregate_by parameters. It then outputs a csv file with the distribution of blocks to entities for each time unit under consideration. This file is saved in the directory output/<project name>/blocks_per_entity/ and is named based on the timeframe and aggregate_by parameters. For example, if the specified timeframe is from June 2023 to September 2023 and the aggregation is by month, then the output file would be named monthly_from_2023-06-01_to_2023-09-30.csv and would be structured as follows: Entity \\ Time period,Jun-2023,Jul-2023,Aug-2023,Sep-2023 <name of entity 1>,<number of blocks produced by entity 1 in June 2023>,<number of blocks produced by entity 1 in July 2023>,<number of blocks produced by entity 1 in August 2023>,<number of blocks produced by entity 1 in September 2023> <name of entity 2>,<number of blocks produced by entity 2 in June 2023>,<number of blocks produced by entity 2 in July 2023>,<number of blocks produced by entity 2 in August 2023>,<number of blocks produced by entity 2 in September 2023> Therefore, the file will have as many rows as the number of entities that have produced blocks in the given timeframe (+ 1 for the header) and as many columns as the number of time units in the given timeframe (+ 1 for the entity names).","title":"Aggregator"},{"location":"contribute/","text":"How to contribute You can contribute to the tool by adding support for a ledger, updating the mapping process for an existing ledger, or adding a new metric. In all cases, the information should be submitted via a GitHub PR. Add support for ledgers You can add support for a ledger that is not already supported as follows. Mapping information In the directory mapping_information/ , there exist three folders ( addresses , clusters , identifiers ). In each folder, add a file named <project_name>.json , if there exist such information for the new ledger (for more details on what type of information each folder corresponds to see the mapping documentation ). Parser and mapping If no existing parser can be reused, create a file named <project_name>_parser.py in the directory consensus_decentralization/parsers/ . In this file create a new class, which inherits from the DefaultParser class of default_parser.py . Then, override its parse method in order to implement the new parser (or override another method if there are only small changes needed, e.g. parse_identifiers if the only thing that is different from the default parser is the way identifiers are decoded). If no existing mapping can be reused, create a file named <project_name>_mapping.py in the directory consensus_decentralization/mappings/ . In this file create a new class, which inherits from the DefaultMapping class of default_mapping.py . Then, override its perform_mapping method and/or any other methods that are required (e.g. map_from_known_identifiers ). Then, you should enable support for the new ledger in the parser and mapping module scripts. Specifically: in the script consensus_decentralization/parse.py , import the parser class and assign it to the project's name in the ledger_parser dictionary; in the script consensus_decentralization/map.py , import the mapping class and assign it to the project's name in the dictionary ledger_mapping . Notes : You should add an entry in each dictionary, regardless of whether you use a new or existing parser or mapping \u2013 if no new parser or mapping class was created for the project, simply assign the suitable class (e.g. DefaultParser or DefaultMapping ) to the project's name in the corresponding dictionary. If you create a new parser/mapping, you should also add unit tests here Documentation Finally, you should include the new ledger in the documentation pages; specifically: add the ledger in the list of supported ledgers in the repository's main README file add the ledger in the list of supported ledgers in the index documentation page document the new ledger's parser in the corresponding documentation page document how the new ledger's data is retrieved in the corresponding documentation page ; if Google BigQuery is used, add the new query to queries.yaml Data You can optionally commit small sample data for the new ledger in the raw_block_data folder. Alternatively, make sure to add your raw data file in the raw_block_data folder before running the tool on the new ledger. Update existing mapping information All mapping data are in the folder mapping_information . To update or add information about a supported ledger's mapping, you should open a Pull Request. This can be done either via console or as follows, via the browser: Open the file that you want to change (e.g., for Bitcoin, follow this link ) on your browser. Click Edit this file . Make your changes in the file. On the bottom, initiate a Pull Request. Write a short and descriptive commit title message (e.g., \"Update 2019 links for company A\"). Select Create a new branch for this commit and start a pull request. In the page that opens, change the PR title (if necessary) and click on Create pull request . When updating the mapping information, the following guidelines should be observed: The link to a pool's website should be active and public. All sources cited should be publicly available and respectable. Unofficial tweets or unavailable or private sources will be rejected.You can use specific keywords, in the cases when the information is available on-chain. Specifically: homepage : this keyword is used in Cardano, to denote that two pools define the same homepage in their metadata (which are published on-chain) Specifically, for legal_links.json : The value of the pool's name (that is the first value in each array entry under a company), should be the same as the value that corresponds to a key name in the ledger-specific pool information, as defined in the corresponding addresses , clusters or identifiers file. If this string is not exactly the same (including capitalization), the link will not be identified during the mapping process. There should exist no time gaps in a pool's ownership structure. Add metrics To add a new metric, you should do the following steps. First, create a relevant script in the folder consensus_decentralization/metrics . The script should include a function named compute_{metric_name} that, given a dictionary of entities (as keys) to number of blocks (as values), outputs a single value (the outcome of the metric). Second, import this new function to consensus_decentralization/analyze.py . Third, add the name of the metric (which should be the same as the one used in the filename above) and any parameter values it might require to the file config.yaml , under metrics . Fourth, you should add unit tests for the new metric here . Finally, you should update the corresponding documentation page","title":"How to contribute"},{"location":"contribute/#how-to-contribute","text":"You can contribute to the tool by adding support for a ledger, updating the mapping process for an existing ledger, or adding a new metric. In all cases, the information should be submitted via a GitHub PR.","title":"How to contribute"},{"location":"contribute/#add-support-for-ledgers","text":"You can add support for a ledger that is not already supported as follows.","title":"Add support for ledgers"},{"location":"contribute/#mapping-information","text":"In the directory mapping_information/ , there exist three folders ( addresses , clusters , identifiers ). In each folder, add a file named <project_name>.json , if there exist such information for the new ledger (for more details on what type of information each folder corresponds to see the mapping documentation ).","title":"Mapping information"},{"location":"contribute/#parser-and-mapping","text":"If no existing parser can be reused, create a file named <project_name>_parser.py in the directory consensus_decentralization/parsers/ . In this file create a new class, which inherits from the DefaultParser class of default_parser.py . Then, override its parse method in order to implement the new parser (or override another method if there are only small changes needed, e.g. parse_identifiers if the only thing that is different from the default parser is the way identifiers are decoded). If no existing mapping can be reused, create a file named <project_name>_mapping.py in the directory consensus_decentralization/mappings/ . In this file create a new class, which inherits from the DefaultMapping class of default_mapping.py . Then, override its perform_mapping method and/or any other methods that are required (e.g. map_from_known_identifiers ). Then, you should enable support for the new ledger in the parser and mapping module scripts. Specifically: in the script consensus_decentralization/parse.py , import the parser class and assign it to the project's name in the ledger_parser dictionary; in the script consensus_decentralization/map.py , import the mapping class and assign it to the project's name in the dictionary ledger_mapping . Notes : You should add an entry in each dictionary, regardless of whether you use a new or existing parser or mapping \u2013 if no new parser or mapping class was created for the project, simply assign the suitable class (e.g. DefaultParser or DefaultMapping ) to the project's name in the corresponding dictionary. If you create a new parser/mapping, you should also add unit tests here","title":"Parser and mapping"},{"location":"contribute/#documentation","text":"Finally, you should include the new ledger in the documentation pages; specifically: add the ledger in the list of supported ledgers in the repository's main README file add the ledger in the list of supported ledgers in the index documentation page document the new ledger's parser in the corresponding documentation page document how the new ledger's data is retrieved in the corresponding documentation page ; if Google BigQuery is used, add the new query to queries.yaml","title":"Documentation"},{"location":"contribute/#data","text":"You can optionally commit small sample data for the new ledger in the raw_block_data folder. Alternatively, make sure to add your raw data file in the raw_block_data folder before running the tool on the new ledger.","title":"Data"},{"location":"contribute/#update-existing-mapping-information","text":"All mapping data are in the folder mapping_information . To update or add information about a supported ledger's mapping, you should open a Pull Request. This can be done either via console or as follows, via the browser: Open the file that you want to change (e.g., for Bitcoin, follow this link ) on your browser. Click Edit this file . Make your changes in the file. On the bottom, initiate a Pull Request. Write a short and descriptive commit title message (e.g., \"Update 2019 links for company A\"). Select Create a new branch for this commit and start a pull request. In the page that opens, change the PR title (if necessary) and click on Create pull request . When updating the mapping information, the following guidelines should be observed: The link to a pool's website should be active and public. All sources cited should be publicly available and respectable. Unofficial tweets or unavailable or private sources will be rejected.You can use specific keywords, in the cases when the information is available on-chain. Specifically: homepage : this keyword is used in Cardano, to denote that two pools define the same homepage in their metadata (which are published on-chain) Specifically, for legal_links.json : The value of the pool's name (that is the first value in each array entry under a company), should be the same as the value that corresponds to a key name in the ledger-specific pool information, as defined in the corresponding addresses , clusters or identifiers file. If this string is not exactly the same (including capitalization), the link will not be identified during the mapping process. There should exist no time gaps in a pool's ownership structure.","title":"Update existing mapping information"},{"location":"contribute/#add-metrics","text":"To add a new metric, you should do the following steps. First, create a relevant script in the folder consensus_decentralization/metrics . The script should include a function named compute_{metric_name} that, given a dictionary of entities (as keys) to number of blocks (as values), outputs a single value (the outcome of the metric). Second, import this new function to consensus_decentralization/analyze.py . Third, add the name of the metric (which should be the same as the one used in the filename above) and any parameter values it might require to the file config.yaml , under metrics . Fourth, you should add unit tests for the new metric here . Finally, you should update the corresponding documentation page","title":"Add metrics"},{"location":"data/","text":"Data collection Currently, the data for the analysis of the different ledgers is collected through Google BigQuery . Note that when saving results from BigQuery you should select the option \"JSONL (newline delimited)\". Sample data & queries Sample data for all blockchains can be found here . Alternatively, one can retrieve the data directly from BigQuery using the queries below. Bitcoin SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_bitcoin.transactions`.outputs FROM `bigquery-public-data.crypto_bitcoin.transactions` JOIN `bigquery-public-data.crypto_bitcoin.blocks` ON `bigquery-public-data.crypto_bitcoin.transactions`.block_number = `bigquery-public-data.crypto_bitcoin.blocks`.number WHERE is_coinbase is TRUE AND timestamp > '2018-01-01' ORDER BY timestamp Bitcoin Cash SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_bitcoin_cash.transactions`.outputs FROM `bigquery-public-data.crypto_bitcoin_cash.transactions` JOIN `bigquery-public-data.crypto_bitcoin_cash.blocks` ON `bigquery-public-data.crypto_bitcoin_cash.transactions`.block_number = `bigquery-public-data.crypto_bitcoin_cash.blocks`.number WHERE is_coinbase is TRUE AND timestamp > '2018-01-01' ORDER BY timestamp Cardano SELECT `iog-data-analytics.cardano_mainnet.block`.slot_no as number, `iog-data-analytics.cardano_mainnet.pool_offline_data`.ticker_name as identifiers, `iog-data-analytics.cardano_mainnet.block`.block_time as timestamp,`iog-data-analytics.cardano_mainnet.block`.pool_hash as reward_addresses FROM `iog-data-analytics.cardano_mainnet.block` LEFT JOIN `iog-data-analytics.cardano_mainnet.pool_offline_data` ON `iog-data-analytics.cardano_mainnet.block`.pool_hash = `iog-data-analytics.cardano_mainnet.pool_offline_data`.pool_hash WHERE `iog-data-analytics.cardano_mainnet.block`.block_time > '2018-01-01' ORDER BY `iog-data-analytics.cardano_mainnet.block`.block_time Dogecoin SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_dogecoin.transactions`.outputs FROM `bigquery-public-data.crypto_dogecoin.transactions` JOIN `bigquery-public-data.crypto_dogecoin.blocks` ON `bigquery-public-data.crypto_dogecoin.transactions`.block_number = `bigquery-public-data.crypto_dogecoin.blocks`.number WHERE is_coinbase is TRUE AND timestamp > '2018-01-01' ORDER BY timestamp Ethereum SELECT number, timestamp, miner as reward_addresses, extra_data as identifiers FROM `bigquery-public-data.crypto_ethereum.blocks` WHERE timestamp > '2018-01-01' ORDER BY timestamp Litecoin SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_litecoin.transactions`.outputs FROM `bigquery-public-data.crypto_litecoin.transactions` JOIN `bigquery-public-data.crypto_litecoin.blocks` ON `bigquery-public-data.crypto_litecoin.transactions`.block_number = `bigquery-public-data.crypto_litecoin.blocks`.number WHERE is_coinbase is TRUE AND timestamp > '2018-01-01' ORDER BY timestamp Tezos SELECT level as number, timestamp, baker as reward_addresses FROM `public-data-finance.crypto_tezos.blocks` WHERE timestamp > '2018-01-01' ORDER BY timestamp Zcash SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_zcash.transactions`.outputs FROM `bigquery-public-data.crypto_zcash.transactions` JOIN `bigquery-public-data.crypto_zcash.blocks` ON `bigquery-public-data.crypto_zcash.transactions`.block_number = `bigquery-public-data.crypto_zcash.blocks`.number WHERE is_coinbase is TRUE AND timestamp > '2018-01-01' ORDER BY timestamp Automating the data collection process Instead of executing each of these queries separately on the BigQuery console and saving the results manually, it is also possible to automate the process using a script and collect all relevant data in one go. Executing this script will run queries from this file . IMPORTANT: the script uses service account credentials for authentication, therefore before running it, you need to generate the relevant credentials from Google, as described here and save your key in the root directory of the project under the name 'google-service-account-key.json'. There is a sample file that you can consult, which shows what your credentials are supposed to look like (but note that this is for informational purposes only, this file is not used in the code). Once you have set up the credentials, you can just run the following command from the root directory to retrieve data for all supported blockchains: python -m consensus_decentralization.collect_data There are also two command line arguments that can be used to customize the data collection process: ledgers accepts any number of the supported ledgers (case-insensitive). For example, adding --ledgers bitcoin results in collecting data only for Bitcoin, while --ledgers Bitcoin Ethereum Cardano would collect data for Bitcoin, Ethereum and Cardano. If the ledgers argument is omitted, then the default value is used, which is taken from the configuration file and typically corresponds to all supported blockchains. --force-query forces the collection of all raw data files, even if the corresponding files already exist. By default, this flag is set to False and the script only fetches block data for some blockchain if the corresponding file does not already exist.","title":"Data Collection"},{"location":"data/#data-collection","text":"Currently, the data for the analysis of the different ledgers is collected through Google BigQuery . Note that when saving results from BigQuery you should select the option \"JSONL (newline delimited)\".","title":"Data collection"},{"location":"data/#sample-data-queries","text":"Sample data for all blockchains can be found here . Alternatively, one can retrieve the data directly from BigQuery using the queries below.","title":"Sample data &amp; queries"},{"location":"data/#bitcoin","text":"SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_bitcoin.transactions`.outputs FROM `bigquery-public-data.crypto_bitcoin.transactions` JOIN `bigquery-public-data.crypto_bitcoin.blocks` ON `bigquery-public-data.crypto_bitcoin.transactions`.block_number = `bigquery-public-data.crypto_bitcoin.blocks`.number WHERE is_coinbase is TRUE AND timestamp > '2018-01-01' ORDER BY timestamp","title":"Bitcoin"},{"location":"data/#bitcoin-cash","text":"SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_bitcoin_cash.transactions`.outputs FROM `bigquery-public-data.crypto_bitcoin_cash.transactions` JOIN `bigquery-public-data.crypto_bitcoin_cash.blocks` ON `bigquery-public-data.crypto_bitcoin_cash.transactions`.block_number = `bigquery-public-data.crypto_bitcoin_cash.blocks`.number WHERE is_coinbase is TRUE AND timestamp > '2018-01-01' ORDER BY timestamp","title":"Bitcoin Cash"},{"location":"data/#cardano","text":"SELECT `iog-data-analytics.cardano_mainnet.block`.slot_no as number, `iog-data-analytics.cardano_mainnet.pool_offline_data`.ticker_name as identifiers, `iog-data-analytics.cardano_mainnet.block`.block_time as timestamp,`iog-data-analytics.cardano_mainnet.block`.pool_hash as reward_addresses FROM `iog-data-analytics.cardano_mainnet.block` LEFT JOIN `iog-data-analytics.cardano_mainnet.pool_offline_data` ON `iog-data-analytics.cardano_mainnet.block`.pool_hash = `iog-data-analytics.cardano_mainnet.pool_offline_data`.pool_hash WHERE `iog-data-analytics.cardano_mainnet.block`.block_time > '2018-01-01' ORDER BY `iog-data-analytics.cardano_mainnet.block`.block_time","title":"Cardano"},{"location":"data/#dogecoin","text":"SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_dogecoin.transactions`.outputs FROM `bigquery-public-data.crypto_dogecoin.transactions` JOIN `bigquery-public-data.crypto_dogecoin.blocks` ON `bigquery-public-data.crypto_dogecoin.transactions`.block_number = `bigquery-public-data.crypto_dogecoin.blocks`.number WHERE is_coinbase is TRUE AND timestamp > '2018-01-01' ORDER BY timestamp","title":"Dogecoin"},{"location":"data/#ethereum","text":"SELECT number, timestamp, miner as reward_addresses, extra_data as identifiers FROM `bigquery-public-data.crypto_ethereum.blocks` WHERE timestamp > '2018-01-01' ORDER BY timestamp","title":"Ethereum"},{"location":"data/#litecoin","text":"SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_litecoin.transactions`.outputs FROM `bigquery-public-data.crypto_litecoin.transactions` JOIN `bigquery-public-data.crypto_litecoin.blocks` ON `bigquery-public-data.crypto_litecoin.transactions`.block_number = `bigquery-public-data.crypto_litecoin.blocks`.number WHERE is_coinbase is TRUE AND timestamp > '2018-01-01' ORDER BY timestamp","title":"Litecoin"},{"location":"data/#tezos","text":"SELECT level as number, timestamp, baker as reward_addresses FROM `public-data-finance.crypto_tezos.blocks` WHERE timestamp > '2018-01-01' ORDER BY timestamp","title":"Tezos"},{"location":"data/#zcash","text":"SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_zcash.transactions`.outputs FROM `bigquery-public-data.crypto_zcash.transactions` JOIN `bigquery-public-data.crypto_zcash.blocks` ON `bigquery-public-data.crypto_zcash.transactions`.block_number = `bigquery-public-data.crypto_zcash.blocks`.number WHERE is_coinbase is TRUE AND timestamp > '2018-01-01' ORDER BY timestamp","title":"Zcash"},{"location":"data/#automating-the-data-collection-process","text":"Instead of executing each of these queries separately on the BigQuery console and saving the results manually, it is also possible to automate the process using a script and collect all relevant data in one go. Executing this script will run queries from this file . IMPORTANT: the script uses service account credentials for authentication, therefore before running it, you need to generate the relevant credentials from Google, as described here and save your key in the root directory of the project under the name 'google-service-account-key.json'. There is a sample file that you can consult, which shows what your credentials are supposed to look like (but note that this is for informational purposes only, this file is not used in the code). Once you have set up the credentials, you can just run the following command from the root directory to retrieve data for all supported blockchains: python -m consensus_decentralization.collect_data There are also two command line arguments that can be used to customize the data collection process: ledgers accepts any number of the supported ledgers (case-insensitive). For example, adding --ledgers bitcoin results in collecting data only for Bitcoin, while --ledgers Bitcoin Ethereum Cardano would collect data for Bitcoin, Ethereum and Cardano. If the ledgers argument is omitted, then the default value is used, which is taken from the configuration file and typically corresponds to all supported blockchains. --force-query forces the collection of all raw data files, even if the corresponding files already exist. By default, this flag is set to False and the script only fetches block data for some blockchain if the corresponding file does not already exist.","title":"Automating the data collection process"},{"location":"mappings/","text":"Mappings A mapping is responsible for linking blocks to the entities that created them. While the parsed data contains information about the addresses that received rewards for producing some block or identifiers that are related to them, it does not contain information about the entities that control these addresses, which is where the mapping comes in. The mapping takes as input the parsed data and outputs a file ( output/<project_name>/mapped_data.json ), which is structured as follows: [ { \"number\": \"<block's number>\", \"timestamp\": \"<block's timestamp of the form: yyyy-mm-dd hh:mm:ss UTC>\", \"reward_addresses\": \"<address1>,<address2>\" \"creator\": <entity that created the block>, \"mapping_method\": <method used to map the block to its creator> } ] Mapping Information To assist the mapping process, the directory mapping_information/ contains mapping information about the supported projects. There exist three subdirectories and two additional files. In each subdirectory there exists a file for the corresponding ledger data, if such data exists. Identifiers The files under identifiers define information about block creators. Each key corresponds to a tag or ticker, by which the pool is identifiable in its produced blocks. The value for each key is a dictionary of pool-related information, specifically its name, a URL to its homepage, etc. Each file's structure is as follows: { \"P1\": { \"name\": \"Pool P1\", \"homepage\": \"example.com/p1\" }, \"--P2--\": { \"name\": \"Pool P2\", \"homepage\": \"example.com/p2\" } } Clusters The files under clusters define information about pool clusters. This information is organized per cluster. For each cluster, an array of pool-related information is defined. Each item in the array defines the pool's name, the time window during which the pool belonged to the cluster (from the beginning of from until the beginning of to excluding ), and the publicly available source of information, via which the link between the pool and the cluster is established. Each file's structure is as follows: { \"cluster A\": [ {\"name\": \"P1\", \"from\": \"\", \"to\": \"2023\", \"source\": \"example.com/link1\"} ], \"cluster B\": [ {\"name\": \"--P2--\", \"from\": \"\", \"to\": \"\", \"source\": \"example.com/link2\"} ] } Addresses The files under addresses define ownership information about addresses. As with clusters, for each address the pool ownership information defines the pool's name and a public source of information about the ownership. Each file's structure is as follows: { \"address1\": {\"name\": \"Pool P2\", \"source\": \"example.com\"}, } Legal links The file legal_links.json defines legal links between pools and companies, based on off-chain information. For example, it defines ownership information of a pool by a company. The structure of the file is as follows: { \"<parent company>\": [ {\"name\": \"<pool name>\", \"from\": \"<start date>\", \"to\": \"<end date>\", \"source\": \"<source of information>\"} ] } The values for each entry are the same as clusters in the above pool information. Special addresses The file special_addresses.json defines per-project information about addresses that are not related to some entity but are used for protocol-specific reasons (e.g. treasury address). The format of the file is the following: { \"Project A\": [ {\"address\": \"A special address 1\", \"source\": \"some.public.source\"}, {\"address\": \"A special address 2\", \"source\": \"some.public.source\"} ], \"Project B\": [ {\"address\": \"B special address\", \"source\": \"some.public.source\"} ] } Mapping process implementation In our implementation, the mapping of a block uses the auxiliary information as follows. First, it iterates over all known tags and compares each one with the block's identifiers. If the tag is a substring of the parameter, then a match is found. If the first step fails, we compare the block's reward addresses with known pool addresses (including special addresses that exist for some blockchains) and again look for a match. In both cases, if there is a match, then: We map the block to the matched pool. We associate all of the block's reward addresses (that is, the addresses that receive fees from the block) with the matched pool. We record the mapping method that was used to obtain the mapping ( known_identifiers for the first case or known_addresses for the second. In essence, the identifiers are the principal element for mapping a block to an entity and the known addresses are the fallback mechanism. If there is a match, we also parse the auxiliary information, such as pool ownership or clusters, in order to assign the block to the top level entity, e.g., the pool's parent company or cluster. If a match is found this way, we update the mapping method to known_pool_links . If all mechanisms fail, then no match is found. In this case, we assign the reward addresses as the block's entity.","title":"Mappings"},{"location":"mappings/#mappings","text":"A mapping is responsible for linking blocks to the entities that created them. While the parsed data contains information about the addresses that received rewards for producing some block or identifiers that are related to them, it does not contain information about the entities that control these addresses, which is where the mapping comes in. The mapping takes as input the parsed data and outputs a file ( output/<project_name>/mapped_data.json ), which is structured as follows: [ { \"number\": \"<block's number>\", \"timestamp\": \"<block's timestamp of the form: yyyy-mm-dd hh:mm:ss UTC>\", \"reward_addresses\": \"<address1>,<address2>\" \"creator\": <entity that created the block>, \"mapping_method\": <method used to map the block to its creator> } ]","title":"Mappings"},{"location":"mappings/#mapping-information","text":"To assist the mapping process, the directory mapping_information/ contains mapping information about the supported projects. There exist three subdirectories and two additional files. In each subdirectory there exists a file for the corresponding ledger data, if such data exists.","title":"Mapping Information"},{"location":"mappings/#identifiers","text":"The files under identifiers define information about block creators. Each key corresponds to a tag or ticker, by which the pool is identifiable in its produced blocks. The value for each key is a dictionary of pool-related information, specifically its name, a URL to its homepage, etc. Each file's structure is as follows: { \"P1\": { \"name\": \"Pool P1\", \"homepage\": \"example.com/p1\" }, \"--P2--\": { \"name\": \"Pool P2\", \"homepage\": \"example.com/p2\" } }","title":"Identifiers"},{"location":"mappings/#clusters","text":"The files under clusters define information about pool clusters. This information is organized per cluster. For each cluster, an array of pool-related information is defined. Each item in the array defines the pool's name, the time window during which the pool belonged to the cluster (from the beginning of from until the beginning of to excluding ), and the publicly available source of information, via which the link between the pool and the cluster is established. Each file's structure is as follows: { \"cluster A\": [ {\"name\": \"P1\", \"from\": \"\", \"to\": \"2023\", \"source\": \"example.com/link1\"} ], \"cluster B\": [ {\"name\": \"--P2--\", \"from\": \"\", \"to\": \"\", \"source\": \"example.com/link2\"} ] }","title":"Clusters"},{"location":"mappings/#addresses","text":"The files under addresses define ownership information about addresses. As with clusters, for each address the pool ownership information defines the pool's name and a public source of information about the ownership. Each file's structure is as follows: { \"address1\": {\"name\": \"Pool P2\", \"source\": \"example.com\"}, }","title":"Addresses"},{"location":"mappings/#legal-links","text":"The file legal_links.json defines legal links between pools and companies, based on off-chain information. For example, it defines ownership information of a pool by a company. The structure of the file is as follows: { \"<parent company>\": [ {\"name\": \"<pool name>\", \"from\": \"<start date>\", \"to\": \"<end date>\", \"source\": \"<source of information>\"} ] } The values for each entry are the same as clusters in the above pool information.","title":"Legal links"},{"location":"mappings/#special-addresses","text":"The file special_addresses.json defines per-project information about addresses that are not related to some entity but are used for protocol-specific reasons (e.g. treasury address). The format of the file is the following: { \"Project A\": [ {\"address\": \"A special address 1\", \"source\": \"some.public.source\"}, {\"address\": \"A special address 2\", \"source\": \"some.public.source\"} ], \"Project B\": [ {\"address\": \"B special address\", \"source\": \"some.public.source\"} ] }","title":"Special addresses"},{"location":"mappings/#mapping-process-implementation","text":"In our implementation, the mapping of a block uses the auxiliary information as follows. First, it iterates over all known tags and compares each one with the block's identifiers. If the tag is a substring of the parameter, then a match is found. If the first step fails, we compare the block's reward addresses with known pool addresses (including special addresses that exist for some blockchains) and again look for a match. In both cases, if there is a match, then: We map the block to the matched pool. We associate all of the block's reward addresses (that is, the addresses that receive fees from the block) with the matched pool. We record the mapping method that was used to obtain the mapping ( known_identifiers for the first case or known_addresses for the second. In essence, the identifiers are the principal element for mapping a block to an entity and the known addresses are the fallback mechanism. If there is a match, we also parse the auxiliary information, such as pool ownership or clusters, in order to assign the block to the top level entity, e.g., the pool's parent company or cluster. If a match is found this way, we update the mapping method to known_pool_links . If all mechanisms fail, then no match is found. In this case, we assign the reward addresses as the block's entity.","title":"Mapping process implementation"},{"location":"metrics/","text":"Metrics A metric gets the aggregated data (see Aggregator ) and outputs a relevant value. The metrics that have been implemented so far are the following: Nakamoto coefficient : The Nakamoto coefficient represents the minimum number of entities that collectively produce more than 50% of the total blocks within a given timeframe. The output of the metric is an integer. Gini coefficient : The Gini coefficient represents the degree of inequality in block production. The output of the metric is a decimal number in [0,1]. Values close to 0 indicate equality (all entities in the system produce the same number of blocks) and values close to 1 indicate inequality (one entity produces most or all blocks). Entropy : Entropy represents the expected amount of information in the distribution of blocks across entities. The output of the metric is a real number. Typically, a higher value of entropy indicates higher decentralization (lower predictability). Entropy is parameterized by a base rate \u03b1, which defines different types of entropy: \u03b1 = -1: min entropy \u03b1 = 0: Hartley entropy \u03b1 = 1: Shannon entropy (this is used by default) \u03b1 = 2: collision entropy HHI : The Herfindahl-Hirschman Index (HHI) is a measure of market concentration. It is defined as the sum of the squares of the market shares (as whole numbers, e.g. 40 for 40%) of the entities in the system. The output of the metric is a real number in (0, 10000]. Values close to 0 indicate low concentration (many entities produce a similar number of blocks) and values close to 1 indicate high concentration (one entity produces most or all blocks). The U.S. Department of Justice has set the following thresholds for interpreting HHI values (in traditional markets): (0, 1500): Competitive market [1500, 2500]: Moderately concentrated market (2500, 10000]: Highly concentrated market Each metric is implemented in a separate Python script in the folder metrics . Each script defines a function named compute_<metric_name> , which takes as input a dictionary of the form {'<entity name>': <number of resources>} (and possibly other relevant arguments) and outputs the corresponding metric values.","title":"Metrics"},{"location":"metrics/#metrics","text":"A metric gets the aggregated data (see Aggregator ) and outputs a relevant value. The metrics that have been implemented so far are the following: Nakamoto coefficient : The Nakamoto coefficient represents the minimum number of entities that collectively produce more than 50% of the total blocks within a given timeframe. The output of the metric is an integer. Gini coefficient : The Gini coefficient represents the degree of inequality in block production. The output of the metric is a decimal number in [0,1]. Values close to 0 indicate equality (all entities in the system produce the same number of blocks) and values close to 1 indicate inequality (one entity produces most or all blocks). Entropy : Entropy represents the expected amount of information in the distribution of blocks across entities. The output of the metric is a real number. Typically, a higher value of entropy indicates higher decentralization (lower predictability). Entropy is parameterized by a base rate \u03b1, which defines different types of entropy: \u03b1 = -1: min entropy \u03b1 = 0: Hartley entropy \u03b1 = 1: Shannon entropy (this is used by default) \u03b1 = 2: collision entropy HHI : The Herfindahl-Hirschman Index (HHI) is a measure of market concentration. It is defined as the sum of the squares of the market shares (as whole numbers, e.g. 40 for 40%) of the entities in the system. The output of the metric is a real number in (0, 10000]. Values close to 0 indicate low concentration (many entities produce a similar number of blocks) and values close to 1 indicate high concentration (one entity produces most or all blocks). The U.S. Department of Justice has set the following thresholds for interpreting HHI values (in traditional markets): (0, 1500): Competitive market [1500, 2500]: Moderately concentrated market (2500, 10000]: Highly concentrated market Each metric is implemented in a separate Python script in the folder metrics . Each script defines a function named compute_<metric_name> , which takes as input a dictionary of the form {'<entity name>': <number of resources>} (and possibly other relevant arguments) and outputs the corresponding metric values.","title":"Metrics"},{"location":"parsers/","text":"Parsers The parser obtains raw data from a full node (see Data Collection page on how to obtain the required data). It parses the data into a list of entries (dictionaries), each entry corresponding to a block. The input file should be placed in the raw_block_data/ directory and named as <project_name>_raw_data.json . The parsed data is structured as follows: [ { \"number\": \"<block's number>\", \"timestamp\": \"<block's timestamp of the form: yyyy-mm-dd hh:mm:ss UTC>\", \"reward_addresses\": \"<address1>,<address2>\" \"identifiers\": \"<identifiers>\" } ] number and timestamp are consistent among different blockchains. reward_addresses and identifiers vary, depending on each ledger. Specifically, reward_addresses corresponds to: Bitcoin , Bitcoin Cash , Dogecoin , Litecoin , Zcash : a string of comma-separated addresses which appear in the block's coinbase transaction with non-negative value (i.e., which are given part of the block's fees) Ethereum : the block's miner field Cardano : the hash of the pool that created the data, if defined, otherwise the empty string Tezos : the block's baker field The field identifiers corresponds to: Bitcoin , Bitcoin Cash , Dogecoin , Litecoin , Zcash : the field coinbase_param of the block's coinbase transaction Ethereum : the block's extra_data field Cardano : the ticker name of the pool that created the block, if defined, otherwise an empty string Tezos : there is no such field If using BigQuery, the queries for Bitcoin, Bitcoin Cash, Dogecoin, Litecoin, Zcash (see Data Collection ) return data that are parsed with the default_parser module in parsers . The query for Ethereum returns data that is parsed using the ethereum_parser module in parsers . All other queries return data already in the necessary parsed form, so they are parsed using a \"dummy\" parser that only sorts the blocks.","title":"Parsers"},{"location":"parsers/#parsers","text":"The parser obtains raw data from a full node (see Data Collection page on how to obtain the required data). It parses the data into a list of entries (dictionaries), each entry corresponding to a block. The input file should be placed in the raw_block_data/ directory and named as <project_name>_raw_data.json . The parsed data is structured as follows: [ { \"number\": \"<block's number>\", \"timestamp\": \"<block's timestamp of the form: yyyy-mm-dd hh:mm:ss UTC>\", \"reward_addresses\": \"<address1>,<address2>\" \"identifiers\": \"<identifiers>\" } ] number and timestamp are consistent among different blockchains. reward_addresses and identifiers vary, depending on each ledger. Specifically, reward_addresses corresponds to: Bitcoin , Bitcoin Cash , Dogecoin , Litecoin , Zcash : a string of comma-separated addresses which appear in the block's coinbase transaction with non-negative value (i.e., which are given part of the block's fees) Ethereum : the block's miner field Cardano : the hash of the pool that created the data, if defined, otherwise the empty string Tezos : the block's baker field The field identifiers corresponds to: Bitcoin , Bitcoin Cash , Dogecoin , Litecoin , Zcash : the field coinbase_param of the block's coinbase transaction Ethereum : the block's extra_data field Cardano : the ticker name of the pool that created the block, if defined, otherwise an empty string Tezos : there is no such field If using BigQuery, the queries for Bitcoin, Bitcoin Cash, Dogecoin, Litecoin, Zcash (see Data Collection ) return data that are parsed with the default_parser module in parsers . The query for Ethereum returns data that is parsed using the ethereum_parser module in parsers . All other queries return data already in the necessary parsed form, so they are parsed using a \"dummy\" parser that only sorts the blocks.","title":"Parsers"},{"location":"setup/","text":"Setup Installation To install the consensus decentralization analysis tool, simply clone this GitHub repository: git clone https://github.com/Blockchain-Technology-Lab/consensus-decentralization.git The tool is written in Python 3, therefore a Python 3 interpreter is required in order to run it locally. The requirements file lists the dependencies of the project. Make sure you have all of them installed before running the scripts. To install all of them in one go, run the following command from the root directory of the project: python -m pip install -r requirements.txt Execution The consensus decentralization analysis tool is a CLI tool. The run.py script in the root directory of the project invokes the required parsers, mappings and metrics, but it is also possible to execute each module individually. The following process describes the most typical workflow. Place all raw data (which could be collected from BigQuery for example; see Data Collection for more details) in the raw_block_data/ directory, each file named as <project_name>_raw_data.json (e.g., bitcoin_raw_data.json ). By default, there is a (very small) sample input file for some supported projects; to use it, remove the prefix sample_ . Run python run.py --ledgers <ledger_1> <ledger_n> --timeframe <timeframe> --aggregate-by <unit to aggregate by> to analyze the n specified ledgers for the given timeframe, aggregated using the given granularity. All arguments are optional, so it's possible to omit any of them; in this case, the default values will be used. Specifically: ledgers accepts any number of the supported ledgers (case-insensitive). For example, --ledgers bitcoin would run the analysis for Bitcoin, while --ledgers Bitcoin Ethereum Cardano would run the analysis for Bitcoin, Ethereum and Cardano. If the ledgers argument is omitted, then the analysis is performed for the ledgers specified in the config.yaml file, which are typically all supported ledgers. The timeframe argument accepts one or two values of the form YYYY-MM-DD (month and day can be omitted), which indicate the beginning and end of the time period that will be analyzed. For example, --timeframe 2022 would run the analysis for the year 2022 (so from January 1st 2022 to December 31st 2022), while we could also get the same result using --timeframe 2022-01 2022-12 or --timeframe 2022-01-01 2022-12-31 . Similarly, --timeframe 2022-02 or --timeframe 2022-02-01 2022-02-28 would do it for the month of February 2022 (February 1st 2022 to February 28th 2022), while --timeframe 2022-02-03 would do it for a single day (Feburary 3rd 2022). Last, --timeframe 2018 2022 would run the analysis for the entire time period between January 1st 2018 and December 31st 2022. If the timeframe argument is omitted, then the start date and end dates of the time frame are sourced from the config.yaml file. aggregate_by corresponds to the unit of time to aggregate the data by, i.e. the granularity of the analysis. It can be one of: day , week , month , year , all and by default it is month . Note that in the case of weekly aggregation, we consider a week to be 7 consecutive days, starting from the first day of the time period under consideration (so not necessarily Monday to Sunday). If \"all\" is chosen, then no aggregation will be performed, meaning that the given timeframe will be treated as a single unit of time in the context of our analysis. In all other cases, the given timeframe will be divided into units of the given granularity and the result will be a time series. Additionally, there are three flags that can be used to customize an execution: --force-map forces the parsing, mapping and aggregation to be performed on all data, even if the relevant output files already exist. This can be useful for when mapping info is updated for some blockchain. By default, this flag is set to False and the tool only performs the mapping and aggregation when the relevant output files do not exist. --plot enables the generation of graphs at the end of the execution. Specifically, the output of each implemented metric is plotted for the specified ledgers and timeframe, as well as the block production dynamics for each specified ledger. By default, this flag is set to False and no plots are generated. --animated enables the generation of (additional) animated graphs at the end of the execution. By default, this flag is set to False and no animated plots are generated. Note that this flag is ignored if --plot is set to False. All output files can then be found under the output/ directory, which is automatically created the first time the tool is run.","title":"How to use"},{"location":"setup/#setup","text":"","title":"Setup"},{"location":"setup/#installation","text":"To install the consensus decentralization analysis tool, simply clone this GitHub repository: git clone https://github.com/Blockchain-Technology-Lab/consensus-decentralization.git The tool is written in Python 3, therefore a Python 3 interpreter is required in order to run it locally. The requirements file lists the dependencies of the project. Make sure you have all of them installed before running the scripts. To install all of them in one go, run the following command from the root directory of the project: python -m pip install -r requirements.txt","title":"Installation"},{"location":"setup/#execution","text":"The consensus decentralization analysis tool is a CLI tool. The run.py script in the root directory of the project invokes the required parsers, mappings and metrics, but it is also possible to execute each module individually. The following process describes the most typical workflow. Place all raw data (which could be collected from BigQuery for example; see Data Collection for more details) in the raw_block_data/ directory, each file named as <project_name>_raw_data.json (e.g., bitcoin_raw_data.json ). By default, there is a (very small) sample input file for some supported projects; to use it, remove the prefix sample_ . Run python run.py --ledgers <ledger_1> <ledger_n> --timeframe <timeframe> --aggregate-by <unit to aggregate by> to analyze the n specified ledgers for the given timeframe, aggregated using the given granularity. All arguments are optional, so it's possible to omit any of them; in this case, the default values will be used. Specifically: ledgers accepts any number of the supported ledgers (case-insensitive). For example, --ledgers bitcoin would run the analysis for Bitcoin, while --ledgers Bitcoin Ethereum Cardano would run the analysis for Bitcoin, Ethereum and Cardano. If the ledgers argument is omitted, then the analysis is performed for the ledgers specified in the config.yaml file, which are typically all supported ledgers. The timeframe argument accepts one or two values of the form YYYY-MM-DD (month and day can be omitted), which indicate the beginning and end of the time period that will be analyzed. For example, --timeframe 2022 would run the analysis for the year 2022 (so from January 1st 2022 to December 31st 2022), while we could also get the same result using --timeframe 2022-01 2022-12 or --timeframe 2022-01-01 2022-12-31 . Similarly, --timeframe 2022-02 or --timeframe 2022-02-01 2022-02-28 would do it for the month of February 2022 (February 1st 2022 to February 28th 2022), while --timeframe 2022-02-03 would do it for a single day (Feburary 3rd 2022). Last, --timeframe 2018 2022 would run the analysis for the entire time period between January 1st 2018 and December 31st 2022. If the timeframe argument is omitted, then the start date and end dates of the time frame are sourced from the config.yaml file. aggregate_by corresponds to the unit of time to aggregate the data by, i.e. the granularity of the analysis. It can be one of: day , week , month , year , all and by default it is month . Note that in the case of weekly aggregation, we consider a week to be 7 consecutive days, starting from the first day of the time period under consideration (so not necessarily Monday to Sunday). If \"all\" is chosen, then no aggregation will be performed, meaning that the given timeframe will be treated as a single unit of time in the context of our analysis. In all other cases, the given timeframe will be divided into units of the given granularity and the result will be a time series. Additionally, there are three flags that can be used to customize an execution: --force-map forces the parsing, mapping and aggregation to be performed on all data, even if the relevant output files already exist. This can be useful for when mapping info is updated for some blockchain. By default, this flag is set to False and the tool only performs the mapping and aggregation when the relevant output files do not exist. --plot enables the generation of graphs at the end of the execution. Specifically, the output of each implemented metric is plotted for the specified ledgers and timeframe, as well as the block production dynamics for each specified ledger. By default, this flag is set to False and no plots are generated. --animated enables the generation of (additional) animated graphs at the end of the execution. By default, this flag is set to False and no animated plots are generated. Note that this flag is ignored if --plot is set to False. All output files can then be found under the output/ directory, which is automatically created the first time the tool is run.","title":"Execution"}]}
\ No newline at end of file
diff --git a/setup/index.html b/setup/index.html
index 2f2030b..cb33664 100644
--- a/setup/index.html
+++ b/setup/index.html
@@ -121,19 +121,31 @@ <h2 id="execution">Execution</h2>
 in the <code>raw_block_data/</code> directory, each file named as <code>&lt;project_name&gt;_raw_data.json</code> (e.g., <code>bitcoin_raw_data.json</code>).
 By default,
 there is a (very small) sample input file for some supported projects; to use it, remove the prefix <code>sample_</code>.</p>
-<p>Run <code>python run.py --ledgers &lt;ledger_1&gt; &lt;ledger_n&gt; --timeframe &lt;timeframe&gt;</code> to
-analyze the n specified ledgers for the given timeframe.
-Both arguments are optional, so it's possible to omit one or both of them; in this case, the default values
+<p>Run <code>python run.py --ledgers &lt;ledger_1&gt; &lt;ledger_n&gt; --timeframe &lt;timeframe&gt; --aggregate-by &lt;unit to aggregate by&gt;</code> to
+analyze the n specified ledgers for the given timeframe, aggregated using the given granularity.
+All arguments are optional, so it's possible to omit any of them; in this case, the default values
 will be used. Specifically:</p>
 <ul>
-<li><code>ledgers</code> accepts any number of the supported ledgers (case-insensitive). For example, <code>--ledgers bitcoin</code>
-would run the analysis for Bitcoin, while <code>--ledgers Bitcoin Ethereum Cardano</code> would run the analysis for Bitcoin,
-Ethereum and Cardano. If the <code>ledgers</code> argument is omitted, then all supported ledgers are analyzed.</li>
-<li>The <code>timeframe</code> argument should be of the form <code>YYYY-MM-DD</code> (month and day can be omitted). For example,
-<code>--timeframe 2022</code> would run the analysis for the year 2022, while <code>--timeframe 2022-02</code> would do it for the month of
-February 2022 and <code>--timeframe 2022-02-03</code> would do it for a single day (Feburary 3rd 2022). If the <code>timeframe</code> 
-argument is omitted, then a monthly analysis is performed for each month between January 2010 and the current month 
-or the subset of this time period for which relevant data exists.</li>
+<li><code>ledgers</code> accepts any number of the supported ledgers (case-insensitive). For example, <code>--ledgers bitcoin</code> 
+  would run the analysis for Bitcoin, while <code>--ledgers Bitcoin Ethereum Cardano</code> would run the analysis for Bitcoin, 
+  Ethereum and Cardano. If the <code>ledgers</code> argument is omitted, then the analysis is performed for the ledgers 
+  specified in the <code>config.yaml</code> file, which are typically all supported ledgers.</li>
+<li>The <code>timeframe</code> argument accepts one or two values of the form <code>YYYY-MM-DD</code> (month and day can be
+  omitted), which indicate the beginning and end of the time period that will be analyzed. For example, 
+  <code>--timeframe 2022</code> would run the analysis for the year 2022 (so from January 1st 2022 to 
+  December 31st 2022), while we could also get the same result using <code>--timeframe 2022-01 2022-12</code> or 
+  <code>--timeframe 2022-01-01 2022-12-31</code>. Similarly, <code>--timeframe 2022-02</code> or <code>--timeframe 2022-02-01 2022-02-28</code> would 
+  do it for the month of February 2022 (February 1st 2022 to February 28th 2022), while <code>--timeframe 2022-02-03</code> 
+  would do it for a single day (Feburary 3rd 2022). Last, <code>--timeframe 2018 2022</code> would run the analysis for the 
+  entire time period between January 1st 2018 and December 31st 2022. If the <code>timeframe</code> argument is omitted, then 
+  the start date and end dates of the time frame are sourced from the <code>config.yaml</code> file.</li>
+<li><code>aggregate_by</code> corresponds to the unit of time to aggregate the data by, i.e. the granularity of the analysis. 
+  It can be one of: <code>day</code>, <code>week</code>, <code>month</code>, <code>year</code>, <code>all</code> and by default it is <code>month</code>. Note that in the case of 
+  weekly aggregation, we consider a week to be 7 consecutive days, starting from the first day of the time period 
+  under consideration (so not necessarily Monday to Sunday). If "all" is chosen, then no aggregation will be 
+  performed, meaning that the given timeframe will be treated as a single unit of time in the context of our 
+  analysis. In all other cases, the given timeframe will be divided into units of the given granularity and the 
+  result will be a time series.</li>
 </ul>
 <p>Additionally, there are three flags that can be used to customize an execution:</p>
 <ul>
diff --git a/sitemap.xml.gz b/sitemap.xml.gz
index 9841ed1..987802f 100644
Binary files a/sitemap.xml.gz and b/sitemap.xml.gz differ