diff --git a/contribute/index.html b/contribute/index.html
index aeedb59..9e80a50 100644
--- a/contribute/index.html
+++ b/contribute/index.html
@@ -198,9 +198,13 @@ <h2 id="add-metrics">Add metrics</h2>
 entities (as keys) to number of blocks (as values), outputs a single value (the
 outcome of the metric).</p>
 <p>Second, import this new function to <code>src/analyze.py</code>.</p>
-<p>Finally, add the name of the metric (which should be the same as the one used in
+<p>Third, add the name of the metric (which should be the same as the one used in
 the filename above) and any parameter values it might require to the file
 <code>config.yaml</code>, under <code>metrics</code>.</p>
+<p>Fourth, you should add unit tests for the new metric
+<a href="https://github.com/Blockchain-Technology-Lab/pooling-analysis/tree/main/tests">here</a>.</p>
+<p>Finally, you should update the <a href="https://github.com/Blockchain-Technology-Lab/pooling-analysis/blob/main/docs/metrics.md">corresponding documentation
+page</a></p>
               
             </div>
           </div><footer>
diff --git a/index.html b/index.html
index 10f71fe..dfd0bc7 100644
--- a/index.html
+++ b/index.html
@@ -194,5 +194,5 @@ <h2 id="contributing">Contributing</h2>
 
 <!--
 MkDocs version : 1.5.2
-Build Date UTC : 2023-08-16 10:58:34.967827+00:00
+Build Date UTC : 2023-08-17 09:50:49.351001+00:00
 -->
diff --git a/search/search_index.json b/search/search_index.json
index 79ade87..44b77fb 100644
--- a/search/search_index.json
+++ b/search/search_index.json
@@ -1 +1 @@
-{"config":{"indexing":"full","lang":["en"],"min_search_length":3,"prebuild_index":false,"separator":"[\\s\\-]+"},"docs":[{"location":"","text":"Blockchain Pooling Analysis - Documentation This is the documentation for the Blockchain Pooling Analysis tool developed by the University of Edinburgh's Blockchain Technology Lab. The tool is responsible for analyzing pooling behavior of various blockchains and measuring their subsequent decentralization levels. The relevant source code is available on GitHub . Overview The tool consists of the following modules: Parser Mapping Metrics The parser is responsible for pre-processing the raw data that comes from a full node. It produces a file with all the information that is needed for the mapping. The mapping takes the output of the parser and combines it with some other sources of information. It then outputs a file that reveals the distribution of resources to different entities. In this context, \"resources\" correspond to the number of produced blocks. This distribution is the input for the metrics module, which tracks various decentralization-related metrics and produces files with the results. More details about the different modules can be found in the corresponding Parser , Mapping and Metrics pages. Currently, the supported ledgers are: Bitcoin Bitcoin Cash Cardano Dogecoin Ethereum Litecoin Tezos Zcash We intend to add more ledgers to this list in the future. Contributing This is an open source project licensed under the terms and conditions of the MIT license and CC BY-SA 4.0 . Everyone is welcome to contribute to it by proposing or implementing their ideas. Example contributions include, but are not limited to, reporting potential bugs, supplying useful information for the mappings of supported ledgers, adding support for a new ledger, or making the code more efficient. All contributions to the project will also be covered by the above-mentioned license. When making changes in the code, contributors are required to fork the project's repository first and then issue a pull request with their changes. Each PR will be reviewed before being merged to the main branch. Bugs can be reported in the Issues page. Other comments and ideas can be brought up in the project's Discussions . For more information on how to make specific contributions, see the Contribute page.","title":"Home"},{"location":"#blockchain-pooling-analysis-documentation","text":"This is the documentation for the Blockchain Pooling Analysis tool developed by the University of Edinburgh's Blockchain Technology Lab. The tool is responsible for analyzing pooling behavior of various blockchains and measuring their subsequent decentralization levels. The relevant source code is available on GitHub .","title":"Blockchain Pooling Analysis - Documentation"},{"location":"#overview","text":"The tool consists of the following modules: Parser Mapping Metrics The parser is responsible for pre-processing the raw data that comes from a full node. It produces a file with all the information that is needed for the mapping. The mapping takes the output of the parser and combines it with some other sources of information. It then outputs a file that reveals the distribution of resources to different entities. In this context, \"resources\" correspond to the number of produced blocks. This distribution is the input for the metrics module, which tracks various decentralization-related metrics and produces files with the results. More details about the different modules can be found in the corresponding Parser , Mapping and Metrics pages. Currently, the supported ledgers are: Bitcoin Bitcoin Cash Cardano Dogecoin Ethereum Litecoin Tezos Zcash We intend to add more ledgers to this list in the future.","title":"Overview"},{"location":"#contributing","text":"This is an open source project licensed under the terms and conditions of the MIT license and CC BY-SA 4.0 . Everyone is welcome to contribute to it by proposing or implementing their ideas. Example contributions include, but are not limited to, reporting potential bugs, supplying useful information for the mappings of supported ledgers, adding support for a new ledger, or making the code more efficient. All contributions to the project will also be covered by the above-mentioned license. When making changes in the code, contributors are required to fork the project's repository first and then issue a pull request with their changes. Each PR will be reviewed before being merged to the main branch. Bugs can be reported in the Issues page. Other comments and ideas can be brought up in the project's Discussions . For more information on how to make specific contributions, see the Contribute page.","title":"Contributing"},{"location":"contribute/","text":"How to contribute You can contribute to the tool by adding support for a ledger, updating the mapping process for an existing ledger, or adding a new metric. In all cases, the information should be submitted via a GitHub PR. Add support for ledgers You can add support for a ledger that is not already supported as follows. Mapping information In the directory mapping_information , there exist three folders ( addresses , clusters , identifiers ). In each folder, add a file named <project_name>.json , if there exist such information for the new ledger (for more details on what type of information each folder corresponds to see the mapping documentation ). Parser and mapping In the directory src/parsers , create a file named <project_name>_parser.py , if no existing parser can be reused. In this file create a new class, which inherits from the DefaultParser class of default_parser.py . Then, override its parse method in order to implement the new parser (or override another method if there are only small changes needed, e.g. parse_identifiers if the only thing that is different from the default parser is the way identifiers are decoded). In the directory src/mappings , create a file named <project_name>_mapping.py , if no existing mapping can be reused. In this file create a new class, which inherits from the DefaultMapping class of default_mapping.py . Then, override its process method. This method takes as input a time period in the form yyyy-mm-dd (e.g., '2022' for the year 2022, '2022-11' for November 2022, '2022-11-12' for 12 November 2022), returns a dictionary of the form {'<entity name>': <number of resources>} , and creates a csv file with the mapped data for this timeframe in the output directory. Then, you should enable support for the new ledger in the parser and mapping module scripts. Specifically: in the script src/parse.py , import the parser class and assign it to the project's name in the ledger_parser dictionary; in the script src/map.py , import the mapping class and assign it to the project's name in the dictionary ledger_mapping . Notes : You should add an entry in each dictionary, regardless of whether you use a new or existing parser or mapping \u2013 if no new parser or mapping class was created for the project, simply assign the suitable class (e.g. DefaultParser or DefaultMapping) to the project's name in the corresponding dictionary. If you create a new parser/mapping, you should also add unit tests here Documentation Finally, you should include the new ledger in the documentation pages; specifically: add the ledger in the list of supported ledgers in the repository's main README file add the ledger in the list of supported ledgers in the index documentation page document the new ledger's parser in the corresponding documentation page document how the new ledger's data is retrieved in the corresponding documentation page ; if Google BigQuery is used, add the new query to queries.yaml Update existing mapping information All mapping data are in the folder mapping_information . To update or add information about a supported ledger's mapping, you should open a Pull Request. This can be done either via console or as follows, via the browser: Open the file that you want to change (e.g., for Bitcoin, follow this link ) on your browser. Click Edit this file . Make your changes in the file. On the bottom, initiate a Pull Request. Write a short and descriptive commit title message (e.g., \"Update 2019 links for company A\"). Select Create a new branch for this commit and start a pull request. In the page that opens, change the PR title (if necessary) and click on Create pull request . When updating the mapping information, the following guidelines should be observed: The link to a pool's website should be active and public. All sources cited should be publicly available and respectable. Unofficial tweets or unavailable or private sources will be rejected.You can use specific keywords, in the cases when the information is available on-chain. Specifically: homepage : this keyword is used in Cardano, to denote that two pools define the same homepage in their metadata (which are published on-chain) Specifically, for legal_links.json : The value of the pool's name (that is the first value in each array entry under a company), should be the same as the value that corresponds to a key name in the ledger-specific pool information, as defined in the corresponding addresses , clusters or identifiers file. If this string is not exactly the same (including capitalization), the link will not be identified during the mapping process. There should exist no time gaps in a pool's ownership structure. Add metrics To add a new metric, you should do the following steps. First, create a relevant script in the folder src/metrics . The script should include a function named compute_{metric_name} that, given a dictionary of entities (as keys) to number of blocks (as values), outputs a single value (the outcome of the metric). Second, import this new function to src/analyze.py . Finally, add the name of the metric (which should be the same as the one used in the filename above) and any parameter values it might require to the file config.yaml , under metrics .","title":"How to contribute"},{"location":"contribute/#how-to-contribute","text":"You can contribute to the tool by adding support for a ledger, updating the mapping process for an existing ledger, or adding a new metric. In all cases, the information should be submitted via a GitHub PR.","title":"How to contribute"},{"location":"contribute/#add-support-for-ledgers","text":"You can add support for a ledger that is not already supported as follows.","title":"Add support for ledgers"},{"location":"contribute/#mapping-information","text":"In the directory mapping_information , there exist three folders ( addresses , clusters , identifiers ). In each folder, add a file named <project_name>.json , if there exist such information for the new ledger (for more details on what type of information each folder corresponds to see the mapping documentation ).","title":"Mapping information"},{"location":"contribute/#parser-and-mapping","text":"In the directory src/parsers , create a file named <project_name>_parser.py , if no existing parser can be reused. In this file create a new class, which inherits from the DefaultParser class of default_parser.py . Then, override its parse method in order to implement the new parser (or override another method if there are only small changes needed, e.g. parse_identifiers if the only thing that is different from the default parser is the way identifiers are decoded). In the directory src/mappings , create a file named <project_name>_mapping.py , if no existing mapping can be reused. In this file create a new class, which inherits from the DefaultMapping class of default_mapping.py . Then, override its process method. This method takes as input a time period in the form yyyy-mm-dd (e.g., '2022' for the year 2022, '2022-11' for November 2022, '2022-11-12' for 12 November 2022), returns a dictionary of the form {'<entity name>': <number of resources>} , and creates a csv file with the mapped data for this timeframe in the output directory. Then, you should enable support for the new ledger in the parser and mapping module scripts. Specifically: in the script src/parse.py , import the parser class and assign it to the project's name in the ledger_parser dictionary; in the script src/map.py , import the mapping class and assign it to the project's name in the dictionary ledger_mapping . Notes : You should add an entry in each dictionary, regardless of whether you use a new or existing parser or mapping \u2013 if no new parser or mapping class was created for the project, simply assign the suitable class (e.g. DefaultParser or DefaultMapping) to the project's name in the corresponding dictionary. If you create a new parser/mapping, you should also add unit tests here","title":"Parser and mapping"},{"location":"contribute/#documentation","text":"Finally, you should include the new ledger in the documentation pages; specifically: add the ledger in the list of supported ledgers in the repository's main README file add the ledger in the list of supported ledgers in the index documentation page document the new ledger's parser in the corresponding documentation page document how the new ledger's data is retrieved in the corresponding documentation page ; if Google BigQuery is used, add the new query to queries.yaml","title":"Documentation"},{"location":"contribute/#update-existing-mapping-information","text":"All mapping data are in the folder mapping_information . To update or add information about a supported ledger's mapping, you should open a Pull Request. This can be done either via console or as follows, via the browser: Open the file that you want to change (e.g., for Bitcoin, follow this link ) on your browser. Click Edit this file . Make your changes in the file. On the bottom, initiate a Pull Request. Write a short and descriptive commit title message (e.g., \"Update 2019 links for company A\"). Select Create a new branch for this commit and start a pull request. In the page that opens, change the PR title (if necessary) and click on Create pull request . When updating the mapping information, the following guidelines should be observed: The link to a pool's website should be active and public. All sources cited should be publicly available and respectable. Unofficial tweets or unavailable or private sources will be rejected.You can use specific keywords, in the cases when the information is available on-chain. Specifically: homepage : this keyword is used in Cardano, to denote that two pools define the same homepage in their metadata (which are published on-chain) Specifically, for legal_links.json : The value of the pool's name (that is the first value in each array entry under a company), should be the same as the value that corresponds to a key name in the ledger-specific pool information, as defined in the corresponding addresses , clusters or identifiers file. If this string is not exactly the same (including capitalization), the link will not be identified during the mapping process. There should exist no time gaps in a pool's ownership structure.","title":"Update existing mapping information"},{"location":"contribute/#add-metrics","text":"To add a new metric, you should do the following steps. First, create a relevant script in the folder src/metrics . The script should include a function named compute_{metric_name} that, given a dictionary of entities (as keys) to number of blocks (as values), outputs a single value (the outcome of the metric). Second, import this new function to src/analyze.py . Finally, add the name of the metric (which should be the same as the one used in the filename above) and any parameter values it might require to the file config.yaml , under metrics .","title":"Add metrics"},{"location":"data/","text":"Data collection Currently, the data for the analysis of the different ledgers is collected through Google BigQuery . Note that when saving results from BigQuery you should select the option \"JSONL (newline delimited)\". Sample data & queries Bitcoin Sample raw Bitcoin data are available here . They can be retrieved using Google BigQuery with the following query: SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_bitcoin.transactions`.outputs FROM `bigquery-public-data.crypto_bitcoin.transactions` JOIN `bigquery-public-data.crypto_bitcoin.blocks` ON `bigquery-public-data.crypto_bitcoin.transactions`.block_number = `bigquery-public-data.crypto_bitcoin.blocks`.number WHERE is_coinbase is TRUE AND timestamp > '2017-12-31' Bitcoin Cash Sample raw Bitcoin Cash data are available here . They can be retrieved using Google BigQuery with the following query: SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_bitcoin_cash.transactions`.outputs FROM `bigquery-public-data.crypto_bitcoin_cash.transactions` JOIN `bigquery-public-data.crypto_bitcoin_cash.blocks` ON `bigquery-public-data.crypto_bitcoin_cash.transactions`.block_number = `bigquery-public-data.crypto_bitcoin_cash.blocks`.number WHERE is_coinbase is TRUE AND timestamp > '2018-12-31' Cardano Sample raw Cardano data are available here . They can be retrieved using Google BigQuery with the following query: SELECT `iog-data-analytics.cardano_mainnet.block`.slot_no as number, `iog-data-analytics.cardano_mainnet.pool_offline_data`.ticker_name as identifiers, `iog-data-analytics.cardano_mainnet.block`.block_time as timestamp,`iog-data-analytics.cardano_mainnet.block`.pool_hash as reward_addresses FROM `iog-data-analytics.cardano_mainnet.block` LEFT JOIN `iog-data-analytics.cardano_mainnet.pool_offline_data` ON `iog-data-analytics.cardano_mainnet.block`.pool_hash = `iog-data-analytics.cardano_mainnet.pool_offline_data`.pool_hash WHERE `iog-data-analytics.cardano_mainnet.block`.block_time > '2020-12-31' Dogecoin Sample raw Dogecoin data are available here . They can be retrieved using Google BigQuery with the following query: SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_dogecoin.transactions`.outputs FROM `bigquery-public-data.crypto_dogecoin.transactions` JOIN `bigquery-public-data.crypto_dogecoin.blocks` ON `bigquery-public-data.crypto_dogecoin.transactions`.block_number = `bigquery-public-data.crypto_dogecoin.blocks`.number WHERE is_coinbase is TRUE AND timestamp > '2019-12-31' Ethereum Sample raw Ethereum data are available here . They can be retrieved using Google BigQuery with the following query: SELECT number, timestamp, miner as reward_addresses, extra_data as identifiers FROM `bigquery-public-data.crypto_ethereum.blocks` WHERE timestamp > '2018-12-31' Litecoin Sample raw Litecoin data are available here . They can be retrieved using Google BigQuery with the following query: SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_litecoin.transactions`.outputs FROM `bigquery-public-data.crypto_litecoin.transactions` JOIN `bigquery-public-data.crypto_litecoin.blocks` ON `bigquery-public-data.crypto_litecoin.transactions`.block_number = `bigquery-public-data.crypto_litecoin.blocks`.number WHERE is_coinbase is TRUE AND timestamp > '2018-12-31' Tezos Sample raw Tezos data are available here . They can be retrieved using Google BigQuery with the following query: SELECT level as number, timestamp, baker as reward_addresses FROM `public-data-finance.crypto_tezos.blocks` WHERE timestamp > '2020-12-31' Zcash Sample raw Zcash data are available here . They can be retrieved using Google BigQuery with the following query: SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_zcash.transactions`.outputs FROM `bigquery-public-data.crypto_zcash.transactions` JOIN `bigquery-public-data.crypto_zcash.blocks` ON `bigquery-public-data.crypto_zcash.transactions`.block_number = `bigquery-public-data.crypto_zcash.blocks`.number WHERE is_coinbase is TRUE AND timestamp > '2018-12-31' Automating the data collection process Instead of executing each of these queries separately on the BigQuery console and saving the results manually, it is possible to automate the process using a script and collect all relevant data in one go. Executing this script will run all queries in this file , so you can also control which queries are run by adding them to or removing them from the file. IMPORTANT: the script uses service account credentials for authentication, therefore before running it, you need to generate the relevant credentials from Google, as described here and save your key in the root directory of the project under the name 'google-service-account-key.json'. There is a sample file that you can consult, which shows what your credentials are supposed to look like (but note that this is for informational purposes only, this file is not used in the code). Once you have set up the credentials, you can just run the following command from the src directory to retrieve data for all supported blockchains: python collect_data.py","title":"Data Collection"},{"location":"data/#data-collection","text":"Currently, the data for the analysis of the different ledgers is collected through Google BigQuery . Note that when saving results from BigQuery you should select the option \"JSONL (newline delimited)\".","title":"Data collection"},{"location":"data/#sample-data-queries","text":"","title":"Sample data &amp; queries"},{"location":"data/#bitcoin","text":"Sample raw Bitcoin data are available here . They can be retrieved using Google BigQuery with the following query: SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_bitcoin.transactions`.outputs FROM `bigquery-public-data.crypto_bitcoin.transactions` JOIN `bigquery-public-data.crypto_bitcoin.blocks` ON `bigquery-public-data.crypto_bitcoin.transactions`.block_number = `bigquery-public-data.crypto_bitcoin.blocks`.number WHERE is_coinbase is TRUE AND timestamp > '2017-12-31'","title":"Bitcoin"},{"location":"data/#bitcoin-cash","text":"Sample raw Bitcoin Cash data are available here . They can be retrieved using Google BigQuery with the following query: SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_bitcoin_cash.transactions`.outputs FROM `bigquery-public-data.crypto_bitcoin_cash.transactions` JOIN `bigquery-public-data.crypto_bitcoin_cash.blocks` ON `bigquery-public-data.crypto_bitcoin_cash.transactions`.block_number = `bigquery-public-data.crypto_bitcoin_cash.blocks`.number WHERE is_coinbase is TRUE AND timestamp > '2018-12-31'","title":"Bitcoin Cash"},{"location":"data/#cardano","text":"Sample raw Cardano data are available here . They can be retrieved using Google BigQuery with the following query: SELECT `iog-data-analytics.cardano_mainnet.block`.slot_no as number, `iog-data-analytics.cardano_mainnet.pool_offline_data`.ticker_name as identifiers, `iog-data-analytics.cardano_mainnet.block`.block_time as timestamp,`iog-data-analytics.cardano_mainnet.block`.pool_hash as reward_addresses FROM `iog-data-analytics.cardano_mainnet.block` LEFT JOIN `iog-data-analytics.cardano_mainnet.pool_offline_data` ON `iog-data-analytics.cardano_mainnet.block`.pool_hash = `iog-data-analytics.cardano_mainnet.pool_offline_data`.pool_hash WHERE `iog-data-analytics.cardano_mainnet.block`.block_time > '2020-12-31'","title":"Cardano"},{"location":"data/#dogecoin","text":"Sample raw Dogecoin data are available here . They can be retrieved using Google BigQuery with the following query: SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_dogecoin.transactions`.outputs FROM `bigquery-public-data.crypto_dogecoin.transactions` JOIN `bigquery-public-data.crypto_dogecoin.blocks` ON `bigquery-public-data.crypto_dogecoin.transactions`.block_number = `bigquery-public-data.crypto_dogecoin.blocks`.number WHERE is_coinbase is TRUE AND timestamp > '2019-12-31'","title":"Dogecoin"},{"location":"data/#ethereum","text":"Sample raw Ethereum data are available here . They can be retrieved using Google BigQuery with the following query: SELECT number, timestamp, miner as reward_addresses, extra_data as identifiers FROM `bigquery-public-data.crypto_ethereum.blocks` WHERE timestamp > '2018-12-31'","title":"Ethereum"},{"location":"data/#litecoin","text":"Sample raw Litecoin data are available here . They can be retrieved using Google BigQuery with the following query: SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_litecoin.transactions`.outputs FROM `bigquery-public-data.crypto_litecoin.transactions` JOIN `bigquery-public-data.crypto_litecoin.blocks` ON `bigquery-public-data.crypto_litecoin.transactions`.block_number = `bigquery-public-data.crypto_litecoin.blocks`.number WHERE is_coinbase is TRUE AND timestamp > '2018-12-31'","title":"Litecoin"},{"location":"data/#tezos","text":"Sample raw Tezos data are available here . They can be retrieved using Google BigQuery with the following query: SELECT level as number, timestamp, baker as reward_addresses FROM `public-data-finance.crypto_tezos.blocks` WHERE timestamp > '2020-12-31'","title":"Tezos"},{"location":"data/#zcash","text":"Sample raw Zcash data are available here . They can be retrieved using Google BigQuery with the following query: SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_zcash.transactions`.outputs FROM `bigquery-public-data.crypto_zcash.transactions` JOIN `bigquery-public-data.crypto_zcash.blocks` ON `bigquery-public-data.crypto_zcash.transactions`.block_number = `bigquery-public-data.crypto_zcash.blocks`.number WHERE is_coinbase is TRUE AND timestamp > '2018-12-31'","title":"Zcash"},{"location":"data/#automating-the-data-collection-process","text":"Instead of executing each of these queries separately on the BigQuery console and saving the results manually, it is possible to automate the process using a script and collect all relevant data in one go. Executing this script will run all queries in this file , so you can also control which queries are run by adding them to or removing them from the file. IMPORTANT: the script uses service account credentials for authentication, therefore before running it, you need to generate the relevant credentials from Google, as described here and save your key in the root directory of the project under the name 'google-service-account-key.json'. There is a sample file that you can consult, which shows what your credentials are supposed to look like (but note that this is for informational purposes only, this file is not used in the code). Once you have set up the credentials, you can just run the following command from the src directory to retrieve data for all supported blockchains: python collect_data.py","title":"Automating the data collection process"},{"location":"mappings/","text":"Mappings A mapping obtains the parsed data (from output/<project_name>/parsed_data.json ) and outputs a csv file that maps blocks to entities, structured as follows: Entity,Resources <name of entity>,<(int) number of blocks> The name of the csv file is the timeframe, over which the mapping was executed (e.g., 2021-04.csv ). The file is stored in the project's output directory ( output/<project_name>/ ). The logic of the mapping depends on the type of clustering we want to achieve. So, different mappings will output different results, even if applied on the same data. An exception to this is the \"no-cluster\" mapping (DummyMapping in the code), which maps blocks to reward addresses, so it doesn't perform any extra processing on the raw data. Mapping Information To assist the mapping process, the directory mapping_information/ contains mapping information about the supported projects. There exist three subdirectories and two additional files. In each subdirectory there exists a file for the corresponding ledger data, if such data exists. Identifiers The files under identifiers define information about block creators. Each key corresponds to a tag or ticker, by which the pool is identifiable in its produced blocks. The value for each key is a dictionary of pool-related information, specifically its name, a URL to its homepage, etc. Each file's structure is as follows: { \"P1\": { \"name\": \"Pool P1\", \"homepage\": \"example.com/p1\" }, \"--P2--\": { \"name\": \"Pool P2\", \"homepage\": \"example.com/p2\" } } Clusters The files under clusters define information about pool clusters. This information is organized per cluster. For each cluster, an array of pool-related information is defined. Each item in the array defines the pool's name, the time window during which the pool belonged to the cluster (from the beginning of from until the beginning of to excluding ), and the publicly available source of information, via which the link between the pool and the cluster is established. Each file's structure is as follows: { \"cluster A\": [ {\"name\": \"P1\", \"from\": \"\", \"to\": \"2023\", \"source\": \"example.com/link1\"} ], \"cluster B\": [ {\"name\": \"--P2--\", \"from\": \"\", \"to\": \"\", \"source\": \"example.com/link2\"} ] } Addresses The files under addresses define ownership information about addresses. As with clusters, for each address the pool ownership information defines the pool's name and a public source of information about the ownership. Each file's structure is as follows: { \"address1\": {\"name\": \"Pool P2\", \"source\": \"example.com\"}, } Legal links The file legal_links.json defines legal links between pools and companies, based on off-chain information. For example, it defines ownership information of a pool by a company. The structure of the file is as follows: { \"<parent company>\": [ {\"name\": \"<pool name>\", \"from\": \"<start date>\", \"to\": \"<end date>\", \"source\": \"<source of information>\"} ] } The values for each entry are the same as clusters in the above pool information. Special addresses The file special_addresses.json defines per-project information about addresses that are not related to some entity but are used for protocol-specific reasons (e.g. treasury address). The format of the file is the following: { \"Project A\": [ {\"address\": \"A special address 1\", \"source\": \"some.public.source\"}, {\"address\": \"A special address 2\", \"source\": \"some.public.source\"} ], \"Project B\": [ {\"address\": \"B special address\", \"source\": \"some.public.source\"} ] } Mapping process implementation In our implementation, the mapping of a block uses the auxiliary information as follows. First, it iterates over all known tags and compares each one with the block's identifiers. If the tag is a substring of the parameter, then we have a match. Second, if the first step fails, we compare the block's reward addresses with known pool addresses and again look for a match. In both cases, if there is a match, then: (i) we map the block to the matched pool; (ii) we associate all of the block's reward addresses (that is, the addresses that receive fees from the block) with the matched pool. In essence, the identifiers are the principal element for mapping a block to an entity and the known addresses are the fallback mechanism. If there is a match, we also parse the auxiliary information, such as pool ownership or clusters, in order to assign the block to the top level entity, e.g., the pool's parent company or cluster. If both mechanisms fail, then no match is found. In this case, we assign the reward addresses as the block's entity.","title":"Mappings"},{"location":"mappings/#mappings","text":"A mapping obtains the parsed data (from output/<project_name>/parsed_data.json ) and outputs a csv file that maps blocks to entities, structured as follows: Entity,Resources <name of entity>,<(int) number of blocks> The name of the csv file is the timeframe, over which the mapping was executed (e.g., 2021-04.csv ). The file is stored in the project's output directory ( output/<project_name>/ ). The logic of the mapping depends on the type of clustering we want to achieve. So, different mappings will output different results, even if applied on the same data. An exception to this is the \"no-cluster\" mapping (DummyMapping in the code), which maps blocks to reward addresses, so it doesn't perform any extra processing on the raw data.","title":"Mappings"},{"location":"mappings/#mapping-information","text":"To assist the mapping process, the directory mapping_information/ contains mapping information about the supported projects. There exist three subdirectories and two additional files. In each subdirectory there exists a file for the corresponding ledger data, if such data exists.","title":"Mapping Information"},{"location":"mappings/#identifiers","text":"The files under identifiers define information about block creators. Each key corresponds to a tag or ticker, by which the pool is identifiable in its produced blocks. The value for each key is a dictionary of pool-related information, specifically its name, a URL to its homepage, etc. Each file's structure is as follows: { \"P1\": { \"name\": \"Pool P1\", \"homepage\": \"example.com/p1\" }, \"--P2--\": { \"name\": \"Pool P2\", \"homepage\": \"example.com/p2\" } }","title":"Identifiers"},{"location":"mappings/#clusters","text":"The files under clusters define information about pool clusters. This information is organized per cluster. For each cluster, an array of pool-related information is defined. Each item in the array defines the pool's name, the time window during which the pool belonged to the cluster (from the beginning of from until the beginning of to excluding ), and the publicly available source of information, via which the link between the pool and the cluster is established. Each file's structure is as follows: { \"cluster A\": [ {\"name\": \"P1\", \"from\": \"\", \"to\": \"2023\", \"source\": \"example.com/link1\"} ], \"cluster B\": [ {\"name\": \"--P2--\", \"from\": \"\", \"to\": \"\", \"source\": \"example.com/link2\"} ] }","title":"Clusters"},{"location":"mappings/#addresses","text":"The files under addresses define ownership information about addresses. As with clusters, for each address the pool ownership information defines the pool's name and a public source of information about the ownership. Each file's structure is as follows: { \"address1\": {\"name\": \"Pool P2\", \"source\": \"example.com\"}, }","title":"Addresses"},{"location":"mappings/#legal-links","text":"The file legal_links.json defines legal links between pools and companies, based on off-chain information. For example, it defines ownership information of a pool by a company. The structure of the file is as follows: { \"<parent company>\": [ {\"name\": \"<pool name>\", \"from\": \"<start date>\", \"to\": \"<end date>\", \"source\": \"<source of information>\"} ] } The values for each entry are the same as clusters in the above pool information.","title":"Legal links"},{"location":"mappings/#special-addresses","text":"The file special_addresses.json defines per-project information about addresses that are not related to some entity but are used for protocol-specific reasons (e.g. treasury address). The format of the file is the following: { \"Project A\": [ {\"address\": \"A special address 1\", \"source\": \"some.public.source\"}, {\"address\": \"A special address 2\", \"source\": \"some.public.source\"} ], \"Project B\": [ {\"address\": \"B special address\", \"source\": \"some.public.source\"} ] }","title":"Special addresses"},{"location":"mappings/#mapping-process-implementation","text":"In our implementation, the mapping of a block uses the auxiliary information as follows. First, it iterates over all known tags and compares each one with the block's identifiers. If the tag is a substring of the parameter, then we have a match. Second, if the first step fails, we compare the block's reward addresses with known pool addresses and again look for a match. In both cases, if there is a match, then: (i) we map the block to the matched pool; (ii) we associate all of the block's reward addresses (that is, the addresses that receive fees from the block) with the matched pool. In essence, the identifiers are the principal element for mapping a block to an entity and the known addresses are the fallback mechanism. If there is a match, we also parse the auxiliary information, such as pool ownership or clusters, in order to assign the block to the top level entity, e.g., the pool's parent company or cluster. If both mechanisms fail, then no match is found. In this case, we assign the reward addresses as the block's entity.","title":"Mapping process implementation"},{"location":"metrics/","text":"Metrics A metric gets the mapped data (see above Mapping ) and outputs a relevant value. The implemented metrics are the following (with more to be added in the future): Nakamoto coefficient : The Nakamoto coefficient is the minimum number of entities that collectively produce more than 50% of the produced blocks within a given timeframe. The output of the metric is a tuple of the Nakamoto coefficient and the power percentage that these entities control. Gini coefficient : The Gini coefficient represents the degree of inequality in block production. The output of the metric a decimal number in [0,1]. Values close to 0 indicate equality (all entities in the system produce the same number of blocks) and values close to 1 indicate inequality (one entity produces most or all blocks). Entropy : Entropy represents the expected amount of information in the distribution of blocks across entities. Entropy is parameterized by a base rate \u03b1, which defines different types of entropy: (a) -1: min; (b) 0: Hartley; (c) 1: Shannon; (d) 2: collision. The output of the metric is a real number. Typically, a high number of different involved entities, each with approximately equal power, should yield high entropy. Each metric is implemented in a separate Python script in the folder metrics . Each script defines a function named compute_<metric_name> , which takes as input a dictionary of the form {'<entity name>': <number of resources>} (and possibly other relevant arguments) and outputs the corresponding metric values.","title":"Metrics"},{"location":"metrics/#metrics","text":"A metric gets the mapped data (see above Mapping ) and outputs a relevant value. The implemented metrics are the following (with more to be added in the future): Nakamoto coefficient : The Nakamoto coefficient is the minimum number of entities that collectively produce more than 50% of the produced blocks within a given timeframe. The output of the metric is a tuple of the Nakamoto coefficient and the power percentage that these entities control. Gini coefficient : The Gini coefficient represents the degree of inequality in block production. The output of the metric a decimal number in [0,1]. Values close to 0 indicate equality (all entities in the system produce the same number of blocks) and values close to 1 indicate inequality (one entity produces most or all blocks). Entropy : Entropy represents the expected amount of information in the distribution of blocks across entities. Entropy is parameterized by a base rate \u03b1, which defines different types of entropy: (a) -1: min; (b) 0: Hartley; (c) 1: Shannon; (d) 2: collision. The output of the metric is a real number. Typically, a high number of different involved entities, each with approximately equal power, should yield high entropy. Each metric is implemented in a separate Python script in the folder metrics . Each script defines a function named compute_<metric_name> , which takes as input a dictionary of the form {'<entity name>': <number of resources>} (and possibly other relevant arguments) and outputs the corresponding metric values.","title":"Metrics"},{"location":"parsers/","text":"Parsers The parser obtains raw data from a full node (see Data Collection page on how to obtain the required data). It parses the data and outputs a json file with a list of entries, each entry corresponding to a block. The input file should be placed in the input directory and named as <project_name>_raw_data.json . The output file is stored under output/<project_name>/parsed_data.json and is structured as follows: [ { \"number\": \"<block's number>\", \"timestamp\": \"<block's timestamp of the form: yyyy-mm-dd hh:mm:ss UTC>\", \"reward_addresses\": \"<address1>,<address2>\" \"identifiers\": \"<identifiers>\" } ] number and timestamp are consistent among different blockchains. reward_addresses and identifiers vary, depending on each ledger. Specifically, reward_addresses corresponds to: Bitcoin , Bitcoin Cash , Dogecoin , Litecoin , Zcash : a string of comma-separated addresses which appear in the block's coinbase transaction with non-negative value (i.e., which are given part of the block's fees) Ethereum : the block's miner field Cardano : the hash of the pool that created the data, if defined, otherwise the empty string Tezos : the block's baker field The field identifiers corresponds to: Bitcoin , Bitcoin Cash , Dogecoin , Litecoin , Zcash : the field coinbase_param of the block's coinbase transaction Ethereum : the block's extra_data field Cardano : the ticker name of the pool that created the block, if defined, otherwise an empty string Tezos : there is no such field If using BigQuery, the queries for Bitcoin, Bitcoin Cash, Dogecoin, Litecoin, Zcash (see Data Collection ) return data that are parsed with the default_parser script in parsers . The query for Cardano returns data that is parsed using the cardano_parser script in parsers . All other queries return data already in the necessary parsed form, so they are parsed using a \"dummy\" parser.","title":"Parsers"},{"location":"parsers/#parsers","text":"The parser obtains raw data from a full node (see Data Collection page on how to obtain the required data). It parses the data and outputs a json file with a list of entries, each entry corresponding to a block. The input file should be placed in the input directory and named as <project_name>_raw_data.json . The output file is stored under output/<project_name>/parsed_data.json and is structured as follows: [ { \"number\": \"<block's number>\", \"timestamp\": \"<block's timestamp of the form: yyyy-mm-dd hh:mm:ss UTC>\", \"reward_addresses\": \"<address1>,<address2>\" \"identifiers\": \"<identifiers>\" } ] number and timestamp are consistent among different blockchains. reward_addresses and identifiers vary, depending on each ledger. Specifically, reward_addresses corresponds to: Bitcoin , Bitcoin Cash , Dogecoin , Litecoin , Zcash : a string of comma-separated addresses which appear in the block's coinbase transaction with non-negative value (i.e., which are given part of the block's fees) Ethereum : the block's miner field Cardano : the hash of the pool that created the data, if defined, otherwise the empty string Tezos : the block's baker field The field identifiers corresponds to: Bitcoin , Bitcoin Cash , Dogecoin , Litecoin , Zcash : the field coinbase_param of the block's coinbase transaction Ethereum : the block's extra_data field Cardano : the ticker name of the pool that created the block, if defined, otherwise an empty string Tezos : there is no such field If using BigQuery, the queries for Bitcoin, Bitcoin Cash, Dogecoin, Litecoin, Zcash (see Data Collection ) return data that are parsed with the default_parser script in parsers . The query for Cardano returns data that is parsed using the cardano_parser script in parsers . All other queries return data already in the necessary parsed form, so they are parsed using a \"dummy\" parser.","title":"Parsers"},{"location":"setup/","text":"Setup Installation To install the pooling analysis tool, simply clone this GitHub repository: git clone https://github.com/Blockchain-Technology-Lab/pooling-analysis.git The tool is written in Python 3, therefore a Python 3 interpreter is required in order to run it locally. The requirements file lists the dependencies of the project. Make sure you have all of them installed before running the scripts. To install all of them in one go, run the following command from the root directory of the project: python -m pip install -r requirements.txt Execution The pooling analysis tool is a CLI tool. The run.py script in the root directory of the project invokes the required parsers, mappings and metrics, but it is also possible to execute each module individually. The following process describes the most typical workflow. Place all raw data (which could be collected from BigQuery for example; see Data Collection for more details) in the input directory, each file named as <project_name>_raw_data.json (e.g., bitcoin_raw_data.json ). By default, there is a (very small) sample input file for some supported projects; to use it, remove the prefix sample_ . Run python run.py --ledgers <ledger_1> <ledger_n> --timeframe <timeframe> to analyze the n specified ledgers for the given timeframe. Both arguments are optional, so it's possible to omit one or both of them; in this case, the default values will be used. Specifically: ledgers accepts any number of the supported ledgers (case-insensitive). For example, --ledgers bitcoin would run the analysis for Bitcoin, while --ledgers Bitcoin Ethereum Cardano would run the analysis for Bitcoin, Ethereum and Cardano. If the ledgers argument is omitted, then all supported ledgers are analyzed. The timeframe argument should be of the form YYYY-MM-DD (month and day can be omitted). For example, --timeframe 2022 would run the analysis for the year 2022, while --timeframe 2022-02 would do it for the month of February 2022. If the timeframe argument is omitted, then a monthly analysis is performed for each month between January 2018 and the current month. The script will print the output of each implemented metric for the specified ledgers and timeframe. All output files can then be found under the output directory, which is automatically created the first time the tool is run.","title":"How to use"},{"location":"setup/#setup","text":"","title":"Setup"},{"location":"setup/#installation","text":"To install the pooling analysis tool, simply clone this GitHub repository: git clone https://github.com/Blockchain-Technology-Lab/pooling-analysis.git The tool is written in Python 3, therefore a Python 3 interpreter is required in order to run it locally. The requirements file lists the dependencies of the project. Make sure you have all of them installed before running the scripts. To install all of them in one go, run the following command from the root directory of the project: python -m pip install -r requirements.txt","title":"Installation"},{"location":"setup/#execution","text":"The pooling analysis tool is a CLI tool. The run.py script in the root directory of the project invokes the required parsers, mappings and metrics, but it is also possible to execute each module individually. The following process describes the most typical workflow. Place all raw data (which could be collected from BigQuery for example; see Data Collection for more details) in the input directory, each file named as <project_name>_raw_data.json (e.g., bitcoin_raw_data.json ). By default, there is a (very small) sample input file for some supported projects; to use it, remove the prefix sample_ . Run python run.py --ledgers <ledger_1> <ledger_n> --timeframe <timeframe> to analyze the n specified ledgers for the given timeframe. Both arguments are optional, so it's possible to omit one or both of them; in this case, the default values will be used. Specifically: ledgers accepts any number of the supported ledgers (case-insensitive). For example, --ledgers bitcoin would run the analysis for Bitcoin, while --ledgers Bitcoin Ethereum Cardano would run the analysis for Bitcoin, Ethereum and Cardano. If the ledgers argument is omitted, then all supported ledgers are analyzed. The timeframe argument should be of the form YYYY-MM-DD (month and day can be omitted). For example, --timeframe 2022 would run the analysis for the year 2022, while --timeframe 2022-02 would do it for the month of February 2022. If the timeframe argument is omitted, then a monthly analysis is performed for each month between January 2018 and the current month. The script will print the output of each implemented metric for the specified ledgers and timeframe. All output files can then be found under the output directory, which is automatically created the first time the tool is run.","title":"Execution"}]}
\ No newline at end of file
+{"config":{"indexing":"full","lang":["en"],"min_search_length":3,"prebuild_index":false,"separator":"[\\s\\-]+"},"docs":[{"location":"","text":"Blockchain Pooling Analysis - Documentation This is the documentation for the Blockchain Pooling Analysis tool developed by the University of Edinburgh's Blockchain Technology Lab. The tool is responsible for analyzing pooling behavior of various blockchains and measuring their subsequent decentralization levels. The relevant source code is available on GitHub . Overview The tool consists of the following modules: Parser Mapping Metrics The parser is responsible for pre-processing the raw data that comes from a full node. It produces a file with all the information that is needed for the mapping. The mapping takes the output of the parser and combines it with some other sources of information. It then outputs a file that reveals the distribution of resources to different entities. In this context, \"resources\" correspond to the number of produced blocks. This distribution is the input for the metrics module, which tracks various decentralization-related metrics and produces files with the results. More details about the different modules can be found in the corresponding Parser , Mapping and Metrics pages. Currently, the supported ledgers are: Bitcoin Bitcoin Cash Cardano Dogecoin Ethereum Litecoin Tezos Zcash We intend to add more ledgers to this list in the future. Contributing This is an open source project licensed under the terms and conditions of the MIT license and CC BY-SA 4.0 . Everyone is welcome to contribute to it by proposing or implementing their ideas. Example contributions include, but are not limited to, reporting potential bugs, supplying useful information for the mappings of supported ledgers, adding support for a new ledger, or making the code more efficient. All contributions to the project will also be covered by the above-mentioned license. When making changes in the code, contributors are required to fork the project's repository first and then issue a pull request with their changes. Each PR will be reviewed before being merged to the main branch. Bugs can be reported in the Issues page. Other comments and ideas can be brought up in the project's Discussions . For more information on how to make specific contributions, see the Contribute page.","title":"Home"},{"location":"#blockchain-pooling-analysis-documentation","text":"This is the documentation for the Blockchain Pooling Analysis tool developed by the University of Edinburgh's Blockchain Technology Lab. The tool is responsible for analyzing pooling behavior of various blockchains and measuring their subsequent decentralization levels. The relevant source code is available on GitHub .","title":"Blockchain Pooling Analysis - Documentation"},{"location":"#overview","text":"The tool consists of the following modules: Parser Mapping Metrics The parser is responsible for pre-processing the raw data that comes from a full node. It produces a file with all the information that is needed for the mapping. The mapping takes the output of the parser and combines it with some other sources of information. It then outputs a file that reveals the distribution of resources to different entities. In this context, \"resources\" correspond to the number of produced blocks. This distribution is the input for the metrics module, which tracks various decentralization-related metrics and produces files with the results. More details about the different modules can be found in the corresponding Parser , Mapping and Metrics pages. Currently, the supported ledgers are: Bitcoin Bitcoin Cash Cardano Dogecoin Ethereum Litecoin Tezos Zcash We intend to add more ledgers to this list in the future.","title":"Overview"},{"location":"#contributing","text":"This is an open source project licensed under the terms and conditions of the MIT license and CC BY-SA 4.0 . Everyone is welcome to contribute to it by proposing or implementing their ideas. Example contributions include, but are not limited to, reporting potential bugs, supplying useful information for the mappings of supported ledgers, adding support for a new ledger, or making the code more efficient. All contributions to the project will also be covered by the above-mentioned license. When making changes in the code, contributors are required to fork the project's repository first and then issue a pull request with their changes. Each PR will be reviewed before being merged to the main branch. Bugs can be reported in the Issues page. Other comments and ideas can be brought up in the project's Discussions . For more information on how to make specific contributions, see the Contribute page.","title":"Contributing"},{"location":"contribute/","text":"How to contribute You can contribute to the tool by adding support for a ledger, updating the mapping process for an existing ledger, or adding a new metric. In all cases, the information should be submitted via a GitHub PR. Add support for ledgers You can add support for a ledger that is not already supported as follows. Mapping information In the directory mapping_information , there exist three folders ( addresses , clusters , identifiers ). In each folder, add a file named <project_name>.json , if there exist such information for the new ledger (for more details on what type of information each folder corresponds to see the mapping documentation ). Parser and mapping In the directory src/parsers , create a file named <project_name>_parser.py , if no existing parser can be reused. In this file create a new class, which inherits from the DefaultParser class of default_parser.py . Then, override its parse method in order to implement the new parser (or override another method if there are only small changes needed, e.g. parse_identifiers if the only thing that is different from the default parser is the way identifiers are decoded). In the directory src/mappings , create a file named <project_name>_mapping.py , if no existing mapping can be reused. In this file create a new class, which inherits from the DefaultMapping class of default_mapping.py . Then, override its process method. This method takes as input a time period in the form yyyy-mm-dd (e.g., '2022' for the year 2022, '2022-11' for November 2022, '2022-11-12' for 12 November 2022), returns a dictionary of the form {'<entity name>': <number of resources>} , and creates a csv file with the mapped data for this timeframe in the output directory. Then, you should enable support for the new ledger in the parser and mapping module scripts. Specifically: in the script src/parse.py , import the parser class and assign it to the project's name in the ledger_parser dictionary; in the script src/map.py , import the mapping class and assign it to the project's name in the dictionary ledger_mapping . Notes : You should add an entry in each dictionary, regardless of whether you use a new or existing parser or mapping \u2013 if no new parser or mapping class was created for the project, simply assign the suitable class (e.g. DefaultParser or DefaultMapping) to the project's name in the corresponding dictionary. If you create a new parser/mapping, you should also add unit tests here Documentation Finally, you should include the new ledger in the documentation pages; specifically: add the ledger in the list of supported ledgers in the repository's main README file add the ledger in the list of supported ledgers in the index documentation page document the new ledger's parser in the corresponding documentation page document how the new ledger's data is retrieved in the corresponding documentation page ; if Google BigQuery is used, add the new query to queries.yaml Update existing mapping information All mapping data are in the folder mapping_information . To update or add information about a supported ledger's mapping, you should open a Pull Request. This can be done either via console or as follows, via the browser: Open the file that you want to change (e.g., for Bitcoin, follow this link ) on your browser. Click Edit this file . Make your changes in the file. On the bottom, initiate a Pull Request. Write a short and descriptive commit title message (e.g., \"Update 2019 links for company A\"). Select Create a new branch for this commit and start a pull request. In the page that opens, change the PR title (if necessary) and click on Create pull request . When updating the mapping information, the following guidelines should be observed: The link to a pool's website should be active and public. All sources cited should be publicly available and respectable. Unofficial tweets or unavailable or private sources will be rejected.You can use specific keywords, in the cases when the information is available on-chain. Specifically: homepage : this keyword is used in Cardano, to denote that two pools define the same homepage in their metadata (which are published on-chain) Specifically, for legal_links.json : The value of the pool's name (that is the first value in each array entry under a company), should be the same as the value that corresponds to a key name in the ledger-specific pool information, as defined in the corresponding addresses , clusters or identifiers file. If this string is not exactly the same (including capitalization), the link will not be identified during the mapping process. There should exist no time gaps in a pool's ownership structure. Add metrics To add a new metric, you should do the following steps. First, create a relevant script in the folder src/metrics . The script should include a function named compute_{metric_name} that, given a dictionary of entities (as keys) to number of blocks (as values), outputs a single value (the outcome of the metric). Second, import this new function to src/analyze.py . Third, add the name of the metric (which should be the same as the one used in the filename above) and any parameter values it might require to the file config.yaml , under metrics . Fourth, you should add unit tests for the new metric here . Finally, you should update the corresponding documentation page","title":"How to contribute"},{"location":"contribute/#how-to-contribute","text":"You can contribute to the tool by adding support for a ledger, updating the mapping process for an existing ledger, or adding a new metric. In all cases, the information should be submitted via a GitHub PR.","title":"How to contribute"},{"location":"contribute/#add-support-for-ledgers","text":"You can add support for a ledger that is not already supported as follows.","title":"Add support for ledgers"},{"location":"contribute/#mapping-information","text":"In the directory mapping_information , there exist three folders ( addresses , clusters , identifiers ). In each folder, add a file named <project_name>.json , if there exist such information for the new ledger (for more details on what type of information each folder corresponds to see the mapping documentation ).","title":"Mapping information"},{"location":"contribute/#parser-and-mapping","text":"In the directory src/parsers , create a file named <project_name>_parser.py , if no existing parser can be reused. In this file create a new class, which inherits from the DefaultParser class of default_parser.py . Then, override its parse method in order to implement the new parser (or override another method if there are only small changes needed, e.g. parse_identifiers if the only thing that is different from the default parser is the way identifiers are decoded). In the directory src/mappings , create a file named <project_name>_mapping.py , if no existing mapping can be reused. In this file create a new class, which inherits from the DefaultMapping class of default_mapping.py . Then, override its process method. This method takes as input a time period in the form yyyy-mm-dd (e.g., '2022' for the year 2022, '2022-11' for November 2022, '2022-11-12' for 12 November 2022), returns a dictionary of the form {'<entity name>': <number of resources>} , and creates a csv file with the mapped data for this timeframe in the output directory. Then, you should enable support for the new ledger in the parser and mapping module scripts. Specifically: in the script src/parse.py , import the parser class and assign it to the project's name in the ledger_parser dictionary; in the script src/map.py , import the mapping class and assign it to the project's name in the dictionary ledger_mapping . Notes : You should add an entry in each dictionary, regardless of whether you use a new or existing parser or mapping \u2013 if no new parser or mapping class was created for the project, simply assign the suitable class (e.g. DefaultParser or DefaultMapping) to the project's name in the corresponding dictionary. If you create a new parser/mapping, you should also add unit tests here","title":"Parser and mapping"},{"location":"contribute/#documentation","text":"Finally, you should include the new ledger in the documentation pages; specifically: add the ledger in the list of supported ledgers in the repository's main README file add the ledger in the list of supported ledgers in the index documentation page document the new ledger's parser in the corresponding documentation page document how the new ledger's data is retrieved in the corresponding documentation page ; if Google BigQuery is used, add the new query to queries.yaml","title":"Documentation"},{"location":"contribute/#update-existing-mapping-information","text":"All mapping data are in the folder mapping_information . To update or add information about a supported ledger's mapping, you should open a Pull Request. This can be done either via console or as follows, via the browser: Open the file that you want to change (e.g., for Bitcoin, follow this link ) on your browser. Click Edit this file . Make your changes in the file. On the bottom, initiate a Pull Request. Write a short and descriptive commit title message (e.g., \"Update 2019 links for company A\"). Select Create a new branch for this commit and start a pull request. In the page that opens, change the PR title (if necessary) and click on Create pull request . When updating the mapping information, the following guidelines should be observed: The link to a pool's website should be active and public. All sources cited should be publicly available and respectable. Unofficial tweets or unavailable or private sources will be rejected.You can use specific keywords, in the cases when the information is available on-chain. Specifically: homepage : this keyword is used in Cardano, to denote that two pools define the same homepage in their metadata (which are published on-chain) Specifically, for legal_links.json : The value of the pool's name (that is the first value in each array entry under a company), should be the same as the value that corresponds to a key name in the ledger-specific pool information, as defined in the corresponding addresses , clusters or identifiers file. If this string is not exactly the same (including capitalization), the link will not be identified during the mapping process. There should exist no time gaps in a pool's ownership structure.","title":"Update existing mapping information"},{"location":"contribute/#add-metrics","text":"To add a new metric, you should do the following steps. First, create a relevant script in the folder src/metrics . The script should include a function named compute_{metric_name} that, given a dictionary of entities (as keys) to number of blocks (as values), outputs a single value (the outcome of the metric). Second, import this new function to src/analyze.py . Third, add the name of the metric (which should be the same as the one used in the filename above) and any parameter values it might require to the file config.yaml , under metrics . Fourth, you should add unit tests for the new metric here . Finally, you should update the corresponding documentation page","title":"Add metrics"},{"location":"data/","text":"Data collection Currently, the data for the analysis of the different ledgers is collected through Google BigQuery . Note that when saving results from BigQuery you should select the option \"JSONL (newline delimited)\". Sample data & queries Bitcoin Sample raw Bitcoin data are available here . They can be retrieved using Google BigQuery with the following query: SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_bitcoin.transactions`.outputs FROM `bigquery-public-data.crypto_bitcoin.transactions` JOIN `bigquery-public-data.crypto_bitcoin.blocks` ON `bigquery-public-data.crypto_bitcoin.transactions`.block_number = `bigquery-public-data.crypto_bitcoin.blocks`.number WHERE is_coinbase is TRUE AND timestamp > '2017-12-31' Bitcoin Cash Sample raw Bitcoin Cash data are available here . They can be retrieved using Google BigQuery with the following query: SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_bitcoin_cash.transactions`.outputs FROM `bigquery-public-data.crypto_bitcoin_cash.transactions` JOIN `bigquery-public-data.crypto_bitcoin_cash.blocks` ON `bigquery-public-data.crypto_bitcoin_cash.transactions`.block_number = `bigquery-public-data.crypto_bitcoin_cash.blocks`.number WHERE is_coinbase is TRUE AND timestamp > '2018-12-31' Cardano Sample raw Cardano data are available here . They can be retrieved using Google BigQuery with the following query: SELECT `iog-data-analytics.cardano_mainnet.block`.slot_no as number, `iog-data-analytics.cardano_mainnet.pool_offline_data`.ticker_name as identifiers, `iog-data-analytics.cardano_mainnet.block`.block_time as timestamp,`iog-data-analytics.cardano_mainnet.block`.pool_hash as reward_addresses FROM `iog-data-analytics.cardano_mainnet.block` LEFT JOIN `iog-data-analytics.cardano_mainnet.pool_offline_data` ON `iog-data-analytics.cardano_mainnet.block`.pool_hash = `iog-data-analytics.cardano_mainnet.pool_offline_data`.pool_hash WHERE `iog-data-analytics.cardano_mainnet.block`.block_time > '2020-12-31' Dogecoin Sample raw Dogecoin data are available here . They can be retrieved using Google BigQuery with the following query: SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_dogecoin.transactions`.outputs FROM `bigquery-public-data.crypto_dogecoin.transactions` JOIN `bigquery-public-data.crypto_dogecoin.blocks` ON `bigquery-public-data.crypto_dogecoin.transactions`.block_number = `bigquery-public-data.crypto_dogecoin.blocks`.number WHERE is_coinbase is TRUE AND timestamp > '2019-12-31' Ethereum Sample raw Ethereum data are available here . They can be retrieved using Google BigQuery with the following query: SELECT number, timestamp, miner as reward_addresses, extra_data as identifiers FROM `bigquery-public-data.crypto_ethereum.blocks` WHERE timestamp > '2018-12-31' Litecoin Sample raw Litecoin data are available here . They can be retrieved using Google BigQuery with the following query: SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_litecoin.transactions`.outputs FROM `bigquery-public-data.crypto_litecoin.transactions` JOIN `bigquery-public-data.crypto_litecoin.blocks` ON `bigquery-public-data.crypto_litecoin.transactions`.block_number = `bigquery-public-data.crypto_litecoin.blocks`.number WHERE is_coinbase is TRUE AND timestamp > '2018-12-31' Tezos Sample raw Tezos data are available here . They can be retrieved using Google BigQuery with the following query: SELECT level as number, timestamp, baker as reward_addresses FROM `public-data-finance.crypto_tezos.blocks` WHERE timestamp > '2020-12-31' Zcash Sample raw Zcash data are available here . They can be retrieved using Google BigQuery with the following query: SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_zcash.transactions`.outputs FROM `bigquery-public-data.crypto_zcash.transactions` JOIN `bigquery-public-data.crypto_zcash.blocks` ON `bigquery-public-data.crypto_zcash.transactions`.block_number = `bigquery-public-data.crypto_zcash.blocks`.number WHERE is_coinbase is TRUE AND timestamp > '2018-12-31' Automating the data collection process Instead of executing each of these queries separately on the BigQuery console and saving the results manually, it is possible to automate the process using a script and collect all relevant data in one go. Executing this script will run all queries in this file , so you can also control which queries are run by adding them to or removing them from the file. IMPORTANT: the script uses service account credentials for authentication, therefore before running it, you need to generate the relevant credentials from Google, as described here and save your key in the root directory of the project under the name 'google-service-account-key.json'. There is a sample file that you can consult, which shows what your credentials are supposed to look like (but note that this is for informational purposes only, this file is not used in the code). Once you have set up the credentials, you can just run the following command from the src directory to retrieve data for all supported blockchains: python collect_data.py","title":"Data Collection"},{"location":"data/#data-collection","text":"Currently, the data for the analysis of the different ledgers is collected through Google BigQuery . Note that when saving results from BigQuery you should select the option \"JSONL (newline delimited)\".","title":"Data collection"},{"location":"data/#sample-data-queries","text":"","title":"Sample data &amp; queries"},{"location":"data/#bitcoin","text":"Sample raw Bitcoin data are available here . They can be retrieved using Google BigQuery with the following query: SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_bitcoin.transactions`.outputs FROM `bigquery-public-data.crypto_bitcoin.transactions` JOIN `bigquery-public-data.crypto_bitcoin.blocks` ON `bigquery-public-data.crypto_bitcoin.transactions`.block_number = `bigquery-public-data.crypto_bitcoin.blocks`.number WHERE is_coinbase is TRUE AND timestamp > '2017-12-31'","title":"Bitcoin"},{"location":"data/#bitcoin-cash","text":"Sample raw Bitcoin Cash data are available here . They can be retrieved using Google BigQuery with the following query: SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_bitcoin_cash.transactions`.outputs FROM `bigquery-public-data.crypto_bitcoin_cash.transactions` JOIN `bigquery-public-data.crypto_bitcoin_cash.blocks` ON `bigquery-public-data.crypto_bitcoin_cash.transactions`.block_number = `bigquery-public-data.crypto_bitcoin_cash.blocks`.number WHERE is_coinbase is TRUE AND timestamp > '2018-12-31'","title":"Bitcoin Cash"},{"location":"data/#cardano","text":"Sample raw Cardano data are available here . They can be retrieved using Google BigQuery with the following query: SELECT `iog-data-analytics.cardano_mainnet.block`.slot_no as number, `iog-data-analytics.cardano_mainnet.pool_offline_data`.ticker_name as identifiers, `iog-data-analytics.cardano_mainnet.block`.block_time as timestamp,`iog-data-analytics.cardano_mainnet.block`.pool_hash as reward_addresses FROM `iog-data-analytics.cardano_mainnet.block` LEFT JOIN `iog-data-analytics.cardano_mainnet.pool_offline_data` ON `iog-data-analytics.cardano_mainnet.block`.pool_hash = `iog-data-analytics.cardano_mainnet.pool_offline_data`.pool_hash WHERE `iog-data-analytics.cardano_mainnet.block`.block_time > '2020-12-31'","title":"Cardano"},{"location":"data/#dogecoin","text":"Sample raw Dogecoin data are available here . They can be retrieved using Google BigQuery with the following query: SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_dogecoin.transactions`.outputs FROM `bigquery-public-data.crypto_dogecoin.transactions` JOIN `bigquery-public-data.crypto_dogecoin.blocks` ON `bigquery-public-data.crypto_dogecoin.transactions`.block_number = `bigquery-public-data.crypto_dogecoin.blocks`.number WHERE is_coinbase is TRUE AND timestamp > '2019-12-31'","title":"Dogecoin"},{"location":"data/#ethereum","text":"Sample raw Ethereum data are available here . They can be retrieved using Google BigQuery with the following query: SELECT number, timestamp, miner as reward_addresses, extra_data as identifiers FROM `bigquery-public-data.crypto_ethereum.blocks` WHERE timestamp > '2018-12-31'","title":"Ethereum"},{"location":"data/#litecoin","text":"Sample raw Litecoin data are available here . They can be retrieved using Google BigQuery with the following query: SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_litecoin.transactions`.outputs FROM `bigquery-public-data.crypto_litecoin.transactions` JOIN `bigquery-public-data.crypto_litecoin.blocks` ON `bigquery-public-data.crypto_litecoin.transactions`.block_number = `bigquery-public-data.crypto_litecoin.blocks`.number WHERE is_coinbase is TRUE AND timestamp > '2018-12-31'","title":"Litecoin"},{"location":"data/#tezos","text":"Sample raw Tezos data are available here . They can be retrieved using Google BigQuery with the following query: SELECT level as number, timestamp, baker as reward_addresses FROM `public-data-finance.crypto_tezos.blocks` WHERE timestamp > '2020-12-31'","title":"Tezos"},{"location":"data/#zcash","text":"Sample raw Zcash data are available here . They can be retrieved using Google BigQuery with the following query: SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_zcash.transactions`.outputs FROM `bigquery-public-data.crypto_zcash.transactions` JOIN `bigquery-public-data.crypto_zcash.blocks` ON `bigquery-public-data.crypto_zcash.transactions`.block_number = `bigquery-public-data.crypto_zcash.blocks`.number WHERE is_coinbase is TRUE AND timestamp > '2018-12-31'","title":"Zcash"},{"location":"data/#automating-the-data-collection-process","text":"Instead of executing each of these queries separately on the BigQuery console and saving the results manually, it is possible to automate the process using a script and collect all relevant data in one go. Executing this script will run all queries in this file , so you can also control which queries are run by adding them to or removing them from the file. IMPORTANT: the script uses service account credentials for authentication, therefore before running it, you need to generate the relevant credentials from Google, as described here and save your key in the root directory of the project under the name 'google-service-account-key.json'. There is a sample file that you can consult, which shows what your credentials are supposed to look like (but note that this is for informational purposes only, this file is not used in the code). Once you have set up the credentials, you can just run the following command from the src directory to retrieve data for all supported blockchains: python collect_data.py","title":"Automating the data collection process"},{"location":"mappings/","text":"Mappings A mapping obtains the parsed data (from output/<project_name>/parsed_data.json ) and outputs a csv file that maps blocks to entities, structured as follows: Entity,Resources <name of entity>,<(int) number of blocks> The name of the csv file is the timeframe, over which the mapping was executed (e.g., 2021-04.csv ). The file is stored in the project's output directory ( output/<project_name>/ ). The logic of the mapping depends on the type of clustering we want to achieve. So, different mappings will output different results, even if applied on the same data. An exception to this is the \"no-cluster\" mapping (DummyMapping in the code), which maps blocks to reward addresses, so it doesn't perform any extra processing on the raw data. Mapping Information To assist the mapping process, the directory mapping_information/ contains mapping information about the supported projects. There exist three subdirectories and two additional files. In each subdirectory there exists a file for the corresponding ledger data, if such data exists. Identifiers The files under identifiers define information about block creators. Each key corresponds to a tag or ticker, by which the pool is identifiable in its produced blocks. The value for each key is a dictionary of pool-related information, specifically its name, a URL to its homepage, etc. Each file's structure is as follows: { \"P1\": { \"name\": \"Pool P1\", \"homepage\": \"example.com/p1\" }, \"--P2--\": { \"name\": \"Pool P2\", \"homepage\": \"example.com/p2\" } } Clusters The files under clusters define information about pool clusters. This information is organized per cluster. For each cluster, an array of pool-related information is defined. Each item in the array defines the pool's name, the time window during which the pool belonged to the cluster (from the beginning of from until the beginning of to excluding ), and the publicly available source of information, via which the link between the pool and the cluster is established. Each file's structure is as follows: { \"cluster A\": [ {\"name\": \"P1\", \"from\": \"\", \"to\": \"2023\", \"source\": \"example.com/link1\"} ], \"cluster B\": [ {\"name\": \"--P2--\", \"from\": \"\", \"to\": \"\", \"source\": \"example.com/link2\"} ] } Addresses The files under addresses define ownership information about addresses. As with clusters, for each address the pool ownership information defines the pool's name and a public source of information about the ownership. Each file's structure is as follows: { \"address1\": {\"name\": \"Pool P2\", \"source\": \"example.com\"}, } Legal links The file legal_links.json defines legal links between pools and companies, based on off-chain information. For example, it defines ownership information of a pool by a company. The structure of the file is as follows: { \"<parent company>\": [ {\"name\": \"<pool name>\", \"from\": \"<start date>\", \"to\": \"<end date>\", \"source\": \"<source of information>\"} ] } The values for each entry are the same as clusters in the above pool information. Special addresses The file special_addresses.json defines per-project information about addresses that are not related to some entity but are used for protocol-specific reasons (e.g. treasury address). The format of the file is the following: { \"Project A\": [ {\"address\": \"A special address 1\", \"source\": \"some.public.source\"}, {\"address\": \"A special address 2\", \"source\": \"some.public.source\"} ], \"Project B\": [ {\"address\": \"B special address\", \"source\": \"some.public.source\"} ] } Mapping process implementation In our implementation, the mapping of a block uses the auxiliary information as follows. First, it iterates over all known tags and compares each one with the block's identifiers. If the tag is a substring of the parameter, then we have a match. Second, if the first step fails, we compare the block's reward addresses with known pool addresses and again look for a match. In both cases, if there is a match, then: (i) we map the block to the matched pool; (ii) we associate all of the block's reward addresses (that is, the addresses that receive fees from the block) with the matched pool. In essence, the identifiers are the principal element for mapping a block to an entity and the known addresses are the fallback mechanism. If there is a match, we also parse the auxiliary information, such as pool ownership or clusters, in order to assign the block to the top level entity, e.g., the pool's parent company or cluster. If both mechanisms fail, then no match is found. In this case, we assign the reward addresses as the block's entity.","title":"Mappings"},{"location":"mappings/#mappings","text":"A mapping obtains the parsed data (from output/<project_name>/parsed_data.json ) and outputs a csv file that maps blocks to entities, structured as follows: Entity,Resources <name of entity>,<(int) number of blocks> The name of the csv file is the timeframe, over which the mapping was executed (e.g., 2021-04.csv ). The file is stored in the project's output directory ( output/<project_name>/ ). The logic of the mapping depends on the type of clustering we want to achieve. So, different mappings will output different results, even if applied on the same data. An exception to this is the \"no-cluster\" mapping (DummyMapping in the code), which maps blocks to reward addresses, so it doesn't perform any extra processing on the raw data.","title":"Mappings"},{"location":"mappings/#mapping-information","text":"To assist the mapping process, the directory mapping_information/ contains mapping information about the supported projects. There exist three subdirectories and two additional files. In each subdirectory there exists a file for the corresponding ledger data, if such data exists.","title":"Mapping Information"},{"location":"mappings/#identifiers","text":"The files under identifiers define information about block creators. Each key corresponds to a tag or ticker, by which the pool is identifiable in its produced blocks. The value for each key is a dictionary of pool-related information, specifically its name, a URL to its homepage, etc. Each file's structure is as follows: { \"P1\": { \"name\": \"Pool P1\", \"homepage\": \"example.com/p1\" }, \"--P2--\": { \"name\": \"Pool P2\", \"homepage\": \"example.com/p2\" } }","title":"Identifiers"},{"location":"mappings/#clusters","text":"The files under clusters define information about pool clusters. This information is organized per cluster. For each cluster, an array of pool-related information is defined. Each item in the array defines the pool's name, the time window during which the pool belonged to the cluster (from the beginning of from until the beginning of to excluding ), and the publicly available source of information, via which the link between the pool and the cluster is established. Each file's structure is as follows: { \"cluster A\": [ {\"name\": \"P1\", \"from\": \"\", \"to\": \"2023\", \"source\": \"example.com/link1\"} ], \"cluster B\": [ {\"name\": \"--P2--\", \"from\": \"\", \"to\": \"\", \"source\": \"example.com/link2\"} ] }","title":"Clusters"},{"location":"mappings/#addresses","text":"The files under addresses define ownership information about addresses. As with clusters, for each address the pool ownership information defines the pool's name and a public source of information about the ownership. Each file's structure is as follows: { \"address1\": {\"name\": \"Pool P2\", \"source\": \"example.com\"}, }","title":"Addresses"},{"location":"mappings/#legal-links","text":"The file legal_links.json defines legal links between pools and companies, based on off-chain information. For example, it defines ownership information of a pool by a company. The structure of the file is as follows: { \"<parent company>\": [ {\"name\": \"<pool name>\", \"from\": \"<start date>\", \"to\": \"<end date>\", \"source\": \"<source of information>\"} ] } The values for each entry are the same as clusters in the above pool information.","title":"Legal links"},{"location":"mappings/#special-addresses","text":"The file special_addresses.json defines per-project information about addresses that are not related to some entity but are used for protocol-specific reasons (e.g. treasury address). The format of the file is the following: { \"Project A\": [ {\"address\": \"A special address 1\", \"source\": \"some.public.source\"}, {\"address\": \"A special address 2\", \"source\": \"some.public.source\"} ], \"Project B\": [ {\"address\": \"B special address\", \"source\": \"some.public.source\"} ] }","title":"Special addresses"},{"location":"mappings/#mapping-process-implementation","text":"In our implementation, the mapping of a block uses the auxiliary information as follows. First, it iterates over all known tags and compares each one with the block's identifiers. If the tag is a substring of the parameter, then we have a match. Second, if the first step fails, we compare the block's reward addresses with known pool addresses and again look for a match. In both cases, if there is a match, then: (i) we map the block to the matched pool; (ii) we associate all of the block's reward addresses (that is, the addresses that receive fees from the block) with the matched pool. In essence, the identifiers are the principal element for mapping a block to an entity and the known addresses are the fallback mechanism. If there is a match, we also parse the auxiliary information, such as pool ownership or clusters, in order to assign the block to the top level entity, e.g., the pool's parent company or cluster. If both mechanisms fail, then no match is found. In this case, we assign the reward addresses as the block's entity.","title":"Mapping process implementation"},{"location":"metrics/","text":"Metrics A metric gets the mapped data (see above Mapping ) and outputs a relevant value. The implemented metrics are the following (with more to be added in the future): Nakamoto coefficient : The Nakamoto coefficient is the minimum number of entities that collectively produce more than 50% of the produced blocks within a given timeframe. The output of the metric is a tuple of the Nakamoto coefficient and the power percentage that these entities control. Gini coefficient : The Gini coefficient represents the degree of inequality in block production. The output of the metric a decimal number in [0,1]. Values close to 0 indicate equality (all entities in the system produce the same number of blocks) and values close to 1 indicate inequality (one entity produces most or all blocks). Entropy : Entropy represents the expected amount of information in the distribution of blocks across entities. Entropy is parameterized by a base rate \u03b1, which defines different types of entropy: (a) -1: min; (b) 0: Hartley; (c) 1: Shannon; (d) 2: collision. The output of the metric is a real number. Typically, a high number of different involved entities, each with approximately equal power, should yield high entropy. Each metric is implemented in a separate Python script in the folder metrics . Each script defines a function named compute_<metric_name> , which takes as input a dictionary of the form {'<entity name>': <number of resources>} (and possibly other relevant arguments) and outputs the corresponding metric values.","title":"Metrics"},{"location":"metrics/#metrics","text":"A metric gets the mapped data (see above Mapping ) and outputs a relevant value. The implemented metrics are the following (with more to be added in the future): Nakamoto coefficient : The Nakamoto coefficient is the minimum number of entities that collectively produce more than 50% of the produced blocks within a given timeframe. The output of the metric is a tuple of the Nakamoto coefficient and the power percentage that these entities control. Gini coefficient : The Gini coefficient represents the degree of inequality in block production. The output of the metric a decimal number in [0,1]. Values close to 0 indicate equality (all entities in the system produce the same number of blocks) and values close to 1 indicate inequality (one entity produces most or all blocks). Entropy : Entropy represents the expected amount of information in the distribution of blocks across entities. Entropy is parameterized by a base rate \u03b1, which defines different types of entropy: (a) -1: min; (b) 0: Hartley; (c) 1: Shannon; (d) 2: collision. The output of the metric is a real number. Typically, a high number of different involved entities, each with approximately equal power, should yield high entropy. Each metric is implemented in a separate Python script in the folder metrics . Each script defines a function named compute_<metric_name> , which takes as input a dictionary of the form {'<entity name>': <number of resources>} (and possibly other relevant arguments) and outputs the corresponding metric values.","title":"Metrics"},{"location":"parsers/","text":"Parsers The parser obtains raw data from a full node (see Data Collection page on how to obtain the required data). It parses the data and outputs a json file with a list of entries, each entry corresponding to a block. The input file should be placed in the input directory and named as <project_name>_raw_data.json . The output file is stored under output/<project_name>/parsed_data.json and is structured as follows: [ { \"number\": \"<block's number>\", \"timestamp\": \"<block's timestamp of the form: yyyy-mm-dd hh:mm:ss UTC>\", \"reward_addresses\": \"<address1>,<address2>\" \"identifiers\": \"<identifiers>\" } ] number and timestamp are consistent among different blockchains. reward_addresses and identifiers vary, depending on each ledger. Specifically, reward_addresses corresponds to: Bitcoin , Bitcoin Cash , Dogecoin , Litecoin , Zcash : a string of comma-separated addresses which appear in the block's coinbase transaction with non-negative value (i.e., which are given part of the block's fees) Ethereum : the block's miner field Cardano : the hash of the pool that created the data, if defined, otherwise the empty string Tezos : the block's baker field The field identifiers corresponds to: Bitcoin , Bitcoin Cash , Dogecoin , Litecoin , Zcash : the field coinbase_param of the block's coinbase transaction Ethereum : the block's extra_data field Cardano : the ticker name of the pool that created the block, if defined, otherwise an empty string Tezos : there is no such field If using BigQuery, the queries for Bitcoin, Bitcoin Cash, Dogecoin, Litecoin, Zcash (see Data Collection ) return data that are parsed with the default_parser script in parsers . The query for Cardano returns data that is parsed using the cardano_parser script in parsers . All other queries return data already in the necessary parsed form, so they are parsed using a \"dummy\" parser.","title":"Parsers"},{"location":"parsers/#parsers","text":"The parser obtains raw data from a full node (see Data Collection page on how to obtain the required data). It parses the data and outputs a json file with a list of entries, each entry corresponding to a block. The input file should be placed in the input directory and named as <project_name>_raw_data.json . The output file is stored under output/<project_name>/parsed_data.json and is structured as follows: [ { \"number\": \"<block's number>\", \"timestamp\": \"<block's timestamp of the form: yyyy-mm-dd hh:mm:ss UTC>\", \"reward_addresses\": \"<address1>,<address2>\" \"identifiers\": \"<identifiers>\" } ] number and timestamp are consistent among different blockchains. reward_addresses and identifiers vary, depending on each ledger. Specifically, reward_addresses corresponds to: Bitcoin , Bitcoin Cash , Dogecoin , Litecoin , Zcash : a string of comma-separated addresses which appear in the block's coinbase transaction with non-negative value (i.e., which are given part of the block's fees) Ethereum : the block's miner field Cardano : the hash of the pool that created the data, if defined, otherwise the empty string Tezos : the block's baker field The field identifiers corresponds to: Bitcoin , Bitcoin Cash , Dogecoin , Litecoin , Zcash : the field coinbase_param of the block's coinbase transaction Ethereum : the block's extra_data field Cardano : the ticker name of the pool that created the block, if defined, otherwise an empty string Tezos : there is no such field If using BigQuery, the queries for Bitcoin, Bitcoin Cash, Dogecoin, Litecoin, Zcash (see Data Collection ) return data that are parsed with the default_parser script in parsers . The query for Cardano returns data that is parsed using the cardano_parser script in parsers . All other queries return data already in the necessary parsed form, so they are parsed using a \"dummy\" parser.","title":"Parsers"},{"location":"setup/","text":"Setup Installation To install the pooling analysis tool, simply clone this GitHub repository: git clone https://github.com/Blockchain-Technology-Lab/pooling-analysis.git The tool is written in Python 3, therefore a Python 3 interpreter is required in order to run it locally. The requirements file lists the dependencies of the project. Make sure you have all of them installed before running the scripts. To install all of them in one go, run the following command from the root directory of the project: python -m pip install -r requirements.txt Execution The pooling analysis tool is a CLI tool. The run.py script in the root directory of the project invokes the required parsers, mappings and metrics, but it is also possible to execute each module individually. The following process describes the most typical workflow. Place all raw data (which could be collected from BigQuery for example; see Data Collection for more details) in the input directory, each file named as <project_name>_raw_data.json (e.g., bitcoin_raw_data.json ). By default, there is a (very small) sample input file for some supported projects; to use it, remove the prefix sample_ . Run python run.py --ledgers <ledger_1> <ledger_n> --timeframe <timeframe> to analyze the n specified ledgers for the given timeframe. Both arguments are optional, so it's possible to omit one or both of them; in this case, the default values will be used. Specifically: ledgers accepts any number of the supported ledgers (case-insensitive). For example, --ledgers bitcoin would run the analysis for Bitcoin, while --ledgers Bitcoin Ethereum Cardano would run the analysis for Bitcoin, Ethereum and Cardano. If the ledgers argument is omitted, then all supported ledgers are analyzed. The timeframe argument should be of the form YYYY-MM-DD (month and day can be omitted). For example, --timeframe 2022 would run the analysis for the year 2022, while --timeframe 2022-02 would do it for the month of February 2022. If the timeframe argument is omitted, then a monthly analysis is performed for each month between January 2018 and the current month. The script will print the output of each implemented metric for the specified ledgers and timeframe. All output files can then be found under the output directory, which is automatically created the first time the tool is run.","title":"How to use"},{"location":"setup/#setup","text":"","title":"Setup"},{"location":"setup/#installation","text":"To install the pooling analysis tool, simply clone this GitHub repository: git clone https://github.com/Blockchain-Technology-Lab/pooling-analysis.git The tool is written in Python 3, therefore a Python 3 interpreter is required in order to run it locally. The requirements file lists the dependencies of the project. Make sure you have all of them installed before running the scripts. To install all of them in one go, run the following command from the root directory of the project: python -m pip install -r requirements.txt","title":"Installation"},{"location":"setup/#execution","text":"The pooling analysis tool is a CLI tool. The run.py script in the root directory of the project invokes the required parsers, mappings and metrics, but it is also possible to execute each module individually. The following process describes the most typical workflow. Place all raw data (which could be collected from BigQuery for example; see Data Collection for more details) in the input directory, each file named as <project_name>_raw_data.json (e.g., bitcoin_raw_data.json ). By default, there is a (very small) sample input file for some supported projects; to use it, remove the prefix sample_ . Run python run.py --ledgers <ledger_1> <ledger_n> --timeframe <timeframe> to analyze the n specified ledgers for the given timeframe. Both arguments are optional, so it's possible to omit one or both of them; in this case, the default values will be used. Specifically: ledgers accepts any number of the supported ledgers (case-insensitive). For example, --ledgers bitcoin would run the analysis for Bitcoin, while --ledgers Bitcoin Ethereum Cardano would run the analysis for Bitcoin, Ethereum and Cardano. If the ledgers argument is omitted, then all supported ledgers are analyzed. The timeframe argument should be of the form YYYY-MM-DD (month and day can be omitted). For example, --timeframe 2022 would run the analysis for the year 2022, while --timeframe 2022-02 would do it for the month of February 2022. If the timeframe argument is omitted, then a monthly analysis is performed for each month between January 2018 and the current month. The script will print the output of each implemented metric for the specified ledgers and timeframe. All output files can then be found under the output directory, which is automatically created the first time the tool is run.","title":"Execution"}]}
\ No newline at end of file
diff --git a/sitemap.xml.gz b/sitemap.xml.gz
index 6ef9d33..b382501 100644
Binary files a/sitemap.xml.gz and b/sitemap.xml.gz differ