Added Balance Pool Group data for Cardano #145

Cerkoryn · 2024-03-13T20:02:35Z

All Submissions:

Have you followed the guidelines in our Contributing documentation?
Have you verified that there aren't any other open Pull Requests for the same update/change?
Does the Pull Request pass all tests?

Description

I added a different data source using Balance's Pool Group API available here. The Public API available at the bottom of the page provides a more conservative grouping of Cardano Pool Groups that is in line with other community tools such as Cexplorer. The script I used for obtaining and parsing the data is in the cardano_preprocessing folder as get_cardano_info_balance.py. By default the script will save the identifier and cluster files within that subfolder to prevent overwriting, but cardano_pool_data.json will get overwritten.

Additionally, I made minor changes in a few other files to assist with UTF-8 encoding issues. A slightly more impactful change was made in gini.py to prevent an integer overflow issue I was running into. However, this seems to have caused test_analyze.py to fail. I was not able to run the consensus tool for Dogecoin without this change though, so feel free to ignore it or any of the others if necessary.

Checklist

Update Mapping Support Information Submissions:

For which ledger do you update the mapping information?
- Cardano
What mapping information do you update?
- identifiers
- addresses
- clusters
- legal links
Did you update the tests (if needed)?

LadyChristina

Thanks a lot for your contributions, I think overall that it is a good idea to integrate pool data from Balance and other explorers in our tool, but there are some issues I have identified with the current code, so please take a look and let me know what your thoughts are on my comments.

LadyChristina · 2024-03-20T13:52:57Z

mapping_information/cardano_preprocessing/get_cardano_info_balance.py

+    for pool in pools:
+        current_time = time.time()
+        elapsed_time = current_time - last_request_time
+        if elapsed_time < request_interval: # Trying to stay within Cardanoscan's rate limits.


The script seems to be about getting data from Balance but the comment mentions Cardanoscan - are they the same thing?
Same applies to the comment on L37

The pool group data (known as clusters in this consensus tool) is available via public API on Balance's website. However, the Pool IDs in Balance's data are in the BECH32 format, which is different than the format for Pool IDs in the source data. I needed a way to map one to the other and found that Cardanoscan (which is a Cardano block explorer) stored both together. There is possibly a better way to do this, but I am unfamiliar with any techniques for doing so.

Oh ok I get it. I think it's fine to do it this way, but it would help to add a comment that explains it, so that it's more clear

LadyChristina · 2024-03-20T13:58:32Z

mapping_information/cardano_preprocessing/get_cardano_info_balance.py

+    representing a pool.
+    """
+    logging.info("Getting pool data..")
+    filename = 'cardano_pool_data.json'


I would suggest saving this as a seperate file instead of overwriting the one with on-chain data, as this way we lose a lot of information, such as a pool's ticker, homepage and name that seem to be missing here. There is also a test that fails because a block gets mapped to a pool ticker (CFLOW) instead of pool name (CashFlow).

In general, I think it's nice to get additional data from another source (Balance) but perhaps it would make more sense to employ this only for the pools and clusters that are not already identified by our current method of looking at on-chain data directly. One approach for doing this would be to first load the current data and then update as needed before writing the new file. This is especially relevant for the identifiers file, where the one you have uploaded now contains no links to the pools' websites and is therefore inconsistent with the files of all other ledgers.

That makes sense. My primary reasons for integrating the Balance data was to validate the EDI consensus tool against known data sourced by the community. Although the community data doesn't have the rigorously verified legal links, websites, etc., it was heartening to see that the EDI methodology resulted in very similar results as community tools. For example, the Nakamoto Coefficient reported by Balance by simply counting pool stake is 30, but using the same data with this tool that counts actual historical blocks gave a Nakamoto Coefficient of 32. I think the differences could likely be explained by the randomness of slot battles or pools missing their propagation windows.

I could save the Balance data to separate files, but some additional options I've considered would be to add a --balance flag to the existing get_cardano_info.py. This method could use the existing identifiers/clusters as a source of truth (and retain the websites, links, etc.) and only fill in the gaps that are missing. The link or source fields here could be Balance API.

Otherwise, I could simply add the Balance data to the existing identifiers/clusters directly to the JSON which is less invasive and in-line with your contribution guidelines.

Do you have a preference for any of these options? I can get to work on these changes but it may take me some time to implement into the pull request.

If you could add them to the existing identifiers / clusters files that would be great! Thanks again for taking the time to contribute and apologies for the slow responses :)

LadyChristina · 2024-03-21T11:18:02Z

mapping_information/cardano_preprocessing/get_cardano_info_balance.py

+    Queries the Balance Pool Groups API and Cardanoscan to get pool data and write it to a json file.
+    :returns: dictionary, where each key is a pool's hash and the corresponding value is a dictionary with the
+    pool's metadata (name, ticker, homepage, description)
+    :returns: list of dictionaries, where each each dictionary contains metadata (pool_hash, pool_id, pool_group, pool_ticker) 


duplicate "each"

Also it seems that there are two separate lists of dictionaries that are returned so it should be noted here that the return value is actually a tuple

LadyChristina · 2024-03-21T11:35:52Z

mapping_information/cardano_preprocessing/get_cardano_info_balance.py

+if __name__ == '__main__':
+    logging.basicConfig(format='[%(asctime)s] %(message)s', datefmt='%Y/%m/%d %I:%M:%S %p', level=logging.INFO)
+
+    #mapping_info_dir = pathlib.Path(__file__).parent.parent


this line can be deleted

LadyChristina · 2024-03-21T12:13:23Z

mapping_information/cardano_preprocessing/get_cardano_info_balance.py

+    logging.basicConfig(format='[%(asctime)s] %(message)s', datefmt='%Y/%m/%d %I:%M:%S %p', level=logging.INFO)
+
+    #mapping_info_dir = pathlib.Path(__file__).parent.parent
+    mapping_info_dir = pathlib.Path(__file__).parent


However, saving the files in this directory will throw an error unless the subdirectories "clusters" and "identifiers" are manually created. This could be prevented by automatically creating the subdirectories if they don't exist.

LadyChristina · 2024-03-21T12:20:53Z

consensus_decentralization/metrics/gini.py

@@ -27,4 +27,11 @@ def gini(array):
    array = np.sort(array)
    index = np.arange(1, array.shape[0] + 1)
    n = array.shape[0]
-    return (np.sum((2 * index - n - 1) * array)) / (n * np.sum(array))
+    # Normalize the array to prevent overflow


I think these changes are fine, but there are two tests that fail because of floating point errors that are produced here. One option is to update the tests to use the new values, as they are not exactly wrong values. Another option would be to round the Gini coefficient (and perhaps all other metrics too) to some fixed number of decimals, e.g. 5, and then use the same number of digits when testing

Added Balance Pool Group data for Cardano

e8afe31

Cerkoryn mentioned this pull request Mar 13, 2024

Difficulty Adding New Pool Clusters #143

Closed

dimkarakostas requested a review from LadyChristina March 20, 2024 11:19

LadyChristina requested changes Mar 21, 2024

View reviewed changes

LadyChristina requested a review from dimkarakostas March 21, 2024 12:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added Balance Pool Group data for Cardano #145

Added Balance Pool Group data for Cardano #145

Cerkoryn commented Mar 13, 2024

LadyChristina left a comment

LadyChristina Mar 20, 2024

Cerkoryn Mar 27, 2024

LadyChristina Apr 7, 2024

LadyChristina Mar 20, 2024

LadyChristina Mar 21, 2024

Cerkoryn Mar 27, 2024

LadyChristina Apr 7, 2024 •

edited

Loading

LadyChristina Mar 21, 2024

LadyChristina Mar 21, 2024

LadyChristina Mar 21, 2024

LadyChristina Mar 21, 2024

LadyChristina Mar 21, 2024

Added Balance Pool Group data for Cardano #145

Are you sure you want to change the base?

Added Balance Pool Group data for Cardano #145

Conversation

Cerkoryn commented Mar 13, 2024

Description

Checklist

Update Mapping Support Information Submissions:

LadyChristina left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LadyChristina Apr 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LadyChristina Apr 7, 2024 •

edited

Loading