The Internet Yellow Pages could not exist without all the awesome prior research and data sources. We list all of them here, if possible with their corresponding licenses, to which you will need to conform if you use the public instance or create a dump that includes these data sources.
Please refer to the READMEs in the respective crawler directories for more information.
We retrieve route server looking glass snapshots from the following IXPs.
Name | URL |
---|---|
AMS-IX | https://lg.ams-ix.net/ |
BCIX | https://lg.bcix.de/ |
DE-CIX | https://lg.de-cix.net/ |
IX.br | https://lg.ix.br/ |
LINX | https://alice-rs.linx.net/ |
Megaport | https://lg.megaport.com/ |
Netnod | https://lg.netnod.se/ |
We use APNIC's AS population estimate.
We use the as2rel, peer-stats, and pfx2as datasets from BGPKIT.
Use of this data is authorized under their Acceptable Use Agreement.
We use AS names, AS tags, and anycast prefix tags provided by BGP.Tools.
We use two datasets from CAIDA which use is authorized under their Acceptable Use Agreement.
CAIDA AS Rank https://doi.org/10.21986/CAIDA.DATA.AS-RANK.
and
The CAIDA UCSD IXPs Dataset, https://www.caida.org/catalog/datasets/ixps
We use the Cisco Umbrella Popularity List.
We use URL testing lists from The Citizen Lab.
Citizen Lab and Others. 2014. URL Testing Lists Intended for Discovering Website Censorship. https://github.com/citizenlab/test-lists.
This data is licensed under CC BY-NC-SA 4.0. No changes were made to the data.
We use the radar/dns/top/ases
, radar/dns/top/locations
, radar/ranking/top
, and
radar/datasets
endpoints of the Clouflare Radar API.
This data is licensed under CC BY-NC 4.0. No changes were made to the data.
We use AS names provided by Emile Aben and others with permission (Hi Emile!).
We use three datasets from the Internet Health Report (that's us!): Country Dependency, AS Hegemony, and Route Origin Validation.
This data is licensed under CC BY-NC-SA 4.0. No changes were made to the data.
We use the AS to organization mapping from the Internet Intelligence Lab at Georgia Tech.
Z. Chen, Z. Bischof, C. Testart, A. Dainotti, "AS to Organization Mapping", Internet Intelligence Lab at Georgia Tech, https://github.com/InetIntel/Dataset-AS-to-Organization-Mapping
Use of this data is authorized under their Acceptable Use Agreement.
We use the extended allocation and assignment reports provided by the Number Resource Organization.
We use several datasets from OpenINTEL, a joint project of the University of Twente, SURF, SIDN Labs and NLnet Labs.
The tranco1m
and umbrella1m
datasets are licensed
under CC BY-NC-SA 4.0. No changes
were made to the data. In addition, there are Terms of
Use for this data.
The DNS Dependency Graph tool is a joint project of the University of Twente and IIJ Research Laboratory.
Other datasets are used with permission from OpenINTEL.
We use the daily routing snapshots from Packet Clearing House.
This data is licensed under CC BY-NC-SA 3.0. No changes were made to the data.
We use the fac
, ix
, ixlan
, netfac
, and org
endpoints of the
PeeringDB API.
Use of this data is authorized under their Acceptable Use Policy.
We use AS names, Atlas measurement information, and RPKI data from the RIPE NCC and RIPE Atlas.
We use rDNS data from RIR-data.org, a joint project of SimulaMet and the University of Twente.
Alfred Arouna, Ioana Livadariu, and Mattijs Jonker. "Lowering the Barriers to Working with Public RIR-Level Data." Proceedings of the 2023 Workshop on Applied Networking Research (ANRW '23).
We use the Stanford ASdb dataset provided by the Stanford Empirical Security Research Group.
ASdb: A System for Classifying Owners of Autonomous Systems. Maya Ziv, Liz Izhikevich, Kimberly Ruth, Katherine Izhikevich, and Zakir Durumeric. ACM Internet Measurement Conference (IMC), November 2021.
We use the Tranco list provided by the DistriNet Research Unit KU Leuven, TU Delft, and LIG.
The Tranco list combines lists from five providers:
- Cisco Umbrella
- Majestic (available under a CC BY 3.0 license)
- Farsight
- Chrome User Experience Report (CrUX) (available under a CC BY-SA 4.0 license)
- Cloudflare Radar (available under a CC BY-NC 4.0 license).
We use the RoVista dataset provided by the NetSecLab group at Virginia Tech.
RoVista: Measuring and Understanding the Route Origin Validation (ROV) in RPKI. Weitong Li, Zhexiao Lin, Md. Ishtiaq Ashiq, Emile Aben, Romain Fontugne, Amreesh Phokeer, and Taejoong Chung. ACM Internet Measurement Conference (IMC), October 2023.
We use the country population indicator SP.POP.TOTL.
from the
Indicators API
dataset provided by the
World Bank.