Index of /datasets/topology/ark/ipv4/itdk/2020-08

      Name                                Last modified      Size  Description
Parent Directory - README.txt 2020-12-11 18:10 16K itdk-run-20200819-dns-names.txt.bz2 2020-09-06 14:51 22M IPv4 Routed /24 DNS Names Dataset itdk-run-20200819.addrs.bz2 2020-09-12 22:19 9.0M Macroscopic Internet Topology Data Kit (ITDK) kapar-midar-iff.ifaces.bz2 2020-09-16 19:48 532M kapar-midar-iff.links.bz2 2020-09-16 19:42 562M 2020-09-24 15:08 104M kapar-midar-iff.nodes.bz2 2020-09-16 19:36 436M kapar-midar-iff.nodes.geo.bz2 2020-09-23 16:37 213M midar-iff.ifaces.bz2 2020-09-11 12:55 533M midar-iff.links.bz2 2020-09-11 12:50 570M 2020-09-25 02:53 105M midar-iff.nodes.bz2 2020-09-11 12:45 439M midar-iff.nodes.geo.bz2 2020-09-23 02:41 212M
             CAIDA's Macroscopic Internet Topology Data Kit

                            ITDK 2020-08

  NOTE: The AS assignment files (*.as.bz2) were replaced with corrected
        versions on Dec 11, 2020 (the files themselves have a modification
        date of Sep 24/25, 2020).  This doesn't change any of the AS
        assignments that existed in the older files (with modification
        dates of Sep 19, 2020); this fix only increases the amount of AS
        assignments included in the files.

  NOTE: The format of the .nodes files has slightly changed.
        In ITDK release 2013-04 and earlier, we used addresses in instead of for non-real addresses.

  NOTE: This README contains the full details of data collection that the
        ITDK webpage lacks, so you will want to read over this file even
        though some text duplicates the webpage (general description,
        file formats, and data use terms).

The ITDK contains data about connectivity and routing gathered
from a large cross-section of the global Internet.

At present, this ITDK release consists of

 (1) two related IPv4 router-level topologies,
 (2) router-to-AS assignments,
 (3) geographic location of each router, and
 (4) DNS lookups of all observed IP addresses.

We plan to expand this release with other complementary datasets as
they become available (more details are available at the ITDK URL

The two included IPv4 router-level topologies are generated from the same
IPv4-level topology but differ in the accuracy and completeness of the
alias resolution performed to create them.  The first topology is
derived from aliases resolved with MIDAR and iffinder, which yield the
highest confidence aliases with few false positives.  The second
topology also uses MIDAR and iffinder but further includes aliases
resolved with kapar, which significantly increases the coverage of
aliases but at the cost of false positives (which inflate the size of
routers and decrease the router count).  Researchers should choose the
topology to use depending on the relative importance they place on
accuracy vs. comprehensiveness of alias resolution.  Choose the most
accurate alias resolution if uncertain about which to use.

    Tools used:

    * MIDAR: Monotonic ID-based Alias Resolution

    * iffinder: Mercator-style common source address alias resolution

    * kapar: analytical alias resolver and topology generator

    * RouterToAsAssignment: analytical AS ownership resolver

    * qrrs: bulk DNS lookup tool
      (in development)

    * DDec: DNS Decoding database

    * MaxMind's free GeoLite City database

    * BordermapIT for AS assignments

    Source datasets:

    * IPv4 Routed /24 Topology Dataset

    Data collection:

      The MIDAR alias resolution run was performed 2020-08-22 to
      2020-09-06 on 19 monitors (in 14 countries) using:

        * 2.97 million addresses extracted from the IPv4 Routed /24
          Topology Dataset ("Ark Routed /24 traces") for the period
          2020-08-07 to 2020-08-20.  We used 39 cycles of traces
          (cycles 8674 to 8712, all from team 1) from 132 monitors
	  in 48 countries -- all active Ark monitors instead of the
          subset used for MIDAR).

      (The file itdk-run-20200819.addrs.bz2 contains the target addresses
      used for the ITDK run.)

      When extracting IP addresses from traceroute paths for the purposes
      of using them as MIDAR and iffinder (see below for
      details of the iffinder run), we only include addresses that could
      potentially be routers; that is, we only include addresses that
      appeared as an intermediate hop in some traceroute path, which means
      we exclude the responding destination address from each trace.

      For the kapar alias resolution run, we used the same set of
      traces from the Routed /24 Topology Dataset as the MIDAR run
      (see description above).  These traces contributed the underlying
      IP-level topology from which we constructed both router-level
      topologies included in this ITDK.

      NOTE: Unlike the MIDAR target list, the generated router-level graphs
            also contain the responding destinations and Ark monitors as

      The iffinder alias resolution run was performed on 2020-08-27 during
      the MIDAR run using the same target addresses as MIDAR.  We
      ran iffinder on 112 monitors, a superset of those used for MIDAR, with
      each monitor independently probing the full set of iffinder targets
      in a per-monitor randomized order.

      For AS assignments, we used RIPE and RouteViews BGP tables, RIR
      delegations, and PeeringDB.

      We use a combination of publicly known Internet eXchange (IX) point
      information, DDec hostname mapping, and MaxMind's free GeoLite City
      database to provide the geographic location (at city granularity) of
      routers in the router-level graph.  See the ITDK web page for
      further details on our geolocation method.

      For details of the DNS names data collection, see the section
      below describing the available DNS files and their formats.


Each router-level topology is provided in two files, one giving the
nodes and another giving the links.  There are also files that
assign ASes and geolocation to each node.

IPv4 Router Topology A (accurate alias resolution):

    Router topology based on aliases discovered by MIDAR and iffinder.
    This topology contains fewer aliases, but has a low false positive
    rate for aliases.

IPv4 Router Topology B (comprehensive alias resolution):

    Router topology based on aliases discovered by MIDAR, iffinder, and kapar.
    The addition of the kapar algorithm makes this topology more complete
    with respect to aliases, but also gives it a higher false positive
    rate for aliases.

File Formats:


     The nodes file lists the set of interfaces that were inferred to
     be on each router.

      Format: node <node_id>:   <i1>   <i2>   ...   <in>
     Example: node N33382: 

     Each lines indicates that a node node_id has interfaces i_1 to i_n.
     Interface addresses in (IANA reserved space for multicast)
     are not real addresses.  They were artificially generated to identify
     potentially unique non-responding interfaces in traceroute paths.

     The IPv6 dataset uses IPv6 multicast addresses (FF00::/8) to indicate
     non-responding interfaces in traceroute paths.

       NOTE: In ITDK release 2013-04 and earlier, we used addresses in
    instead of for these non-real addresses.


     The links file lists the set of routers and router interfaces
     that were inferred to be sharing each link.  Note that these are
     IP layer links, not physical cables or graph edges.  More than
     two nodes can share the same IP link if the nodes are all
     connected to the same layer 2 switch (POS, ATM, Ethernet, etc).

      Format: link <link_id>:   <N1>:i1   <N2>:i2   [<N3>:[i3] .. [<Nm>:[im]]
     Example: link L104:  N242484: N1847: N5849773

     Each line indicates that a link link_id connects nodes N_1 to
     N_m.  If it is known which router interface is connected to the
     link, then the interface address is given after the node ID
     separated by a colon (e.g., "N1:"); otherwise, only the
     node ID is given (e.g., "N1").

     By joining the node and link data, one can obtain the _known_ and
     _inferred_ interfaces of each router.  Known interfaces actually
     appeared in some traceroute path.  Inferred interfaces arise when
     we know that some router N_1 connects to a known interface i_2 of
     another router N_2, but we never saw an actual interface on the
     former router.  The interfaces on an IP link are typically
     assigned IP addresses from the same prefix, so we assume that
     router N_1 must have an inferred interface from the same prefix
     as i_2.

     The node-AS file assigns an AS to each node found in the nodes
     file.  We used BordermapIT to infer the owner AS of each node.

      Format: node.AS   <node_id>   <AS>   <method>
     Example: node.AS N39 17645 election

     Each line indicates that the node node_id is owned/operated by
     the given AS, as inferred with the given method. There are three
     inference methods:

        1. single: a router has only a single choice of AS

        2. election: multiple ASes are present on a router, and one AS
           occurs more frequently than the rest

        3. election+degree: multiple ASes are present on a router, but
           no AS occurs the most frequently, so the choice is based on
           AS degree

     Addresses that belong to the address space of an Internet exchange
     point (as self-identified in PeeringDB:
     are excluded from the AS analysis, as we don't consider them to be
     part of the AS-level topology.


     The node-geolocation file contains the geographic location of
     each node in the nodes file.  We first map each interface on a
     router to a location.  If all interfaces map to the same
     location, then we assign that location to the router; otherwise,
     we do not assign any location to the router (that is, the router
     does not appear in the geolocation file).

      Format: node.geo   <node_id>:   <continent>   <country>   <region> \
              <city>   <latitude>   <longitude> 
     Example: node.geo N15:  ***  US  HI  Honolulu  21.3267  -157.8167
     Each line indicates that the node node_id has the given geographic
     location.  The fields have the following meanings:

       <continent>: currently always "***".

       <country>: the two-letter ISO 3166 Country Code along with the
                  following codes specific to GeoLite City for
                  uncertain situations:

                    * A1: anonymous proxy,
                    * A2: satellite-based Internet provider,
                    * EU: Europe,
                    * AP: Asia/Pacific Region, and
                    * US: includes overseas US military bases.

       <region>: for US/Canada, the two-letter ISO-3166-2 code for the
                 state/province, along with AA, AE, and AP for Armed
                 Forces America, Europe, and Pacific, respectively;
                 for outside US/Canada, the two-letter FIPS 10-4.

       <city>: city or town in ISO-8859-1 encoding (up to 255 characters).

       <latitude> and <longitude>: signed floating point numbers.

     The above description is derived from the authoritative MaxMind
     documentation available at

     Additional references:

       * ISO 3166 country codes:


       * inside US/Canada ISO-3166-2 state/province codes:


       * outside US/Canada FIPS 10-4 state/province codes:



     This file provides additional information about all interfaces
     included in the provided router-level graphs:

      Format:  <address> [<node_id>] [<link_id>] [T] [D]

     Each of the fields in square brackets may or may not be present.

     Example: N34980480 D
     Example: N18137917 L537067 T

     Example: N45020
     Example: N18137965 L537125 T D

     <node_id> starts with "N" and identifies the node (alias set) to which
     the address belongs.  An address may not have a node_id if no aliases
     were found.

     <link_id> starts with "L" and identifies the link to which the address is
     attached, if known.  An address will not have a link_id if it was
     obtained from a source other than traceroute or appeared only as the
     first public address in a traceroute (i.e., the source and all other hops
     preceeding this address were either private addresses or nonresponsive).

     "T" indicates that the address appeared in at least one traceroute as a
     transit hop, i.e. preceeded by at least one (public or private) address
     (including the source) and followed by at least one public address
     (including the destination).  An address does not qualify as a transit
     hop if it was seen only in these situations: it was obtained from a
     source other than traceroute; it was the source or destination of a
     traceroute; or it was the last responding public address to appear in a

     "D" indicates that the address appeared in at least one traceroute as a
     responding destination hop.

     "T" and "D" are not mutually exclusive -- an address may have been a
     transit hop in one traceroute and the destination in another.

     An interface address will have "T" but not "L<link_id>" if it appeared
     only as the first public address in a traceroute.

DNS Names:

There are two related DNS names datasets, and you should choose the one to
use based on your specific needs:

1. If you would like to know what the DNS names were at about the time
   that addresses were observed in the traces of the IPv4 Routed /24
   Topology Dataset used for this ITDK, then you should download the
   relevant portion of the IPv4 Routed /24 DNS Names Dataset, generated
   with qrrs, from

   The traces used for this ITDK were collected Aug 7, 2020 to Aug 20,
   2020.  You should download DNS names files a few days before and
   after this range.

2. On Sep 6, 2020, we performed additional DNS lookups with qrrs of
   the 2.97 million MIDAR addresses in order to obtain DNS names
   closer in time to the MIDAR and iffinder runs.  These more timely
   DNS lookups are better for extracting DNS-based ground truth that
   can be compared with MIDAR and iffinder results.  These DNS results
   are available in the file


   Each line contains three entries separated by tabs:

          <timestamp>    <IP-address>    <DNS-name>

   where <timestamp> is the timestamp of the lookup.
   Please see the README of the IPv4 Routed /24 DNS Names
   Dataset for full details about the encoding of special characters
   in the <DNS-name> field.

Data Use Terms and Conditions



This product includes GeoLite data created by MaxMind, available from