Index of /datasets/supplement/2021-conext-hoiho

      Name                            Last modified      Size  Description
Parent Directory - 202008-midar-iff.geo-re.json 2021-09-20 21:01 2.1M 202008-midar-iff.minrtt.txt.bz2 2021-09-20 21:01 500M 202008-midar-iff.routers.bz2 2021-09-20 21:01 25M 202011-speedtrap.geo-re.json 2021-09-20 21:01 440K 202011-speedtrap.minrtt.txt.bz2 2021-09-20 21:01 28M 202011-speedtrap.routers.bz2 2021-09-20 21:01 7.2M 202103-midar-iff.geo-re.json 2021-09-20 21:01 2.1M 202103-midar-iff.minrtt.txt.bz2 2021-09-20 21:01 452M 202103-midar-iff.routers.bz2 2021-09-20 21:01 24M 202103-speedtrap.geo-re.json 2021-09-20 21:01 438K 202103-speedtrap.minrtt.txt.bz2 2021-09-20 21:01 22M 202103-speedtrap.routers.bz2 2021-09-20 21:01 6.8M README.txt 2023-05-21 17:39 2.1K geocodes.txt 2021-09-20 21:01 26M 2021-09-20 21:01 5.1K md5.md5 2021-09-20 21:01 1.0K public_suffix_list.dat 2021-09-20 21:01 222K scamper-cvs-20210917.tar.gz 2021-09-20 21:34 2.0M web/ 2021-09-20 21:01 -
This public dataset contains the data used to train our system to
learn regular expressions that extract geohints from router hostnames,
as well as the product of using our system.

 + *-midar-iff.routers files contain IPv4 routers inferred using
   MIDAR and Mercator, annotated with node IDs and hostnames.
 + *-speedtrap.routers files contain IPv6 routers inferred using
   Speedtrap, annotated with node IDs and hostnames.
 + *-minrtt.txt files contain RTT samples towards each router from
   vantage points with known locations, identified by Node ID.
 + *.geo-re.json files contain JSON-formatted rules that researchers
   can use to interpret hostnames.  We include for
   applying those rules.
 + geocodes.txt contains the geocodes we used, excluding the CLLI
   codes, which we licensed from iconectiv.

If you use this data supplement, you are required to cite:

 M. Luckie, B. Huffaker, A. Marder, Z. Bischof, M. Fletcher, and
 k. claffy.
 Learning to Extract Geographic Information from Internet Router
 Proc. ACM Conference on emerging Networking EXperiments and
 Technologies (CoNEXT).

You are also required to cite the ITDK, from which this data is
derived.  The instructions for citing the ITDK are included at:

The data is designed to be used with sc_hoiho, which is included
as part of scamper:

To obtain the inferred regular expressions which are included in this
dataset release, you will need to build sc_hoiho by passing
--with-sc_hoiho and either --with-pcre or --with-pcre2 to configure.
When building sc_hoiho, ensure pcre (or pcre2) is in the path where
your compiler looks for header files and libraries.  For example:

CFLAGS='-I/usr/local/include' LDFLAGS='-L/usr/local/lib' ./configure \
 --with-sc_hoiho --with-pcre2

and then run:

sc_hoiho -O learngeo -d best-regex -g geocodes.txt -R <training-set>.minrtt.txt public_suffix_list.dat <training-set>.routers

Other options to sc_hoiho are documented in the manual page for