Index of /datasets/supplement/2021-conext-hoiho
Name Last modified Size Description
Parent Directory -
web/ 2021-09-20 21:01 -
md5.md5 2021-09-20 21:01 1.0K
README.txt 2023-05-21 17:39 2.1K
hoiho-apply.pl 2021-09-20 21:01 5.1K
public_suffix_list.dat 2021-09-20 21:01 222K
202103-speedtrap.geo-re.json 2021-09-20 21:01 438K
202011-speedtrap.geo-re.json 2021-09-20 21:01 440K
scamper-cvs-20210917.tar.gz 2021-09-20 21:34 2.0M
202008-midar-iff.geo-re.json 2021-09-20 21:01 2.1M
202103-midar-iff.geo-re.json 2021-09-20 21:01 2.1M
202103-speedtrap.routers.bz2 2021-09-20 21:01 6.8M
202011-speedtrap.routers.bz2 2021-09-20 21:01 7.2M
202103-speedtrap.minrtt.txt.bz2 2021-09-20 21:01 22M
202103-midar-iff.routers.bz2 2021-09-20 21:01 24M
202008-midar-iff.routers.bz2 2021-09-20 21:01 25M
geocodes.txt 2021-09-20 21:01 26M
202011-speedtrap.minrtt.txt.bz2 2021-09-20 21:01 28M
202103-midar-iff.minrtt.txt.bz2 2021-09-20 21:01 452M
202008-midar-iff.minrtt.txt.bz2 2021-09-20 21:01 500M
This public dataset contains the data used to train our system to
learn regular expressions that extract geohints from router hostnames,
as well as the product of using our system.
+ *-midar-iff.routers files contain IPv4 routers inferred using
MIDAR and Mercator, annotated with node IDs and hostnames.
+ *-speedtrap.routers files contain IPv6 routers inferred using
Speedtrap, annotated with node IDs and hostnames.
+ *-minrtt.txt files contain RTT samples towards each router from
vantage points with known locations, identified by Node ID.
+ *.geo-re.json files contain JSON-formatted rules that researchers
can use to interpret hostnames. We include hoiho-apply.pl for
applying those rules.
+ geocodes.txt contains the geocodes we used, excluding the CLLI
codes, which we licensed from iconectiv.
If you use this data supplement, you are required to cite:
M. Luckie, B. Huffaker, A. Marder, Z. Bischof, M. Fletcher, and
k. claffy.
Learning to Extract Geographic Information from Internet Router
Hostnames.
Proc. ACM Conference on emerging Networking EXperiments and
Technologies (CoNEXT).
You are also required to cite the ITDK, from which this data is
derived. The instructions for citing the ITDK are included at:
http://data.caida.org/datasets/topology/ark/ipv4/
The data is designed to be used with sc_hoiho, which is included
as part of scamper:
https://www.caida.org/tools/measurement/scamper/
To obtain the inferred regular expressions which are included in this
dataset release, you will need to build sc_hoiho by passing
--with-sc_hoiho and either --with-pcre or --with-pcre2 to configure.
When building sc_hoiho, ensure pcre (or pcre2) is in the path where
your compiler looks for header files and libraries. For example:
CFLAGS='-I/usr/local/include' LDFLAGS='-L/usr/local/lib' ./configure \
--with-sc_hoiho --with-pcre2
and then run:
sc_hoiho -O learngeo -d best-regex -g geocodes.txt -R <training-set>.minrtt.txt public_suffix_list.dat <training-set>.routers
Other options to sc_hoiho are documented in the manual page for
sc_hoiho.
https://www.caida.org/tools/measurement/scamper/man/sc_hoiho.1.pdf