Index of /datasets/supplement/2019-imc-hoiho
Name Last modified Size Description
Parent Directory -
201007-midar-iff.re 2019-06-04 20:22 480K
201007-midar-iff.routers.bz2 2019-09-17 13:55 9.9M
201104-midar-iff.re 2019-06-04 20:22 492K
201104-midar-iff.routers.bz2 2019-09-17 13:55 13M
201110-midar-iff.re 2019-06-04 20:22 481K
201110-midar-iff.routers.bz2 2019-09-17 13:55 12M
201207-midar-iff.re 2019-06-04 20:22 480K
201207-midar-iff.routers.bz2 2019-09-17 13:55 14M
201304-midar-iff.re 2019-06-04 20:22 445K
201304-midar-iff.routers.bz2 2019-09-17 13:55 14M
201307-midar-iff.re 2019-06-04 20:22 396K
201307-midar-iff.routers.bz2 2019-09-17 13:56 15M
201404-midar-iff.re 2019-06-04 20:22 447K
201404-midar-iff.routers.bz2 2019-09-17 13:56 15M
201412-midar-iff.re 2019-06-04 20:22 444K
201412-midar-iff.routers.bz2 2019-09-17 13:56 16M
201508-midar-iff.re 2019-06-04 20:22 438K
201508-midar-iff.routers.bz2 2019-09-17 13:56 15M
201603-midar-iff.re 2019-06-04 20:22 459K
201603-midar-iff.routers.bz2 2019-09-17 13:56 16M
201609-midar-iff.re 2019-06-04 20:22 451K
201609-midar-iff.routers.bz2 2019-09-17 13:56 18M
201702-midar-iff.re 2019-06-04 20:22 455K
201702-midar-iff.routers.bz2 2019-09-17 13:56 17M
201708-midar-iff.re 2019-06-04 20:22 473K
201708-midar-iff.routers.bz2 2019-09-17 13:56 17M
201708-speedtrap.re 2020-07-30 23:04 36K
201708-speedtrap.routers.bz2 2020-07-30 23:04 25M
201803-midar-iff.re 2019-06-04 20:22 448K
201803-midar-iff.routers.bz2 2019-09-17 13:57 18M
201901-midar-iff.re 2019-06-04 20:22 356K
201901-speedtrap.re 2020-07-30 23:04 23K
201901-speedtrap.routers.bz2 2020-07-30 23:04 4.3M
201904-midar-iff.re 2019-06-04 20:22 394K
README.txt 2019-09-17 14:37 1.8K
public_suffix_list.dat 2019-09-17 13:57 188K
web/ 2019-09-17 17:50 -
This public dataset contains the data used to train our system to
learn regular expressions that extract router names from hostnames.
It also includes the "best" regular expressions inferred for each
suffix with at least one training router. Note, not all of the
regular expressions are useful, and you should exercise your best
judgement as to which expressions are useful. We have included
web pages showing how the best regular expressions applied given
the training data to help you exercise your judgement.
If you use this data, you are required to cite:
M. Luckie, B. Huffaker, and k. claffy. Learning to Extract Router
Names from Hostnames. Proc. ACM Internet Measurement Conference
2019.
You are also required to cite the ITDK, from which this data is
derived. The instructions for citing the ITDK are included at:
http://data.caida.org/datasets/topology/ark/ipv4/
The data is designed to be used with sc_hoiho, which is included
as part of scamper:
https://www.caida.org/tools/measurement/scamper/
To obtain the inferred regular expressions which are included in this
dataset release, you will need to build sc_hoiho by passing
--with-sc_hoiho and either --with-pcre or --with-pcre2 to configure.
When building sc_hoiho, ensure pcre (or pcre2) is in the path where
your compiler looks for header files and libraries. For example:
CFLAGS='-I/usr/local/include' LDFLAGS='-L/usr/local/lib' ./configure \
--with-sc_hoiho --with-pcre2
and then run:
sc_hoiho -d best-regex public_suffix_list.dat <training-set>.routers
Note that this can take some time to complete. If you are only
concerned with a regular expression for a single domain, you can pass
-D <domain> to sc_hoiho. Other options to sc_hoiho are documented in
the manual page for sc_hoiho.
https://www.caida.org/tools/measurement/scamper/man/sc_hoiho.1.pdf