Index of /datasets/supplement/2019-imc-hoiho

      Name                         Last modified      Size  Description
Parent Directory - 2019-06-04 20:22 480K 201007-midar-iff.routers.bz2 2019-09-17 13:55 9.9M 2019-06-04 20:22 492K 201104-midar-iff.routers.bz2 2019-09-17 13:55 13M 2019-06-04 20:22 481K 201110-midar-iff.routers.bz2 2019-09-17 13:55 12M 2019-06-04 20:22 480K 201207-midar-iff.routers.bz2 2019-09-17 13:55 14M 2019-06-04 20:22 445K 201304-midar-iff.routers.bz2 2019-09-17 13:55 14M 2019-06-04 20:22 396K 201307-midar-iff.routers.bz2 2019-09-17 13:56 15M 2019-06-04 20:22 447K 201404-midar-iff.routers.bz2 2019-09-17 13:56 15M 2019-06-04 20:22 444K 201412-midar-iff.routers.bz2 2019-09-17 13:56 16M 2019-06-04 20:22 438K 201508-midar-iff.routers.bz2 2019-09-17 13:56 15M 2019-06-04 20:22 459K 201603-midar-iff.routers.bz2 2019-09-17 13:56 16M 2019-06-04 20:22 451K 201609-midar-iff.routers.bz2 2019-09-17 13:56 18M 2019-06-04 20:22 455K 201702-midar-iff.routers.bz2 2019-09-17 13:56 17M 2019-06-04 20:22 473K 201708-midar-iff.routers.bz2 2019-09-17 13:56 17M 2020-07-30 23:04 36K 201708-speedtrap.routers.bz2 2020-07-30 23:04 25M 2019-06-04 20:22 448K 201803-midar-iff.routers.bz2 2019-09-17 13:57 18M 2019-06-04 20:22 356K 2020-07-30 23:04 23K 201901-speedtrap.routers.bz2 2020-07-30 23:04 4.3M 2019-06-04 20:22 394K README.txt 2019-09-17 14:37 1.8K md5.md5 2022-07-12 14:37 2.0K public_suffix_list.dat 2019-09-17 13:57 188K web/ 2019-09-17 17:50 -
This public dataset contains the data used to train our system to
learn regular expressions that extract router names from hostnames.
It also includes the "best" regular expressions inferred for each
suffix with at least one training router.  Note, not all of the
regular expressions are useful, and you should exercise your best
judgement as to which expressions are useful.  We have included
web pages showing how the best regular expressions applied given
the training data to help you exercise your judgement.

If you use this data, you are required to cite:

 M. Luckie, B. Huffaker, and k. claffy.  Learning to Extract Router
 Names from Hostnames.  Proc. ACM Internet Measurement Conference

You are also required to cite the ITDK, from which this data is
derived.  The instructions for citing the ITDK are included at:

The data is designed to be used with sc_hoiho, which is included
as part of scamper:

To obtain the inferred regular expressions which are included in this
dataset release, you will need to build sc_hoiho by passing
--with-sc_hoiho and either --with-pcre or --with-pcre2 to configure.
When building sc_hoiho, ensure pcre (or pcre2) is in the path where
your compiler looks for header files and libraries.  For example:

CFLAGS='-I/usr/local/include' LDFLAGS='-L/usr/local/lib' ./configure \
 --with-sc_hoiho --with-pcre2

and then run:

sc_hoiho -d best-regex public_suffix_list.dat <training-set>.routers

Note that this can take some time to complete.  If you are only
concerned with a regular expression for a single domain, you can pass
-D <domain> to sc_hoiho.  Other options to sc_hoiho are documented in
the manual page for sc_hoiho.