Thumbnail Image

Learning regexes to extract router names from hostnames

We present the design, implementation, evaluation, and validation of a system that automatically learns to extract router names (router identifiers) from hostnames stored by network operators in different DNS zones, which we represent by regular expressions (regexes). Our supervised-learning approach evaluates automatically generated candidate regexes against sets of hostnames for IP addresses that other alias resolution techniques previously inferred to identify interfaces on the same router. Conceptually, if three conditions hold: (1) a regex extracts the same value from a set of hostnames associated with IP addresses on the same router; (2) the value is unique to that router; and (3) the regex extracts names for multiple routers in the suffix, then we conclude the regex accurately represents the naming convention for the suffix. We train our system using router aliases inferred from active probing to learn regexes for 2550 different suffixes. We then demonstrate the utility of this system by using the regexes to find 105% additional aliases for these suffixes. Regexes inferred in IPv4 perfectly predict aliases for ≈85% of suffixes with IPv6 aliases, i.e., IPv4 and IPv6 addresses representing the same underlying router, and find 9.0 times more routers in IPv6 than found by prior techniques.
Conference Contribution
Type of thesis
Luckie, M. J., Huffaker, B., & claffy, kc. (2019). Learning regexes to extract router names from hostnames. In Proceedings of ACM 2019 Internet Measurement Conference (IMC’19) (pp. 337–350). New York, NY, USA: ACM Press. https://doi.org/10.1145/3355369.3355589
ACM Press
© 2019 Association for Computing Machinery. This is the author's accepted version.