Understanding DNS Query Composition at B-Root

Jacob Ginesin, Jelena Mirkovic
Northeastern University, USC/ISI


The Domain Name System (DNS) is the internet's phone book - primarily it exchanges names, e.g. www.foo.com, for IP addresses, e.g. Because DNS serves users all over the world, and because the accuracy of the information served is critical to internet security, DNS is organized in a distributed and hierarchical manner. At the top of the hierarchy (in other words, the backbone of the whole system), sits 13 DNS root servers. Through a program at University of Southern California's Information Science Institute (ISI), I studied the historical behavior of the DNS root server ISI manages, B-Root. Root servers are only queried if servers lower in the hierarchy fail to answer the query, so studying root server data can provide insight into erroneous DNS query trends.

I was invited to write a guest blog post for APNIC, the organization underpinning the internet infrastructure in Asia, on this research. You can read it here!

Extra Details

If you're interested in the main results of this work, please read the paper. Otherwise, here are some neat extra details.

Future Work

Because the research program I was apart of was only 8 weeks long, I wasn't able to study B-Root's DNS traces data as deeply as I would have liked. Here's some stuff I missed:

  1. Site-specific statistics - i.e., which domain was queried the most?
  2. DNS over HTTPS/TLS (RFC8484/RFC7858)
  3. Historical DNS query responses.
  4. Data from other root servers.1
  5. DNS query caching.
  6. Categorization of "One Word" queries.

Dataset Quirks

Here are some interesting/fun things I found in the B-Root DNS traces dataset I studied.

  1. In 2020, 8.38% of queries hitting to B-Root had a ".consul" TLD - invalid by IANA's list of valid TLDs. This seems to be a large leak from Hashicorp's networking platform, Consul. If this is the case, the B-Root data predated the discovery of this leak by two years! See the relevant CVE.
  2. Appletalk, a proprietary networking protocol suite for Apple products released in 1985, consistently accounted for about 1% of all queries hitting B-Root. This possibly indicates legacy apple product usage.
  3. In 2014, 1.22% of queries sent to B-Root were of the Invalid TLD ".com/wawa" - was this due to Wawa, my favorite hoagie store, leaking data?
  4. In 2020, 0.18% of queries had the invalid TLD ".rac2va" - was this due to a misconfiguration of this router?
  5. In 2021, 0.66% of queries had the invalid TLD ".novalocal" - this seems to be due to a widespread misconfiguration of Openstack, as discussed on stackoverflow and in documentation


If you want to reproduce my results (and you have access to B-Root DNS data), you'll need to parse B-Root's DNS DITL trace format, FSDB. A nice presentation on parsing FSDB can be found here.

Further Reading

If you thought this paper was interesting, here are some other great reads.

  1. The Root of the DNS Revisited By Geoff Huston
  2. Reflections on Ten Years Past the Snowden Revalations - RFC9446 (can't believe this is an RFC)
  3. Statement on DNS Encryption


  1. Although all 13 root servers collect their historical data under the same program, DITL, the formatting of DNS traces across different roots isn't exactly the same, making large data processing challenging. Servers at DNS-OARC are also quite slow.

Bibtex Citation

author={Ginesin, Jacob and Mirkovic, Jelena},
booktitle={2022 IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (BDCAT)}, 
title={Understanding DNS Query Composition at B-Root}, 

Plaintext Citation

J. Ginesin and J. Mirkovic, "Understanding DNS Query Composition at B-Root," 2022 IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (BDCAT), Vancouver, WA, USA, 2022, pp. 265-270, doi: 10.1109/BDCAT56447.2022.00044.