M3AAWG published a new document entitled "The Present and Future of the Public Suffix List."
If you're like most people, your reaction is probably, "What the heck is the Public Suffix List and why should I care about it?" The Public Suffix List is playing a crucial role in the operation of the Internet, yet it's maintained by a small team toiling in obscurity. XKCD captures it perfectly:
If that diagram doesn't make intuitive sense to you, there's a nice "explainer" for it at: https://www.explainxkcd.com/wiki/index.php/2347:_Dependency
The Role of the Public Suffix List
The Public Suffix List (PSL) is incorporated as a fundamental resource into some of the internet's most popular applications, including virtually all web browsers. The PSL includes the names of all the domains under which new (private) domains can be directly registered. When the PSL gets incorporated into applications, it is often used for security-sensitive purposes, including critical security decisions around domain ownership boundaries. The PSL plays a vital and security-sensitive role for the internet.
The PSL is a hand-curated list of top-level domains and "effective top-level domains" under which new domains can be directly registered within the Internet Coordination Policy, ICP-3 namespace. At the time of writing, the PSL contained over 9,000 entries.
Domain Name System Terms
To fully understand the function of the PSL, we need to explain a little domain name "lingo," first. Domain names like www.google.com are made up of labels separated by dots.
- The rightmost label (.com in this case) is the top-level domain (TLD)
- The middle label (google) is registered under that TLD.
- The leftmost label (www), when combined with the other two labels, represents a fully qualified domain name (FQDN).
In the simplest of worlds, all FQDNS would comprise exactly three labels.
If that were to be consistently true, life would be simple, and it would already be time for drinks and nibbles. Unfortunately, the reality is a little more complicated. The best way to understand the complexity of FQDNs is by considering some examples.
The first example, the domain that people always seem to trot out to help explain the problem, is the British Broadcasting Corporation (the BBC). The BBC uses the domain www.bbc.co.uk — a FQDN with four labels instead of three. Why four labels? Because .uk was already in use on Janet, the joint academic network. Thus, corporate users registered under co.uk rather than directly under .uk. Knowing that, we can now parse www.bbc.co.uk into three chunks:
- An effective TLD (eTLD), (.co.uk in this case). Even though .co.uk consists of two labels rather than one, that's still where most new businesses register their domain names in the UK.
- Then there's the registerable bit (.bbc)
- And finally, there's a hostname (www) which, when combined with the registerable bit and the effective TLD, gives us our FQDN.
Okay, so that's a little weird, but most should be willing to accept that in some cases TWO labels may effectively act in concert as if they were a single unified "real" top level domain.
Now let's consider two more FQDNs, each with more than three labels:
In the case of www.cs.uoregon.edu, there's:
- One label that's a TLD (.edu)
- A registerable bit (.uoregon is the domain registered and used by the University of Oregon)
- A subdomain (.cs, is the subdomain used by the Computer Science Department at UO)
- And a hostname (www), which, when combined with the other labels, results in an FQDN.
That's pesky. The existence of subdomains under a registerable domain name means we can't just work our way in from the left-hand side of the name.
In the case of www.springfield.k12.or.us, we've got:
- Three labels that make up an effective TLD (.k12.or.us)
- A registerable bit (.springfield)
- And a hostname (www), which, when combined with the other labels, results in an FQDN.
A Critical Security Control: The PSL Defines Domain Boundaries
So how does the world know where new domains get registered? And where does one customer's domain (and subdomain) stop and a second customer's start? Why is it k12.or.us in the case of the Springfield Public School district (instead of just or.us)? Is there some hidden rule or pattern? No, there's just a big table listing what's what, the PSL. One can determine where it’s permissible to register a domain in the DNS by consulting > the list of suffixes conveniently compiled from the various registries into the PSL.
The PSL identifies domain authority boundaries. These boundaries indicate security and trust relationships. If they aren't accurately delineated, you might end up with inadvertent access to someone else's domains or data–or they might end up with inadvertent access to yours. The PSL also helps to properly map certificates and web authentication cookies.
The PSL Keeps Certificate Issuance and Web Authentication Cookies Properly Scoped
Every certificate authority follows its own unique process for certificate scoping review within the guidelines established by the Certificate and Browser Forum. Without the PSL as one guide point, however, the chances increase that a certificate authority might accidentally consider issuing an overly inclusive wildcard cert, such as a wildcard certificate covering all *.k12.or.us.
If this were to occur, it could enable widespread impersonation attacks and would be a potential security problem.
Once user authentication has taken place and a web cookie has been set, who do you share that cookie with? Everyone? No! That would be crazy (if you've got that authentication cookie, you've got access to all the user's data on that site).
On the other hand, there are some sites that are obviously "related" where you'd like to allow access without requiring the user to log back in for each site. Getting the set of sites that are related (and by implication, the set of sites that AREN'T related) is critical:
- Too broad? You'll allow access you shouldn't. Accounts will be insecure, and private information may end up getting compromised.
- Too narrow? Users will continually find themselves needing to re-authenticate, which can be frustrating and lead to a poor user experience.
So how do browsers and other applications know what's related and what isn't? Yep, they check a copy of the PSL to see where new domains get registered.
You don't want web authentication cookies to be set or read by unintended parties. Establishing a k12.or.us entry in the Public Suffix List helps ensure that someone from one school district (perhaps the Salem Keizer Public schools, using hosts in salkeiz.k12.or.us) won't be able to set or read cookies from domains used by another school district (such as the Springfield Public Schools, using hosts in springfield.k12.or.us). The Public Suffix List helps keep those divisions of control–and much more–properly scoped.
This is just one example of why the PSL plays a critical role in the security and privacy ecosystem, but it’s enough to make it clear that this thing's important.
Something This Important Must Be Something Run by the "President of the Internet," Right?
No. First, the internet is decentralized and there's no one in charge of it all. Many areas of the internet work only because everyone works together and agrees on basic principles about it. This is a nice example of that decentralized and cooperative spirit. People use and rely on the PSL because it meets an important need, and to this point, has been carefully administered.
But you need to understand the other important thing about the administration of the PSL: it's done by a small team of volunteers as a labor of love.
The PSL is hosted on a cloud delivery network (CDN) via Google Cloud by the Mozilla Foundation. However, the PSL is maintained by a very small team of individual volunteers, working on a best-efforts basis, with no service level agreement (SLA), no formal contract, and no planning for sustainment or succession.
If the PSL were to cease being maintained (perhaps because of volunteer burnout, health issues, legal action, or some other unforeseeable catastrophic event), the impact of its abandonment would be quite disruptive and could potentially affect the stability and security of the entire internet ecosystem.
Internet users all appreciate the PSL volunteer maintainers' efforts (and they have done a great job to date, for which the present writers are eternally grateful), but critical infrastructure (and the PSL should be considered critical infrastructure) must be sustainable and resilient.
The PSL needs sustainable funding and support from a foundation (or from multiple businesses whose products rely on the PSL) if it is to continue to exist, along with succession planning to ensure it does not become an abandoned open-source project.
The Limitations of the PSL's Distribution and Format
Anyone can download a copy of the PSL using their web browser. It is unsustainable for even a tiny fraction of the internet's billions of users to routinely, directly download a copy of the PSL.
As a result, many significant third-party projects that use the PSL work around this by downloading one copy of the PSL and then embedding and sharing that information as part of their code or compiled application. Unless a new and updated release of the third-party software project is generated when the PSL is updated, it's highly likely the PSL hard-coded within their code or application is out-of-date.
The PSL is just a plain text file of lines. This simple format, while easy to parse and support, allows just four types of entries: comments, exact match entries, wildcard entries and exception entries. There's no clear way of associating attributes with listings as one could if the PSL were to use a proper database and the wildcard and exception entries are open to the potential misinterpretation of those who integrate, parse, or otherwise utilize the PSL.
Do you know if your systems or programs are using the PSL? And if they are, when did you last update your copy of it? If you use the PSL, make sure your copy is current, and that someone is responsible for keeping it that way.
If you are responsible for a TLD or major eTLD, review your PSL entry(ies) and make sure it is properly represented. If it isn't, review the submission guidelines and submit an appropriately formatted pull request in the PSL git repo along with the DNS-based submitter validation records to update your entry or entries.
Thinking About a More Scalable Future and Engaging with PSL Advocates and Volunteers
The PSL has done amazingly well given its very simple implementation, but we know that at some point in the future it will likely run into scaling problems-every system that depends on people copying flat files eventually tends to do so. Are you old enough to remember the "hosts" file that preceded the Domain Name System? If not, check out https://en.wikipedia.org/wiki/Hosts_(file) to get a sense of why a simple flat file model won’t scale to Internet volumes.
If you are someone who does protocol development, please consider engaging with the Internet Engineering Task Force (IETF) project known as dbound where alternatives to the current non-scalable, simple-flat-file-download-from-the-web model are being considered.
If you can help the PSL, whether in terms of working as a volunteer or ensuring the PSL is organizationally sustainable, please do! M3AAWG is advocating for an industry coalition to advise, support, rally sponsors, and define a path forward for the Public Suffix List.
Thanks for reading a little about the Public Suffix List and do check out the new M3AAWG document around it at M3AAWG Present and Future of the Public Suffix List.