Google’s FLoC Is a Terrible Idea

Read the original article: Google’s FLoC Is a Terrible Idea


The third-party cookie is dying, and Google is trying to create its replacement. 

No one should mourn the death of the cookie as we know it. For more than two decades, the third-party cookie has been the lynchpin in a shadowy, seedy, multi-billion dollar advertising-surveillance industry on the Web; phasing out tracking cookies and other persistent third-party identifiers is long overdue. However, as the foundations shift beneath the advertising industry, its biggest players are determined to land on their feet. 

Google is leading the charge to replace third-party cookies with a new suite of technologies to target ads on the Web. And some of its proposals show that it hasn’t learned the right lessons from the ongoing backlash to the surveillance business model. This post will focus on one of those proposals, FLoC, which is perhaps the most ambitious—and potentially the most harmful. 

FLoC is meant to be a new way to make your browser do the profiling that third-party trackers used to do themselves: in this case, boiling down your recent browsing activity into a behavioral label, and then sharing it with websites and advertisers. The technology will avoid the privacy risks of third-party cookies, but it will create new ones in the process. It may also exacerbate many of the worst non-privacy problems with behavioral ads, including discrimination and predatory targeting. 

Google’s pitch to privacy advocates is that a world with FLoC (and other elements of the “privacy sandbox”) will be better than the world we have today, where data brokers and ad-tech giants track and profile with impunity. But that framing is based on a false premise that we have to choose between “old tracking” and “new tracking.” It’s not either-or. Instead of re-inventing the tracking wheel, we should imagine a better world without the myriad problems of targeted ads. 

We stand at a fork in the road. Behind us is the era of the third-party cookie, perhaps the Web’s biggest mistake. Ahead of us are two possible futures. 

In one, users get to decide what information to share with each site they choose to interact with. No one needs to worry that their past browsing will be held against them—or leveraged to manipulate them—when they next open a tab. 

In the other, each user’s behavior follows them from site to site as a label, inscrutable at a glance but rich with meaning to those in the know. Their recent history, distilled into a few bits, is “democratized” and shared with dozens of nameless actors that take part in the service of each web page. Users begin every interaction with a confession: here’s what I’ve been up to this week, please treat me accordingly.

Users and advocates must reject FLoC and other misguided attempts to reinvent behavioral targeting. We implore Google to abandon FLoC and redirect its effort towards building a truly user-friendly Web.

What is FLoC?

In 2019, Google presented the Privacy Sandbox, its vision for the future of privacy on the Web. At the center of the project is a suite of cookieless protocols designed to satisfy the myriad use cases that third-party cookies currently provide to advertisers. Google took its proposals to the W3C, the standards-making body for the Web, where they have primarily been discussed in the Web Advertising Business Group, a body made up primarily of ad-tech vendors. In the intervening months, Google and other advertisers have proposed dozens of bird-themed technical standards: PIGIN, TURTLEDOVE, SPARROW, SWAN, SPURFOWL, PELICAN, PARROT… the list goes on. Seriously. Each of the “bird” proposals is designed to perform one of the functions in the targeted advertising ecosystem that is currently done by cookies.

FLoC, which stands for “Federated Learning of Cohorts”, is designed to help advertisers perform behavioral targeting without third-party cookies. A browser with FLoC enabled would collect information about its user’s browsing habits, then use that information to assign its user to a “cohort” or group. Users with similar browsing habits—for some definition of “similar”—would be grouped into the same cohort. Each user’s browser will share a cohort ID, indicating which group they belong to, with websites and advertisers. According to the proposal, at least a few thousand users should belong to each cohort (though that’s not a guarantee).

If that sounds dense, think of it this way: your FLoC ID will be like a succinct summary of your recent activity on the Web.

Google’s proof of concept used the domains of the sites that each user visited as the basis for grouping people together. It then used an algorithm called SimHash to create the groups. SimHash can be computed locally on each user’s machine, so there’s no need for a central server to collect behavioral data. However, a central administrator could have a role in enforcing privacy guarantees. In order to prevent any cohort from being too small (i.e. too identifying), Google proposes that a central actor could count the number of users assigned each cohort. If any are too small, they can be combined with other, similar cohorts until enough users are represented in each one. 

For FLoC to be useful to advertisers, a user’s cohort will necessarily reveal information about their behavior.

According to the proposal, most of the specifics are still up in the air. The draft specification states that a user’s cohort ID will be available via Javascript, but it’s unclear whether there will be any restrictions on who can access it, or whether the ID will be shared in any other ways. FLoC could perform clustering based on URLs or page content instead of domains; it could also use a federated learning-based system (as the name FLoC implies) to generate the groups instead of SimHash. It’s also unclear exactly how many possible cohorts there will be. Google’s experiment used 8-bit cohort identifiers, meaning that there were only 256 possible cohorts. In practice that number could be much higher; the documentation suggests a 16-bit cohort ID comprising 4 hexadecimal characters. The more cohorts there are, the more specific they will be; longer cohort IDs will mean that advertisers learn more about each user’s interests and have an easier time fingerprinting them.

One thing that is specified is duration. FLoC cohorts will be re-calculated on a weekly basis, each time using data from the previous week’s browsing. This makes FLoC cohorts less useful as long-term identifiers, but it also makes them more potent measures of how users behave over time.

New privacy problems

FLoC is part of a suite intended to bring targeted ads into a privacy-preserving future. But the core design involves sharing new information with advertisers. Unsurprisingly, this also creates new privacy risks. 

Fingerprinting

The first issue is fingerprinting. Browser fingerprinting is the practice of gathering many discrete pieces of information from a user’s browser to create a unique, stable identifier for that browser. EFF’s Cover Your Tracks project demonstrates how the process works: in a nutshell, the more ways your browser looks or acts different from others’, the easier it is to fingerprint. 

Google has promised that the vast majority of FLoC cohorts will comprise thousands of users each, so a cohort ID alone shouldn’t distinguish you from a few thousand other people like you. However, that still gives fingerprinters a massive head start. If a tracker starts with your FLoC cohort, it only has to distinguish your browser from a few thousand others (rather than a few hundred million). In information theoretic terms, FLoC cohorts will contain several bits of entropy—up to 8 bits, in Google’s proof of concept trial. This information is even more potent given that it is unlikely to be correlated with other information that the browser exposes. This will make it much easier for trackers to put together a unique fingerprint for FLoC users.

Google has acknowledged this as a challenge, but has pledged to solve it as part of the broader “Privacy Budget” plan it has to deal with fingerprinting long-term. Solving fingerprinting is an admirable goal, and its proposal is a promising avenue to pursue. But according to the FAQ, that plan is “an early stage proposal and does not yet have a browser implementation.” Meanwhile, Google is set to begin testing FLoC as early as this month.

Fingerprinting is notoriously difficult to stop. Browsers like Safari and Tor have engaged in years-long wars of attrition against trackers, sacrificing large swaths of their own feature sets in order to reduce fingerprinting attack surfaces. Fingerprinting mitigation generally involves trimming away or restricting unnecessary sources of entropy—which is what FLoC is. Google should not create new fingerprinting risks until it’s figured out how to deal with existing ones.

Cross-context exposure

The second problem is less easily explained away: the technology will share new personal data with trackers who can already identify users. For FLoC to be useful to advertisers, a user’s cohort will necessarily reveal information about their behavior. 

The project’s Github page addresses this up front:

This API democratizes access to some information about an individual’s general browsing history (and thus, general interests) to any site that opts into it. … Sites that know a person’s PII (e.g., when people sign in using their email address) could record and reveal their cohort. This means that information about an individual’s interests may eventually become public.

As described above, FLoC cohorts shouldn’t work as identifiers by themselves. However, any company able to identify a user in other ways—say, by offering “log in with Google” services to sites around the Internet—will be able to tie the information it learns from FLoC to the user’s profile.

Two categories of information may be exposed in this way:

  1. Specific information about browsing history. Trackers may be able to reverse-engineer the cohort-assignment algorithm to determine that any user who belongs to a specific cohort probably or definitely visited specific sites. 
  2. General information about demographics or interests. Observers may learn that in general, members of a specific cohort are substantially likely to be a specific type of person. For example, a particular cohort may over-represent users who are young, female, and Black; another cohort, middle-aged Republican voters; a third, LGBTQ+ youth.

This means every site you visit will have a good idea about what kind of person you are on first contact, without having to do the work of tracking you across the web. Moreover, as your FLoC cohort will update over time, sites that can identify you in other ways will also be able to track how your browsing changes. Remember, a FLoC cohort is nothing more, and nothing less, than a summary of your recent browsing activity.

You should have a right to present different aspects of your identity in different contexts. If you visit a site for medical information, you might trust it with information about your health, but there’s no reason it needs to know what your politics are. Likewise, if you visit a retail website, it shouldn’t need to know whether you’ve recently read up on treatment for depression. FLoC erodes this separation of contexts, and instead presents the same behavioral summary to everyone you interact with.

Beyond privacy

FLoC is designed to prevent a very specific threat: the kind of individualized profiling that is enabled by cross-context identifiers today. The goal of FLoC and other proposals is to avoid letting trackers access specific pieces of information that they can tie to specific people. As we’ve shown, FLoC may actually help trackers in many contexts. But even if Google is able to iterate on its design and prevent these risks, the harms of targeted advertising are not limited to violations of privacy. FLoC’s core objective is at odds with other civil liberties.

The power to target is the power to discriminate. By definition, targeted ads allow advertisers to reach some kinds of people while excluding others. A targeting system may be used to decide who gets to see job postings or loan offers just as easily as it is to advertise shoes. 

Over the years, the machinery of targeted advertising has frequently been used for exploitation, discrimination, and harm. The ability to target people based on ethnicity, religion, gender, age, or ability allows discriminatory ads for jobs, housing, and credit. Targeting based on credit history—or characteristics systematicall

[…]


Read the original article: Google’s FLoC Is a Terrible Idea