Tuesday, March 30, 2021

Google Is Testing Its Controversial New Ad Targeting Tech in Millions of Browsrs

Today, Google launched an “origin trial” of Federated Learning of Cohorts (aka FLoC), its experimental new technology for targeting ads. A switch has silently been flipped in millions of instances of Google Chrome: those browsers will begin sorting their users into groups based on behavior, then sharing group labels with third-party trackers and advertisers around the web. A random set of users have been selected for the trial, and they can currently only opt out by disabling third-party cookies.

Although Google announced this was coming, the company has been sparse with details about the trial until now. We’ve pored over blog posts, mailing lists, draft web standards, and Chromium’s source code to figure out exactly what’s going on.

EFF has already written that FLoC is a terrible idea.  Google’s launch of this trial—without notice to the individuals who will be part of the test, much less their consent—is a concrete breach of user trust in service of a technology that should not exist.

Below we describe how this trial will work, and some of the most important technical details we’ve learned so far.

FLoC is supposed to replace cookies. In the trial, it will supplement them.

Google designed FLoC to help advertisers target ads once third-party cookies go away. During the trial, trackers will be able to collect FLoC IDs in addition to third-party cookies. 

That means all the trackers who currently monitor your behavior across a fraction of the web using cookies will now receive your FLoC cohort ID as well. The cohort ID is a direct reflection of your behavior across the web. This could supplement the behavioral profiles that many trackers already maintain.

The trial will affect up to 5% of Chrome users worldwide.

We’ve been told that the trial is currently deployed to 0.5% of Chrome users in some regions. Users in eligible regions will be chosen completely at random, regardless of most ad and privacy settings. Only users who have turned off third-party cookies in Chrome will be opted out by default.

Furthermore, the team behind FLoC has requested that Google bump up the sample to 5% of users, so that ad tech companies can better train models using the new data. If that request is granted, tens or hundreds of millions more users will be enrolled in the trial.

Users have been enrolled in the trial automatically. There is no dedicated opt-out (yet).

As described above, a random portion of Chrome users will be enrolled in the trial without notice, much less consent. Those users will not be asked to opt in. In the current version of Chrome, users can only opt out of the trial by turning off all third-party cookies.

Future versions of Chrome will add dedicated controls for Google’s “privacy sandbox,” including FLoC. But it’s not clear when these settings will go live, and in the meantime, users wishing to turn off FLoC must turn off third-party cookies as well.

Turning off third-party cookies is not a bad idea in general. After all, cookies are at the heart of the privacy problems that Google says it wants to address. But turning them off altogether is a crude countermeasure, and it breaks many conveniences (like single sign-on) that web users rely on. Many privacy-conscious users of Chrome employ more targeted tools, including extensions like Privacy Badger, to prevent cookie-based tracking. Unfortunately, Chrome extensions cannot yet control whether a user exposes a FLoC ID.

Websites aren’t being asked to opt in, either.

FLoC calculates a label based on your browsing history. For the trial, Google will default to using every website that serves ads—which is the majority of sites on the web. Sites can opt out of being included in FLoC calculations by sending an HTTP header, but some hosting providers don’t give their customers direct control of headers. Many site owners may not be aware of the trial at all.

This is an issue because it means that sites lose some control over how their visitors’ data is processed. Right now, a site administrator has to make a conscious decision to include code from an advertiser on their page. Sites can, at least in theory, choose to partner with advertisers based on their privacy policies. But now, information about a user’s visit to that site will be wrapped up in their FLoC ID, which will be made widely available (more on that in the next section). Even if a website has a strong privacy policy and relationships with responsible advertisers, a visit there may affect how trackers see you in other contexts.

Each user’s FLoC ID—the label that reflects their past week’s browsing history—will be available to any website or tracker who wants it.

Anyone can sign up for Chrome’s origin trial. After that, it can access FLoC IDs for users who have been chosen for the trial whenever it can run JavaScript. This includes the vast ecosystem of nameless advertisers to whom your browser connects whenever you visit most ad-serving sites. If you’re part of the trial, dozens of companies may be able to gather your FLoC ID from each site you visit.

There will be over 33,000 possible cohorts.

One of the most important portions of the FLoC specification left undefined is exactly how many cohorts there are. Google ran a preliminary experiment with 8-bit cohort IDs, which meant there were just 256 possible groups. This limited the amount of information trackers could learn from a user’s cohort ID. 

However, an examination of the latest version of Chrome reveals that the live version of FLoC uses 50-bit cohort identifiers. The cohorts are then batched together into 33,872 total cohorts, over 100 times more than in Google’s first experiment. Google has said that it will ensure “thousands” of people are grouped into each cohort, so nobody can be identified using their cohort alone. But cohort IDs will still expose lots of new information—around 15 bits—and will give fingerprinters a massive leg up.

The trial will likely last until July.

Any tracker, advertiser, or other third party can sign up through Google’s Origin Trial portal to begin collecting FLoCs from users. The page currently indicates that the trial may last until July 13. Google has also made it clear that the exact details of the technology—including how cohorts are calculated—will be subject to change, and we could see several iterations of the FLoC grouping algorithm between now and then.

Google plans to audit FLoC for correlations with “sensitive categories.” It’s still missing the bigger picture.

Google has pledged to make sure that cohorts aren’t too tightly correlated with “sensitive categories” like race, sexuality, or medical conditions. In order to monitor this, Google plans to collect data about which sites are visited by users in each cohort. It has released a whitepaper describing its approach. 

We’re glad to see a specific proposal, but the whitepaper sidesteps the most pressing issues. The question Google should address is "can you target people in vulnerable groups;" the whitepaper reduces this to "can you target people who visited a specific site.” This is a dangerous oversimplification. Rather than working on the hard problem, Google has chosen to focus on an easier version that it believes it can solve. Meanwhile, it’s failed to address FLoC’s worst potential harms.

During the trial, any user who has turned on “Chrome Sync” (letting Google collect their browsing history), and who has not disabled any of several default sharing settings, will now share their cohort ID attached to their browsing history with Google. 

Google will then check to see if each user visited any sites that it considers part of a “sensitive category.” For example, WebMD might be labelled in the “medical” category, or PornHub in the “adult” category. If too many users in one cohort have visited a particular kind of “sensitive” site, Google will block that cohort. Any users that are part of “sensitive” cohorts will be placed into an “empty” cohort instead. Of course, trackers will still be able to see that said users are part of the “empty” cohort, revealing that they were originally classified as some kind of “sensitive.”

For the origin trial, Google is relying on its massive cache of personalized browsing data to perform the audit. In the future, Google plans to use other privacy-preserving technology to do the same thing without knowing individuals’ browsing history.

Regardless of how Google does it, this plan won't solve the bigger issues with FLoC, discrimination, and predatory targeting. The proposal rests on the assumption that people in “sensitive categories” will visit specific “sensitive” websites, and that people who aren’t in those groups will not visit said sites. But behavior correlates with demographics in unintuitive ways. It's highly likely that certain demographics are going to visit a different subset of the web than other demographics are, and that such behavior will not be captured by Google’s “sensitive sites” framing. For example, people with depression may exhibit similar browsing behaviors, but not necessarily via something as explicit and direct as, for example, visiting “depression.org.” Meanwhile, tracking companies are well-equipped to gather traffic from millions of users, link it to data about demographics or behavior, and decode which cohorts are linked to which sensitive traits. Google’s website-based system, as proposed, has no way of stopping that.

As we said before, “Google can choose to dismantle the old scaffolding for surveillance without replacing it with something new and uniquely harmful.” Google has failed to address the harms of FLoC, or even to convince us that they can be addressed. Instead, it's running a test that will share new data about millions of unsuspecting users. This is another step in the wrong direction.



from Hacker News https://ift.tt/3cAsgFW

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.