Proxies are one of those ghostly conceits of the internet age. Countless systems and businesses depend on them, but very few people are sure of what they are, how they work, or how to properly curate them. However, they are something every digital marketing agency needs to be aware of, and to account for.
Let’s clear this up really quick — when I say “proxies” I’m not referring to the services that let you route your internet connection through a distant server, to obscure your location or identity. Those are completely separate, but they share the homonym with the proxies I’m talking about today because, in both cases, they are stand-ins for a real thing. Proxy servers mask user connections, while web proxies silhouette real people. It’s these web (user data) proxies that I’ll be discussing today.
(Keep an eye out for next week’s post on the other kind of proxy!)
Let’s Start with a History Lesson
Back in January of 2012, Google had the clever idea of aggregating user data from a number of different services, including Google Search, Calendar, Mail, YouTube, and a number of others, into a single and comprehensive user profile. Google’s well-meant intention was to collect data about viewing habits, search history, and so on in order to create a loose profile of a person’s demographic and interests to better target ads. (If you’ve ever been on Facebook, and seen an ad for something you just bought on Amazon, a cookie cached in your proxy is the culprit.)
How Do Proxies Work?
Similar to the Personas that we digital marketers use, proxies use a variety of digital data sources to compile a profile of demographic and personal data about each user. This data collection happens consistently, for every interaction a user takes on Google (every search, every click) and on many other sites, generally through Google Analytics.
It’s a massive body of data. Google searches alone average about 57 000 per second, or an annual total of at least 2 trillion. It’s impossible to overstate what a monumentally large body of data we’re talking about.
Are Proxies Accurate?
Since that body of data is so big, processing it into something usable depends on a lot of inferences and assumptions. When algorithms create proxies for users, they do so on the macro level, charting large-scale trends and applying them to individual profiles. The odds that a proxy will be completely accurate for any particular individual are relatively low, which is a trade-off for proxies being generally right on average.
Now, it would seem like this would have a lot of advantages. Ideally, proxies would make it easier to properly target ad content, to anticipate user interests, to track user engagement by demographic as they move through a site, and to remarket appropriately. But when proxies get it wrong, there’s a self-propelling snowball effect that can be incredibly difficult to spot, and harder still to correct.
Proxies Subtly Build Inertia
Web consultant, and author of the forthcoming Technically Wrong: Sexist Apps, Biased Algorithms, and Other Threats of Toxic Tech. Sara Wachter-Boettcher has written about a personal experience regarding her own proxy as interpreted by Google’s algorithm: it thought she was a man. Specifically, it pegged the then 28 year old woman as a man with an age somewhere in the 35–44 range.
As she tells it, she soon realized that a number of women in her professional circle shared her experience, as well as writers for Forbes (finance), Mashable (tech), and The Mary Sue (geek culture from a feminist perspective). Each of these women had search histories full of techy topics, and, in lieu of further data, got lumped in with a macro-level trend.
Here’s where it gets sticky.
Let’s take a moment and play this out, in practical terms. First assumption, most users will not disable Google’s data collection setting on their own profile (have you?), or otherwise take steps to mask themselves online (VPNs etc.) Second assumption: that data will usually be incomplete, so inferences will have to be made about things like age and gender for most users.
Since user data is more valuable when it includes interests as well as demographics, there is incentive to collect or infer as much data as possible.
So, let’s imagine an algorithm with the simple function of establishing gender and age of three hypothetical users based solely on search history. Out of hundreds of thousands of searches in each of their profiles, discarding outliers (topics with fewer than fifty discrete instances, say), trends will eventually emerge. The reason to discard outliers is because they probably won’t be representative. Imagine if User Alpha borrowed Beta’s phone briefly, or if Gamma looked up marsupials for a school project despite lacking any real interest in zoology.
Alpha seems to be interested in sci-fi, tech, literature, and philosophy.
Beta’s searches have shifted over the past twelve months from travel destinations and backpacker’s blogs to interior design tips, cribs and car seats for sale, Pinterest, and parenting blogs.
Gamma’s internet behaviour is primarily media-focussed, tending toward certain YouTube channels, especially video game streams, as well as toy reviews and unboxing videos.
Our humble algorithm, based on hundreds of millions of other user profiles, might notice some trends. It would probably identify Alpha as a university-educated male geek, Beta as a new parent, probably female, and likely in her mid to late twenties, and Gamma as a grade-schooler.
But would it be right?
There are plenty of women, including Wachter-Boettcher, who would fit Alpha’s profile, there is nothing in the data to suggest that Beta is a mother rather than a father, and Gamma is sixty years old, works in traditional marketing, and is constantly trying to keep abreast of what the kids are into so he can design his adverts appropriately.
But without that practical, corrective feedback, all the algorithm has to go on is the initial inference. If it assumes male users like Beta are female, unless there’s evidence to the contrary, then it will start to treat them as female. It will incorrectly report them as female when calculating web traffic.
If even one user is misidentified (say, a woman with an interest in tech, or a man reading about breastfeeding and parenting) then the algorithm, patting itself on the back and uncorrected, will be more likely to misidentify future users.
You can see where this is going — the proxy will become something of a self-fulfilling prophecy, and it’s going to build its own inertia exponentially quickly.
The Snowball Effect
My father once told me about a magnificent failure in advertising. A company was selling commemorative coins (sure to increase in value as collector’s items…) with the charmingly blundering promise that they had been “authenticated by our own seal!”
(“And if you like these, then have I got the Rolex for you!”)
The collection and interpolation of data, if based on macro-level trends rather than micro-level corrections, will, as Wachter-Boettcher points out, “make the system less accurate over time, not more, without you even realizing it.”
And this can have practical, tangible consequences.
Put yourself in the mind of the editor-in-chief of a website about, say, business and finance. If your user data starts to slant to reports of a primarily male readership, reflecting the broader trend at the expense of the very real, very meaningful number of female visitors who have been incorrectly reported by our algorithm, then you might make the salient decision to try to double down and focus on attracting more males, appealing to the group you are now coming to see as your primary demographic. In one stroke, you could be alienating a huge swath of your readership.
A good digital marketing agency, similarly, needs to be extremely vigilant when it comes to actually putting proxy data to work. There are a few checks and balances, with varying degrees of efficacy, that digital marketers can use to try to mitigate some that inertia.
For instance, a comments section that invites users to log in through Facebook or Twitter can inform demographic data for user engagement.
But as a b-corp certified digital marketing agency, with a focus on values-based marketing, we don’t always see the need to bring demographic data into play. For instance, if your readership seems to like articles about CEO and leadership training then what does it matter whether they’re male or female, or twenty or forty? If your customers start buying a particular product, then expand that product line with gender neutral branding.
That way, no one gets alienated, your business won’t miss out on opportunities, and you’ll be helping to keep the systemic inertia in check.
Colibri Digital Marketing
We’re the digital marketing agency San Francisco trusts to focus on the triple bottom line of people, planet, and profit. Based in the Bay Area, close to Silicon Valley, we’re the team with the sneak-peek into the future of digital marketing. If you’re ready to work with San Francisco’s first and only full-service B Corp-Certified digital marketing agency, drop us a line to schedule a free digital marketing strategy session!
Originally published at colibridigitalmarketing.com on November 2, 2017.