The Butler Did It! Machine Learning, Natural Language Processing, and How Digital Assistants Are Re-Carving the SEO Landscape

MACHINE LEARNING, NATURAL LANGUAGE PROCESSING, AND SEO

By now, odds are you’re already aware of Google’s commitment to developing a functional natural language search engine. Their mobile-first restructuring has stolen the spotlight in recent months, but their advances in machine learning, natural language processing, and the sheer linguistic competence of their algorithms is still monumentally impressive, and these innovations and developments demand a reconceptualization of what it means to practice good SEO.

A primer for those new to the field, “speech recognition” refers to the ability to accurately generate text from spoken language, while “natural language processing” describes to the ability of a piece of software to react to spoken or written language as fluently and conversationally as a native speaker. This is most easily developed with one form or another of “machine learning” which uses adaptive algorithms that alter themselves incrementally, by receiving feedback, in order to more consistently arrive at the appropriate output for a given input.

With enough experience, these algorithms thereby mimic understanding, at least for practical purposes. Machine learning is related to “artificial intelligence”, and it’s not too big a stretch to say that it is the next major frontier in computing.

TECHNOLOGY HAS COME A LONG WAY

When you think of a voice assistant, you may be imagining a gimmicky, unstable tech trinket. We all suffered the grandiose promises of speech-to-type software that was so thoroughly unhelpful, producing a product so riddled with errors that we would have been better off without it. Today, however, we’ve all but perfected speech-to text, and it isn’t so much of a talking point.

We don’t recognize the technical hurdle in parsing spoken language. How do you teach a computer to choose between “Tell her I see, under where we were standing, her masala and her sweets” and “Tallahassee underwear whew err Stan ding Herman salamander suites”? How should a computer know which set of words is nonsense? Even the simple act of transcription requires a certain predictive reasoning for the probabilities of syntax, context, and semantics.

The technology is now firmly on the cusp of real, practical usability as something even greater than a glorified transcription service. It will soon be able to do more than hear your words and retype them. It will soon be able to listen to commands and actually follow them appropriately.

NEW POINTS OF ACCESS

Voice search is popping up everywhere. It’s on our cell phones, our game consoles, and, increasingly, our desktop computers. Siri depends on it, Android’s “Google Now” is built on it, and the Windows 10 Taskbar, for Bing searches, now incorporates it as well.

Moreover, our use of this technology is firmly on the rise. Voice search now makes up 10% of searches worldwide, according to voice interface specialist Timothy Tottle (MindMeld), up from 2015, when voice searches were “statistically zero.”

While Siri, one of the earliest commercially mainstream voice assistants, made her debut in October of 2011, over the next few years Android’s Google Now (2012), Microsoft’s Cortana (2014), and Amazon’s Alexa (2016), as well as a huge variety of smaller apps, have been gradually refining their speech recognition technologies and developing a broader range of capabilities. By 2016 standards, for instance, Apple’s 2011 version of Siri (iOS 5, on the iPhone 4S if you can believe it) was clunky, unreliable, and frequently more of a burden than a boon.

Today, Siri, like the other voice assistants mentioned, is much more capable, with a broader set of available actions and better speech recognition to boot.

Natural language processing is similarly present in Google’s text-based searches. Our search queries are becoming almost conversational phrases, where they used to be little more than strings of keywords.

SPEAKING GOOGLE

Think back to 2001. It would have seemed absurd to ask Google “Where is The Mummy Returns playing near me?” You would have typed something like “The Mummy Returns Movie Listings”, maybe adding “San Francisco, November 16” if your first query didn’t return helpful results.

That’s the thing. We all spent the 90s and early 2000s learning to speak to Google through an oddly specific pidgin language. We strung a series of nouns together, from broadest to narrowest, because we treated Google like a sort of filing system for the rest of the internet. We searched it as we would any database, narrowing our topics.

This pidgin had several drawbacks. While it was pretty good at finding words related structurally (listing, listings, listed, list, etc.) it had a harder time coming up with conceptual parallels (listings, showings, show-times, etc.) And it depended on us knowing exactly what we were looking for, ahead of time. To find the forgotten name of the actor, we might have had to follow “The Mummy Returns” down the garden path, until we finally found a chance reference to Brendan Fraser.

TRANSITIONAL PHRASING

It wasn’t long before the pidgin became a creole, of sorts, which expanded its functionality from the starkly literal to the deeper intention. We might now ask “The Mummy Returns + Actor” to reliably find Brendan Fraser, or “The Mummy Returns” to find its quality (a surprising 47% on Rotten Tomatoes).

We could even begin to ask simple questions, even without knowing the keywords we might have needed to solve them. Our search queries now allowed for interrogatives and adjectives, alongside the simple nouns. This typically led to sidelong questions and indirect workarounds, since we couldn’t be completely specific in our requests, yet.

If we didn’t know a particular keyword (say, “filmography”) we might be able to get around it with a query like “what movies was Brendan Fraser in”, a clear upgrade from 2001’s “Movies Brendan Fraser” approach which would still require a fair bit of manual research to get to the right answer.

It’s at this point that SEO as we know it really came into being. Taking Google as a particular example, there weren’t many ways to optimize a page for the pidgin language, save for stuffing an inordinate number of keywords wherever they would fit. (Remember, in 2001, Google was looking primarily for a word match by volume rather than something contextually relevant.)

As our creole developed, and Google began to group words into simple webs of related meanings and shared contexts (movie, film, show, show-times, actor, star, lead, character, etc.) the SEO potential for digital marketers skyrocketed.

From this point forward, search and SEO developed in tandem. As Google became more functional, users began to experiment with new kinds of queries. Tracking those queries, and making sure your content included those phrasings, has been the foundation of SEO ever since.

Soon, other interrogatives began to work their ways into our searches, as we increasingly began to conceptualize Google, and other search engines, as resources rather than simple aggregators. We might now ask “Can rabbits eat zucchini?” or “How to fix a leaky faucet?” but we still hadn’t crossed the chasm from keyword matching (i.e., finding words similar to the ones we searched for in a comparable phrasal permutation) to actual natural language processing. We still presume that Google will find a string of words more readily than it will find something to meet the semantic content of the query.

That’s the next step, and it’s coming fast.

THE BUTLER DID IT

Increasingly, we expect Google to follow the verb, like a butler, and fetch us the nouns. A big part of that push is the hardwired human capacity for language. In my piece on Chatbots, I talked about how humans couldn’t help themselves from speaking, projecting human characteristics onto inanimate things, and Google is no different. As we start voicing our queries aloud, we slip into our native language — forgetting the limitations of the creole — and we let imperatives and pronouns and secondary clauses sneak in.

Technology is rising to our challenge.

At a tech demo last year, Viv, another voice assistant, was able to answer convoluted, multi-part questions like “Was it raining in Seattle three Thursdays ago?” and “Will it be warmer than 70 degrees near the Golden Gate Bridge after 5PM the day after tomorrow?”

As part of Google’s 2016 I/O Developer Conference in May (2016), Google Assistant was able to answer follow-up questions without the need to restate the original subject. In the demonstration, Assistant was asked “Who directed The Revenant” and the follow-up “Show me his awards” without needing to restate director Alejandro Iñárritu in the second query.

This kind of functionality opens up a completely new kind of search, and digital marketers will need to react accordingly. With this new functionality, the ability to interpret rather than just match, a user isn’t required to have an endpoint in mind.

“Find me pictures of that dog the queen has” ought to, and soon will, bring up pictures of Corgis, even if none of those particular words is anywhere on the page. Prior to natural language search, a user would have either needed to have already known the name of the dog breed, or would have had to search for the Queen’s dog first, and the images of Corgis second. The voice assistant bridges the gap, and (crosses the chasm) from “search tool” to “butler”, and follows instructions rather than just matches phrases.

JUMPING THE GUN, OR OFF TO THE RACES?

Our expectations are ahead of the technology, but good SEO will soon need to focus more on providing context and descriptions to indicate what your content actually is rather than just what it says since finding the thing will soon trump finding a phrasal echo. Google has already anticipated this, and has recently announced that their algorithms will use the structured data templates and vocabulary from Schema.

Digital marketers take heed — keywords are waning and context is waxing. If users are no longer required to have a clear and specific end-point in mind (asking a question to which they do not know the answer), and if voice assistants (like Google Assistant) are filling in the gaps, then it will be incumbent upon us to ensure that our content can answer the broadest number of questions as possible rather than just matching the greatest number of key phrases. By being clear about what your content is and what niche it is filling, you make your site visible to queries that didn’t know they were looking for you, and that’s the next phase of SEO.

This nascent technology is still finding its footing, but, even so, this change to SEO is more than just speculative. In his keynote address at the I/O Developer Conference, Google CEO Sundar Pichai gave some interesting data about voice search. Voice, he said, accounts for a fifth of searches on the Android Google App, as well as a full quarter of searches from the Windows 10 taskbar. An astonishing majority of these searches come from mobile users, who want local results.

Phrases like “near me” have entered the popular lexicon as a shorthand for local search in general, and it’s a perfect example of why context and clarity are already paramount.

It does no one in San Francisco any good to see a New York steakhouse website, stuffed with keywords and peppered with html tags, when all they wanted was a place to eat. By providing location data in the appropriate format, a restaurant might snag all the hungry traffic within a 45 minute radius, just by doubling down on adding the appropriate context to its site (where it is, what it does, how to find it, what kind of fare it serves, whether people have been satisfied by their experience) in the appropriate formats.

Armed with that kind of data, a digital butler can make an informed and confident recommendation that could never have been gleaned from simple keywords.

THE FUTURE IS NOW

One of the best things about machine learning is that it develops more quickly with more feedback. Google alone handles around 500 billion searches per month, globally. If a full fifth of those are voice searches, using natural language processing and voice assistants, those are now 50 billion data points of feedback per month, and the algorithms will refine themselves accordingly.

Natural language processing is literally getting better by the second. Don’t get left behind!

Colibri Digital Marketing

We’re the digital marketing agency for the twenty first century. Based in San Francisco, we’ve got our fingers on the pulse of Silicon Valley, we’ve got an insider perspective on the tech industry, and we get a sneak peak at the future of digital marketing. If you’re ready to work with the best, drop us a line or click here to schedule a free digital marketing strategy session!

This post was originally written and posted by Andrew McLoughlin. It is republished here on on behalf of Colibri Digital Marketing. Thanks, Andrew, for a great read!

I work to make the web a more beautiful, accessible, and functional place. I use dreams as a form of planning. And I play because it’s fun.