Tech Blog: Why keywords suck for Social Purchase Intent

Keywords have a place in the world. When searching a social network, such as Twitter for brand data, they can work quite well. If you want data about Toyota for example, you can provide the keyword 'toyota' to one of the standard search interfaces provided by the social networks to receive a feed of data that contain this keyword.

Keywords have their limitations though. In particular, they really don't work well for identifying purchase intent. Which keywords would you use when searching for purchase intent? The word 'buy' seems to be an obvious start. However, if you try it out the combination of 'toyota' and 'buy' on Twitter you will very quickly notice that these keywords are used by corporations in their advertising, for example:

Example 1: "The time to buy a new #Toyota is now! Save big during the #ToyotaTime Sales Event at #HuntingtonToyota!..."

Now, clearly we are not going to sell any cars to the advertisers themselves. This type of data adds noise to the stream that doesn't help us monetize.

Next up we could try 'want', a quick search includes:

Example 2: "@Toyota_Hybrid drivers want to win the @24hoursoflemans #LM24 #goodluck #pesage"

Which again, is advertising. Another example is:

Example 3: "Does anyone want to come to the Toyota dealership with me at 12:20 for an oil change or can someone pick me up so I don't have to wait"

Which isn't advertising, but it also is not purchase intent; it is unrelated to purchase.

In fact, we find that only circa 3% - 5% (depending upon industry) of a data stream is related to purchase - once unearthed however, this hidden data is like gold!

Proponents of the keywords approach would argue that you can add more keywords and then some negative keywords with them (so that messages with these negative words are ignored). These long chains of keywords with their associated logic are termed boolean queries. The problem is, they really don't scale to the needs of enterprise software. Very soon the rules and lists of keywords become impossible to manage and, a lot of the time, simply don't work.

What is needed is a scalable, intelligent solution. Chatterbox Labs' Path 2 Purchase machine learning and natural language processing product does this. Instead of matching simple keywords, the classifiers are trained on real social data so that they learn the types of language used across the entire message (not just keywords) by real people when expressing their intent to purchase. In fact, Chatterbox's classifiers can pinpoint both Pre Purchase and Post Purchase intent, and can further break Pre Purchase down into Interest (early stage), Consideration (mid stage) and Lead Validation (late stage).

If we go back to our earlier examples, Chatterbox's Path 2 Purchase classifiers can determine that Examples 1 and 2 are "Advertising & Corporate Messaging" and Example 3 is "Unrelated to Purchase". With these important utility steps complete, the classifiers can pinpoint real purchase intent. Take the following three examples:

Example 4: "really want a toyota cressida"

This message is classified as Pre Purchase at the Interest stage (that is, early on the path to purchase).

Example 5: "I'm thinking of buying a car. Probably a Toyota. Something small but tough that can take a MN winter. Thoughts on where and what to buy?"

This message is classified as Pre Purchase at the Consideration stage (that is, mid way on the path to purchase).

Example 6: "I'M BUYING A CAR!!!!!!!! #soweird #ImOld #Toyota @Toyota"

This message is classified as Pre Purchase at the Lead Validation stage (that is, late on the path to purchase).

Using this statistical machine learning approach, which dynamically learns language, Chatterbox can cut through the noise and pinpoint the monetizable data in real-time.

For more information contact Andrew Watson, VP Strategic Alliances ( or visit