Tech Blog: Scaling Purchase Intent for the Firehose

Here at Chatterbox Labs we pride ourselves, not just on the accuracy of our Path 2 Purchase technology, but also our ability to operate at scale; we can cut through the noise and pinpoint multiple stages of purchase intent in real-time feeds of multilingual social data in real-time. We do this by embedding our technology in partner software stacks that process data at scale.

Operating in real-time on the firehose means that you really can capture those in-moment buyers now, not in hours, days or weeks.

But how do we do this? There are three main components:

  1. Do the hard work first. We use a statistical machine learning approach that decouples the learning stage (termed training) and the classification stage (that is, pinpointing the buyers). The training stage learns all the types of language people use to express their intent to purchase. This produces a series of language models and is computationally hard. Therefore we compute this offline and exploit the language models in a lightweight classification process in the partner's real-time feed.

  2. Internal parallelisation. In order to classify a social text, the classification stage performs many tasks. A standard system would try to execute these tasks one after another, however this ignores the power available in today's chips and multi core server CPUs. Our Path 2 Purchase library operates these tasks at the same time (in parallel). You can even take control of this at the code level and use your own Java ExecutorService that may be bespoke to your software stack.

  3. External parallelisation. Our system treats each of the social messages that are flooding down the firehose independently. This is great because the software environment that exploits our technology does not need to process one message after another, instead a distributed system of workers can process multiple messages at the same time.

In our benchmarking on a $1000 commodity computer, 148k classifications can be performed every second using 9 Java threads for execution.

This is monetisation on a massive scale, without the massive massive hardware investment one would usually expect.

For more information please contact Andrew Watson, VP Strategic Alliances ( or visit our website (