Category Archives: predictive-analytics

The YouTube recommendation engine: a lesson in transparent analytics

Recommendation engines are all the rage. Whether it is in the realm of social commerce (see IBM Coremetrics Intelligent Offer) or in location-based social applications like Foursquare.

As the attention span of the browsing population shrinks below that of your average goldfish, so the need to create razor-sharp, perfectly honed navigation systems increases. There’s a demand on publishers to use whatever information they have to provide a more contextualized browsing experience.

That’s all well and good, but have you ever looked at a recommendation and wondered what on earth was the system thinking when it picked it? You’ve spent months on the site exploring hardcore thrash metal so why on earth are you being offered a book on floral knitting patterns?

I just went onto Youtube and noticed that they have actually got pretty transparent with their recommendation engine:

Notice the ‘because you watched’.

As we use more analytics systems to build interfaces, being explicit about how decisions are made becomes increasingly important: 

Show what determined the recommendation: This answers the question of why on earth am I seeing this recommendation? In this instance YouTube bases this on what you have watched previously, but this could just as well relate to what others with similar interests have liked (the Amazon approach).

Allow you to interact with the recommendation: YouTube allows you to remove recommendations from the list that you don’t think are appropriate. One thing it doesn’t do is spell out whether that feedback is factored into future recommendations. Some systems (such as Pandora and Netflix) use a thumbs-up/down or rating system with the implicit understanding that this information will be fed into the calculations of future recommendations. As James Taylor, the Decision Management expert pointed out to me some years ago, recommendation engines have their limit. If I booked a once-in-the-lifetime trip to Bermuda last year, there’s no point in showing me vacations to Antigua six months later. Allowing me to vote this kind of recommendation down can help systems disentangle one-shot whims from longer term patterns of behavior.      

The question of privacy: Being transparent about analytics systems and and how exactly visitors are being tracked can go a long way to allay the growing public fears around the growing mountain of data produced by the internet in general and social networking sites in particular. Indeed, here in California there has been considerable press around a bill to increase the privacy of social networks. Justin Brookman, director of the Project on Consumer Privacy at the Center for Democracy and Technology has said, “I think the idea of telling people what is going on and giving them control over their information from the beginning is a good idea for social networks and others places as well”. Privacy advocates are asking publishers to be more open about how data is being used.

As user interfaces become more reliant on analytics tools to offer a more personalized experience, there are significant advantages to displaying upfront exactly why we are being shown the recommendations we see.

McMaster University ties up with IBM Business Analytics to create energy-efficient buildings

Business analytics can play an increasingly important role in academia these days. Recently we heard about how business intelligence could be used to locate students in need of further assistance by mining data on lecture attendance and course performance. This week we see a press announcement detailing how Canada’s McMaster University is using business analytics to make its buildings greener.

So how exactly does the technology help reduce operating costs and cut greenhouse gas emissions?

On the one hand, a series of sensors, actuators and meters collect real-time data on energy consumption and temperature levels. When combined with dynamic-pricing data, this can give an accurate reading of exactly how much it costs to use a certain amount of energy at a certain time of day.

What can McMaster do with all this information?

  • Assess how much given building is costing them to run
  • Track energy usage and cost across the whole campus
  • Forecast what future costs and usage will be based on past performance
  • Simulate different environments and scenarios to understand more about energy usage and cost under different climate conditions
  • Optimize energy consumption based on the forecasts and simulations

Looking across the 60 campus buildings, the system will be able to identify under-performing buildings and the causes of energy inefficiencies. With the help of IBM Business Analytics technologies, McMaster will improve its decision making process and raise the bar for sustainable, cost-saving building management practices.

Business analytics reduces cardiac surgery mortality rate by 50 percent

A wonderful case study came out last week of how Sequoia Hospital is using business analytics software to inform heart surgery doctors on when to operate and what is the best post- and pre-op care option.

The hospital is using IBM SPSS predictive analytics software to sift through heavy data spanning healthcare databases, medical precedents and real-world medical cases:

"For instance, the software revealed that an anticoagulant drug often given to patients after a heart attack dramatically increases the chances of serious postoperative bleeding. Based on that information, Sequoia was able to put a protocol in place to stop the drug at least five days prior to surgery to allow the patient’s platelets to recover and significantly reduce bleeding events."

Why can business analytics play such an important role in healthcare?

Statistics is at the core of modern medicine. Whether it is measuring the response of a sample group against a control group to decide whether a new drug outperforms the placebo, or whether it’s tracking through a national database of asthma sufferers in search of factors leading to increased instances of attacks, the medical profession relies heavily on statistics-based predictions. Software such as IBM SPSS solutions have the tooling to crawl through this data and help medical professionals make more informed decisions.

As more hospitals and medical facilities switch to electronic record systems, the amount of medical data that computing systems can access mushrooms. This requires more sophisticated, powerful applications (such as those that can tie together unstructured data from various sources). The payback is a larger sample group, diminished margin of error, and a performance increase in the delivery of healthcare. Whereas in the past a hospital would have only had its own records as evidence when deciding on a course of action, now state-wide or nation-wide information can be mined.   

More information on how IBM SPSS solutions is transforming the medical profession

What would YouTube want with a recommendation engine?

Techcrunch recently reported that Google (as the owner of YouTube) is looking at the purchase of Twitter-based movie recommendation site Fflick. Judging by the Fflick site today, this is more than just idle rumor:


What are the implications for YouTube?

On the one hand it signals a more concerted effort from the beefy video sharing site to play nicer with the other social networks in the playground (or at least hold hands with Twitter whilst working out a relationship with arch-rival Facebook). It could mean we see the kind of functionality in YouTube present on other video networks like Livestream: a display of all the Twitter backchannel related to a piece of content. For instance:


See the running commentary down the right? This is more ‘chat’ than ‘comments’ with tight integration with Facebook/Twitter. 

On the other hand, it also opens up the possibility of YouTube to start mining user data to offer recommendations. How useful is this on a video site? Just look at the Netflix story. The popular US video rental service has made a big deal of its ability to guess what movie you want to add to your rental wishlist. It bases its recommendations on what you’ve seen in the past, how you rated it, what others like you have seen (and a bunch of other variables even including the day of the week on which you’re viewing the site!) Netflix prizes this technology enough to have made it a central part of the site navigation and even paid a team from AT&T $1 M for coming up with a winnning algorithm in 2009.

YouTube has a much bigger collection of content, a wealth of behavioural data through its huge viewing figures. It generally knows less about its visitors than Netflix does as the site doesn’t require you to login to engage. Potentially, that’s where the Twitter piece comes in to play: you give up some of this information about yourself each time you tweet. Fflick provides the service to tie the tweet back to the video. Fflick also provides the service to pick through your Tweets and use these to determine what content you might like to see next.

This kind of application of predictive analytics is hot right now in the social media space. Foursquare is believed to be using predictive analytics to keep Facebook at bay in the location-based-services sector.

Social media is making us increasingly impatient and we are starting to demand more from our interfaces. Add to that the growing market for hand-held devices that offer precious little space for content, let alone navigation, and you have a compelling case for services using whatever technology they can to pinpoint what you probably want to do next, and serve that up. If they don’t engage, the next video-sharing site is only a short URL away.

More on the Fflick acquisition

IBM Watson: is artificial cleverness the same as AI?

Let’s start with the obvious: this is the opinion of one mere human. Someone who would fail miserably at the US quiz show Jeopardy: it’s that ‘start-with-the-answer’ approach that just screws me up every time. Not being a native of this soil, I claim it’s just not part of my DNA.

But an IBM supercomputer called Watson (which was indeed conceived on US soil) appears to be performing awfully well at the contest and as such is causing a lot of media attention, much of it centered around the whole field of artificial intelligence (AI) and IBM’s involvement in this area.

As PC World reports, Watson overcame two Jeopardy all-time champs in a practice round recently. How does it do this? The silicon contestant has read countless encyclopedias and other tomes, contains natural language processing capabilities and can even determine how confident it is in its response. Couple this with industry-leading computational power and you have one efficient competitor.

IBM has a history in the development of pitting computers against humans on the cerebral battlefield. In the late ‘nineties, Deep Blue defeated chess grandmaster Gary Kasparov (although Kasparov disputes that he was indeed beaten). However the team behind the Watson project are quick to point out that the level of computing required to deal with the high-level semantic reasoning they are up against is different to the logic-bound nature of chess. Chess is a game of limited moves on an 8×8 grid; Jeopardy a game of infinite words.

I can’t help but think back to my Philosophy of the Mind classes where we studied the Turing test – that black box approach to measure AI proposed by Alan Turing in the 50s. Sometimes called the ‘imitation game’, the concept was that if someone could ask questions to a black box and not discern whether a computer or a person was inside, you could attribute intelligence to the machine on a par to that which us humans enjoy. This Stanford article does a good job of discussing the Turing Test and its objections in some detail.

One objection that stands out is that of origination: could a computer do more than just perform tasks (or deal with questions) set by humans? In the case of Watson, it was a team of people within IBM Research that came up with the idea to build a supercomputer to compete in Jeopardy. The motivations? Showcase technology. A fun work-related project. Team-building. The question is whether a computer could have had the ‘wisdom’ (foolhardiness) to come up with the idea of the project in the first place.

I’d suggest this level of decision-making is a quantum leap beyond the semantic analysis of IBM Watson.

Jonah Lehrer, in the provocatively-titled Proust was a Neuroscientist, uses the filter of art to illustrate what neuroscience is uncovering about the complexity of our intelligence. Within the poetry of Walt Whitman you find the idea that feelings and emotions are born in our bodies, not our minds:

"Antonio Damascio, a neuroscientist who has done extensive work on the etiology of feeling, calls this process the body loop. In his view the mind stalks the flesh; from our muscles we steal our moods."

You can’t separate our thought process from our bodily existence. This could be a problem for a computer lacking flesh and bones.

I don’t just bring this up in the vein of being a contrarian or mean-spirited towards what is quite an astounding piece of computing. I think there is a message here that relates to the technology at the core of Watson: business analytics.

Decision-making within the enterprise happens at different levels and business analytics doesn’t necessarily apply at all of those. For instance, business analytics is ideal at helping a marketer pinpoint prospects who might be interested in a particular offer. It’s less good at determining whether that same marketer should run a conference program if they’ve never run one before. We’re still not close to being able to automate that intuitive part of the decision-making process in business.

Last year I sat in a discussion around decision management and heard from a product marketing manager that a barrier to adoption of business analytics systems is the fear from decision-makers that this technology will take away their jobs (the very same people who normally sign the check on these kinds of purchases). This would suggest we in the field of business analytics need to do a better job of explaining that there are some decisions that can be automated and others that cannot. Business analytics consists of a set of tools that us humans can use to make smarter decisions, but like all tools, it has limits.

So whilst IBM Watson shows what computers can achieve in the human realm, it’s worth bearing in mind (no pun intended) that computers pose little threat to the human realm. The Jeopardy contest that is coming up on February 14 is a battle of one computer against 2 humanoids. If Watson wins, we’re not talking about the dawn of a new era where Jeopardy is played out by tin robots bearing the IBM insignia. We are talking about a triumph of a technology that has applications in healthcare and customer service and beyond – a technology that remains a tool in the hands of us mere humans.

More about IBM Watson, including some wonderful videos on its construction

(Image courtesy of The Doctor Fun Archive)

The growing role of predictive analytics in data center management

Crisis management is generally a costly business. Switching gears away from your forward-thinking strategy and pulling in resource to deal with issues not on the radar can really stymie growth and efficiency.

Especially in the IT management space.

In the past, the role of the IT manager was largely reactive: as soon as a problem occurs, they would have to jump in and manage the crisis. This was, and continues to be, a costly exercise for IT departments – often costing organizations millions of dollars annually.

Investment in predictive analytics has the potential to drastically reduce the surprises faced by IT management. In a recent article in Enterprise Networking Planet, Drew Robb shows how predictive analytics can be used to monitor networks across enterprises and mine behavioral patterns to get out in front of potential issues like usage spikes and plan for them before they occur. As IT moves towards virtualization and cloud models which allow for flexibility in terms of resource allocation, predictive analytics really comes into its own as a tool to help manage these spaces. For instance, with a cloud-based installation, resources can be deployed or changed in minutes, rather than weeks. If you have multiple users and applications on the installation, predictive analytics can be used to determine where resources should be apportioned prior to any impact on service levels.

Maintenance isn’t only the area where predictive analytics play a role.

Steven Sams, IBM’s vice president of Global Site and Facilities Services points out that by 2012 global data storage capacity will need to be 6.5 times what it is today (fueled largely by internet cloud-based services). He recently explained to Forbes’ Quentin Hardy how predictive analytics can be used by data center managers to plan for this growth:

"Tech planners need the same kind of big pattern-finding software more commonly used by designers, chief executives, and finance types. Among the new analytic offerings from IBM are cash flow-based scenario software, for figuring out whether to build, consolidate, or do nothing"

Obviously these decisions can have serious implications on business operations and costs. Sams highlights a Chinese bank that has managed to go from 38 to 2 data centers with a cost saving of $180 million a year using this technology. To better serve this market, IBM has launched a predictive analytics tool for use by the Global Business Services division on data center engagements.

As we move into 2011 and beyond, predictive analytics can play a major role in the way IT departments manage data centers and their operations. Given what’s at at stake, expect to see a lot more interest in this area.

Foursquare to use predictive analytics to beat Facebook?

There’s a growing battle in the location-based services business between Foursquare and Facebook. Foursquare, with its past emphasis on gaming and status building (who wants to be the mayor of the local laundromat?) is now focusing on a more functional aspect: helping people decide where they should go next. According to a report in Brandweek (backed up by this article on a recent job ad), Foursquare sees offering recommendations as its chance to avoid being squeezed out of existence by Facebook, who, with over 500 million users, is the ostensible gorilla in the room.

How does it plan to do this? Brandweek suggests it will adopt predictive services which are common on sites like Amazon and Netflix:

"Those services crunch behavior data—what movies you watch and books you read—to suggest new products. Foursquare wants to do the same, only with recommendations of real-world activities."

For instance, let’s say you are a sushi freak living in Chicago who’s been active on Foursquare for the last year. You’ve been using Foursquare to capture badges for most of the top local Japanese eateries. Foursquare can see your penchant for fine sushi in the windy city and look across its network for others in your area who share the same passion. It realizes that there is a new joint downtown and can suggest you check this out.

How does this crunching work? The data is mined along a process which runs something like this for each individual visitor:

  • What are the past actions you have recorded
  • What patterns can be determined from your actions
  • Who else in the network is like you
  • Where are the gaps between your actions and their actions?
  • Offer as predictions these actions that people like you have performed

Note, this obviates the need for a user to fill in a vast registration form listing all their likes and interests. The system can figure this out by looking at past behavior.

In terms of making predictions, systems need to be smart enough to factor in elements that can cause shifts in our patterns of behavior:

  • Seasonality (no taste for raw fish when snowing)
  • Change in tastes (eg. pregnancy pushes sushi off the menu)
  • Removing system bias (eg. not only favoring well-established popular places, but allowing new entrants a chance to prove themselves)

Whether Foursquare makes a concerted move in this direction remains to be seen, but as web and mobile applications creep further into every aspect of our existence (with their inherent ability to track behavior), expect to see an increasing use of business intelligence and predictive analytics to create smarter systems offering us more relevant information.

Using structured data analytics to make better business decisions

imageIn the current edition of Analytics, a cross-brand team from IBM (Irv Lustig, Brenda Dietrich, Christer Johnson and Christopher Dziekan) explain IBM’s view of the structured data analytics landscape.

Key to this model are three categories of structured data analysis:

1. Descriptive Analytics: A set of technologies and processes that use data to understand and analyze business performance
2. Predictive Analytics: The extensive use of data and mathematical techniques to uncover explanatory and predictive models of business performance representing the inherit relationship between data inputs and outputs/outcomes.
3. Prescriptive Analytics: A set of mathematical techniques that computationally determine a set of high-value alternative actions or decisions given a complex set of objectives, requirements, and constraints, with the goal of improving business performance.

As the authors explain, this model can help businesses make better decisions, rather than just simply automate standardized processes.

Let’s use the example of a fictional global shoe manufacturer we’ll call ‘Footloose’ to see how each category could be used to increase business performance.

Descriptive analytics

These are your flexible dashboards that let you focus in on key areas of the business. For Footloose, this could be all the standard operations dashboards eg. like the one showing monthly shoe sales by region. Footloose should be able to see how actual sales fared against the forecast. Where there are deviations (say the sales of sandals in Spain has gone through the roof), they can use descriptive analytics to drill-down into the data. They may see that the growth is coming from the Madrid and possibly related to a major marketing push during a hot spell in that region.

IBM Cognos solutions offers this kind of descriptive analytics (including business intelligence) that can be implemented to measure and explore how a company is performing.

Predictive analytics

Here we use data from the past to make predictions about the future. For Footloose, this could include combining seasonal sales variations for a sports shoe with the longer term uptrend they have been seeing for the last few years. Footloose can also use predictive analytics to improve their web presence: they can launch a recommendation engine to suggest what a visitor might want to view next based on what they (and people like them) have looked at in the past (like the book suggestion service Amazon offers).

IBM SPSS offers a set of predictive analytic tools which allow business users to employ predictive insights at the point where decisions are being made.

Prescriptive analytics

How can we achieve the best outcome, whilst addressing any uncertainty in the data? Prescriptive analytics can help us answer this question. Let’s say Footloose has made its prediction about what shoe sales are likely to be over the coming year. Now they just need to figure out how to respond to those predictions. Sales of sandals are expected to remain high in Spain so they need to increase their distribution channel there. How should they achieve this? Increase the fleet of vehicles or buid more (costly) distribution centers.

Footloose can plug the data into an optimization model (costs of building a new plant, buying new trucks, gas) to calculate what would be the most efficient supply chain to deliver the extra required capacity.

IBM ILOG Optimization has technologies specialized for these kind of calculations where there are large data sets with potential uncertainty.

I’ve used this example to present a simplified view of IBM’s approach to structured data analysis and how IBM technologies can be used in tandem to improve business performance. A key advantage of these technologies is that their utility stretches across various industries and applications.

For a fuller explanation of this field, I’d definitely recommend reading the full article in Analytics Magazine