Friday, July 25, 2008

The Brain, Semantic Web technology, and what it will mean for you!


I was recently watching a lecture by Jeff Hawkins titled, "Computing beyond Turing" and it inspired me to write something about the state of affairs between Computational Neuroscience, Semantic Web technology, and the future of Web, Desktop and Mobile applications. In the end, I hope this helps explain how we’re working to tie these concepts together with regard to the web, desktop and mobile environments at
deeda Inc.

Although the link to this particular inspirational lecture is posted above, here is a quick embedded video primer on the subject. This talk is from TED 2003 and was also given by Jeff Hawkins. If you have some time, definitely try to check it out:






"Coaxing computers to perform basic acts of perception and robotics, let alone high-level thought, has been difficult. No existing computer can recognize pictures, understand language, or navigate through a cluttered room with anywhere near the facility of a child. Hawkins and his colleagues have developed a model of how the neocortex performs these and other tasks. The theory, call Hierarchical Temporal Memory, explains how the hierarchical structure of the neocortex builds a model of its world and uses this model for inference and prediction."





The above image represents the hierarchical data structure that the neocortex uses to store information about the world. The fundamental question we need to answer is how does this memory structure work? It turns out that the lower level of this inverted tree structure is where the widest range of visual information is first processed. You can think of this wide area as the point of first contact with sensory information. There’s a lot of it, and it can be seemingly overwhelming. However, the cells in this area have a very broad range of pattern recognition abilities. This is what allows them to make sense out of the data chaos. They are programmed to recognize spatial patterns that occur at the same time, as well as the sequences of these temporal patterns.

These cells then memorize what patterns are occurring as well as retain the ability to predict what patterns will follow - and then pass these patterns up to their parent. Each parent is looking at its child node, and in turn it is processing patterns of patterns. As you progress from the top down, increasingly broader patterns and sequences are being received. As you move up the hierarchy the temporal component of the patterns and sequences are becoming smaller. Each parent has the ability to pass down predictions on what pattern and sequence each child node should be receiving next. In the example of a musical melody, the prediction sent could be the notes that are expected to be heard next. Together, the 30 billion neurons in the neocortex are constantly managing, retaining, and predicting the inputs from various sensory systems.




There are three fascinating and wonderful components to this HTM model.
  1. This model accounts for learning and prediction of events, patterns and sequences.

  2. This model represents an entirely new way of looking at memory and storage.

  3. These hierarchical systems represent a highly distributed computing model.
While silicon (hardware) representations of this type of memory and processing model are still under development, there are many ways this model has been successfully applied, developed, and tested through software. Jeff Hawkins' Redwood Center for Theoretical Neuroscience (formerly the Redwood Neuroscience Institute) and his group at Numenta have already provided some great software tools for experimenting and developing systems based on the HTM model.

I love seeing examples of technology that are based on natural biological processes. It has always been my goal for
deeda to help bridge the gap between the two, and to make real world examples that showcase the efficiency and value of such systems. In the mobile environment we're typically working with fairly small screens and buttons. Even in the touch environment not every handset can typically access Flash or other types of Web specific content. It is therefore imperative that our future handsets should have some level of intelligence that reduces the number of commands, websites, or information we have to navigate through to find what we need in a useful manner. Even if predictive analysis reduces the complexity of interactions it will be a great boon for mobile applications. Let's say I'm visiting Boston and I'm hungry. It's 10PM at night, and I'm in Copley Park. Why should I have to navigate to a dozen websites to find recommendations on restaurants in my area, then navigate to a restaurant's website to see if they are open? Sites like WhatsOpen.com will do all of the narrowing down for you, but is this what it has come down to? We have to wait for a website to come around that aggregates some amount of useful information so we don't have to waste time? Why can't we have a system that understands where we are, what we like to eat, what our friends have recommended in the area, as well as what is still open - and then show it to us - all with once click or tap?

Processing data in new and novel ways is only part of the solution. How do we get data to play nicely with each other? How do we provide context and understanding behind the data that is being analyzed? With respect to our work at
deeda there is another component to this vision that I'd like to quickly discuss - The Semantic Web.

Tim Berners-Lee, the father of the World Wide Web, described the Semantic Web as follows, "The Web was designed as an information space, with the goal that it should be useful not only for human-human communication, but also that machines would be able to participate and help. One of the major obstacles to this has been the fact that most information on the Web is designed for human consumption, and even if it was derived from a database with well defined meanings (in at least some terms) for its columns, that the structure of the data is not evident to a robot browsing the web. Leaving aside the artificial intelligence problem of training machines to behave like people, the Semantic Web approach instead develops languages for expressing information in a machine processable form."

The fact that Tim wrote this 10 years ago is a testament to just how much work needed to be done before Semantic technology would be ready to be implemented in a meaningful and widespread way. While I wouldn't say that semantic markup or standards are widespread, there are currently a lot of great startups that are beginning to leverage the technology - And the real world examples and value of the semantic web are becoming more and more prevalent every day.

A great book that helps introduce the concepts and technology behind the Semantic Web is "Semantic Web for the Working Ontologist", by Dean Allemang and Jim Hendler. This book was recommended to me by Henry Story of Sun Microsystems (originally of BableFish fame), who has been studying and working on Semantic Web technology since 2004.

Over the past few weeks I’ve had the pleasure of becoming friends with Henry as well as get a chance to take a closer look at some of his work on the Semantic front. The first question most people have when the phrase “Semantic Web” is used is, “What exactly does it mean?” The short answer is that it is a model for making data shareable by machines in a meaningful way. What Hypertext did for linking related web pages of information, the Semantic Web does with Hyperdata –
data that can actually be aggregated and linked in a meaningful way.

Another answer that is commonly given is that the Semantic Web is the model for making applications more “intelligent”. Ok, so now what does
that mean? After all, the web is currently full of increasingly intelligent applications. Commerce sites can personalize information and deliver suggestions in uncanny ways (think of Amazon’s “other people who bought this, also bought X,Y,Z” feature). Search Engines have also become increasingly better at delivering intuitive and seemingly deep matches that are relevant to simple search queries. It would seem that “intelligence” is just a matter of building smarter algorithms, relational databases, XML stores, or object stores to make the data appear better connected and consistent. So when we say “Semantic Web” are we talking about a new web technology, a different type of Web infrastructure, or both?

The answer lies somewhere in the middle, and if not explained properly can lead to a bit of confusion. What we have accomplished today with intelligent applications is a reflection of the best we can do with the data available to us in the traditional HTML format. To allow smart applications to perform to their full potential we must improve the Web infrastructure to provide results that are not confusing, disconnected, or “dumb”. As Dean Allemang and Jim Hendler state in their book, “The Semantic Web doesn’t make data smart because smart data isn’t what the Semantic Web needs. The Semantic Web just needs to get the right data to the right place so the smart applications can do their work. So the question to ask is not “How can we make the Web infrastructure smarter?” but “What can the Web infrastructure provide to improve the consistency and availability of Web data?”

What's the rationale for such a system? Data that is generally hidden away in HTML files is often useful in some contexts, but not in others. We provide all this great information about ourselves to share, but we only share it in a humanly readable format. Machines have trouble processing the same information because it is seldom (often intentionally) kept out of their grasps. The problem with the majority of data on the Web in its current form is that it is difficult to use on a large scale, because there is no global system for publishing data in such a way as it can be easily processed by anyone. We don’t need a smart Web infrastructure, but we need a Web infrastructure that lets us connect data to smart Web applications so the whole Web experience is enhanced.

For a quick example let’s take a look at the current state of affairs on the web. Let’s say the majority of your personal profile information is on Facebook – including your work affiliations, group events, where you live, and maybe even your favorite books, and movies. When you visit Amazon it has no idea about all the latest books you’ve added to your Facebook profile. Maybe you recently moved to a new city, but the information in your Amazon account still says you live at your old address. You might find it helpful to share some of your Facebook information with Amazon so you can get some great suggestions, but the only way to do that is if you put all of your latest information into another closed, centralized system (Amazon). Amazon doesn't even have the same fields or profile setup as Facebook so doing this is hardly an option. Not only is this frustrating, but it's also an enormous waste of time. Wouldn’t it be nice if Facebook, or any major website you were a part of, acted as if you actually owned your personal data? You should be able to take your profile information with you wherever you go. And wouldn't it be great if other websites, like Amazon, allowed you to easily submit your relevant information? The end result would be applications that are intelligent enough to read this shared information in a way that is automatic, and instantly useful.

Since the value of centralized and closed systems like Facebook lies in hording all of your personal data (so they can increase their ad revenue), Amazon was forced to come to Facebook and build two Facebook applications – Amazon Giver
and Amazon Grapevine. These applications allow you to solve the above problem while still staying inside the walled garden of Facebook. Amazon Grapevine will go through your favorite books, movies and other profile information, as well as look at people and groups you are a fan of, and then make suggestions for purchases. You can make all of your purchases through Amazon from within Facebook without ever leaving. As in the "I'm hungry" example, is this the best it's going to get? We all have to join Facebook, and then wait for all of our other favorite sites make a compatible Facebook application? Where's the freedom? Where are the choices?

On the surface this seems like an issue of data portability. But that’s mainly the ‘human’ side of the problem. The machine side is where the real challenge resides. If all of your data could merge and play nicely with each other,
all web and mobile applications - not just custom Facebook applications - could be more intelligent and useful.

The main idea of the Semantic Web is to support a distributed Web at the level of the data rather than at the level of the presentation. Instead of having one webpage point to another, one data item can point to another, using global references called Uniform Resource Identifiers (URIs). The Web infrastructure provides a data model whereby information about a single entity can be distributed over the Web. This distribution could allow Amazon to access what you want to share with it even though the information is distributed over websites controlled by more than one organization. The single, coherent data model for the application is not held inside one application but rather is part of the Web infrastructure. When you publish information about your profile it shouldn’t just be published a human-readable presentation of this information that is trapped on Facebook, but instead a distributable, machine-readable description of the data. The data model that the Semantic Web infrastructure uses to represent this distributed web of data is called the Resource Description Framework (RDF).

“This single distributed model of information is the contribution that the Semantic Web infrastructure brings to a smarter web. Just as the case with the data-backed Web applications, the Semantic Web infrastructure allows the data to drive the presentation so that various webpages (presentations) can provide views into a consistent body of information. In this way, the Semantic Web helps data not be so dumb.”

Up to this point I’ve tried my best to explain the concepts and ideas surrounding the Semantic Web. But Henry Story has gone a step further by creating a wonderful working example with his Beatnik AddressBook application. You can check it out here.


What Henry’s AddressBook does is aggregate the personal profiles (FOAF files) into a social network that continuously adds deeper and deeper information about your friends and colleagues. Location information is represented in a beautiful NASA mapping component, and makes visualizing where your friends are currently located much easier. Each person in your social network may have unique information about you, and as these files are aggregated a deeper and clearer picture of your relationships and your own profile becomes apparent.

Although this is still a work in progress, I encourage anyone interested in helping with Henry's project to contact him. Often times the best way to get the critical mass to move from the research field to the real world is with the help of an enthusiastic community of supporters. Since
deeda has a significant focus on aggregating social graph data across the mobile, desktop and web environments we're also very interested in working with Henry to develop a Semantic Web solution that fits with deeda's vision.

Along these lines, where does all this ‘talk’ leave us? Whether it’s applying Neurosciene to computer science, or exploring the promises of the Semantic Web, the same questions remains: “Where are the applications?” “Where are the working examples?” “If this is so much better, why isn’t anyone else doing it?”.

“There are so many things the Web might usefully do in the future, that it is sometimes hard to see how we can get there from here. W3C's RDF has been around since 1997, yet while it has been adopted in a number of applications (for example by Mozilla, Open Directory, Adobe, RSS 1.0), people often ask why there is as yet no killer app for RDF. While we're not sure that 'killer app' is the right way to think about the problem, it is true that there is relatively little RDF data 'out there in the public Web', in the way that HTML is 'out there'.”

To quote and paraphrase Dan Brickley, 'The (soon to be "Semantic") Web, if it is to reach its full potential, needs to become a lot more automatic. We hope that it will be able to do things (offer us services) based on combining data and services scattered around the Web. It might, for example, be able to find the phone numbers or AOL screen names of all your friends and professional collaborators. Or show you the photos, names and recent publications and shared bookmarks for everyone attending the next party in your deeda calendar. Best of all it should be able to deliver these services instantly across mobile handsets, and allow you to automatically reference your current location as a node in the Semantic data hierarchy.'

Where HTMs and Semantic Web technologies overlap, we see a very valuable model begin to develop:
  • A Model that helps people communicate

  • A Model that explains and make predictions

  • A Model that mediates among multiple viewpoints

  • A Model that is spread across a distributed system

  • A Model that represents a new structure for memory
But for people to adopt these features deeda must make its system as easy to understand and operate as possible. In fact, it should "just work" out of the box. From the development standpoint we must provide:
  • An ontology modeling module

  • Intuitive web interfaces for everyday people for managing ontologies in a collaborative mode

  • Ontology import and export modules

  • Truth and Reasoning Maintenance, Reporting and Editing modules
From the user's standpoint everything should be as familiar as it currently is in a traditional centralized system (Facebook, Twitter, Amazon etc). Rather than force average people to learn RDF, FOAF, OWL or other Semantic formats, we must do the conversion of their data for them - quickly, efficiently, and in the background. Furthermore, skilled users should have access to their raw RDF/FOAF files for editing, storing, or transporting elsewhere.

Handset companies have traditionally been 'hardware only' companies. With Apple on the scene they are reluctantly trying to catchup and understand the importance on the software side. Personally, I believe the real wake up call for handset manufacturers was when RIM appeared on the scene with their Blackberry handsets. Blackberries are one of the best examples of how custom, purpose-built Network Operations Centers (Infrastructure), Middleware (RIM Enterprise Software), and Handsets (with RIM software) are what lead to an exemplary user experience. Most companies however are too afraid to risk investing on all three fronts, and typically see their value in only one of these three areas. This is what leads to handsets and applications that never quite live up to their full potential.

deeda understands the value in optimizing data across the web, desktop and mobile environments. We hope you continue to support us as we develop our solutions and understand just how much is at stake. Unlike most applications, there is quite a bit of thought and hard work that is going into our system. We hope to have something fun for you to test and play with by the end of the year.

0 Comments:

Post a Comment

<< Home