Thursday, April 10, 2008

CiteULike helps you find papers!!!

As a graduate student, you are expected to read tons of research papers, journal articles and other articles related to your course work and research principles. Many a times, it is up to the student to search for these articles, and verify their authenticity. Many times we find PDF files online that talk about the exact same topic that you have been searching for, but then when u look to verify its authenticity - NO CLUE!!! Usually conferences websites and publication houses and websites like Citeseer and IEEE Explore will help you to identify the source. But then all this is a lot of hard work, and takes time.


CiteULike hopes to ease some of this pain and attempt to help you organize your reading material in a way that will help you locate new stuff, without having to worry about the veracity of the source of the article.

The premise is simple. Quoting Wikipedia:
“CiteULike is based on the principle of social bookmarking and is aimed to promote and to develop the sharing of scientific references amongst researchers. In the same way that it is possible to catalog web pages (with Furl and
del.icio.us) or photographs (with Flickr), scientists can share information on academic papers with specific tools developed for that purpose.”

You sign up with the service and start adding your articles, papers and other journal references into your ‘library’. Every addition requires you to provide corresponding tags. These tags and other metadata can then be used to link up your library articles with ones that others have been reading.

CiteULike gives access to personal or shared bibliographies directly from the web. It allows one to see what other people posted publicly, which tags they added, and how they commented and rated a paper. It is also possible to browse the public libraries of people with similar interest to discover interesting papers. Groups allow individual users to collaborate with other users to build a library of references. The data are backed up daily from the central server.

The author of this CiteULike is a scientist himself (Richard Cameron, London) who found it very cumbersome to author bibliographies, and in his experiments with the then available bibliography software, he found them to be highly inadequate.

The source code for this project is closed source, but the repository of metadata they collect is available for free download. Plug-ins for this project can be downloaded using Subversion.

Does it work?
So the bottom line is, CiteULike will help me discover related literature from multiple sources, and will also tell me if those articles are actually being used by someone or not, and if they are any good.
Time to put it to the test! I signed up with the service, and fed into the library references to some of the research papers that we have been studying in the PRS class.

Once I had my library up, I then looked to see if other people are actually reading any of the stuff we have been in our class. Turns out there are quite a few people do read/research in recommender systems!!!! I found a handful of groups and people who are interested in the papers that I have in my library.

To conclude, CiteULike is a simple tool with a very simple interface that helps one build up a library of literature which they can then link up with other public media to find stuff that matches their interests. The concept of using meta data to link thing up is not new, using it to solve the problem of composing bibliographies and find interesting (and reliable) research articles is a innovative concept. In fact I might actually use this a lot in the coming few weeks for some of my course work and individual research work!!!


Links:
CiteULike
Wikipedia talks more on CiteULike

Thursday, April 3, 2008

'Seeq' your Music

Downloading and burning music to the cds is slowly falling out of taste. The web now hosts more music, videos and merchandise than any one person, artist or organization can amass. And with tools like Google, searching this vast online collection of media and related paraphernalia is getting easier by the day.

SeeqPod? It lets you search for music. It finds playable content (songs and music you can play instantly without having to sign up or register) and lets you create a ipod-like-play list of the same, which is impressive at first, making it easy and fun to use SeeqPod. Like all other search engines, SeeqPod has crawlers out in the WWW searching and indexing online content. This progress of this indexing process can be witnessed on their homepage. Quoting RWW – ‘The homepage is reminiscent of Google's original unadorned page with just a simple search form. The vital difference is that SeeqPod also displays a sample of current music being indexed by its engine. These songs are meant to draw you in, and succeed at it. There's something mesmerizing about watching track after track scroll by’.

Not very innovative is what you might think. Looks like a mash-up between ipod and Google. In fact that’s what many online blogs quote. But there is more to SeeqPod than being another Google rip-off.

SeeqPod'sdiscover’ feature:

Before I explain how this works, let’s look at what developers of SeeqPod have to say about this feature.

“We have created a totally unique algorithm that finds the hidden relationships between playable topics, not unlike the way our minds make relevant associations between subjects of interest to us. Our technology mines the deepest crevices of the Web, returning useful, precise results.

Our Discovery engine generates related search results for the query entered in the search box. Discoveries are powered by our patented language-independent algorithms that analyze and mimic the way people make associations in everyday life.”

So SeeqPod tries to build a correlation between a particular search topic and related articles on the web. The algorithms responsible to do this works in the same way that you or I would relate an artist with a particular object or topic on the web. A good example of this would be searching for an artist and finding other artists who have similar styles or maybe share a band member. How does SeeqPod build this understanding? Don’t have a clue!!! (As with all proprietary work, this one too keeps it secrets locked behind a veil of secrecy).

But we can safely conclude that SeeqPod is trying to mimic a content based collaborative recommendation system.

Testing SeeqPod:

We are concerned with the discover feature, as this is what provides ‘recommendations’ to us for a given topic. Let’s go with my favorite artist Audioslave.

Search throws up 179 ‘playable media’ results and 114 discoveries. First on the list of discoveries is ‘Good Charlotte – the young and the hopeless’. I can’t explain this result. But a few results down, I see Soundgarden – burden in my hand. The current lead vocalist of Audioslave, Chris Cornell, was once on the lead vocals for Soundgarden.

In totality, the results do make sense. Artist who have a history with Audioslave, songs that have similar characteristics to what Audioslave plays, and of course links to merchandise, all make for good discoveries.

I can definitely find all this related stuff on Google too, but I would have to extend my search query to reflect what I want. SeeqPod throws up these recommendations even if you don’t ask for them. I found a Audioslave t-shirt which I would love to buy, even though I was not looking for one. This shows how this technology helps you ‘discover’ more about a topic or artist or album.

Seeqpod does a good job of organizing its search results and presenting the relevant content to its users. By not hosting any music or video files, its neatly avoids the ire of RIAA and other DMCA adhering record labels. What makes Seeqpod interesting to use if the fact that it provides instant replay of media, and provides some useful recommendations about the same without taxing the user by asking to him search with a more specific intent. With Seeqpod you would definitely not find conventional results at the top of your list. The Audioslave homepage was no where to be found in the SeeqPod search results, while Google lists it at the top of its finds. Seeqpod also lets you help it find more relevant content by letting you submit sites which you think might be relevant to a particular topic.

There are a number of ways to try to recommend items to a user. But the building a correlation model which emulates the way a human would link objects is interesting and can be applied to a lot of business models to simulate the human thinking process.

There are a lot of unanswered questions …like what factors decide the ordering of the search results? Does Seeqpod consider user hits and other usage statistics while recommending a particular page/item? Its internal working and algorithms would definitely be interesting to explore.

Link:

  1. SeeqPod Home
  2. SeeqPod FAQ
  3. RWW Article on SeeqPod
  4. Wikipedia on SeeqPod

Thursday, March 27, 2008

Find your smatch!!!

When we are searching for a new book or a movie, the first thing we do is call up friends. And not just any friend, we call those people whose choice is very similar to ours. Smatchy.com tries to capture this same idea in the form of a recommendation engine.

With smatchy.com, a user has to answer a set of questions, based on which other smatchy users, whose answers match those of the user, are calculated. The results of this calculation are used to power other features on the site, like providing movie & book recommendations.

Now this concept might not seem very innovative, and frankly speaking, people usually don’t like to be questioned a lot. But smatchy.com negotiates this stigma by making the whole question-answer process fun!

Firstly, the questions are anything but typical. Some of the questions I had to answer were:
I would like to own a Lexus
I don’t mind being licked by dogs
Linear algebra is fun
I think people need to be aggressive if they want to survive in the workplace.
I like taking risks.
All religions have equal worth
I love coffee
I care too much about what other people think.
I like to IM my friends all the time.
I like the book 'da vinci code'

The question set is a mix of fun, whacky and sometime totally serious questions. This makes the process of answering the questions really addictive. Partly because you want to see more of these weird questions (‘I would rather have a root canal than live in Texas’??!!!), and partly because u want to know what other people think about those same questions.

Answering the questions is also fun. There are 2 scales on which u rate every question, which is quite different from the traditional thumbs up/down or ranking approach. One scale is the agreement scale, using which u can use to express your answer (on a scale of 1-7). The second scale is used to reflect how ‘good’ a question is. These 2 scales are combined into a grid-like system.

After you answer a question, the system throws up a statistic which states the total number of people who responded to that question, and their response on a scale of 1-7, and where u stand with respect to other peoples opinion.

Recommendations:
Firstly, the smatches. Smatchy.com based on your responses, finds people who think like you do. These matches to your profile are called smatches. So each smatch is a smatchy.com user who has answered the questions you have, and their responses match your responses. Not only can u see a lit of all the smatches, but you can also see the degree to which your profile matches that of each of your smatch (percentage). You can also choose how smathcy.com finds these smatches. There are 14 different categories to choose from (books, movies, attitude, love!!!). You can also filter the smatches based on how ‘similar’ they are to you.

Now using the profiles of all these smatches, the system calculates movies and books that you may like. I have yet to get any recommendations, but from what I have read about the system in other blogs, these recommendations are pretty decent.

There are some other features too, like making up your own questions, sending messages to other people etc.

Working:
We can only guess how this system works, as the site does not provide any documentation on how they derive their recommendations. It looks like a collaborative recommendation engine, which does a user-to-user profile match based on the questions they answer.

Another intresting feature to analyze is the answer scale. Remember we have 2 scales to work with while answering every question. I believe the ‘good question’ scale is used by the system to calculate if users like to answer a particular question. Based on this, the system can refine its question set.

According to the owners of smatchy.com (MBA Graduates from Wharton), there is a complex algorithm behind the system (about which they don’t talk much, nor provide any documentation L ). I hope to find more information on the technical aspect of this system, which could provide me with the development of the movie recommender system I am currently building for my class project.

Links:
Home page:
http://smatchy.com/ (login id: zarthos password: zarthos)
Article on smatchy.com:
http://www.squidoo.com/recs
Smatchy Blog: http://blog.smatchy.com/
Smatchy.com FAQ: http://www.smatchy.com/home/help

Thursday, March 20, 2008

TalkMine lets you talk to your documents !!!!

Distributed information sources pose problems in personalized and customized retrieval of information:

· Passive Environments. There is no genuine interaction between user and system, the former pulls information from a passive database and therefore needs to know how to query relevant information with appropriate keywords. Furthermore, such impersonal interfaces cannot respond to queries in a user-specific fashion because they do not keep user-specific information, or user profiles. The net result is that users must know in advance how to characterize the information they need before pulling it from the environment.
· Idle Structure. Structural relationships between documents, keywords, and information retrieval patterns are not utilized. Different kinds of structural relationships are available, but not typically used, for different DIS, e.g. citation structure in scientific library databases, the link structure in the WWW, the clustering of keyword relationships into different meanings of keywords, temporal patterns of retrieval, etc.
· Fixed Semantics. Keywords are initially provided by document authors (or publishers, librarians, and indexers), and do not necessarily reflect the evolving semantic expectations of users.
· Isolated Information Resources. No relationships are created or information is exchanged among documents and/or keywords in different information resources such as databases, web sites, etc. Each resource is accessed with a private set of keywords and query language.


The Solution !!!!!

TalkMine is an adaptive recommendation system which is both collaborative and content-based, and exploits currently untapped sources of information in DIS. In particular, it integrates information from the patterns of usage of groups of users, and also categorizes DIS (Distributed information sources) content or semantics in a manner relevant to those groups. It is currently being developed as a test bed environment for the Research Library at the Los Alamos National Laboratory, more specifically, for its Library without Walls project (1) under the Adaptive Recommendation Project (ARP).

Talkmine categorizes information sources based on keyword organizations. For a given information source, the system builds a network of keywords consisting of nodes and edges. Nodes represent keywords themselves, and edges represent the proximity of the keywords to other keywords based on the number of documents they share.
By using a conversationalist approach, the system finds out the users interests and models them into an evidence set. Using these evidence sets, the user’s interests in a particular information source can be quantified.

The system works as follows:
· User is presented with all the information sources he has at his disposal.
· User weighs these information sources based on his preference.
· User then inputs an initial keyword.
· System uses this keyword and its proximity with other keywords to build evidence set that emulates user’s interests in a particular information source. This evidence set is also called learned set.
· The uncertainty of this evidence set is calculated (fuzziness, nonspecificity, and conflict). If it below a certain value, the system stops (low uncertainty indicates that the evidence set clearly depicts the user’s tastes)
· The user is then presented with another keyword, selected based on its proximity with the first keyword.
· If user expresses interest in this keyword, another evidence set is formed for this keyword, and its union with the earlier learned set is performed to give a new learned set.
· Again, uncertainty for this new learned set is calculated and if less than a threshold value, the system stops.

Following this approach, the system builds a set of keywords which reflect the user preferences. The user is finally presented with the documents associated with these keywords. Each document consists of multiple keywords, and a network of these keywords represents the document semantics. The relationship between this keyword network and the learned set defines the relevance of the document the user.

The system also uses long-term usage characteristics to refine the keyword network. By associating relatively close keywords with documents that are not previously tagged with those keywords and by increasing and decreasing the distances between keywords based on how often they are correlated; the system refines the keyword network to accurately reflect the evolving user semantics.

System Architecture:

The architecture of TalkMine has both user-side and system-side components. Each user owns a browser (or plug-in to an existing Internet browser), which functions as a consolidated interface to all information resources searched. This individual browser stores user preferences and tracks information retrieval patterns and relationships which it utilizes to adapt to the user. User preferences are stored as a set of local knowledge contexts which the user has constructed while using the system under a set of different interests. These local knowledge contexts store both semantic semi-metric and structural proximity information.

Advantages:
· There is recommendation as the system pro-actively pushes relevant documents to users about related topics that they may have been unaware of. This is achieved because of the structural and semantic proximity information kept in the distributed memory (section 3.3.1), this integration with user-specific (also structural and semantic) information in the categorization process (section 3.3.2), and finally by the document retrieval operations (section 3.3.3).
· There is conversation between users and information resources and among information resources (and indirectly among users) as a mechanism to exchange or crossover knowledge among then is established. As categories are constructed with the question-answering process (section 3.3.2), a list of documents is produced (section 3.3.3) and communicated not only to users but also to information resources that did not contain them, and the semantics of all parties involved are adapted (section 3.3.4).
· There is creativity as new semantic and structural associations are set up by TalkMine. The categorization process brings together knowledge from the different contexts of the information resources. This not only adapts existing local semantics, but combines knowledge not locally available to individual information resources. In this sense, because of the conversation process, information resources gain new knowledge previously unavailable.

The researchers conclude that they have, through this system, established a human machine symbiosis which can be used in the automatic, adaptive, organization of knowledge in DIS.
Links:
1. Talkmine literature: http://informatics.indiana.edu/rocha/dl99.html
2. More on the project and its underlying logic: http://informatics.indiana.edu/rocha/ijhms_pask.html

Thursday, February 21, 2008

Google Co-op: A personalized search engine

Google Co-op, launched on May 2006, is a platform that allows web developers to feature specialized information in web searches, refine and categorize queries and create customized search engines, based on Google Web Search.

One of the services provided by Google Co-op is the ability to create ‘your own search engine’. A user can, using his Google account, setup a customized version of the Google search engine to look for very specific resources on the web. Essentially the user is being allowed to define a sandbox for the search engine, which will restrict the search results to the resources defined with that sandbox.

There are 2 ways in which user can define this sandbox. He can use a list of keywords which will act as a filter for the customized search engine to locate the most relevant search results for a given search string. The user can also give very specific URL’s, and the search engine will scour through the pages of these websites to recover any and all information pertaining to a given search string.

The user can use one of the above mentioned customization techniques, or can combine both to create a topic based search engine for his website, which is one obvious application of this Co-op technology. So say I want people to search for music info on my website. I create a customized search engine, with the keywords ‘music’, and give my websites URL as one of the location where it can search. Voila!!! I have a search tool for my website. Google lets its users host their search engines on separate Google search homepage, and also provides code which can be embedded within a webpage to show the customized Google search box on any webpage you want.

I decided to test this Co-op technology by creating a search engine customized to look for all information on the rock band Coldplay. Using a dummy Gmail account to sign into Google, I started off by creating a customizable search engine based on the keyword Coldplay. I didn’t restrict my engine to any specific URL. I then used the engine to search for ‘clocks’. Clocks is one of tracks on the album Rush of blood to head by Coldplay (released 2002). The following were the top 3 results:
1. YouTube - Coldplay "Clocks" music video
The music video for Coldplay's song "Clocks." ... Views: 682684. How to play Clocks by Coldplay Ryan on piano. 04:43 From: gotitans999. Views: 426101 ...
www.youtube.com
www.youtube.com/watch?v=c9j_RZDqYc4
clipped from Google - 2/2008
2. Coldplay - Clocks Lyrics
Clocks Lyrics by Coldplay. ... Visitors: 24816 visitors have hited Clocks Lyrics since Dec 28, 2007. Print: Coldplay - Clocks Lyrics print version ...
www.lyrics007.com
www.lyrics007.com/Coldplay%20Lyrics/Clocks%20Lyrics.html
clipped from Google - 2/2008
3. Coldplay – Clocks – Music at Last.fm
Listen to Coldplay – Clocks. Clocks appears on the album A Rush of Blood to the Head and has been tagged as: rock, alternative, british.
www.last.fm
www.last.fm/music/Coldplay/_/Clocks
clipped from Google - 2/2008

Compare this to the plain-vanilla Google search results:
1. Clock - Wikipedia, the free encyclopedia
A clock is an instrument for measuring, indicating and maintaining the time. The word "clock" is derived ultimately (via Dutch, Northern French, ...en.wikipedia.org/wiki/Clock - 129k - Cached - Similar pages - Note this
2. Clocks : Free Shipping on Wall Clocks at ClockStyle.com
Clock : Clockstyle.com is the premier online retailer of alarm and atomic clocks. We also offer a large selection of mantel and wall clocks which you can ...www.clockstyle.com/ - 60k - Cached - Similar pages - Note this
3. Clocks - Wall Clocks to Mantel Clock with Authorized Service Center
Clocks of every type, from a wall clock to a mantel, with information and assistance by our Experts and Authorized Service Center.www.giftoftimeclocks.com/ - Similar pages - Note this

The results speak for themselves. Now I decided to tune my search engine further by giving it a specific URL to work with. By rebuilding my search engine, this time with amazon.com as the preferred URL, I ran my search on the engine again. The results:
1. Amazon.com: Clocks: Music: Coldplay
The version of "Clocks" here is the radio-edit, somewhat shorter than the album version, but still one of the most exciting songs of Coldplay. ...
www.amazon.com
www.amazon.com/Clocks-Coldplay/dp/B00008OESI
clipped from Google - 2/2008
2. Amazon.com: Clocks: Music: Coldplay
In parts it's almost like a draft version of "Clocks", so I can see why it's on here. The first line goes, "It could be worse, I could be alone" and from ...
www.amazon.com
www.amazon.com/Clocks-Coldplay/dp/B00008OP2Z
clipped from Google - 2/2008
3. Amazon.com: Stop the Clocks: Music: Oasis
1 singles, and--unique to Oasis--instantly familiar B-sides into one 18-track double album, entitled Stop the Clocks. Furthermore, this collection has been ...
www.amazon.com
www.amazon.com/Stop-Clocks-Oasis/dp/B000IMV4IW
clipped from Google - 2/2008

So we can now personalize the Google search engine to search for what I want. But can we make some money of it. At the end of the day, I have created a very specific tool that lets people search for what they want, and by allowing Google to show its ads near the results of my customized search engine, I am promoting those ads, or attracting people towards it. So am I not supposed to get a share of the profits Google makes off these ads? Well Google gives a big nod to this. If your customized engine attracts people to click on ads shown by Google, Google will share the revenue generated of those ads with you.

I can foresee many web developers and site owners using this tool to create a personalized search tool to serve their personal interests. The business model behind this Co-op technology provides incentive for people to use this tool to promote their e-commerce.

In addition to the customization facility, Co-op also provides the ability to tag WebPages, which will help build meta knowledge for the vast amount of information floating out there in cyberspace. Google calls this ‘Topics’ technology, and compares it to the notion of a filter which can be used by users to refine their searches. Topics combined with contributions by domain experts can really help to categorize and prioritize globs of data into meaningful knowledge.

I have provided a link to my search engine below which searches for info related to Coldplay on amazon.com and the entire web, giving more preference to results related to amazon.com. You can also use the embedded search box below to try out my customized search engine.

Links:
Customized Coldplay search engine
Google Coo-op on Wikipedia
Launch of Google Co-op
Google Topics




Thursday, February 14, 2008

Personalized Driving Instructions


‘Personalization for Route Guidance’ is a research effort conducted at the Computational Learning Laboratory, Stanford, CA. which aims to develop an adaptive route advisor that develops a model of a driver's preferences over time. Imagine a driving assistance system which learns your driving habits and route preferences to suggest better driving routes for you. That’s exactly what this system is trying to achieve!!!

The system comprises of a routing algorithm which is used to determine a route between two points on a map, taking into consideration the users preferences. The system works by first calculating multiple routes to help the user reach his destination. These routes are then presented to the user and he is asked to choose one of them. Based on the user’s choice, the different routes are rated and the cumulative understanding from these ratings is then used to build a user model for that user which can be used to predict which routes the user might like in the future.

Inner workings:
The system contains a map, which is composed of geographic locations (starting and destination locations) and routes connecting them. These routes are composed of roads also called edges.
The routing algorithm attempts to find the shortest path from a starting point to a destination point on this planner map using the Dijkstra’s shortest path algorithm. The route or path is traced along edges in the map. Each edge is weighted, and the algorithm attempts at determining the route with edges having the least weight.

So how does the algorithm go about doing this? Well let’s first understand the concept of weights in this system.
Each route is treated as a vector, and this vector is composed of individual edges. Each route will be multiplied by a global weight vector to determine its cost, this weight being derived from the collective values of attributes of each of the edges. Attributes are estimated total time, total distance, number of turns and number of intersection.
Now once the user selects a destination, the system throws up multiple alternatives to the user. Each alternative route has a weight associated with it. When the user makes a selection, that route’s cost is calculated to make sure it’s the lowest of all the alternatives. If not, the weight vector is modified to reflect the user’s preferences. This change in weight vector is added to the value of global weight vector.
What we get at the end of the day is a weight vector that tries to predict the users route preferences. To determine if a user likes a route or not, we take his global weight vector and multiply it with a route vector to determine its cost. If it’s low, we have a winner!!!

Testing the system:
The research group decided to test this system on a sample data set. Their results show that route preferences change widely across people. People are willing to sacrifice on a range of attributes of a route to achieve their goal, which can be different for different people.

Kicking the personalization aspect up a notch!!!
The research group has plans to use GPS to determine certain attributes of a driver which might be otherwise difficult to determine because of the impersonal nature of the route descriptors. Example: determine the speeds a user takes on edges of a particular type. The familiarity of a route is also being worked into the system. The more familiar a user is with a route, the higher is the preference. But this preference might be overridden by other attributes as time taken to complete route, and is a very subjective in nature.


What this project has managed to do is to prove that it is possible to derive a cost function that predicts driver preferences. Thus the cost function acts as a user model, generating routes that will satisfy the user’s needs. Better street descriptions and detailed user feedback (not the binary yes/no response used now) can help the system construct a better cost function. The group is also looking at other inductive methods to adapt the user model, such as regression over preference rankings, neural networks and principle component analysis which will help is refining the user model to a point where its predictions hit exceptional accuracy levels.

Links:
Project Homepage (get project literature here)

Thursday, February 7, 2008

Audioscrobbler/last.fm - Leveraging paradigms of social networking to provide recommendations for music lovers.


People like to talk about their music preferences. We all have at some point of time have had a heated debate about who your favorite band is, or which Pink Floyd album is the best one. Every time we come across a piece of music or soundtrack that we like, we wish someone could tell us who composed that track, or at least the name of that track. We all spend countless hours scouring through our and our friends music collection searching for that one elusive artist/song/album.

What is evident from the above is the fact that social networking plays an important role in people finding the music of their preference. What if we had a social networking system that was built for this very specific purpose of finding and recommending music? A place where people can share their music preferences and in the process open other people’s eyes to music they never thought existed!!!

This is where last.fm.com comes into the picture. Last.fm.com claims to be a ‘social revolution in music’. Using a recommender system called Audioscrobbler, last.fm builds a profile of the musical tastes of every user by keeping track of all the music listens to, be it music streaming onto his machine from a remote url, or a mp3 playing on his local media player. Last.fm also offers the user an option to add other last.fm users to his ‘friends-list’. All this information is then fed into the last.fm database, where it is compared to data from other last.fm users. This throws up recommendations made by other people who have been listening to similar music which can be seen and played by the user through his profile page. Last.fm also offers other social networking features.


History:
Audioscrobbler, the recommender system behind last.fm was the brainchild of Richard Jones. What began as a computer science project at University of Southampton, UK, blossomed into a popular music recommendation service? Last.fm was fully integrated with Audioscrobbler in 2003 and in 2007; CBS bought last.fm for £140 million.


Users on Last.fm:
To use last.fm, a user has to first sign up for an account with them. Registered users can then download and install one of the last.fm plug-ins. These plug-ins are used to record the names of all the sound tracks which the user is listening to on his local machine (they call this automatic track logging scrobbling). This information is then sent back to the last.fm server, which feeds it into the Audioscrobbler recommender engine, which calculates recommendations for that user.Every registered user has a personal profile page which lists his musical preferences and the music he has been listening to recently. In addition to this, the user also can access a feature called dashboard, which shows the recommendations calculated by Audioscrobbler. User can preview and listen to tracks from the dashboard, which might not be in his/her profile.


Tagging:
Users are also allowed to tag their music, and they can search last.fm catalogue for music that has been tagged in a certain way. Tagging can be by genre, mood, artist, or any other characteristic.


Charts:
Last.fm also generates charts which indicate the popularity of a track/artist/album which a user has in his profile. So by looking at these charts, a user can determine the popularity of the music he is listening to and also the general music preferences of his peers in his social networking community. In addition to personal music charts, last.fm also generates weekly global charts, which indicate the overall music tastes of the entire last.fm user base.

When compared to commercial music charts which are based on the number of sales of a particular album, last.fm charts reflect on the actual music people are listening to.


So, how good is last.fm?
To answer this question, I setup an account with last.fm (search for user ‘zarthos’, that’s my profile name on last.fm). I then queued up 1001 tracks of my favorite artists on Winamp. By installing the winamp-plugin for last.fm, I had my winamp hooked up with Audioscrobbler to record all my musical preferences. I then let winamp play these tracks for 3 days. Parallel to this effort, I also installed the last.fm software on another machine, where I was playing and rating all tracks belonging to the artist Audioslave.

After a few days, I decided to check up on the recommendations last.fm could offer me based on the data I had fed it over the past few days. The results were satisfying for my taste and musical preference. Some of the artists I was recommended were RHCP, Eagles, Rolling Stones, Eric Clapton. What is interesting to note here is that not only does last.fm give me these recommendations, but it also gives me the name of the user who made that recommendation. This way I can now find people who have similar music tastes. Sometimes these recommendations are weird to say the least (I was recommended Slipknot once), but then can go and lookup up that person’s profile to determine why he made that recommendation. When u makes a recommendation to someone, you also have the option to give an explanation behind your recommendation, which can be very useful to people who are exploring new musical avenues.

My next experiment with last.fm will be to feed it with different types of music and see what recommendations it comes up with. I also have yet to explore a lot of its social networking features, which might help me better understand the workings of Audioscrobbler. Come back next week for more on Audioscrobbler and last.fm. Also don’t forget to leave a comment or two on what you think about Audioscrobbler and any ideas as to how I can better explore this recommender system.