2011年4月15日星期五

Lexalytics analysis Wikipedia to understand how human beings human

Top 20 Apps For Managing Social Media
(click the image for a larger view)
Top 20 Apps for social management MediaAcademics may frown on the quotations from Wikipedia because of its origins of social media, but for the search of texts and analysis of feeling firm Lexalytics created sprawling community encyclopedia was the ideal to teach software to understand reference worldwide.

Conference user this week in New York, Lexalytics announced that salience 5.0 release of its software, due out this summer, can better understand the concepts and the relationships between concepts, with a careful reading of the entire content of Wikipedia. The open nature of Web, Lexalytics encyclopedia source was freely index. A reference to the press release warns that no endorsement by the Wikimedia Foundation is implied.

"Wikipedia represents a very, very large corpus and, above all, it is human edited - which means that it shows how humans think about information," said CEO Jeff Catlin. "We used it as a source for how people think about the Organization of information and views on how the bits of information are linked to each other.".

Lexalytics is best known for technology that produces summaries automated documents, and analysis capabilities a sense that can be used for social media monitoring. Catlin said that its business technology is used "behind the scenes" by companies like Radian6 (recently acquired by Salesforce.com) and also licensed directly by certain websites such as TripAdvisor. But the kernel Lexalytics technology General object - as a search engine that can be tailored to the specialized search for content types.

The "concept" matrix Lexalytics created on the basis of its analysis of Wikipedia can factor in the improved analysis of sense, but it is broader, said Caitlin. In some respects, it was closest to the work that went into the creation of what IBM computerized Jeopardy Watson field, which also had to be fed large amounts of press articles and reference sources. One thing the team Watson had in his favour was that answer to trivia questions is a task very accurate, based on the kind of "short-sighted retail" that computers are good for handling. "So if they can understand the issue, there is good chance that they will have the correct answer, he says.".

As Watson knowledge based construction process began well before that Alex Trebek intervened on stage, the compilation of the Lexalytics concept matrix was a distributed computing analytics work run across many servers - many of them purchased through the Amazon cloud services. "Essentially boil us the ocean, so it required much equipment in behind the scenes and a lot of time calculating Amazon," he said. But at the end of the process of his team has boiled it until a summary concepts that fits on a portable computer or a server of modest size.

The result is a piece of software which "includes a rose and a Daisy are two flowers," which, until now, has been a really difficult model, said Catlin. "If"someone wrote that a device runs for three days without a recharge, the system can discover that "runs for three days without a recharge" is an event of battery, even if the word "battery" was never mentioned."" With this technology, an application for marketing could read hundreds of articles of press on a company to see how many of his most recent release of press key messages made their way into this coverage - even if each of the authors of these new different words and phrases used to tell the story.

Sentiment analysis is a relatively mature branch of text analytics, but to the automated systems always get confused by things like the sarcasm and double meaning. An improvement that lexalytics in this next version of its software is a filter for subjective versus objective understanding, or second-hand knowledge directly. For example, "I heard this movie was great" - a comment from someone that has not actually seen the film - could be marked differently from "this movie was great!" even if both are positive feelings.

The matrix concept of Wikipedia "is just a piece more that we use to try to crank up the accuracy of these things and it's wonderful, because it is very good for general knowledge and gives us a broad and deep respect watching the world" said Catlin.


View the original article here


This post was made using the Auto Blogging Software from WebMagnates.org This line will not appear when posts are made after activating the software to full version.

没有评论:

发表评论