Text-mining vs./and/or Linked Data?

Reading Jonathan Rochkind’s musings on using Wikipedia as an authority file (something I’m all for), I was struck by comment that

I think wikipedia-miner, by applying statistical analysis text-mining ‘best guess’ type techniques, provides more relationships than dbpedia alone does. I know that wikipedia-miner’s XML interface is more comprehensible and easily usable by me than dbpedia’s (sorry linked data folks).

XML-over-REST vs. SPARQL debates aside, I think there is an interesting issue here regarding the kind of relationships that statistical text-mining produces vs. the kind typically found in Linked Data. Linked Data favors “factoids” like date-and-place-of-birth, while statistical text-mining produces (at least in this case) distributions interpretable as “relationship strength”. The wikipedia-miner results aren’t “facts” in any normal sense, but as Rochkind suggests they may be more useful. Now sure, you could represent the wikipedia-miner results as Linked Data, but what I’m trying to get at here isn’t a question of data models or syntax. It’s about how and when we choose to treat the patterns in our data as facts, and when we are content to treat them as patterns. Thoughts?

Digging into Money

er… Data.  The LOD-LAM Summit will be a great place to explore ideas and collaborations, but we’ll also be looking at concrete and actionable ways to move this field forward in the year to come.  I can’t think of a better venue for teams to form up or solidify plans for the International Digging Into Data Challenge, and I’m hoping there will be some strong Linked Data applicants this year.  If you’re not already working on something, bring your project ideas in search of partners.   Of course, there’s a pretty quick turn around time for getting proposals in by June 16th, but hey, what’s summer without cramming for at least one grant proposal?

Smithsonian and Powerhouse doing some linking up

Great news to hear that Luke Dearnley, web manager at the Powerhouse Museum (Sydney, Australia), has been invited to join the LOD-LAM summit! Luke, and Dan Collins (IT Manager, Powerhouse Museum)

From left, Luke, Dan Collins (IT Manager, Powerhouse Museum), Suzanne Pilsk (Smithsonian Libraries), Günter Waibel (Smithsonian, Office of the Chief Information Officer), Thorny Staples (Smithsonian, Office of the Chief Information Officer), Thorny Staples. As usual, I'm reflected in the window taking the picture!
Smithsonian and Powerhouse staff link up

visited the Smithsonian on their way home from Museums and the Web. We had a great conversation around a lot of topics and I’m really exicted that the convesation can continue with Luke in San Francisco.

From left, Luke, Dan Collins (IT Manager, Powerhouse Museum), Suzanne Pilsk (Smithsonian Libraries), Günter Waibel (Smithsonian, Office of the Chief Information Officer), Thorny Staples (Smithsonian, Office of the Chief Information Officer), Thorny Staples. As usual, I’m reflected in the window taking the picture!