The Two Missing Pieces From the Semantic News Puzzle
Two weeks before the Future of News conferece, there’s a new reminder of the news industry freefall. According to a recently published report, newspaper readership dropped several percentage points over the last year. Sunday circulation dropped even more than the average. That last bit surprised me - I still prefer my Sunday Times a la tree pulp.
Now I’d like to shift attention back toward the semantic model of journalism that I’ve been evangelizing to Medill students and faculty for a couple of years. It’s true - the variety of loosely connected “linked data” projects had me discouraged for a variety of reasons in the past. However, after the (intermittently disappointing) Web 2.0 conference last week where Yahoo’s Search Monkey was the exceptional highlight, I’m more convinced than ever that the semantic picture is coming into focus.
That brings me to Metaweb’s Freebase project. Thanks to a community of data mobs and a bunch of open onotological databases like DBpedia, Freebase now has over three million subjects listed, and as the triplifications increase, the project’s potential grows exponentially. MQL is easy to pick up after playing around for a bit with the supplied query editor.
I’ve discussed in the past about how in an environment of information abundance, embracing the semantic web would be “good news for good news”. I’ve dissected the issue with a somewhat Cartesian level of atomization in my academic work and it’s the subject of my ChangeMedill submission:
The next twenty years of professional news reporting will be dominated by the shift toward ontologies. We are now willing to pay professional reporters because we trust them.
That won’t change. Trustworthy reporters of news will be paid by consumers of news, but the “news” will start looking more and more like large linked data sets. By moving early in this space, Freebase could both be a platform for fixing homophilic news consumption habits and allowing for a plurality of journalistic research tools to spring up.
However, I see two areas where Freebase could improve to prepare itself for a place in the news reporting ecosystem. The first has to do with how data comes into Freebase, and the second has to do with how it goes out.
1. Sources
From the Freebase FAQ: “Because Freebase lets anyone edit the data, there’s always a chance that somebody has—intentionally or unintentionally—introduced a mistake.”
In regard to sources of data, I’m not so worried about malice, or even errors, as much I am about the same type of influence gaming that has inflicted Wikipedia (which supplies data for Freebase). Even data is capable of biases, if not in content, then in context. What data is readily published, and who is publishing that data, and what are their own data sources? I’ve equated the conceptualizations of transparency and trust before, and that certainly applies in the case of Freebase.
With this in mind, the “creator” property could be quite handy, and if school priorities weren’t so demanding, I’d like to take a stab at a web app for retrieving/visualizing creator stats. But an obvious problem crops up - the “user agent” uploading the data isn’t necessarily the individual user behind the data. For instance, the we can query the history of the Wikipedia bot, but I cannot retrieve data about who had edited the Wikipedia data, and when.
The same problem would come up for news organizations submitting new data statements. There’s no such thing as too much information, or information about information. I hope I won’t have to start a company called MetaMetaWeb, because trust is crucial to the intersection of professional news reporting and the semantic web:
This trust depends not just on the whats, but on the whoms as well. Freebase could help fix this issue by adding an ability for API programs to submit individual creators for data items as a type/user/sub-user object.
2. Natural Language Output
There are plenty of services that can process natural language for semantic concepts. But how about spitting, as well as sucking? Perhaps this issue is not as directly related to news reporting, but I’m sure API developers would generally benefit from a library to convert JSON results into strings of human-readable text. In fact, this would be another fun project to work on, if it wasn’t for my packed schedule of school-related commitments.
We’re still early in the race, but currently Google has a headstart in the area of semantic news. Just look at their new Quote Finder:
Still, if Metaweb isn’t the most exciting all-in-one semantic solution out there today, that would be news to me.
(/me slaps forehead)
