Friday, 5 November 2010

Semantic ambiguity and how human communication fails - except by accident

It's been a busy week at the Topicmapmail-list, where a +50 message thread developed, starting off as an announcement of the Afghanistan War Diary as a topic map in Maiana (made from WikiLeaks data).

The discussion went off in several directions, and spun into a discussion of typing and the inherent messiness of trying to model the world.

- We're kind of back at the start. In a messy semi-structured world with information overflow, - what kind of technology can help us find a way?

Steve Pepper:
The categories of human knowledge are better expressed using a prototype model than a criterial-attribute (or "Aristotelian") model. In such a model, roles and types are not sharply differentiated, but rather exist on a continuum.
Alexander Johannesen questioned the basis of linked data and subject identification:
So the question becomes; can we still rely on our TM way of subject identification? I'm not so sure. Things change. And here's the catch; the more you describe that thing, the more you try to pin it down its definition, the less likely it is for that thing to fit whatever thing you need in what you're modelling. And the less likely it is that that model truly represents reality, so there's a whole scale of inherit dis-ambiguity that you need to have in mind when you knowingly have to make a million compromises while modelling.
To what degree do we need things to be correct vs. useful? And, in the end, is it useful that things aren't correct?
Andrew S. Townley had thoughts on scalability and how this can work outside a controlled environment:
Again, once you start trying to correlate statements about things made by millions of people each with thousands of overlapping but inconsistent assumptions, this stuff matters.  In a controlled environment or walled garden, you have a lot more leeway with "useful", but I don't think that's good enough in today's world with over 1 billion addressable pages added to the Web every day.
I think it's important to (continue to) talk about these issues now while there's still a chance of influencing how people try to deal with a world with that much data.  The less retrofitting and rectifying that needs to be done, the easier it will make things for everyone.  Most of the people churning out all that content have no idea these problems exist.  After all, they have Google and the magic search box.  All they need is just a little bit more link juice and social proof... ;)
Patrick Durusau followed up with a blogpost about Semantic ambiguity:
Since we are trying to communicate with other people, there isn’t any escape from semantic ambiguity. Ever.
It all led me back to Wiio's laws, which I have revisited many times before. - So here's some friday edutainment for those of you which haven't read Wiio's laws on

How all human communication fails, except by accident:
  1. Communication usually fails, except by accident.
  2. If a message can be interpreted in several ways, it will be interpreted in a manner that maximizes the damage
  3. There is always someone who knows better than you what you meant with your message
  4. The more we communicate, the worse communication succeeds
I see the laws as both as a serious warning about how messy human communication is, and as black humor for people trying to do this for a living.

Most of Wiio's work, and the information about his work is ironically in Finnish, which I think most people on this planet doesn't understand very well...

Jukka Korpela has however written an excellent commentary of Wiio's laws

Professor Osmo A. Wiio is a Finnish researcher of human communication. He studied, among other things, readability of texts, organizations and communication within them, and the general theory of communication. 

In addition to his academic career, he has authored books, articles, and radio and TV programs on technology, the future, society, and politics. He formulated "Wiio's laws" when he was a member of the Finnish parliament.

Monday, 13 September 2010

Food traceability system using RFID and Topic Maps


My Google alert found me an article about a food traceability system combining RFID and Topic Maps.

The system is for Spanish ham from the Teruel province (which is supposed to be excellent, and has a Denomination of Origin status):

Free Traceability Management Using RFID and Topic Maps

The article is from ECIME 2010 (the 4th European Conference on Information Management and Evaluation), but I have not found any info besides conference program and abstract.

According to the conference website "The proceedings of the above conference are now available to purchase in CD-ROM format only".

However interesting, - I'm not that keen on spending £50 to get a CD-ROM in the mail.

Open Access publishing is the way, that's for sure...

Friday, 27 March 2009

Wikipedia as a good PSI-source?

I stumbled upon an interesting discussion in the blogpost Wikipedia - A Democratic Gold Standard for Topic Maps, where Vegard Sandvold suggests that the Topic Maps community "should adopt Wikipedia as it’s democratic and user-generated repository of topic PSI’s". (Lars Marius Garshol wrote a good blogpost about the general idea behind PSIs)

Steve Pepper disagrees, and argues that ideally the PSD (Published Subject Descriptor) should incude the minimum of information needed to unambiguously identify the subject.

Robert Engels then enters the discussion and argues from the RDF point of view.

My view is that I would currently use Wikipedia, because on some subjects it's the best source I got. I agree with Steve Pepper, but imagine that it could be useful in some contexts to be a bit fuzzy on purpose. A widely defined and a bit fuzzy subject might be exactly want we want, to be able to "start a conversation".

Friday, 13 March 2009

Exploring Semantic Mashups in the Wandora Workshop at Topic Maps Norway 2009

I really look forward to the Wandora workshop at Topic Maps Norway 2009 / Emnekart 2009 on March 18, as I have wanted for some time to play a bit with Wandora.

Wandora is an Open Source Java application made mainly for building and managing topic maps, but I think of it as a more general semantic toolbox, and think that exploring Wandora as a semantic extraction tool will be fun.

Wandora has a graphical user interface and several data storage options. Wandora both reads and exports the Topic Maps formats LTM and XTM along with the N3 RDF-format, which should make it a very useful toolbox.

The workshop will explore Wandora as a tool for extracting information from open web sources using some of the many built-in extractors to generate topic maps. It will demonstrate how to use Wandora to do semantic mashups. This is a hands-on workshop, which I imagine should be interesting both to TM developers, Semantic Web developers and developers who knows web 2.0 style mash-ups.

I have a dream of one day converting my well-tagged mp3-collection to a topic map, mash it up with open music information, and explore the new exciting possibilities for navigation and search, which would make iTunes look rather dull.

The workshop will focus on a few of the many interesting Wandora extractors to generate and merge topic maps. The list of available Wandora extractors is impressive, and keep on growing with every new release:

  • MP3 ID3 metadata
  • JPEG metadata
  • PDF metadata
  • FreeDB (music CD metadata)
  • Last.fm XML feeds
  • Internet Movie Database datafiles
  • Converts and imports any SQL database to a topic map
  • BibTeX
  • Flickr
  • YouTube
  • Digg and Del.icio.us
  • Geonames
  • Wikipedia extractor and a more general MediaWiki extractor
  • Wordnet
  • OpenCalais classifier
  • OpenCyc extractor
  • RSS 2.0 and Atom feeds
  • Convert emails and email repositories to a topic map
  • Convert file system structures to a topic map
  • Microformat extractors:
    • Convert geo microformat snippets to topic maps
    • Convert hcalendar microformat snippets to topic maps
    • Convert hcard microformat snippets to topic maps

A Vision for a Topic Maps World

Graham Moore is giving a presentation at Topic Maps Norway 2009 / Emnekart 2009 next week, which is not to be missed:

A Vision for a Topic Maps World

Graham Moore, NetworkedPlanet

Topic Maps has been successful in delivering value in the context of content management, intranets and web publishing. In these contexts it has provided value in terms of improved navigation and findability of content. However, the scope of these projects has been limited, and it could be argued that Topic Maps has simply created better managed, and more useful silos of content. This talk presents a vision and concept for enabling Topic Maps in a global context.

We describe how the fundamental concept of Topic Maps, the separation of identity from addressing, can be taken and utilised in a global scale. This vision includes how people, who have invested in Topic Maps in the small, can contribute and benefit from this step change in the scope of Topic Maps usage.

Saturday, 17 May 2008

Published Subjects and global identifiers

Dataforeningen arrangerer et møte om Published Subjects og globale identifikatorer i universitetsbiblioteket tirsdag 27. mai kl 16-18.

English translation:

The Norwegian Computer Society is planning a meeting about Published Subjects and global identifiers from 16 to 18 on May 27th. The program is quite exciting with four lightning talks, but the presentations are planned to be in norwegian. (We would probably be able to reconsider this and talk english if somebody not understanding norwegian would like to join us).

The meeting will be held in the electronic classroom at the University Library in Oslo.

Steve Pepper (Ontopedia) starts with a quick introduction of the need for shared global identifiers and an introduction to Published Subjects, where he also explains the terminology (PSI, PSD, ...)

Are Gulbrandsen (USIT) presents known published PSI sets and a few unresolved publication and discovery issues. He also discusses potentional sources of PSIs (for instance GREP, ISBN, Wikipedia, LinkedIn and excisting thesauruses).

Alexander Johannesen (Bekk) continues were he left off at Topic Maps 2008, Visions for a Topic Mapped Library, and wants to discuss the use of PSIs from a library perspective. (He has also promised to give us a quick overview over what they have manged to do at The New Zealand Electronic Text Centre (NZETC), Victoria University of Wellington. (NZETC got the Topic Maps Project of the Year 2008 award).

Stian Danenbarger (Bouvet) also continues from his Topic Maps 2008 presentation: Published Subjects: Small Pieces, Meaningfully Joined, and wants to focus on how we can add context to the discovery of PSI sets. - Who is using a PSI set, and how is it used?

More info in Norwegian

Tuesday, 6 May 2008

Amazon.com recommends Lutz Maicher and american football

I got a recommendation from Amazon.com today, that first got me a bit puzzled. I think it's a good example of the homonym problem and some of the challenges Topic Maps and semantic search try to solve.


Dear Amazon.com Customer,

We've noticed that customers who have purchased or rated books by Lutz Maicher have also purchased Ohio State University Football Vault by Jack Park. For this reason, you might like to know that Ohio State University Football Vault will be released on May 20, 2008. You can pre-order yours by following the link below.

  Ohio State University Football Vault
  Jack Park

   Price:
$49.95

   Release Date: May 20, 2008

Product Description
In the Ohio State University® Football VaultTM: The History of the Buckeyes®, Ohio State Football Radio Network commentator and football speaker Jack Park takes you on a memorable journey through more than 100 years of Buckeye football. The detailed scrapbook narrative contains neverbefore- published vintage photographs, artwork and memorabilia drawn from OSU s extensive campus archives. Tucked into dozens of sleeves and pockets, fans will find reproductions of old game programs, historic tickets, bumper stickers and more. These fascinating replicas include a formation diagram for the band s famous Script Ohio, a letter from President Gerald Ford to Woody Hayes and those classic Buckeye helmet stickers. No Ohio State fan should be without this home archive of OSU s long and illustrious history. Illustrated; Hardcover; 144 Pages.




It's correct that I like to read books by Lutz Maicher. I would also be interested if Jack Park published a book. I am however not interested in mixing in what the other Jack Parks have written.

I have always had very high thoughts about Amazons system, and sometimes their recommendations work incredibly well. How could they be so wrong this time?

It seems like a deeper problem. Amazon doesn't appear to have any concept of the author apart from a text string. There doesn't seem to be any unique id. And it gets worse...

Clicking the author-link on a book-page (or the corresponding links on a record- or DVD-page) must be one of the most used navigational aids in Amazon:

 

These were the top ranked hits when clicking Jack Park on a book page:

  1. XML Topic Maps: Creating and Using Topic Maps for the Web by Jack Park and Sam Hunting (Paperback - Jul 26, 2002)

  2. Watchmen on the Wall by Miriam Rodyn Park, Jack W. Hayford, and Robert Stearns (Spiral-bound - Aug 1, 2007)

  3. The Official Ohio State Football Encyclopedia by Jack Park (Hardcover - April 20, 2003)

  4. Home Wind Power by Jack Park (Paperback - Jun 1981)

  5. Sport and Exercise Science: ESSAYS IN THE HISTORY OF SPORTS MEDICINE (Sport and Society) by Jack W Berryman and Roberta J Park (Hardcover - Jul 1, 1992)

  6. Wind Power Book by Jack Park (Hardcover - Jun 1982)

  7. The Ohio State Football Encyclopedia by Jack L Park (Hardcover - Oct 8, 2001)

  8. 52 Fishing Hotspots: A Guide to Angling Every Week of the Year: Compiled By the Editors of Western Outdoors by Kevin Dawson, Terry Rudnick, Dave Hughes, and Mike Sawyers (Paperback - 1985)

  9. Hunter by George Dickerson, Tony Epper, Sonny Gibson, and Harriet Medin (Video Download - Feb 5, 2008)

  10. The Beats : An Anthology of 'Beat' Writing by Park Honan, Allen Ginsberg, Gregory Corso, and Lawrence Ferlinghetti (Paperback - 1987)

  11. Charting the Topic Maps Research and Applications Landscape: First International Workshop on Topic Map Research and Applications, TMRA 2005, Leipzig, Germany, ... Papers (Lecture Notes in Computer Science) by Lutz Maicher and Jack Park (Paperback - April 11, 2006)


Hit number 2 doesn't even match the string "Jack Park", but Amazon is returning this book as the second most relevant hit because of the authors Miriam Rodyn Park and Jack W. Hayford.

I didn't understand why a search for the author "Jack Park" would return book 8, until I had a closer look, and saw that the book had around 40 authors and other contributors, a couple named Jack and one named Park.

As for why Amazon returned the video download of season 1 of Hunter, I'm still clueless...

After this I'm not surprised that The Beats : An Anthology of 'Beat' Writing is on the list. Park Honan is the editor, and Jack Kerouac is one of the authors.

Now lets go back to the beginning. Amazon recommended Ohio State University Football Vault because I have given Lutz Maichers book a high rating.

Amazon claims to have "noticed that customers who have purchased or rated books by Lutz Maicher have also purchased Ohio State University Football Vault by Jack Park". 

- Is this bullshit, or have actually one of you Topic Mappers out there got both the TMRA 2005 proceedings and the Ohio State University Football Vault?

The source of my Amazon recommendation is book number 11, and the recommended book doesn't even appear on the list. I wonder when I will get recommendations for all of the other more relevant books on the list.