Friday, 27 March 2009

Wikipedia as a good PSI-source?

I stumbled upon an interesting discussion in the blogpost Wikipedia - A Democratic Gold Standard for Topic Maps, where Vegard Sandvold suggests that the Topic Maps community "should adopt Wikipedia as it’s democratic and user-generated repository of topic PSI’s". (Lars Marius Garshol wrote a good blogpost about the general idea behind PSIs)

Steve Pepper disagrees, and argues that ideally the PSD (Published Subject Descriptor) should incude the minimum of information needed to unambiguously identify the subject.

Robert Engels then enters the discussion and argues from the RDF point of view.

My view is that I would currently use Wikipedia, because on some subjects it's the best source I got. I agree with Steve Pepper, but imagine that it could be useful in some contexts to be a bit fuzzy on purpose. A widely defined and a bit fuzzy subject might be exactly want we want, to be able to "start a conversation".

Friday, 13 March 2009

Exploring Semantic Mashups in the Wandora Workshop at Topic Maps Norway 2009

I really look forward to the Wandora workshop at Topic Maps Norway 2009 / Emnekart 2009 on March 18, as I have wanted for some time to play a bit with Wandora.

Wandora is an Open Source Java application made mainly for building and managing topic maps, but I think of it as a more general semantic toolbox, and think that exploring Wandora as a semantic extraction tool will be fun.

Wandora has a graphical user interface and several data storage options. Wandora both reads and exports the Topic Maps formats LTM and XTM along with the N3 RDF-format, which should make it a very useful toolbox.

The workshop will explore Wandora as a tool for extracting information from open web sources using some of the many built-in extractors to generate topic maps. It will demonstrate how to use Wandora to do semantic mashups. This is a hands-on workshop, which I imagine should be interesting both to TM developers, Semantic Web developers and developers who knows web 2.0 style mash-ups.

I have a dream of one day converting my well-tagged mp3-collection to a topic map, mash it up with open music information, and explore the new exciting possibilities for navigation and search, which would make iTunes look rather dull.

The workshop will focus on a few of the many interesting Wandora extractors to generate and merge topic maps. The list of available Wandora extractors is impressive, and keep on growing with every new release:

  • MP3 ID3 metadata
  • JPEG metadata
  • PDF metadata
  • FreeDB (music CD metadata)
  • Last.fm XML feeds
  • Internet Movie Database datafiles
  • Converts and imports any SQL database to a topic map
  • BibTeX
  • Flickr
  • YouTube
  • Digg and Del.icio.us
  • Geonames
  • Wikipedia extractor and a more general MediaWiki extractor
  • Wordnet
  • OpenCalais classifier
  • OpenCyc extractor
  • RSS 2.0 and Atom feeds
  • Convert emails and email repositories to a topic map
  • Convert file system structures to a topic map
  • Microformat extractors:
    • Convert geo microformat snippets to topic maps
    • Convert hcalendar microformat snippets to topic maps
    • Convert hcard microformat snippets to topic maps

A Vision for a Topic Maps World

Graham Moore is giving a presentation at Topic Maps Norway 2009 / Emnekart 2009 next week, which is not to be missed:

A Vision for a Topic Maps World

Graham Moore, NetworkedPlanet

Topic Maps has been successful in delivering value in the context of content management, intranets and web publishing. In these contexts it has provided value in terms of improved navigation and findability of content. However, the scope of these projects has been limited, and it could be argued that Topic Maps has simply created better managed, and more useful silos of content. This talk presents a vision and concept for enabling Topic Maps in a global context.

We describe how the fundamental concept of Topic Maps, the separation of identity from addressing, can be taken and utilised in a global scale. This vision includes how people, who have invested in Topic Maps in the small, can contribute and benefit from this step change in the scope of Topic Maps usage.

Saturday, 17 May 2008

Published Subjects and global identifiers

Dataforeningen arrangerer et møte om Published Subjects og globale identifikatorer i universitetsbiblioteket tirsdag 27. mai kl 16-18.

English translation:

The Norwegian Computer Society is planning a meeting about Published Subjects and global identifiers from 16 to 18 on May 27th. The program is quite exciting with four lightning talks, but the presentations are planned to be in norwegian. (We would probably be able to reconsider this and talk english if somebody not understanding norwegian would like to join us).

The meeting will be held in the electronic classroom at the University Library in Oslo.

Steve Pepper (Ontopedia) starts with a quick introduction of the need for shared global identifiers and an introduction to Published Subjects, where he also explains the terminology (PSI, PSD, ...)

Are Gulbrandsen (USIT) presents known published PSI sets and a few unresolved publication and discovery issues. He also discusses potentional sources of PSIs (for instance GREP, ISBN, Wikipedia, LinkedIn and excisting thesauruses).

Alexander Johannesen (Bekk) continues were he left off at Topic Maps 2008, Visions for a Topic Mapped Library, and wants to discuss the use of PSIs from a library perspective. (He has also promised to give us a quick overview over what they have manged to do at The New Zealand Electronic Text Centre (NZETC), Victoria University of Wellington. (NZETC got the Topic Maps Project of the Year 2008 award).

Stian Danenbarger (Bouvet) also continues from his Topic Maps 2008 presentation: Published Subjects: Small Pieces, Meaningfully Joined, and wants to focus on how we can add context to the discovery of PSI sets. - Who is using a PSI set, and how is it used?

More info in Norwegian

Tuesday, 6 May 2008

Amazon.com recommends Lutz Maicher and american football

I got a recommendation from Amazon.com today, that first got me a bit puzzled. I think it's a good example of the homonym problem and some of the challenges Topic Maps and semantic search try to solve.


Dear Amazon.com Customer,

We've noticed that customers who have purchased or rated books by Lutz Maicher have also purchased Ohio State University Football Vault by Jack Park. For this reason, you might like to know that Ohio State University Football Vault will be released on May 20, 2008. You can pre-order yours by following the link below.

  Ohio State University Football Vault
  Jack Park

   Price:
$49.95

   Release Date: May 20, 2008

Product Description
In the Ohio State University® Football VaultTM: The History of the Buckeyes®, Ohio State Football Radio Network commentator and football speaker Jack Park takes you on a memorable journey through more than 100 years of Buckeye football. The detailed scrapbook narrative contains neverbefore- published vintage photographs, artwork and memorabilia drawn from OSU s extensive campus archives. Tucked into dozens of sleeves and pockets, fans will find reproductions of old game programs, historic tickets, bumper stickers and more. These fascinating replicas include a formation diagram for the band s famous Script Ohio, a letter from President Gerald Ford to Woody Hayes and those classic Buckeye helmet stickers. No Ohio State fan should be without this home archive of OSU s long and illustrious history. Illustrated; Hardcover; 144 Pages.




It's correct that I like to read books by Lutz Maicher. I would also be interested if Jack Park published a book. I am however not interested in mixing in what the other Jack Parks have written.

I have always had very high thoughts about Amazons system, and sometimes their recommendations work incredibly well. How could they be so wrong this time?

It seems like a deeper problem. Amazon doesn't appear to have any concept of the author apart from a text string. There doesn't seem to be any unique id. And it gets worse...

Clicking the author-link on a book-page (or the corresponding links on a record- or DVD-page) must be one of the most used navigational aids in Amazon:

 

These were the top ranked hits when clicking Jack Park on a book page:

  1. XML Topic Maps: Creating and Using Topic Maps for the Web by Jack Park and Sam Hunting (Paperback - Jul 26, 2002)

  2. Watchmen on the Wall by Miriam Rodyn Park, Jack W. Hayford, and Robert Stearns (Spiral-bound - Aug 1, 2007)

  3. The Official Ohio State Football Encyclopedia by Jack Park (Hardcover - April 20, 2003)

  4. Home Wind Power by Jack Park (Paperback - Jun 1981)

  5. Sport and Exercise Science: ESSAYS IN THE HISTORY OF SPORTS MEDICINE (Sport and Society) by Jack W Berryman and Roberta J Park (Hardcover - Jul 1, 1992)

  6. Wind Power Book by Jack Park (Hardcover - Jun 1982)

  7. The Ohio State Football Encyclopedia by Jack L Park (Hardcover - Oct 8, 2001)

  8. 52 Fishing Hotspots: A Guide to Angling Every Week of the Year: Compiled By the Editors of Western Outdoors by Kevin Dawson, Terry Rudnick, Dave Hughes, and Mike Sawyers (Paperback - 1985)

  9. Hunter by George Dickerson, Tony Epper, Sonny Gibson, and Harriet Medin (Video Download - Feb 5, 2008)

  10. The Beats : An Anthology of 'Beat' Writing by Park Honan, Allen Ginsberg, Gregory Corso, and Lawrence Ferlinghetti (Paperback - 1987)

  11. Charting the Topic Maps Research and Applications Landscape: First International Workshop on Topic Map Research and Applications, TMRA 2005, Leipzig, Germany, ... Papers (Lecture Notes in Computer Science) by Lutz Maicher and Jack Park (Paperback - April 11, 2006)


Hit number 2 doesn't even match the string "Jack Park", but Amazon is returning this book as the second most relevant hit because of the authors Miriam Rodyn Park and Jack W. Hayford.

I didn't understand why a search for the author "Jack Park" would return book 8, until I had a closer look, and saw that the book had around 40 authors and other contributors, a couple named Jack and one named Park.

As for why Amazon returned the video download of season 1 of Hunter, I'm still clueless...

After this I'm not surprised that The Beats : An Anthology of 'Beat' Writing is on the list. Park Honan is the editor, and Jack Kerouac is one of the authors.

Now lets go back to the beginning. Amazon recommended Ohio State University Football Vault because I have given Lutz Maichers book a high rating.

Amazon claims to have "noticed that customers who have purchased or rated books by Lutz Maicher have also purchased Ohio State University Football Vault by Jack Park". 

- Is this bullshit, or have actually one of you Topic Mappers out there got both the TMRA 2005 proceedings and the Ohio State University Football Vault?

The source of my Amazon recommendation is book number 11, and the recommended book doesn't even appear on the list. I wonder when I will get recommendations for all of the other more relevant books on the list.

Wednesday, 9 April 2008

Travelling to Bergen - a metapragmatic digression with semantic underpinnings

In this blog everything is a subject, - remember? And as a Topic Mapper I sometimes get some weird associations that need more than one sentence explaining. Everything is about context, and semantics often needs some context to be understood (ref pragmatics).

When writing another blogpost I desperately wanted to digress a little and tell some stories related to the city of Bergen.

If you ever go to Bergen I can recommend listening to Michael Jackson.

Be sure to sample the fantastic variety of local beer. You can safely leave out the most popular beer, which tastes almost, but not quite, entirely unlike good beer. (You have probably tasted something very similar in most other cities you have visited, - that is if you like beer. I mean if you really like beer it's another story, but you would probably start to see where I am going with this.)

When we're talking about local beer, Brussels is definitely worth a trip, especially if you find yourself in Bergen. The Cantillon brewery beer museum there is one of the few places in the universe where they still believe in spontaneous fermentation. This is also a good reason that it's improbable that the museum will ever relocate or that the top floor will be redecorated. The mash is exposed to the wild yeasts and bacteria that are said to be native to the Senne valley, in which Brussels lies. This is the historic way of making beers, and all the beer in the world before 1800 was really lambic. (Be aware that Brussels and Belgium have more beer museums. Some people joke that Belgium has more beer types than people).

It is however not trivial to find your way there, and that's one of the reasons for this blogpost. Now when I think about it, The way I have started It's probably better to describe the journey the other way, - from Brussels to Bergen.

The best way to travel from Brussels to Bergen is by car (you will miss all the fun by going on the train). From the centre you start following E19 southwest. Then after a while you will see highway signs for 'Bergen'. Then after a while there are suddenly no more signs for 'Bergen'. - You have entered the french-speaking part of Belgium, the province of Hainaut, of which Mons is the capital. The city got it's name because it was located on a hill, and in Belgium any hill is large, so they decided to call it a mountain (context again I suppose). Mons is latin for Mountain.

Mons is the location of the museum where Paul Otlets Mundaneum is kept. After listening to Alex Wrights keynote at Topic Maps 2008 I have an excuse for going back to Mons, - apart from the beer of course. Which reiminds me of what Michael Jackson has to say on the subject. Bergen is also a very nice city however, so there's several good reasons for going there.

Calling the hills of Mons for mountains is however to strech it too far, - it's actually an insult to Bergen. (I won't even mention the fjords which makes quite a difference).

The fun part of this is looking up some webpages for the city of Bergen where you get all kinds of offers for traveling to and staying in hotels in Bergen.

Information History and Early Visions of the Web

I had the pleasure of listening to the starting keynote of the Topic Maps 2008 conference:

Hierarchies, Networks, and the Web that Wasn't
by Alex Wright

I think the keynote was fantastic, giving us some perspective and food for thought discussing the history leading up to hypertext and the World Wide Web. 

He talked about visions of hypertext pioneers (and earlier ideas) that predated the web. In the end he summed up some of the fantastic features that didn't make it into the web, like typed links and multi-directional links. Features we actually can use in a topic map. He also cited Ted Nelson saying "The Web isn't hypertext, it's decorated directories!"

David Weinberger continued this discussion in his own keynote the day after, and argued that one of the reasons the web succeeded was precisely that it left out these complex features. David is a marketing guy (and philosopher among other things) and I think he is right. The web succeeded because it was simplistic. Easy to understand and easy for anybody to just start using. (Which Implications might this have for the Topic Maps standards?)

I think both Wright and Weinberger have a point. Weinbergers perspective is that the "center of the web should be empty",  but he also says that the web is extendable. I think it's possible that we will get closer to Ted Nelsons visions, but probably not all the way.

One highlight of the presentation was about The Mother of All Demos:

On December 9, 1968, Douglas C. Engelbart and the group of 17 researchers working with him in the Augmentation Research Center at Stanford Research Institute in Menlo Park, CA, presented a 90-minute live public demonstration of the online system, NLS, they had been working on since 1962. The public presentation was a session in the of the Fall Joint Computer Conference held at the Convention Center in San Francisco, and it was attended by about 1,000 computer professionals. This was the public debut of the computer mouse. But the mouse was only one of many innovations demonstrated that day, including hypertext, object addressing and dynamic file linking, as well as shared-screen collaboration involving two persons at different sites communicating over a network with audio and video interface [1].

This demo in december 1968 and the moon landing in july 1969. - Must have been quite a year!

Another highlight of the keynote was the story of Paul Otlet, an almost forgotten Belgian forefather of the web. He somehow managed to get funding from the Belgian government and set off to build something resembling an analog web in the 1920s. According to the Wikipedia article his Permanent Encyclopedia grew from 400,000 entries in 1895 to over 15 million in 1934. The museum hosting his collection is located in the city of Mons, in Belgium. (I have actually been there. Good beer! - And this gives me some strange and digressing associations, that I spin off in another blogpost).

I can really recommend his book too. Glut - Mastering Information Through The Ages

Through the ages is an understatement. - Alex Wright goes way back:

The information age started not with microchips or movable type, but with the first flowering of complex life. To approach the the history of information systems from a purely human-centered perspective is to overlook the lessons of billions of years' worth of evolutionary history. Just as our brains carry around some very old reptilian equipment, so our collective strategies for managing information bear the traces of patterns that took shape a long time ago.

He then goes on to discuss the transformation from singe cell organisms to the first multicellular organisms.

You can read his Boxes and Arrows article The Sociobiology of Information Architecture as a good sample.

This is only the first chapter of 12, where chapter 11 is The Web That Wasn't.