Friday 18 May 2007 12:50:38 pm
Hi Everyone, (I'm in Tasmania, Australia about as far away from Norway as possible, so online is my best chance for sharing ideas at the moment). Felix, the ideas in your second (missing?) post sound more sophisticated ... might be useful to briefly explain some of my EZ/other work... A lot of my work has been directed towards developing useful knowledge for action in complex and dynamic situations (eg industry analysis, understanding national education/training needs, regional business/industry adoption of e-commerce / internet technologies). One of the fundamental problems in these situations is making sense out of the vast amount of information available. One strategy is unstructured text analysis - as used by intelligence agencies and data-mining/marketing companies (or see text analysis here: http://en.wikipedia.org/wiki/Text_mining). On this theme, I started investigating text analysis using GATE (see www.gate.ac.uk) which is a "General Architecture for Text Engineering". This analyses documents (eg imagine web / ez content) to draw automatically out concepts and relationships (eg People, Locations, Time, Money, and more). It can do this from/to a mysql database or generate topic-classifed XML of content. One automated pathway for using GATE is via ECLIPSE (that name ring any bells?) I have experimented with some (basic and non-ez) PHP code to analyse and present the GATE analysis for my industry/other work. I got to a proof-of-concept stage with a semi-automated process going from web/mysql through GATE text analysis engine and back to mysql/web presentation. I guess a key issue is what to do for presentation/user interaction with this rich categorisation of topics. I've been looking at using treemaps (eg http://www.cs.umd.edu/hcil/treemap-history/), and Self Organising Maps / clustering or other visual means (eg see various methods/code here http://www.publicwhip.org.uk/mpsee.php, http://iv.slis.indiana.edu/sw/ ) While such (potentially) industrial-strength approaches may seem like overkill for EZ sites in themselves, I suggest it is useful to look towards the Web 2.0 possibilities of EZ. For me this means going beyond EZ’s core Web 1.0 content-presentation capabilities, and starting to think about rich user interaction and value-adding/information-sensemaking services. For example, in Tasmania at the moment there's a lot of community concern about a "world scale" (ie very very big) pulp mill project. I used some of these techniques to analyse 780 public documents (around 4500 pages) and present this as a topic / treemap. It's not online, but it helps reveal themes and relationships people just don't see reading through content page by page ... which is why I produced it. So, bringing all of this back to the current thread, what does this mean? Well, I wanted to share some of my thinking, and in relation to Felix's post (18/05/2007 4:53 pm - missing from ez site?) on how to do step (1), I note:
- It is pretty heavy processing work, but GATE does some of this type of analysis, and/or, - Even if GATE is overkill, it's methods of grammar/keyword analysis may be useful for you (it uses some flat text files to match locations, for example). BTW, the attribute relation method Kristof describes sounds excellent. It means tagging is possible on multiple dimensions (around a datatype/attribute) rather than using relation to parent, which only allows one. Makes a lot of sense. Apologies for length. Hope this helps.
Regards - Paul
|