2 December 2005
Allowing chaos
One of my cube neighbors, a new-ish employee, said that he didn't want to keep his desk clean because he did not yet have a clear understanding of the product he's working on. I understood what he meant, and I think it's important. Only after he understands the system can he organize his environment to fit that system. My note-taking process begins on a small stack of paper-to-be-recycled, white side up, sitting in front of my keyboard. I scribble notes and drawings and UML diagrams as needed. From there, if they're valuable and not just scribbles, I move them to my development wiki in the appropriate location and HTML-ify them with wiki links and external links. Eventually, I may add further notes, link other articles to them, or move them into a more appropriate location as I get a better understanding of the domain...
Clay Shirky has an article criticizing the goals of the semantic Web as, to put it mildly, flawed (he may have used the phrase "utter failure"). He argues that the semantic content of RDF triples provides far less, in the resultant syllogisms, than what we are already approaching in the messy world of HTML. His primary point is that imposition of a strict ontology pushes out the possibility of generalization. There are points in his argument that I disagree with--significantly, he assumes that larger knowledge can't emerge from thousands of pieces of smaller knowledge--but I appreciate his appreciation of vagueness and messiness.
I had recently gotten re-interested in natural language processing, and was thinking about semantic extraction and transformations. What methods are there to process domain-specific documents and convert them into their contained list of domain topics? For example: you have a collection of resumes and want to tag them with jobs (sales, graphic artist, etc.), experience, and whatever. One approach is to build a lexicon of the domain, parse the text of each document into phrases, then transform the deep meaning of the phrases into their respective topics. A rigid ontology must be built and maintained mapping ontological concepts to topics. Another approach is to use word clustering against "model" documents that represent the topics. Matching against the model documents would determine similarity and therefore likelihood of topic inclusion.
This last approach is one that search engines take and is best illustrated in clustering search engines (such as Clusty or Vivisimo). With clustering, the engine dynamically separates results containing different homographs or different semantic domains. "Turkey" the bird is sorted separately from "Turkey" the country, and "Turkey" farming is sorted from "Turkey" recipes for Thanksgiving. Examination of the raw text can provide the semantic ontology. Or at least part of it.
All this comes down to the question: when is it beneficial to allow chaos? The choices are either to pre-define a structure that information should fit in, or to allow information to manifest itself and post-define rules that can order that information into a structure.
- The city posted by sstrader on 6 March 2016 at 9:43:27 PM
- The children of Infinite Jest posted by sstrader on 23 August 2015 at 3:48:24 PM
- Posthuman dystopia posted by sstrader on 22 March 2015 at 10:21:25 AM
- Poetry and apocalypse posted by sstrader on 22 November 2014 at 10:41:33 AM
- Roman Jakobson's functions of language posted by sstrader on 5 October 2014 at 10:36:23 AM Other entries categorized in Programming:
- Techniques after using Swift for a month posted by sstrader on 26 August 2015 at 11:41:51 PM
- Some thoughts on Ruby after finishing a week (minus one day) of training posted by sstrader on 15 December 2011 at 8:59:30 PM
- Links on the singleton pattern posted by sstrader on 9 December 2011 at 9:19:50 AM
- Phonebot posted by sstrader on 29 October 2011 at 6:37:05 PM
- The labeled break in Java posted by sstrader on 4 September 2011 at 11:27:56 AM