Garycmartin

Joined 10 May 2008
2,216 bytes removed ,  10:55, 13 July 2008
Moved SOM work to community page
Line 1: Line 1: −
Gary C. Martin (gary at garycmartin dot com) is a freelance software developer based in Edinburgh/UK, with a focus on design, UI/HCI, analytics and, information visualisation. His main contributions to the OLPC project so far are the [http://wiki.laptop.org/go/Moon Moon] activity, and a set of SVG toolbar icons for [http://wiki.laptop.org/go/Calculate Calculate]. His homepage is over at http://www.garycmartin.com/
+
Gary C. Martin (gary at garycmartin dot com) is a freelance software developer based in Edinburgh/UK, with a focus on design, UI/HCI, analytics and, information visualisation. His main contributions to the OLPC & Sugar Labs project so far are:
   −
== IAEP Kohonen SOM Visualisations ==
+
* [http://wiki.laptop.org/go/Moon Moon] activity
 +
* a set of SVG toolbar icons for [http://wiki.laptop.org/go/Calculate Calculate]
 +
* [[Community/SOM#It.27s_An_Education_Project_Mailing_List_.28Weekly.29|Self Organising Map (SOM) generation for the IAEP]] mailing list
 +
* [[Community/SOM#Sugar_Mailing_List_.28Monthly.29|SOM generation for the Sugar]] mailing lists
   −
These Self Organising Maps (SOMs) acts as 2D spatial summariser visualisations of (multidimensional) text distance metrics generated from the weekly content on its.an.education.project. Using a geographic like landscape metaphor for visualisation,  the height (colour gradient) indicates features with strong associations to all other features; proximity represents association between specific features (i.e related words), and label size is a rough guide to basic frequency of a feature. There are many "correct" map layouts for the same set of data, each generation will usually settle into a slightly different set of local minima, but the associations are no less valid for each. After removing linguistic junk words, and word stemming, the map is currently picking the top ~200 features by frequency. The map surface is continuous and wraps around north/south and east/west (surface of a torus).
+
His external homepage is over at http://www.garycmartin.com/
 
  −
<gallery>
  −
Image:2008-May-01-09-som.jpg|2008 May 1st to 9th
  −
Image:2008-May-10-16-som.jpg|2008 May 10th to 16th
  −
Image:2008-May-17-23-som.jpg|2008 May 17th to 23rd
  −
Image:2008-May-24-30-som.jpg|2008 May 24th to 30th
  −
Image:2008-May-31-June-06-som.jpg|2008 May 31st to June 6th
  −
Image:2008-June-07-13-som.jpg|2008 June 7th to 13th
  −
Image:2008-June-14-20-som.jpg|2008 June 14th to 20th
  −
Image:2008-June-21-27-som.jpg|2008 June 21st to 27th
  −
Image:2008-June-28-July-4-som.jpg|2008 June 28th to July 4th
  −
Image:2008-July-05-11-som.jpg|2008 July 5th to 11th
  −
</gallery>
  −
 
  −
== What Do They Show? ==
  −
 
  −
[[Image:SOM_legend.jpg|thumb|172px|Legend]]
  −
Well, you could just treat them like tag clouds, showing the top 200 word features used on the list for a given week, but the maps also hold spacial information. Word features that appear close together on the map were used closely (on average) in text content from the list. A height metaphor is also used to indicate the features with the strongest mean associations. The map auto centres on the hot pink peak features, these have the strongest association with all the rest of the features on the map; word features in the blue and green areas have weaker associations, but should not be considered negatively.
  −
 
  −
== Future ==
  −
 
  −
I'll continue to refine the mapping algorithms and visualisation style, and will post details here on any significant modifications. The code base was originally designed for bulk text documents from a single author, tested on works of literature from Project Gutenberg.
 
2,354

edits