Changes

396 bytes removed ,  22:23, 6 August 2012
Line 3: Line 3:     
[[Image:SOM_legend.jpg|thumb|172px|Legend]]
 
[[Image:SOM_legend.jpg|thumb|172px|Legend]]
Self Organising Maps (SOMs) can act as 2d spatial summariser visualisations of multidimensional data. In the maps shown here, text distance metrics are generated from the weekly/monthly content on some of the more active mailing lists. Using a geographic like landscape metaphor, the height (colour gradient) indicates features with strong associations to all other features; proximity represents association between specific features (e.g. related terms), and label size indicates guide to basic frequency of a feature. There are many "correct" 2d map layouts for the same set of data (due to the multidimensional nature of the data), each map generation will usually settle into a slightly different set of local minima, but the associations are no less valid for each. After removing linguistic junk words, and word stemming, the maps currently pick the weeks/months top ~200 features by frequency. Each is a continuous, tillable surface and wraps around north/south and east/west (surface of a torus); so if you find an interesting label to one side, remember to check it's neighbours on the opposite side.
+
Self Organising Maps (SOMs) can be used as 2d spatial summariser visualisations of multidimensional data. In the maps shown here, text distance metrics are generated from the weekly/monthly content on some of the more active mailing lists. Using a geographic like landscape metaphor, the height (colour gradient) indicates features with strong associations to all other features; proximity represents association between specific features (e.g. related terms), and label size indicates guide to basic frequency of a feature. There are many "correct" 2d map layouts for the same set of data (due to the multidimensional nature of the data), each map generation will usually settle into a slightly different set of local minima, but the associations are no less valid for each. After removing linguistic junk words, and word stemming, the maps currently pick the weeks/months top ~200 features by frequency. Each map is a continuous, tillable surface, and wraps around north/south and east/west (surface of a torus); so if you find an interesting label to one edge, remember to check it's neighbours on the opposite side.
    
== What Do They Show? ==
 
== What Do They Show? ==
   −
Well, you could just treat them like tag clouds, showing the top 200 word features used on the list for a given week/month, but the maps also hold spacial information. Word features that appear close together on the map were used closely (on average) in text content from the list. A height metaphor is also used to indicate the features with the strongest mean associations - the map auto centres on the highest pink peak features, these words have the strongest associations with all the rest of the features on the map; word features in the blue and green areas have weaker mean associations relative to the pink highs, but should not be considered negatively as they are still in the top ~200 terms, and will often be tightly associated with surrounding neighbours.
+
Well, you could just treat them as tag clouds, showing the top two hundred word features used on the mail-list for a given week/month, but the maps also hold spacial information that provides context. Word features that appear close together on the map were used closely (on average) in email content from the mail-list. A height metaphor is used to indicate the combination of feature association strength and feature frequency; word features in the blue and green areas have weaker mean associations and frequency relative to the hot orange and pink highs, but should not be considered negatively as they are still in the top ~200 terms, and will often be tightly associated with surrounding neighbours.
    
== SOM Related Research Papers ==
 
== SOM Related Research Papers ==
Line 18: Line 18:  
== It's An Education Project Mailing List ==
 
== It's An Education Project Mailing List ==
   −
Weekly maps generated with posts from the [http://lists.sugarlabs.org/listinfo/iaep IAEP mailing list]. Most recent maps shown first - for older maps please see the [[Sugar_Labs/SOM/IAEP|IAEP map history]] page.
+
Monthly maps generated with posts from the [http://lists.sugarlabs.org/listinfo/iaep IAEP mailing list]. Most recent maps shown first - for older maps please see the [[Sugar_Labs/SOM/IAEP|IAEP map history map archive]] page.
    
<gallery widths="275" heights="150" perrow="2">
 
<gallery widths="275" heights="150" perrow="2">
Image:2009-August-8-14-som.jpg|'''2009 Aug 8th-14th''' Sugar deployment feedback clearly a hot topic this week.
+
File:2012-July-som.png|'''2012 July''' (31 emails)
Image:2009-August-1-7-som.jpg|'''2009 Aug 1st-7th''' map aspect ratio made non-square to improve SOM stability, adjustments to training cycles and topographic colour palette.
+
File:2012-June-som.png|'''2012 June''' (81 emails)
</gallery>
  −
 
  −
<gallery>
  −
Image:2009-July-25-31-som.jpg|'''2009 July 25th-31st'''
  −
Image:2009-July-18-24-som.jpg|'''2009 July 18th-24th'''
  −
Image:2009-July-11-17-som.jpg|'''2009 July 11th-17th'''
  −
Image:2009-July-4-10-som.jpg|'''2009 July 4th-10th'''
  −
Image:2009-Jun-27-Jul-3-som.jpg|'''2009 Jun 27th-Jul 3rd'''
  −
Image:2009-June-20-26-som.jpg|'''2009 June 20th-26th'''
  −
Image:2009-June-13-19-som.jpg|'''2009 June 13th-19th'''
   
</gallery>
 
</gallery>
   Line 49: Line 39:  
Image:2008-September-Sugar-devel-som.jpg|'''2008 September'''
 
Image:2008-September-Sugar-devel-som.jpg|'''2008 September'''
 
</gallery>
 
</gallery>
 +
 +
== Technology in Education Academic Research Papers ==
 +
 +
A selection of SOMs for technology in education [http://wiki.laptop.org/go/Academic_papers research papers] relating to the One Laptop Per Child project can be found on the laptop.org wiki.
    
== Future ==
 
== Future ==
    
The mapping algorithms and visualisation style will continue to be refined, details will be posted on any significant modifications (see comments under images for changes). The code base was originally designed for bulk text documents from a single author, tested on works of literature from Project Gutenberg.
 
The mapping algorithms and visualisation style will continue to be refined, details will be posted on any significant modifications (see comments under images for changes). The code base was originally designed for bulk text documents from a single author, tested on works of literature from Project Gutenberg.
2,354

edits