Sugar Labs/SOM: Difference between revisions

Garycmartin (talk | contribs)
Garycmartin (talk | contribs)
Line 3: Line 3:


[[Image:SOM_legend.jpg|thumb|172px|Legend]]
[[Image:SOM_legend.jpg|thumb|172px|Legend]]
Self Organising Maps (SOMs) can be used as 2d spatial summariser visualisations of multidimensional data. In the maps shown here, text distance metrics are generated from the weekly/monthly content on some of the more active mailing lists. Using a geographic like landscape metaphor, the height (colour gradient) indicates features with strong associations to all other features; proximity represents association between specific features (e.g. related terms), and label size indicates guide to basic frequency of a feature. There are many "correct" 2d map layouts for the same set of data (due to the multidimensional nature of the data), each map generation will usually settle into a slightly different set of local minima, but the associations are no less valid for each. After removing linguistic junk words, and word stemming, the maps currently pick the weeks/months top ~200 features by frequency. Each is a continuous, tillable surface and wraps around north/south and east/west (surface of a torus); so if you find an interesting label to one side, remember to check it's neighbours on the opposite side.
Self Organising Maps (SOMs) can be used as 2d spatial summariser visualisations of multidimensional data. In the maps shown here, text distance metrics are generated from the weekly/monthly content on some of the more active mailing lists. Using a geographic like landscape metaphor, the height (colour gradient) indicates features with strong associations to all other features; proximity represents association between specific features (e.g. related terms), and label size indicates guide to basic frequency of a feature. There are many "correct" 2d map layouts for the same set of data (due to the multidimensional nature of the data), each map generation will usually settle into a slightly different set of local minima, but the associations are no less valid for each. After removing linguistic junk words, and word stemming, the maps currently pick the weeks/months top ~200 features by frequency. Each map is a continuous, tillable surface, and wraps around north/south and east/west (surface of a torus); so if you find an interesting label to one edge, remember to check it's neighbours on the opposite side.


== What Do They Show? ==
== What Do They Show? ==