Self Organizing Maps Using R Studio

Nur Mutmainnah Djafar
5 min readJul 25, 2021

What is self organizing maps?

A self-organizing map (SOM) or self-organizing feature map (SOFM) is a type of artificial neural network (ANN) that is trained using unsupervised learning to produce a low-dimensional (typically two-dimensional), discretized representation of the input space of the training samples, called a map, and is therefore a method to do dimensionality reduction. Self-organizing maps differ from other artificial neural networks as they apply competitive learning as opposed to error-correction learning (such as backpropagation with gradient descent), and in the sense that they use a neighborhood function to preserve the topological properties of the input space.

This makes SOMs useful for visualization by creating low-dimensional views of high-dimensional data, akin to multidimensional scaling. The artificial neural network introduced by the Finnish professor Teuvo Kohonen in the 1980s is sometimes called a Kohonen map or network. The Kohonen net is a computationally convenient abstraction building on biological models of neural systems from the 1970s and morphogenesis models dating back to Alan Turing in the 1950s.

While it is typical to consider this type of network structure as related to feedforward networks where the nodes are visualized as being attached, this type of architecture is fundamentally different in arrangement and motivation.

Do Analyze

Self organizing maps using wines data. These data are the results of chemical analyses of wines grown in the same region in Italy (Piedmont) but derived from three different cultivars: Nebbiolo, Barberas and Grignolino grapes. The wine from the Nebbiolo grape is called Barolo. The data contain the quantities of several constituents found in each of the three types of wines, as well as some spectroscopic variables.

To do this analysis, we need a package that you must install, to using the wines data, namely “kohonen” package by writing the command :

install.packages(kohonen)

To activate the package write the command :

library(kohonen)

First of all, call the data to be used

data(wines)
head(wines) #showing the upper data
Wines Data

It can be seen that the wines data has 13 variables, namely alcohol, malic acid, ash, ash alkalinity, magnesium, tot. phenols, flavonoids, non-flav. phenols, proanth, col.int., col.hue, OD ratio, and proline.

From the six data above, it can be seen that some variables have large gaps/ranges between data, therefore data standardization will be carried out

scale(wines)
head(scale(wines)) #showing the upper data
Tampilan data setelah distandarisasi

After standardizing the data, the data has been normalized around 0.

Then create a variable named “grid” which contains data with dimensions of 5x5 to create self-organizing maps with a hexagonal topology shape.

grid <- somgrid(xdim = 5, ydim = 5, topo = "hexagonal")

Then do the command self organizing maps with the variable name som.wines

som.wines <- som(scale(wines), grid = somgrid(xdim = 5, ydim = 5, "hexagonal"))
str(som.wines)
plot(som.wines, type = "mapping") #display the SOM result plot

The plot above shows the horizontal is the x-axis and the vertical is the y-axis. There are 25 nodes, which means that the wines data consists of 177 data, of which 177 data is spread into the 25 nodes above. The next step is to find out which node 1 has any members, etc.

som.wines$grid$pts

The purpose of the output above is, circle 1 is located at the x (horizontal) value of 1.5 and the y (vertical) value of 0.8660254, circle 2 is located at the x (horizontal) value of 2.5 and the y (vertical) value of 0.8660254, etc.

Objects that fall into 25 circles can be seen by using the command

som.wines$unit.classif

As is known at the beginning there are 177 data, the purpose of the output above is that the first object is entered in circle 9, the second object is entered in circle 7, the third object is entered in circle 2, and so on. There are no values ​​more than 25, because the circle they have is only 25.

Then want to see the overall plot of som.wines

The plot above is a color visual of the previous plot which only contains the nodes in the circle. The thirteen variables were not reduced at all. There is a color description that is dark green representing the alcohol variable, slightly dark green representing malic acid, and so on until the proline variable.

Next, look at the results of grouping several objects in a circle into 5 groups using hierarchical clustering.

som.wines$codes[[1]]
dist(som.wines$codes[[1]])
hclust(dist(som.wines$codes[[1]]))

Hierarchical clustering used is the complete method with distance calculations using the Euclidean method, and there are 25 objects.

Then display the grouping results.

peta <- cutree(hclust(dist(som.wines$codes[[1]])), 5)
plot(peta)

The plot above shows how many members per group. It can be seen by counting the nodes in one line. The first group has 7 members, the second group has 8 groups, etc.

Showing grouping plots

plot(som.wines, type = "codes", bgcol = rainbow(5)[peta])

The plot above shows groups represented by rainbow colors, where in group 1 (red) there are 7 members, group 2 (yellow) has 8 members, group 3 (green) has 5 members, group 4 (purple) has 4 members, and group 5 (blue) has 1 member.

To display the plot in more detail can be added

add.cluster.boundaries(som.wines,peta)

There is a dividing line between the formed groups.

FINISH!

May be useful for you guys…

Sources :

https://en.wikipedia.org/wiki/Self-organizing_map

https://medium.com/@986110101

--

--