Word Embeddings

The first version of chatbot was based on


Using tf learn from tensorflow and python, it built a simple layered neural network to classify intentions from bag of words. However the disadvantage is that the intents file has to be written from scratch. This means other synonyms cannot be understood in the model and to include them would mean putting more and more examples. This requires more work and is not exhaustive.

To improve on this version, I decided to explore on word2vec having heard about of it. In word2vec, words are represented as vectors (centre and context words). The vectors are the probabilities of predicting the surrounding words. Minimizing the cost function is done by stochastic gradient descent.

Softmax function was used in it. It is used to turn numbers or scores into probabilities (a probability distribution). The exponential function ensures the values are all positive while the normalization ensures they sum to 1.

There is also a concept of negative sampling leading to a different cost function of the word2vec compared to the one in the youtube video.


The amazing power of word vectors

More work has to be done to understand
– what exactly do the word vectors represent
– what is negative sampling
– what can word2vec be used to do

Lastly, an interesting start up that allows AI to be used in industrial applications like robotics that do not necessarily have lots of data to train models in. Its called Bonsai and its openAI gym training to balance a pole is cool.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s