Why does word2vec use 2 representations for each word? -

March 15, 2011

i trying understand why word2vec's skipgram model has 2 representations each word (the hidden representation word embedding) , output representation (also called context word embedding) . generality context can (not words) or there more fundamental reason

i recommend read article word2vec : http://arxiv.org/pdf/1402.3722v1.pdf

they give intuition why 2 representations in footnote : not word appears in own context, want minimize probability p(w|w). if use same vectors w context w center word, cannot minimize p(w|w) (computed via dot product) if keep word embeddings in unit circle.

but intuition, don't know if there clear justification this...

imho, real reason why use different representations because manipulate entities of different nature. "dog" context not considered same "dog" center word because not. basicly manipulate big matrices of occurences (word,context), trying maximize probability of these pairs happen. theoreticaly use contexts bigrams, trying maximize instance probability of (word="for", context="to maximize"), , assign vector representation "to maximize". don't because there many representations compute, , have reeeeeally sparse matrix, think idea here : fact use "1-grams" context particular case of kinds of context use.

that's how see it, , if it's wrong please correct !

Search This Blog

Plus Code

Why does word2vec use 2 representations for each word? -

Comments

Post a Comment

Popular posts from this blog

r - Trouble relying on third party package imports in my package -

java - Intellij IDEA shortcut How to add new element (ex. class or package)? -

Payment information shows nothing in one page checkout page magento -