Applications with Context Vectors

Source: MachineLearningMastery.com

Context vectors are a powerful tool for advanced NLP tasks. They allow you to capture the contextual meaning of words, such as identifying the correct sense of a word in a sentence when it has multiple meanings. In this post, we will explore some example applications of context vectors. Specifically:

You will learn how to extract contextual keywords from a document
You will learn how to generate a summary of a document using context vectors

Let’s get started.

Applications with Context Vectors
Photo by Erik Karits. Some rights reserved.

Overview

This post is divided into two parts; they are:

Contextual Keyword Extraction
Contextual Text Summarization

Contextual Keyword Extraction

Contextual keyword extraction is a technique for identifying the most important words in a document based on their contextual relevance. Imagine that you have a document and want to highlight the most representative words. One way to do this is by finding the words that are most semantically similar to the document. This technique is useful for a wide range of NLP tasks, such as information retrieval, document clustering, and text summarization.

Let’s implement a simple contextual keyword extraction system by comparing each word in the document to the document as a whole:

import numpy as np

import torch

from transformers import BertTokenizer, BertModel

def get_context_vectors(sentence, model, tokenizer):

inputs = tokenizer(sentence, return_tensors=“pt”, add_special_tokens=True)

input_ids = inputs[“input_ids”]

attention_mask = inputs[“attention_mask”]

# Get the tokens (for reference)

tokens = tokenizer.convert_ids_to_tokens(input_ids[0])

# Forward pass, get all hidden states from each layer

with torch.no_grad():

outputs = model(input_ids, attention_mask=attention_mask, output_hidden_states=True)

hidden_states = outputs.hidden_states

# Each element in hidden states has shape (batch_size, sequence_length, hidden_size)

# Here takes the first element in the batch from the last layer

last_layer_vectors = hidden_states[–1][0].numpy() # Shape: (sequence_length, hidden_size)

return tokens, last_layer_vectors

def cosine_similarity(vec1, vec2):

return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))

def extract_contextual_keywords(document, model, tokenizer, top_n=5):

“”“extract contextual keywords from a document”“”

# Split the document into sentences (simple split by period)

sentences = [s.strip() for s in document.split(“.”) if s.strip()]

# Process each sentence to get context vectors

all_tokens = []

all_vectors = []

for sentence in sentences:

if not sentence:

continue # Skip empty sentences

# Get context vectors

tokens, vectors = get_context_vectors(sentence, model, tokenizer)

# Store tokens and vectors (excluding special tokens [CLS] and [SEP])

all_tokens.extend(tokens[1:–1])

all_vectors.extend(vectors[1:–1])

# Convert to numpy arrays, then calculate the document vector as average of all token vectors

all_vectors = np.array(all_vectors)

doc_vector = np.mean(all_vectors, axis=0)

# Calculate similarity between each token vector and the document vector

similarities = []

for token, vec in zip(all_tokens, all_vectors):

# Skip special tokens, punctuation, and common words

if token in [“[CLS]”, “[SEP]”, “.”, “,”, “!”, “?”, “the”, “a”, “an”, “is”, “are”, “was”, “were”]:

continue

# compute similarity, then remember it with the token

sim = cosine_similarity(vec, doc_vector)

similarities.append((sim, token))

# Sort the similarity and get the top N

top_similarities = sorted(similarities, reverse=True)[:top_n]

return top_similarities

# Example document

document = “”“

Artificial intelligence is transforming industries around the world.

Machine learning algorithms can analyze vast amounts of data to identify patterns and make predictions.

Natural language processing enables computers to understand and generate human language.

Computer vision systems can recognize objects and interpret visual information.

These technologies are driving innovation in healthcare, finance, transportation, and many other sectors.

““”

tokenizer = BertTokenizer.from_pretrained(“bert-base-uncased”)

model = BertModel.from_pretrained(“bert-base-uncased”)

model.eval()

# Extract contextual keywords and print the result

top_keywords = extract_contextual_keywords(document, model, tokenizer, top_n=10)

print(“Top contextual keywords:”)

for similarity, token in top_keywords:

print(f“{token}: {similarity:.4f}”)

In this example, the BERT model is used to generate context vectors for each word in the document. The document vector is computed as the average of all token vectors. Alternatively, you could obtain the document vector by extracting the [CLS] prefix token after feeding the entire document into the model. However, this approach is not used here because the input document may be too long for the model to process at once. Instead, the document is split into sentences, and each sentence is processed separately.

With the vectors for each word and the document, you compute the cosine similarity between each word and the document. The function extract_contextual_keywords() returns the top N words with the highest similarity scores. These results are then printed.

Cosine similarity measures how close two vectors are to each other. In this case, if a word vector is close to the document vector, it is assumed to be a good representative of the document. This works because the word vectors are context-aware, as generated by the transformer model. Unlike traditional keyword extraction methods that rely on frequency (such as TF-IDF) or predefined rules (such as RAKE), this approach leverages the semantic understanding captured by the transformer model.

When you run this code, you will get:

Top contextual keywords:

to: 0.7961

can: 0.7909

can: 0.7804

of: 0.7551

human: 0.7365

analyze: 0.7354

enables: 0.7345

computers: 0.7310

in: 0.7282

systems: 0.7153

To improve the result, you may consider implementing stop word removal to exclude common words such as “to” in the output.

Contextual Text Summarization

Summarizing a document can be done in different ways. One of the most common approaches is to select the most representative sentences from the document—a method known as extractive summarization.

One way to perform extractive summarization is by generating a vector for each sentence and a vector for the entire document. The sentences most similar to the document are then selected. With context vectors, it is straightforward to implement this approach. Let’s do this:

import numpy as np

import torch

from transformers import BertTokenizer, BertModel

def cosine_similarity(vec1, vec2):

return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))

def get_sentence_embedding(sentence, model, tokenizer):

“”“Sentence embedding extracted from the [CLS] prefix token”“”

# Tokenize the input

inputs = tokenizer(sentence, return_tensors=“pt”,

add_special_tokens=True, truncation=True, max_length=512)

# Forward pass, get hidden states

with torch.no_grad():

outputs = model(**inputs)

# Get the [CLS] token embedding at position 0 from the last layer

cls_embedding = outputs.last_hidden_state[0, 0].numpy()

return cls_embedding

def extractive_summarize(document, model, tokenizer, num_sentences=3):

# Split the document into sentences

sentences = [s.strip() for s in document.split(“.”) if s.strip()]

if len(sentences) <= num_sentences:

return document

# Get embeddings for all sentences

sentence_embeddings = []

for sentence in sentences:

embedding = get_sentence_embedding(sentence, model, tokenizer)

sentence_embeddings.append(embedding)

# Calculate the document embedding (average of all sentence embeddings)

# then find the most similar sentences

document_embedding = np.mean(sentence_embeddings, axis=0)

similarities = []

for idx, embedding in enumerate(sentence_embeddings):

sim = cosine_similarity(embedding, document_embedding)

similarities.append((sim, idx))

top_sentences = sorted(similarities, reverse=True)[:num_sentences]

# Extract the sentences, preserve the original order

top_indices = sorted([x[1] for x in top_sentences])

summary_sentences = [sentences[i] for i in top_indices]

# Join the sentences to form the summary

summary = “. “.join(summary_sentences) + “.”

return summary

# Example document

document = “”“

Transformer models have revolutionized natural language processing by

introducing mechanisms that can effectively capture contextual relationships in

text. One of the most powerful aspects of transformers is their ability to

generate context-aware vector representations, often referred to as context

vectors. Unlike traditional word embeddings that assign a fixed vector to each

word regardless of context, transformer models generate dynamic representations

that depend on the surrounding words. This allows them to capture the nuanced

meanings of words in different contexts. For example, in the sentences “I‘m

going to the bank to deposit money” and “I’m going to sit by the river bank,“

the word “bank” has different meanings. A traditional word embedding would

assign the same vector to “bank” in both sentences, but a transformer model

generates different context vectors that capture the distinct meanings based on

the surrounding words. This contextual understanding enables transformers to

excel at a wide range of NLP tasks, from question answering and sentiment

analysis to machine translation and text summarization.

““”

# Generate a summary

tokenizer = BertTokenizer.from_pretrained(“bert-base-uncased”)

model = BertModel.from_pretrained(“bert-base-uncased”)

summary = extractive_summarize(document, model, tokenizer, num_sentences=3)

# Print the original document and the summary

print(“Original Document:”)

print(document)

print(“Summary:”)

print(summary)

If you run this code, you will get:

Original Document:

Transformer models have revolutionized natural language processing by

introducing mechanisms that can effectively capture contextual relationships in

text. One of the most powerful aspects of transformers is their ability to

generate context-aware vector representations, often referred to as context

vectors. Unlike traditional word embeddings that assign a fixed vector to each

word regardless of context, transformer models generate dynamic representations

that depend on the surrounding words. This allows them to capture the nuanced

meanings of words in different contexts. For example, in the sentences “I’m

going to the bank to deposit money” and “I’m going to sit by the river bank,”

the word “bank” has different meanings. A traditional word embedding would

assign the same vector to “bank” in both sentences, but a transformer model

generates different context vectors that capture the distinct meanings based on

the surrounding words. This contextual understanding enables transformers to

excel at a wide range of NLP tasks, from question answering and sentiment

analysis to machine translation and text summarization.

Summary:

One of the most powerful aspects of transformers is their ability to

generate context-aware vector representations, often referred to as context

vectors. Unlike traditional word embeddings that assign a fixed vector to each

word regardless of context, transformer models generate dynamic representations

that depend on the surrounding words. A traditional word embedding would

assign the same vector to “bank” in both sentences, but a transformer model

generates different context vectors that capture the distinct meanings based on

the surrounding words.

In this example, the function get_sentence_embedding() is used to generate an embedding for an entire sentence by using the [CLS] token embedding from the last layer of the transformer. The [CLS] token is a special token prepended to the sentence, and the transformer is trained to produce an embedding that represents the entire input.

In the function extractive_summarize(), you generate sentence embeddings for each sentence in the document and compute the document embedding as the average of all sentence embeddings. Then, you calculate the cosine similarity between the document embedding and each sentence embedding, selecting the top N sentences with the highest similarity scores.

The summary is formed by joining these top N sentences in their original order within the document. This assumes that the most semantically similar sentences are the most representative of the document.

Summary

In this post, you saw how context vectors can be used in various applications. In particular, you learned:

How to generate context vectors for a document, sentence, or word
How to perform contextual keyword extraction to find important keywords in a document
How to perform extractive summarization

These applications demonstrate the power and versatility of context vectors for advanced NLP tasks. By understanding and leveraging these vectors, you can build sophisticated NLP systems that capture rich semantic relationships in text.

Applications with Context Vectors

Overview

Contextual Keyword Extraction

Contextual Text Summarization

Further Reading

Summary

No comments yet.