Demystify Text Summarization with Hugging Face Open Source Model

LaxmiKumar Reddy Sammeta
2 min readMar 13, 2024

--

Introduction:
Text summarization is a crucial task in natural language processing (NLP) that involves condensing large volumes of text into concise summaries while preserving key information. In this article, we’ll explore how to perform text summarization using open source models, focusing specifically on the facebook/bart-large-cnn. I’ll provide a step-by-step guide and include executable code examples to demonstrate the process.

Step 1: Setting Up the Environment
Before we begin, let’s set up our Python environment and install the necessary libraries:

!pip install transformers datasets

Step 2: Loading the Pretrained Model
Next, let’s load one of the Open Source model for text summarization:

from transformers import BartForConditionalGeneration, BartTokenizer
model_name = "facebook/bart-large-cnn"
model = BartForConditionalGeneration.from_pretrained(model_name)
tokenizer = BartTokenizer.from_pretrained(model_name)

Step 3: Preprocessing the Data
Now, let’s define a function to preprocess the input text for summarization:

def preprocess_text(text):
inputs = tokenizer([text], max_length=1024, return_tensors="pt", truncation=True)
return inputs

Step 4: Generating Summaries
With the preprocessed inputs, we can now generate summaries using the facebook/bart-large-cnn model:

def generate_summary(inputs):
summary_ids = model.generate(inputs["input_ids"], num_beams=4, max_length=150, early_stopping=True)
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
return summary

Step 5: Putting It All Together
Finally, let’s put everything together and generate a summary for a sample text:

text = "The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930. It was the first structure to reach a height of 300 metres. Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct."
inputs = preprocess_text(text)
summary = generate_summary(inputs)
print("Original Text:", text)
print("Summary:", summary)

Conclusion:
In this article, we learned how to perform text summarization using the open source facebook/bart-large-cnn model. By following the steps outlined above and using the provided code examples, you can easily generate concise summaries of text data. Experiment with different inputs and parameters to tailor the summaries to your specific needs. Happy summarizing!

--

--

LaxmiKumar Reddy Sammeta
LaxmiKumar Reddy Sammeta

Written by LaxmiKumar Reddy Sammeta

Technology Leader | Learner | Coder | Contributor | Challenger

No responses yet