In today’s fast-paced world, condensing long-form content into concise summaries is essential, whether for quickly scanning articles or highlighting key points in research papers. Hugging Face offers a powerful tool for text summarization: the BART model. In this article, we will explore how you can leverage Hugging Face's pre-trained models, specifically the facebook/bart-large-cnn model, to summarize long articles and text.
Getting Started with Hugging Face's BART Model
Hugging Face provides a variety of models for NLP tasks like text classification, translation, and summarization. One of the most popular models for summarization is BART (Bidirectional and Auto-Regressive Transformers), which is trained to generate coherent summaries from large documents.
Step 1: Install Hugging Face Transformers Library
To get started with Hugging Face models, you’ll need to install the transformers
library. You can do this using pip:
pip install transformers
Step 2: Importing the Summarization Pipeline
Once the library is installed, you can easily load a pre-trained model for summarization. Hugging Face’s pipeline API provides a high-level interface for using models like facebook/bart-large-cnn, which has been fine-tuned for summarization tasks.
from transformers import pipeline
# Load the summarization model
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
Step 3: Running the Summarizer
Now that you have the summarizer ready, you can feed in any long text to generate a summary. Below is an example using a sample article about Dame Maggie Smith, a well-known British actress.
ARTICLE = """ Dame Margaret Natalie Smith (28 December 1934 – 27 September 2024) was a British actress. Known for her wit in both comedic and dramatic roles, she had an extensive career on stage and screen for over seven decades and was one of Britain's most recognisable and prolific actresses. She received numerous accolades, including two Academy Awards, five BAFTA Awards, four Emmy Awards, three Golden Globe Awards and a Tony Award, as well as nominations for six Olivier Awards. Smith is one of the few performers to earn the Triple Crown of Acting.
Smith began her stage career as a student, performing at the Oxford Playhouse in 1952, and made her professional debut on Broadway in New Faces of '56. Over the following decades Smith established herself alongside Judi Dench as one of the most significant British theatre performers, working for the National Theatre and the Royal Shakespeare Company. On Broadway, she received the Tony Award for Best Actress in a Play for Lettice and Lovage (1990). She was Tony-nominated for Noël Coward's Private Lives (1975) and Tom Stoppard's Night and Day (1979).
Smith won Academy Awards for Best Actress for The Prime of Miss Jean Brodie (1969) and Best Supporting Actress for California Suite (1978). She was Oscar-nominated for Othello (1965), Travels with My Aunt (1972), A Room with a View (1985) and Gosford Park (2001). She portrayed Professor Minerva McGonagall in the Harry Potter film series (2001–2011). She also acted in Death on the Nile (1978), Hook (1991), Sister Act (1992), The Secret Garden (1993), The Best Exotic Marigold Hotel (2012), Quartet (2012) and The Lady in the Van (2015).
Smith received newfound attention and international fame for her role as Violet Crawley in the British period drama Downton Abbey (2010–2015). The role earned her three Primetime Emmy Awards; she had previously won one for the HBO film My House in Umbria (2003). Over the course of her career she was the recipient of numerous honorary awards, including the British Film Institute Fellowship in 1993, the BAFTA Fellowship in 1996 and the Society of London Theatre Special Award in 2010. Smith was made a dame by Queen Elizabeth II in 1990.
"""
# Generate the summary
summary = summarizer(ARTICLE, max_length=130, min_length=30, do_sample=False)
# Print the summary
print(summary)
Output:
[{'summary_text': 'Dame Margaret Natalie Smith (28 December 1934 – 27 September 2024) was a British actress. Known for her wit in both comedic and dramatic roles, she had an extensive career on stage and screen for over seven decades. She received numerous accolades, including two Academy Awards, five BAFTA Awards, four Emmy Awards, three Golden Globe Awards and a Tony Award.'}]
As you can see from the output, the summarizer condenses the main points of the article into a short, readable format, highlighting key facts like her career longevity and accolades.
Another Approach: Summarizing Text from a File
In some use cases, you might want to read the text from a file rather than a hardcoded string. Below is an updated Python script that reads an article from a text file and generates a summary.
from transformers import pipeline
# Load the summarizer pipeline
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
# Function to read the article from a text file
def read_article_from_file(file_path):
with open(file_path, 'r') as file:
return file.read()
# Path to the text file containing the article
file_path = 'article.txt' # Change this to your file path
# Read the article from the file
ARTICLE = read_article_from_file(file_path)
# Get the summary
summary = summarizer(ARTICLE, max_length=130, min_length=30, do_sample=False)
# Print the summary
print(summary)
File Input:
In this case, you would need to save the article to a text file (article.txt
in the example), and the script will read the content and summarize it.
Conclusion
Hugging Face's BART model is a great tool for automatic text summarization. Whether you’re processing long articles, research papers, or any large body of text, the model can help you distill the information into a concise summary.
This article demonstrated how you can integrate Hugging Face’s pre-trained summarization model into your projects, both with hardcoded text and file input. With just a few lines of code, you can have an efficient summarization pipeline up and running in your Python projects.
Top comments (0)