Arabic Auto Summarize Class


Khaled Al-Sham'aa
This class identifies the key points in an Arabic document for you to share with others or quickly scan. The class determines key points by analyzing an Arabic document and assigning a score to each sentence. Sentences that contain words used frequently in the document are given a higher score. You can then choose a percentage of the highest-scoring sentences to display in the summary. "ArAutoSummarize" class works best on well-structured documents such as reports, articles, and scientific papers.

"ArAutoSummarize" class cuts wordy copy to the bone by counting words and ranking sentences. First, "ArAutoSummarize" class identifies the most common words in the document (barring "هو", "هي", "في", "حتى", "من" and the like) and assigns a "score" to each word--the more frequently a word is used, the higher the score.

Then, it "averages" each sentence by adding the scores of its words and dividing the sum by the number of words in the sentence--the higher the average, the higher the rank of the sentence. "ArAutoSummarize" class can summarize texts to specific number of sentences or percentage of the original copy.

We use statistical approach, with some attention apparently paid to:

  • Location: leading sentences of paragraph/document, title, introduction, and conclusion.
  • Fixed phrases: "خصوصا", "نتيجة", "خلاصة", "تحقيقات", "هام", in-text summaries, etc.
  • Frequencies of words, phrases, proper names
  • Contextual material: query, title, headline, initial paragraph

The motivation for this class is the range of applications for key phrases:

    The point of the list is that there are many uses for key phrases, so a class for automatically generating good key phrases should have a sizable market.
  • Mini-summary: Automatic key phrase extraction can provide a quick mini-summary for a long document. For example, it could be a feature in a web sites; just click the summarize button when browsing a long web page.

  • Highlights: It can highlight key phrases in a long document, to facilitate skimming the document.

  • Author Assistance: Automatic key phrase extraction can help an author or editor who wants to supply a list of key phrases for a document. For example, the administrator of a web site might want to have a key phrase list at the top of each web page. The automatically extracted phrases can be a starting point for further manual refinement by the author or editor.

  • Text Compression: On a device with limited display capacity or limited bandwidth, key phrases can be a substitute for the full text. For example, an email message could be reduced to a set of key phrases for display on a pager; a web page could be reduced for display on a portable wireless web browser.

This list is not intended to be exhaustive, and there may be some overlap in the items.