Jump to content

Topic on Talk:Page Previews/API Specification

Is Sentence Boundary Detection (SBD) required?

5
Phuedx (WMF) (talkcontribs)

Given the current definition of an intro, it's not clear that we'll need to return a number of sentences as before. AFAIK the apps request 5 sentences and Page Previews requests 525 characters.

Should the intro be limited to 5 sentences if the first paragraph of the lead section is longer?

Phuedx (WMF) (talkcontribs)
Jdlrobson (talkcontribs)

I was working on the basis that we will only consider the first paragraph which means sentence detection is not necessary.

I'm fairly confident first paragraph is enough for a summary and I would hate and push back strongly against introducing this kind of technical complexity.

Phuedx (WMF) (talkcontribs)

@Jdlrobson:

I would have and push back strongly against introducing this kind of technical complexity.

I hope that your push back would be: weighing up the pros and cons of the approach, e.g. an obvious is minimising the amount of data that we're sending and that clients are receiving, which is a genuine concern; and an investigation being done on how complex existing solutions actually are.

Jdlrobson (talkcontribs)

The examples on http://jdlrobson.com/summaries show that relying on the first paragraph only, appears to for the most part generate shorter summaries for all the examples compared with the existing approach.

At worse they may be double the length.

Reply to "Is Sentence Boundary Detection (SBD) required?"