User:Brooke Vibber/Media rendering encapsulation

Per notes on bug 56304 it would be nice to have a clean interface to get the HTML rendering of a media file used inline on-wiki; this would allow for Parsoid to include appropriate renderings without having to replicate MediaWiki's entire MediaHandler stack and all potential plugins.

Use cases

Fetching a rendering placeholder when creating a Parsoid rendering for use in VisualEditor or viewer;
Fetching a rendering placeholder to add in VisualEditor when adding a media file
- Client should be able to pass filename & parameters to MediaWiki and get an HTML rendering back
- May need to be able to expose ResourceLoader modules for the VE to ensure are loaded for fancier media, or just show fallback content?
- Parsoid needs to be able to reparse the rendering back to wikitext!
- VisualEditor needs to be able to extract the target media and parameters so they can be modified!
~~Providing output for embedding on blogs or social networks~~
- Note we currently expose embedding for audio/video by adding a parameter on the File: description page and embedding that in an iframe; it's probably fine to just expose that more explicitly for most offsite embedding.
  - that means we can just output <iframe>s for machine-readable embedding instructions, as we do in the human-readable cut-n-paste in TMH
~~Providing output for rendering in a Wikipedia reader application~~
- upcoming future Wikipedia mobile app will want to have special native handling of images, audio, and video, but still should be able to handle "other" media plugins that may come in future and have fancy custom HTML rendering
  - that can probably piggyback on an iframe embedding interface too though

Media cases

Simple images
- output an <img>
  - break down the parameters, which may include customizations (size, page of PDF/DjVu/TIFF, SVG language)
  - selection of backend resource(s) to put in the @src / @srcset attributes
  - size selection
  - Do we need to add any semantic data for Parsoid on the <img>, or in a wrapper? (eg, what marks this as belonging to a certain media file)
Audio/video
- output an <audio> or <video> depending on the media type
  - note that media type may vary within a file type -- .ogg files may be audio or video.
  - break down the parameters, which may include customizations (size, thumb time, ?)
  - selection of backend resource(s) to put in the <source> elements
  - selection of backend resource(s) to put in an <img> thumbnail
  - size selection
  - Do we need to add any semantic data for Parsoid on the <audio>/<video>, or in a wrapper? (eg, what marks this as belonging to a certain media file)
  - Do we need to be able to hook into indicating that certain ResourceLoader modules must be loaded? (eg, ensure that TimedMediaHandler has a chance to process the video if it's loaded up dynamically during a page preview or edit)
"other custom rendering"
- could be <div>, <iframe>, <embed> or who knows what!
- embedded 3d model viewers in Java applets? offsite video in an iframe? interactive JavaScript game??
- Need to be able to hook into indicating that certain ResourceLoader modules must be loaded?

Machine-readability

Parsoid will need to be able to take these renderings and turn them back into wikitext, and VisualEditor will need to expose editing of the parameters in a dialog box UI, so there needs to be enough semantic info in there to reconstruct the filename and parameters etc.

See existing handling of images for details; roughly something like this:

<figure or span typeof="mw:Image"> <!-- or mw:Image/Thumb, mw:Image/Frame etc -->
 <a or span><img src="..."></a or span>
 <figcaption (optional)>....</figcaption>
</figure or span>

We can imagine an audio/video entry looking similar:

<figure or span typeof="mw:Image"> <!-- or mw:Image/Thumb, mw:Image/Frame etc -->
 <span>
  <video or audio resource="..." ...>
   <source src="..." type="..." ...>
   ...
   (fallback content with download and help link)
  </video or audio>
 </span>
 <figcaption (optional)>....</figcaption>
</figure or span>

Things to note for audio/video:

there's never a clickable <a> link around a/v, since they must themselves be clickable
thumbnail image URL appears as 'poster' attribute, not 'src'
actual media sources appear in multiple <source> elements; there will usually be multiple entries due to presence of transcodes
availability of all those derived resources may change over time (during processing or after a new version of the media file is uploaded)
some fallback text content (with download & help links) appears inside the audio/video element.
at runtime, more markup may be added via JavaScript to make the player fancier, or provide compatibility with browsers without native support. Make sure we don't do such processing on the editor or save it back by mistake?

Questions:

Where do all the parameters go??? (size, start time, etc)
What about other arbitrary media types? Use iframes, or divs, or mystery arbitrary stuff?

API

We could add a parameter to prop=imageinfo to provide an HTML rendering
- note that it already exposes thumbnail rendering but assumes that "all the world's an image" and only gives you back a 1x-density thumbnail URL and size in CSS pixels -- insufficient for fancier media types or high-density displays.
- iiprop=thumbhtml -> html: "<img src=\"blah\">" or "<video><source blah></video>"?
- (what about ResourceLoader modules?)
  - alternately, we could have a standard registry for "content enhancement modules" that checks for certain classes or elements, then loads specific modules as necessary, and just expose the stub JS in embedding?

Internals

check more into details...
How much of standard image rendering is in MediaHandler and how much is in Linker?
Do we need to rearrange anything to make encapsulable HTML fetchable?
Do we need to rearrange anything to make list of RL modules etc fetchable?
Double-check that handling of frame generation is outside of the actual image handling.... or is it?