Jump to content

User:SBailey (WMF)

From mediawiki.org

Role at Wikimedia Foundation:

  • Parsoid and extensions developer

Hopes for Wikimedia platform futures:

  • help gather requirements, design and create a way to allow editors to author interactive models and simulations in the mediawiki platform

Areas of professional interest:

  • empowering non-programmers to create programs
  • programming language design
  • electronics and embedded systems
  • mechatronics, robotics and automation
  • agriculture automation systems for aquaponics and other farming techniques

Experience:

  • cross-platform, distributed computing, functional array programming language and platform
  • spreadsheet cell engine internals developer (Lotus 123 and Borland Quattro Pro)
  • math education platform and geometry manipulation learning environment for IBM
  • X86 CPU test generating and cosimulation control C interpreter language for Transmeta
  • graphics primitive libraries
  • embedded systems development of hardware, firmware and cloud control processes

Current hobbies:

  • Farm and garden construction and operation including automation
  • Alternative energy systems including photo voltaic solar, solar thermal capture, ground source heat storage and battery inverter systems
  • various PC and playstation video games

Pets:

  • Chocolate Labrador Retriever (Bodhi) - born 4/2018 - adopted 6/2018 from AKC breeder
  • Staffordshire Terrier/Pit Bull - born 11/2021 - rescued 6/2022 from Rescue Ranch

Residence:

  • modest home on 40 acres in Fort Jones California, 26 acres of which is currently a wheat field, the rest sparse oaks on sloping hill side


Important considerations for wikimedia development:

For wikimedia maintenance scripts which need to operate on whole tables, the use of batching is essential and a recommended method is the keyset pagination method, not the LIMIT OFFSET method.

The LIMIT OFFSET approach slows query down when a size of the data is very large because it requires the database read through the records until it reaches the offset #, which cannot use an index or just skip ahead a certain number of records as deleted records will not count as part of the offset. The approach which can use an index to aid in batching efficiently is called Keyset pagination. It requires a unique id in your query, which you can use as a bookmark to point to the last row of the previous page. The next page is fetched using the last bookmark which uses the indexed key field as the start point. For instance:

$lastID = 0;
$batchSize = 1000;

SELECT linter_id, linter_params
FROM linter
WHERE linter_id > $lastID
ORDER BY linter_id ASC
LIMIT $batchSize;

Which when rendered using the wikimedia database abstraction class is:

$queryLinterTable ->table( 'linter' )
    ->fields( [ 'linter_id', 'linter_params' ] )
    ->where( [ 'linter_id > ' . $lastElement ] )
    ->orderBy( 'linter_id', selectQueryBuilder::SORT_ASC )
    ->limit( $batchSize )
    ->caller( __METHOD__ );
$results = $queryLinterTable->fetchResultSet();

foreach ( $results as $row ) {
    $linter_id = intval( $row->linter_id );
    $lastElement = $linter_id;
    // do the maintenance logic and update on each row
}

If the result set above returns the last row with linter_id as 12345, you can use it to fetch the next page which will then use the $lastElement value of 12345:

SELECT linter_id, linter_params
FROM linter
WHERE user_id > 12345
ORDER BY user_id ASC
LIMIT 1000;

This code section will live inside an enclosing loop to process each batch and probably sleep a second between batches and call $lbFactory->waitForReplication(); to allow the table updates to be propagated to the database replicas.