Wikimedia Developer Summit/2018/Knowledge as a Service
Appearance
Dev Summit '18 https://etherpad.wikimedia.org/p/devsummit18
Knowledge as a Service https://phabricator.wikimedia.org/T183314
- See task for session description, goals, and pre-reading notes
DevSummit event
[edit]- Day & Time: Monday, 10:10 am â 12:30 pm
- Room: Tamalpis
- Facilitator: Deb
- Notetaker(s): quiddity, halfak, Leila, Franziska, anne, subbu, SJ
Recap: During this discussion, as well as a follow up discussion, we identified the following as particularly salient points.
- We need better APIs to access content. The Platform Evolution program is taking steps in this direction.
- We can benefit from much better documentation, examples, and technical writing. There's an increasing emphasis in this area by way of the work of Technical Writers (cf. Build Technical Community program)
- We perceive a need for better structuring of the content with a standardized means of capturing semantic information in wikitext. One way of addressing this might be a TechCom discussion.
- There are still open questions about who builds what. A good start would be to better understand use cases.
- There's an opportunity for building into discussions with partners the ability to contribute and play a role in the contribution workflow. This is more a front-of-mind consideration as discussions occur with current or future partners.
- An opportunity exists to survey other organizations who provide their content as open data to find out how they do it and find their lessons learned. We haven't taken an action step here, but it's documented here for future consideration.
Session Notes:
- Get more knowledge out there, to more people
- Many orgs do this: wikimedia, freebase, search engines, wolframalpha
- e.g. People query wikidata, and use that in their projects. E.g. [?], [?], Wikigenomes, Monumental shows locations of historic interst, Eurowings shows info about palces your flying over, Inventaire shows info from[?], Politifact, YLE (finnish broadcasting agency) using wikidata to tag their articles in multiple languages, Quora using our ontology to clean up their own and fill in gaps, Ask Platypus, WikiShootMe shows places with articles that need images, Guessr a geolocation guessing game, Dungeon of Knowledge a roguelike game that walks through domains of knowledge
- Our contents: text, images, videos, sounds, structured data, ontologies, tags, lua modules, templates, gadgets
Questions for this session
- How do we make this easy?
- Moritz: As a wikidata user, the biggest problem is how to find what is already there, and relate to it. Big growth in number of properties, but not currently very connected. Need a new UI for creating items. E.g. perhaps I want to create a new connection, instead of a new item. Would need to implement something for every property -- would not scale. Have some notation about concept sharpness. E.g. coming from mathematics -- there we have a definition for everything. In Wikidata items -- they are created from Wikipedia pages which do not have strong definitions. Eigenvalues & Eigenvectors are very different. Does a wikidata item represent a Wikipedia article or something more specific? An interface should help people contribute [in this context].
- Summary by Adam: I think I heard you say: Modeling user-interfaces would be helpful for building out structure of Wikidata. You want to get more clarity around the modeling component, the specificity ... May be you can talk about why this is important?
- Moritz: We are building Q&A system for maps and to find out that things are related, it's important not only to know what these things are, but also how they are related. In theory, we have a lot of properties that could be used, they are not used consistently. Different people use different ways to represent the models and input the data.
- Summary by Adam: I think I heard you say: Modeling user-interfaces would be helpful for building out structure of Wikidata. You want to get more clarity around the modeling component, the specificity ... May be you can talk about why this is important?
- Adam: Is this a broadly shared view? Is there [?]
- Lucie: Not only the properties are problems, but also the class structure. This is quite a big problem, and I'm not sure how to address it. It's more a community problem, not a big technology problem.
- Lydia: Wikidata is very centered around [?] concept. It's hard to help people see the bigger picture. If you have ideas about how to make this clearer...
- tpt: [?][he said something along the lines of it would be very/too complex to add description to every item we would build something that resembles an AI which would be way to complex?]. We need to expand the discussion to other areas, e.g. structured data on commons. Wikipedia articles, Wikisource, and beyond.
- Lucie: Interconnected with the other content, since we can't even identify all the people in wikidata, e.g. all humans are humans, adding more content to that could create a mess when we try to interlink it with Commons data.
- tpt: have to find a balance between flexibility & precision
- Lydia: to some extent we can fix the mess, but there will always be a mess.
- Subbu: I want to slightly switch topics. Wikitext will continue to be important for writing content on our projects. How do you liberate data from content? You have the entire encyclopedia. How do you liberate data from wiki text? How do you liberate the meta-data from these pages [??] One thing Parsoid has done is to impose a structure at the very basic level which helps clients extract some structure / data from WP content without needing to parse wikitext. What can we do to wikitext as a language to make it possible to expose data from the article? For example, templates has exposed structure and brought consistency What can we do with wikitext as a language to expose more data?
- tpt: I think you're saying: we want to move wikitext from presentation to structure. A user writes about a specific person, and links to [?]
- Subbu: Very broad questions. I have a lot of thoughts around templates ( <--- This. Almost every template captures structured data... ) , but I am also asking if there are other ideas around wikitext as a language.
- Guiseppe: Subbu said part of what I wanted to say. If you look at the examples shown, a lot of them were focused on Wikidata and Commons. Think about Hovercards in a WP page. I think other people are not using these tools because we're not structuring and publicizing those APIs for external use. We should prepare everything we do to be used by third parties. For example, a unified API that would expose this kind of information. This would empower other people use WP data. This is something we can do easily, but it requires a shift in how we think internally.
- Adam: There's extraction happening already. How do we make it better? [editorial: To what extent should editors be doing something to ease those extractions? Is that scalable?] Also talking about an API, do we consider re-use as a first-class consideration?
- Giuseppe: basically yes, if we think of MCS (mobile), a lot of things it does is aggregate info from article in smaller pieces that can be used in mobile app. We should be thinking about how our services can be useful to everyone else. and exposing those things in a documented, clear, public API. We are not doing that consistently now.
- Adam: we definitely don't do that in a consistent matter. Back to the question about asking editors to rewrite things for this purpose: Is it sustainable/scalable to do that.?
- Aaron: 2 things. 1) can show you cool example of knowledge as a service in use. Built by one of my collaborators. http://nokomis.macalester.edu/wmf_en/static/iui2017.html Shows relationship between browsing patterns for different topics on Wikipedia. Shows gaps and quality around a certain topic. This was done by a researcher, free to us, we can use it. Working on public APIs to help editors prioritize work. 2) riffing off Giuseppe, certain aspects of our data have been around for a long time and are well-used. We know people love dumps. Direct SQL access was a huge hit. People who make tools want more access to the APIs that allow people to query and make changes to the content. We could take this kind of affordances and group them together and build requirements for people building new types of Wikimedia content stores.
- James: Top-level, 5 years time, how're we going to achieve the strategy goals? What role do we see for WMF, Affiliates, and the movement? "You should come to our tools, our content, our site"? Or are we aiming at global dispersion, where content appears everywhere, with small tags/logos crediting us at bottom? - for that we'd need more APIs and documentation and outreach and educational materials. Or our APIs but on our platform (Toolforge)? Somewhere in the middle would be [a bit of both?]
- Lydia: I see so many things people do with Wikidata and other contents we have. We didn't sit down and build this technology, someone else did. And we will not know what people can do with our content. So we need to open up the data so people can do what they want to do with it.. Example: Quora is using Wikidata ontology to clean up their ontology. We had no prediction of this, but they did it because what we have is valuable for exactly this task. If we hadn't made it available, they couldn't do that.
- Adam: trust that opening the access will yield interesting results
- Corey: we think there's value in making things structured, but what's the use case? If we're trying to power 3rd parties, that directs us and helps us get there faster. We have to prioritize certain things, so we have to figure out what are the things we want to empower people to do, first. I hope we can get some sort of agreement.
- Dan A: think about process: Started as, Someone said it would be cool to release this dataset. After we realized the good idea was there, it took a year to productionize that. Need to understand out limitations, we have small resources. What kind of stuff would we have to prioritize to make that happen? Important exercise to do, even if it seems far in the future. We don't want to rely on serendipity for good things to arise.
- Ariel: I'm going to do hard sell that knowledge as a service needs to be, first and foremost, knowledge reused. By that I mean, people might come to our site or they might go to a million other sites. And people will start innovating on that, in ways that we would never do on our sites. Our job should be to support and facilitate producing knowledge that has the flexibility and the structures for people to do this. If that means structuring wikitext in a really loose way, we should be doing that. For example, I don't think we need to see what the perfect map will be, but we do need to understand what the minimum they need from us to go do that and start experimenting.
- Lucie: I wanted to add up to this with two examples. First as a question. Most people I work with that use NLP, first use DBpedia, to get the parsed out version of Wikipedia. Do we want to support third parties to offer some services that we don't want to offer, or do we want to take over the ownership of such initiatives? E.g. Wikidata works with triples, and I ask [Person] to provide me with info many times a week. Other people do not have this direct access.
- Lydia: One very good example was WDQS that was done by Magnus as a proof of concept showing that querying Wikidata is key. The team was very busy at the time, so Magnus developed it, lots of people used it, and eventually someone like Stas stood up and sat with Magnus and took over.
- Matt F: We talked about Hovercards and Maps visualization. We should welcome all these, but need to focus on core mission, the creating and sharing of knowledge. We're never going to be able to reach everyone in the world - limitations of internet access or video-impossibility. Should support things like kiwix and dumps. We're a wiki - the creation/curation of knowledge. That affects [?]. E.g. people using wikiwand should be able to edit and discuss those pages - they're unlikely to ever implement it themselves, so we need structured editing and discussion systems so that people can contribute back, no matter where they're reading it from.
- SJ: essential piece is having a place where everyone can share their data. I wake up regularly thinking about the freebase migration. There's still a huge backlog of content to integrate. From the perspective of a person who wants to find data of the world, it would be nice if it was all the same thing and it was automatically mapped. Figuring out whether wikidata means to be *the* way that people access free data, then that's something different and will have different people helping with schema migration, etc.
- See also: Adding high-volume facets like translations [from a translation firehose/network], sourced properties [from a public db; at libraries, in OSM, &c]
- tpt: OSM has contents and contributions and APIs, I believe these projects have APIs for gateing info from the database into many locations [???]. Do we want to go in a similar direction, or continue with a single website/interface/etc, or something in the middle?
- Lydia: are there other similar examples we should look at? Musicbrainz, OSM, what else?
- Newsrooms -- BBC, &c using permalinks to WP infoboxes as where they store structured results of research
- tpt: maybe google with their crowd-sourced aspects...
- Giuseppe: Google is a player in the field. I do think we should be using Algorithms, but that shouldn't be our primary focus. Look at the technology and resources available to us, make community our biggest asset. <- +1
- Aaron: We can't compete, but when it comes to sharing our resources and knowledge effectively, we can do more. When I go to conferences, I show off our APIs and tools and advocate what they can do with it. Consider all the academics working in the space, and we have more people than google does. Google doesn't have open historical APIs, because that would affect shareholder perspectives. When it comes to KaaS, it's not the community that are only creating the tools, but the few allies we already have, and the many more that we should be reaching out to.
- Matt F: Counterpoint: algorithms vs people: We don't have the resources to compete with google, but we don't have as many humans contributors as people think. Compared to our vision, we need to help more humans to contribute. Middle ground is making humans be and feel more eficient. The fundamental point is that "if you can see a mistake you can fix it". [?] [global blacklist, and flags?] [PageImages helps avoid gruntwork?]
- Madhu: We talk about us (engineers) vs volunteers (content). There are lots of low-barrier access methods that we can support like WDQS. We'll never be able to produce all APIs ever, but we should be able to provide some amount of low-barrier user interface that opens the door to more people. [Better mixing of content & dev communities might dissipate this us v. them frame :) Run workshops for super-active content creators [who are already script-runners] to learn to work on dev...]
- Corey: It seems like a lot of us are talking about empowering all these other communities, and I think that's a good direction for us to figure out what we want to build for our service. It should be a force-multiplier for them.
- Leila: Aaron and I heard in a roundtable, from a researcher, SV and tech industry is focused on "human in the loop:. Humans will verify or use that content. We could be thinking in reverse. A human understands [?] . The potential we have to grow, and where we should interject the algorithms and technology. +1 I like that framing of flipping the human <-> machine relationship. +1
- Lucie: It's depending on the community itself too, e.g. huge wikis vs medium to small wikis. Ability in some to adapt to the needs to community
- Leila: Re: size of community, the mid-small are the best place to introduce this, because we can reduce their workload.
- Aaron: Wanted to advocate for a general idea, my team is one and 4/5s of a person the reason why we were able to be so successful: we found a solution for the very small team size we have and how we can create impact. went around talking to people trying to learn what they are doing, where the bottlenecks are, where do they spend a lot of their resources (time) and build an algorithm to help. We don't build the whole product. We rely on others to experiment with that. I think in 5 or 10 years we will have more teams engaging that way. Before product people can get started we need that critical piece of infrastructure, they need these algorithms in place. ORES uses exploded after the basic functionality became publicly available, dozens of things I'd never thought of. In the future we need more people to adopt this way of working.
- Giuseppe: We shouldn't work on abstracting the current content with algorithms, post-edit. Google and other multibillion dollar companies are doing this. We cannot offer an equal quality of abstraction, if we stay with a pure freeform text system, and trying to add structure afterwards.
- Corey: Just to that, I agree with that, but there is a practical concern. A lot of times, a feature is planned, and implementing that feature requires a lot of ?? . There is a need for us to build a feature that is really useful, but then shipping it becomes a problem because it requires a lot of changes to make this happen.
- Giuseppe: This is a special use-case. What I'm talking about is building an interface where the content editors enter can be structured.
- Dan A.: Injecting some potential optimism: Erik in search has been doing human assisted search improvements. We have infrastructure, knowledge, experience to address this kind of problem. To Lydia's question, to what other projects we can look at? Look at food recipes online. It started very structured. Currently, it's focused more on story. It's interesting to see how this has evolved. It has started from the structured form and going to story form. There are projects like Quora who have started with the structured form, and WP is a non-structured form.
- Mingli Yuan: I'm a relative newbie with wikidata . We use wikidata for our translation services. I am a newbee and I don't know how to use the API to edit. The feedback to the services needs a process [?]
- Lydia: that is something we want to do this year, to talk to people like you to figure out how we can help for you to be able to do what you just explained. This is not WD specific, it applies to all projects.
- Unknown: We want to edit an item and the process to do that is not very good.
- Leila: WP editors have th same problems upstream. They get data, do quality checks, sometimes do error correction feedback [?]
- Unknown: We change an item and somebody else edits it back. How can we be informed abou it?
- Corey: I'm not ready to summarize the session. I'm not sure if we've got to all the questions in slide 21. For example, "how do we make this sustainable?".
- Moritz: As a wikidata user, the biggest problem is how to find what is already there, and relate to it. Big growth in number of properties, but not currently very connected. Need a new UI for creating items. E.g. perhaps I want to create a new connection, instead of a new item. Would need to implement something for every property -- would not scale. Have some notation about concept sharpness. E.g. coming from mathematics -- there we have a definition for everything. In Wikidata items -- they are created from Wikipedia pages which do not have strong definitions. Eigenvalues & Eigenvectors are very different. Does a wikidata item represent a Wikipedia article or something more specific? An interface should help people contribute [in this context].
- How can we make this sustainable?
- Something that lives for a while.
- Lydia: Lydia: If we build an API and it is not there anymore after a week nobody will use it. [...]
- Aaron: From the non-technical standpoint, we have to have a good narrative about what we're doing and why, otherwise, we will have to stop in a couple of years.
- Matt F.: deprecation is one thing we should do more of. Doing more experiments: setting expectations with experimental APIs, experiment fast, and move on to move them to production if they pass the tests, and stop if they don't.
- Dan A: Re. opening keynote. We need to like each other. And we need to talk. Not about putting ops in a dark room and working them to death. We can't release a ton of APIs with different specs. We need to talk. We're small enough that we should be able to. We should like each other.
- tpt: Difference between core APIs and [??] Core API = save a Wikipedia ariticle, interact with content, etc. Most resources focused on core. Think open street map way of doing things. E.g. users create their own APIs on top of core API.
- Adam: Something that was talked about a lot is APIs. So there is definitely interest in APIs.
- SJ reads the summary which is at the bottom of the etherpad.
- Adam: did this capture everyone's understand? Anything missing?
- SJ: please add a couple of notes at the bottom of the notes.
- Subbu: I want to make sure I'm not the only one who thinks wikitext should be more structured and needs to expose it
- Adam: My position statement was around schema.org and [?] there's a bunch of design work around here. Aaron was essentially talking and Design Thinking around APIs. Clearly we're doing that in wikidata. Not sure about wikipedia, which has a bunch of templates, but isn't very structured beyond that..
- Moritz: Not sure if we can resolve (at least at this meeting) the duplication of structured data between the projects.
- Adam: Maybe that's a question we should go and ask. It's a charged topic, but needs to be worked out. Though perhaps not now?
- Victoria: Over the break, session facilitators could go though notes and identify next questions.
[SEE BELOW FOR PART 2 CONTINUATION]
- Major tech risks and how can they be mitigated?
- What should we avoid?
- What should we stop?
- Technical needs to realize it?
- Which tech should we explore?
- What resources need to be committed?
- How do WMF, WMDE and other large Affiliates contribute and guide?
Summary
[edit]- Standards to coordinate work of many different groups. consider schema.org and what similar standardizing efforts we could contribute to / draw from / incorporate as part of WD (or use Wikidata schema?)
- This is also step 0 of decentralization.
- Can we standardize a set of entity IDs that can help cross-link internal knowledge graphs (at YD,G,B, &c)? That would be a large + widely used authority file
- Ways to post data: via API for small volumes . Large volumes:Â ?? Called "donation" when air-dropped in, may linger unmerged for years. Contrast to bulk uploads of media go Commons (e.g. BL), entries to WP (via bot)
- Use cases:
- Bulk data [re]users: Researchers (dbpedia!), Posters of large db's (Freebase, cartograph & others) https://www.wikidata.org/wiki/Wikidata:Data_access#MediaWiki_API, http://cartograph.info
- Search users: Compare WQesWolframAlpha ... Freebase search API https://developers.google.com/knowledge-graph/ WQS https://phabricator.wikimedia.org/project/board/891/ (an awesome board)
- Consider easier ways than just API calls: More direct ways to generate queries, letting [re]users have easy access to currently-internal tools (to build their own APIs)
- Clients: mobile users, 3rd parties
- Integrators: Musicbrainz, OSM, Quora ontology, Newsrooms
- Design possibilities
- Simplify tech: cf Dan McKinley's suggestions. Focus on intended result, not toolchain [âLINK?]
- Simple / flexible creation UI: items, classes, sharpness of concept [What does "creation" mean here? Creation of items? Input of data?]
- Centralise finding APIs/tools?: Catalogue of love? Awesome uses of the week? Hey look at this cool new API? Etc. A developer portal!!
- Decentralize experiments, technical risk, more: What options are there for a small data [re]user or contributor to build sth Wikidata-adjacent & -compatible?
- Ways to announce experimental features; preferred use of Labs
- Evolving wikitext and other mediums / interfaces for capturing & building out knowledge
- Ideas: (a) Expose structure in the page (b) Expose metadata and schema information that is part of templates
- Measures of success
- Being a force multiplier of communities we care about
Point to discuss: What do we need to provide to get the kind of innovation we want? 3 to 5 years
5 mins on each point, then 5 to wrap up.
- Specific things to ease consumption and contribution (documentation specialists, unified API...)
- Aaron: Standards for publishing
- CScott: Standard for publishing. Standard for reuse & integration (inline, & deeplinks to a single data pt). (Examples of Wikidata, Commons, etc.). For anything we do for Commons, we should also aim for "Knowledge as a Service". First-order treatment?
- Adam: what do we need to do to get to what Aaron said? Documenting standards.
- Aaron: It's probably a commitee thing. We should review what we are doing and iterate them. Ask: what are all the people doing? What is good and shall be kept? What is missing?
- Corey: Taking our desktop UI and separating from the core and building an API that our desktop UI and Extensions all use. This enables any 3rd party to use the same data and build the same feature set that we have built (free our data)
- Anne: Adding to what Aaron said. Identifying the people who are trying to do something that is close to what we have but can't due to some barrier, learning what they want to do, and extending what we want to do to help them would be help them. There are probably small things we can do (low hanging fruit) that will open the gate to participation way wider.
- SJ: Streamline data import. As soon as someone says "I have a freely licensed dataset to contribute", capture that on a WM server. [Currently: HARD to post to Commons, not integrated into WD] Let others chip in to massage it into Wikidata
- SJ: you can't upload any kind of data format onto Commons. Please fix this :) It makes it hard to capture large contributions. cf. https://commons.wikimedia.org/wiki/Commons:File_types#Other_formats
- Matt F.: Commons now has data namespaces instead of data tables. cf. https://www.mediawiki.org/wiki/Help:Tabular_Data
- Madhu: Building infrastructure that allow people to collaborate is key. Tools like Quarry and PAWS are not just about allowing people to query things, but about enabling sharing of the data from the data-services to enable further collaboration.
- Adam: am I hearing increasing people resources and server resources?
- Madhu: Yes (for Quarry and PAWS) and yes, that would be cool. But also about exposing the services in both programmatic ways and low-barrier ways.
- Adam: What's the next action step to get to that kind of thing?
- Madhu: One of the things I have experience with, SWAP - the equivalent of jupyterhub that I built based on some of the Analytics data sources. This is not public, it's not a solution for all data sources. We should think about Knowledge as a Service as building collaborative data access infrastructure on top of our datasets.
- Adam: do you think of it as shift in thinking when it comes to this problem?
- Madhu: yes
- Mortiz: APIs and dumps services that we create are standardized. What can we do with the API and dumps. It's the first decision they make: API vs dumps.
- Zainan Victor: It would be helpful to have a way to feedback on the structured data about the quality of the data. Specifically, giving people API to give back their judgement about data quality . E.g. on wikidata we've identified this value could be wrong. Like a hesitant "flag" system.
- Adam: are you ...
- [It's on Lydia's list]
- API depth vs. breadth
- Adam: is there a particular philosophy around this? Is the smartest way of APIs that we have today is the way we want to go?
- Ariel: do the minimum you can do to help other people build on them.
- tpt: Multi-layer can be another solution. We can have a basic API for more general use-cases, and a more specialized API for deeper use-cases.
- Adam: do the existing API infrastructure support what we need? If we want to have more people using them, do APIs and dumps work the way they are today?
- Matt: re: API systems, defined on other sites with annotations and restful apis, we can't convert to that infrastructure because we have use cases that won't convert to REST ---Adam: yeah, generators and similar
- Corey: back to Adam's question, the honest answer is that we don't know. People may be looking for somethings, but we don't even know about those. Auditing unifying and documenting our API can help us understand.
- Daren Welsh: Skipping to the last question may help you answer the second (API) question. If you reach out to the people outside of Wikimedia who are trying to use the data to learn what's missing. Basically, talk with your customers.
- What is our next step for enabling partnerships (for both reusers and data providers)?
- Aaron: We're doing many relevant things, like hackathons, developing tool APIs and reaching out to tool devs, research and analytics folk. We don't reach out to people who are develop for-profit apps on our data -- maybe because of queasiness of their non-profit-ish?
- What about people who don't currently use WM data or Wikidata, but are heavy users of comparable data-providers (Wolfram Alpha, GKG, &c)? [in tension w/ the sense of {lack of capacity}, but a broader universe of potential use]
- Leila: Researchers that use our Dumps and APIs, and we understand their needs, but don't have the capacity to respond to many. We capture it at conferences, don't have the resources to address.
- [Next steps?] Leila: I can point to a few highest priority. Check: https://phabricator.wikimedia.org/T182351 for an example.
- TheDJ: We do know about what people are looking for. "How often are my images being used by other people?" GLAMs want to know. All we can currently do is say "here's Cloud Services, and here's a volunteer who might be able to help you". We have tools that people make but then the original developer leaves. [We need centralization of both the list of?]
- Dan A: We could do something similar to the wishlist endeavours. But that hits the resource limit again.
- Adam: Could we better allocate tasks at things like hackathons?
- Leila:Â ? Better prioritize things with an emphasis on repeatability.
- Corey: Teams try to tackle individual problems by themselves, essentially putting out fires, but none has time to sit down and think about longterm solutions.
- Jan D: Slightly adding to this direction: which information we provide and how we provide them. For example, good documentation can help with productization or usefulness of the work we do. It would be good to have more conversation on this topic.
- Adam: can you give a specific example.
- TheDJ: Giving clear guidelines on what you need from the community, and assure that if those are met, you can satisfy need. We're basically talking about how to write a bug report.
- Dan A.: the community tech team wasn't there before. Analytics team is now have a different audience and attention. Maybe when a team wraps up they can be reassigned.
- Nick: do teams ever wrap up?
- TheDJ: teams sometimes do, but products don't
- [Post-meeting clarification: Dan meant that we could have people who rotate between teams, providing extra resources to work on specific hard problems. It's been tried a few times with specific teams.]
- Giuseppe: we keep talking about the documentation. To a complete outsider: someone comes in and doesn't find good documentation. You don't want engineers to write documentation for third parties. We don't have anyone with this role?
- Adam: this is loosely enforced through code-review...?
- Giuseppe: But that's not documentation.
- Madhu: Sarah R is working with Cloud Services and ORES on documentation, and we're kicking off a Documentation Special Interest Group
- Erika: We have identified the need in Tech Mgmt to hire for this position.
- Leila: Tech writing - there are two types of people. Those who write tech writing in terms of story-telling about Tech, which is important, but also people who do documentation of codes, APIs, products, etc.
- Aaron: We're doing many relevant things, like hackathons, developing tool APIs and reaching out to tool devs, research and analytics folk. We don't reach out to people who are develop for-profit apps on our data -- maybe because of queasiness of their non-profit-ish?
- Capturing semantic informations with Wikitext?
- subbu: tied to improving structure on a page. Hard to represent / extract semantics robustly without structured info. Start conversations with the template editor community about this. structured templates is probably a good place to start. I find typing to be a useful abstraction for thinking about both these problems. Some links to notes / thoughts:
- https://www.mediawiki.org/wiki/Parsing/Notes/Wikitext_2.0, https://www.mediawiki.org/wiki/Parsing/Notes/Wikitext_2.0/Typed_Templates, https://www.mediawiki.org/wiki/User:SSastry_(WMF)/Notes/Document_Composability (is the motivating use case -- fine-grain addressability, composability, editability etc.)
- Moritz: suggest to start looking at semantic MediaWiki, to learn what was good and what was not, to figure out which parts we want to keep and adapt.
- Matt F: Rather than SMW, when we're really talking about database information, that's about (e.g.) the actual person (such as Barack Obama's dates of office), it should be on Wikidata. Infoboxes should not repeat this data in every project, they should query it from Wikidata and render it. This technology is already in place, though not every project has adopted it. There is definitely a role for more structured wikitext with regards to presentation. E.g. page images (which images to show as e.g. lead) could benefit from formal presentational structure (as I talked about earlier). Another example could be markup to show different media (e.g. which videos, which images) depending on screen size. Partly about presentation.
- Dan A.: to tie some of that together, it's a good time to look at something aspirational. What should this look like? In 2030, I think we should be able to write in wikitext, everything should be weaved together. For example, the text should be adaptive to user's input as the user is entering that information. If I write "the population of France is ..." the number should be updatable automatically.
- Jan D: Wikitext has evolved (as in: Not structured clearly, but more features are added, so it is hard to parse), we should look at the possible needs behind "Capturing Semantic Info in Wikitext". This might be reframed as "Integrating wikidata and wikipedia better". There will be features like client editing (editing structured data via a user interface). Also, Multi-Content-Revisions can help. Perhaps not [do it] at the wikitext level, but reach integration with other technology which is meant for structured data.
- Mingli Yuan: Parsing natural language tags into triples. Sling from google developed object oriented ? programming, to do this. https://github.com/google/sling
- tpt: data extraction from [?]. Data format is [?]. For example, Wikidata with wikitext and semantic MediaWiki is probably going to be much more complicated.
- Adam: maybe time for one more question
- SJ: To bridge this w/ the Q on consistent interfaces: If you want to go straight to what people are doing in 3-5 years, design for embedded devices [for input & preview] and embedded facts [reuse w/in some other publication]. Most uses are increasingly single clauses, sentences, images w/ short captions -- but may still want basic formatting. Find a simple subset of markup & interface that works for this, across {copyediting, wikidata entry, adding a cited claim}. A single edit might capture a number of wikidata statements.
- The DJ: we can create a lot more flexibility there. Part of the reason wikitext is so successful is that it's readable normally, except for enwiki. People appreciate it that it's one block of data. Lot of conflicts arise when you distribute the responsibility by breaking down the data capturing mechanism. Something like MCR can address that.
- "I think TheDJ said it best"
- subbu: tied to improving structure on a page. Hard to represent / extract semantics robustly without structured info. Start conversations with the template editor community about this. structured templates is probably a good place to start. I find typing to be a useful abstraction for thinking about both these problems. Some links to notes / thoughts:
- Interfaces for consistency (Wikidata, across projects)