User:Jeblad/plans and constituents
The class/property search
[edit]The class/property search is done by traversing upward the hierarchy, trying to find all entities that branch off to a message. On all branches a penalty is given, and all branches that drops below a certain level is pruned.
As there can be several claims, and there can also be several starting points for the upward travel of the hierarchy. The starting point is given weight according to the rank, so a preferred statement would weight somewhat more heavy than a normal statement. Several preferred statements might add up to make a distant class with a plan the overall winner. Deprecated statements will not be traversed.
It is probably better to make a breath-first search, and prioritize those with highest weight before those with lower weight.
Content determination
[edit]The actual data is coming from statements in ordinary items, that is the statements about the topic described in the Wikipedia articles. Such items have statements, and those use properties (predicates) that themselves have statements pointing to one or more messages.
The possible messages are collected by an upward search in the hierarchy, trying to find all properties that branch off to a message.
At least one message must exist for the statement to be considered for inclusion. Which one to use must be considered during optimization of the content generation.
Examples
[edit]Page 62
- Instance of message
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix wdt: <http://www.wikidata.org/prop/direct/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix schema: <http://schema.org/> .
@prefix wd: <http://www.wikidata.org/entity/> .
wd:Q1234567890 a wikibase:Item ; # fake URI
rdfs:label "message a"@en ;
skos:prefLabel "message a"@en ;
schema:name "message a"@en ;
schema:description "message to use for data values"@en;
wdt:P31 # "instance of"
wd:Q1234567891 . # "NLG Message"
The generated structure will have pointers to the actual data.
Document structuring
[edit]The actual data is coming from ordinary items, that is those that represents articles in Wikipedia. Such items has an instance of, and that class (resource) has possibly statements pointing to one or more plans.
The possible plans are collected by an upward search in the hierarchy, trying to find all items that branch off to a plan.
At least one of the plans must be a document plan. This will be used to build the first iteration of the document, that is the article, for the items class. If multiple is found a tie-break will be done on the weight for the plans, and the one with highest weight is used.
All other document plans pointed to by the found classes are then stripped for the title and then merged to build an overall plan.
Examples
[edit]Page 64
- Instance of plan
@prefix wd: <http://wikiba.se/ontology-beta> .
<http://www.wikidata.org/entity/Q42> # fake URI
wd:label "plan A" .
wd:description "some generic plan covering the topic" .
wd:instance_of "NLG Plan" . # this is really an item
wd:constituent # could have any number of constituents, but is expected to have at least one
<http://www.wikidata.org/entity/Q142> . # fake URI
- Instance of constituent
@prefix wd: <http://wikiba.se/ontology-beta> .
<http://www.wikidata.org/entity/Q142> # fake URI
wd:label "constituent A" .
wd:description "some generic constituent covering a small part of the topic" .
wd:instance_of "NLG Constituent" . # this is really an item
- Instance of satellite
@prefix wd: <http://wikiba.se/ontology-beta> .
<http://www.wikidata.org/entity/Q142> # fake URI
wd:label "satelite A" .
wd:description "some generic satellite and its rhetorical relation to the nucleus" .
wd:instance_of "NLG Satellite" . # this is really an item
wd:relation <http://www.wikidata.org/entity/Q342> . # fake URI
wd:satellite <http://www.wikidata.org/entity/Q242> . # fake URI, instance of Plan or Message
- Extended constituent (rhetoric)
@prefix wd: <http://wikiba.se/ontology-beta> .
<http://www.wikidata.org/entity/Q142> # fake URI
wd:label "rhetoric constituent A" .
wd:description "some generic rhetoric covering the topic" .
wd:instance_of "NLG Constituent" . # this is really an item
wd:instance_of "NLG Rhetoric" . # this is really an item, necessary if nucleus/satellite is not given, but should be enforced
wd:nucleus <http://www.wikidata.org/entity/Q42> . # fake URI, instance of Plan or Message
wd:satellite # could have any number of satelites, but is expected to have at least one
<http://www.wikidata.org/entity/Q242> . # fake URI
- Extended constituent (set)
@prefix wd: <http://wikiba.se/ontology-beta> .
<http://www.wikidata.org/entity/Q142> # fake URI
wd:label "set of constituent A" .
wd:description "some generic rhetoric covering the topic" .
wd:instance_of "NLG Constituent" . # this is really an item
wd:instance_of "NLG Set" . # this is really an item, necessary if relation/constituent is not given, but should be enforced
wd:relation <http://www.wikidata.org/entity/Q242> . # fake URI, instance of Plan or Message
wd:constituent # could have any number of constituents, but is expected to have at least one
<http://www.wikidata.org/entity/Q242> . # fake URI
- Extended plan (paragraph)
@prefix wd: <http://wikiba.se/ontology-beta> .
<http://www.wikidata.org/entity/Q142> # fake URI
wd:label "document plan A" .
wd:description "some generic document plan covering the topic" .
wd:instance_of "NLG Plan" . # this is really an item
wd:instance_of "NLG Paragraph" . # this is really an item, necessary as it can't be derived from a property, but should be enforced
- Extended plan (document)
@prefix wd: <http://wikiba.se/ontology-beta> .
<http://www.wikidata.org/entity/Q142> # fake URI
wd:label "document plan A" .
wd:description "some generic document plan covering the topic" .
wd:instance_of "NLG Plan" . # this is really an item
wd:instance_of "NLG Document" . # this is really an item, necessary if title is not given, but should be enforced
wd:title <http://www.wikidata.org/entity/Q242> . # fake URI, instance of PhraseSpec or Message