Jump to content

User:Stefahn/Solr Docu

From mediawiki.org

My own docu about Solr and SolrStore.

General

[edit]

Indexing and updating

[edit]
  • You put documents in it (called "indexing") via XML, JSON, CSV or binary over HTTP.
  • You can modify a Solr index by POSTing XML Documents containing instructions to add (or update) documents, delete documents, commit pending adds and deletes, and optimize your index.
  • schema.xml can specify a "uniqueKey" field called "id". Whenever you POST instructions to Solr to add a document with the same value for the uniqueKey as an existing document, it automatically replaces it for you.
  • index changes are not visible until changes are committed and a new searcher is opened.
  • Commit can be an expensive operation so it's best to make many changes to an index in a batch and then send the commit command at the end.

Query

[edit]

Installation

[edit]

Restarting Solr

[edit]

Do the following as root (or sudo):

cd /opt
./tomcat/bin/shutdown.sh
./tomcat/bin/startup.sh

Command "shutdown" turns off the whole server!

schema.xml

[edit]
  • Info: http://wiki.apache.org/solr/SchemaXml
  • located in:
    • SolrStore: solr/core0/conf/
    • Solr example: solr/example/solr/conf
  • Defines the field types and fields of documents.
  • The schema defines the fields in the index and what type of analysis (field types) is applied to them.
    Example:
    <field name="subject" type="text_general" indexed="true" stored="true"/>
    "subject" = field, "text_general" = fieldtype / analyzer that is applied to the field called "subject"
  • The current schema your server is using may be accessed via the [SCHEMA] link on the admin page.
  • Attention: comment within comment leads to error

Tips and tricks

[edit]
  • If you want to sort an attribute with values like "1 - rookie", "2 - advanced", "3 - expert" don't chose "text_general" as field type, but "string" for example. If you chose text_general results are sorted in this way: advanced, expert, rookie (because "1 -" is skipped/tokenized somehow).

SolrStore

[edit]
  • You don't need to define the SMW attributes as fields in your schema.xml. You only need to define fields if you want to do one of the following:
    • You want to sort results by a attribute.
    • You want to have a search input that searches in more than one attribute (for example search in wikitext and pagetitle at the same time).

multivalued

[edit]
  • multiValued = this field may contain multiple values per document, i.e. if it can appear multiple times in a document
  • With SolrStore one can sort by every field of the Solr System - only requirement: the field must not be multivalued. Usually all the fields that SolrStore generates out of the wiki are multivalued.

Trick to use multivalued fields for sorting: use Copy_Fields to copy the content of one or several fields into another field that is not multivalued.

Changing and reindexing

[edit]

When you change the schema.xml you have not only to restart solr, but also to rebuild the index.

Way to go:

  1. Stop your application server
  2. Change your schema.xml file
  3. Delete the index directory in your data directory (Stefan: in the core directory)
  4. Start your application server (Solr will detect that there is no existing index and make a new one)
  5. Re-Index your data

Ways to reindex:

  • For SMW: Use the following two commands on a shell:
php SMW_refreshData.php -ftpv
php SMW_refreshData.php -v
See [1] for more info.

Misc:

  • There seems to be no problem if one quits XAMPP - data is still there the next time when one launches XAMPP again (reason: it's saved)
  • In general, you need to be very careful when you change the schema without reindexing - see [2]
  • Alternative to stopping application server: use multi-core - see [3]

Misc

[edit]

Multicore

[edit]
  • Multicore means one has more than one Solr core
  • Purpose: you can have a single Solr instance with separate configurations and indexes - while having the convenience of unified administration. More info: http://wiki.apache.org/solr/CoreAdmin
  • Cores are defined in solr.xml
[edit]