User:Stefahn/Solr Docu
Appearance
My own docu about Solr and SolrStore.
General
[edit]Indexing and updating
[edit]- You put documents in it (called "indexing") via XML, JSON, CSV or binary over HTTP.
- You can modify a Solr index by POSTing XML Documents containing instructions to add (or update) documents, delete documents, commit pending adds and deletes, and optimize your index.
- schema.xml can specify a "uniqueKey" field called "id". Whenever you POST instructions to Solr to add a document with the same value for the uniqueKey as an existing document, it automatically replaces it for you.
- index changes are not visible until changes are committed and a new searcher is opened.
- Commit can be an expensive operation so it's best to make many changes to an index in a batch and then send the commit command at the end.
Query
[edit]- You query it via HTTP GET and receive XML, JSON, CSV or binary results.
- Basics: http://lucene.apache.org/solr/api-3_6_2/doc-files/tutorial.html#Querying+Data
- Test and debug queries within your Solr: http://localhost:8080/solr/core0/admin/form.jsp
- Example search UI: http://localhost:8983/solr/browse
- http://wiki.apache.org/solr/SolrQuerySyntax
Installation
[edit]- Extension:SolrStore/Install_Solr#Install_Apache_Solr_under_Windows
- http://www.icuriousmedia.com/blog/how-to-install-apache-solr-on-windows-xp-1439.php
- The folder solr in tomcat/webapps is generated automatically. One doesn't need to copy it from other locations.
Restarting Solr
[edit]Do the following as root (or sudo):
cd /opt ./tomcat/bin/shutdown.sh ./tomcat/bin/startup.sh
Command "shutdown" turns off the whole server!
schema.xml
[edit]- Info: http://wiki.apache.org/solr/SchemaXml
- located in:
- SolrStore: solr/core0/conf/
- Solr example: solr/example/solr/conf
- Defines the field types and fields of documents.
- The schema defines the fields in the index and what type of analysis (field types) is applied to them.
Example:<field name="subject" type="text_general" indexed="true" stored="true"/>
"subject" = field, "text_general" = fieldtype / analyzer that is applied to the field called "subject" - The current schema your server is using may be accessed via the [SCHEMA] link on the admin page.
- Attention: comment within comment leads to error
Tips and tricks
[edit]- If you want to sort an attribute with values like "1 - rookie", "2 - advanced", "3 - expert" don't chose "text_general" as field type, but "string" for example. If you chose text_general results are sorted in this way: advanced, expert, rookie (because "1 -" is skipped/tokenized somehow).
SolrStore
[edit]- You don't need to define the SMW attributes as fields in your schema.xml. You only need to define fields if you want to do one of the following:
- You want to sort results by a attribute.
- You want to have a search input that searches in more than one attribute (for example search in wikitext and pagetitle at the same time).
multivalued
[edit]- multiValued = this field may contain multiple values per document, i.e. if it can appear multiple times in a document
- With SolrStore one can sort by every field of the Solr System - only requirement: the field must not be multivalued. Usually all the fields that SolrStore generates out of the wiki are multivalued.
Trick to use multivalued fields for sorting: use Copy_Fields to copy the content of one or several fields into another field that is not multivalued.
Changing and reindexing
[edit]When you change the schema.xml you have not only to restart solr, but also to rebuild the index.
Way to go:
- Stop your application server
- Change your schema.xml file
- Delete the index directory in your data directory (Stefan: in the core directory)
- Start your application server (Solr will detect that there is no existing index and make a new one)
- Re-Index your data
Ways to reindex:
- For SMW: Use the following two commands on a shell:
php SMW_refreshData.php -ftpv php SMW_refreshData.php -v
- See [1] for more info.
- Script (I don't know how to use up2now, Simon: doesn't work with SMW): http://www.jason-palmer.com/2011/05/how-to-reindex-a-solr-database/
- Modify articles and save afterwards
Misc:
- There seems to be no problem if one quits XAMPP - data is still there the next time when one launches XAMPP again (reason: it's saved)
- In general, you need to be very careful when you change the schema without reindexing - see [2]
- Alternative to stopping application server: use multi-core - see [3]
Misc
[edit]Multicore
[edit]- Multicore means one has more than one Solr core
- Purpose: you can have a single Solr instance with separate configurations and indexes - while having the convenience of unified administration. More info: http://wiki.apache.org/solr/CoreAdmin
- Cores are defined in solr.xml