123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399 |
- Search API
- ----------
- This module provides a framework for easily creating searches on any entity
- known to Drupal, using any kind of search engine. For site administrators, it is
- a great alternative to other search solutions, since it already incorporates
- facetting support and the ability to use the Views module for displaying search
- results, filters, etc. Also, with the Apache Solr integration [1], a
- high-performance search engine is available for use with the Search API.
- If you need help with the module, please post to the project's issue queue [2].
- [1] http://drupal.org/project/search_api_solr
- [2] http://drupal.org/project/issues/search_api
- Content:
- - Glossary
- - Information for users
- - Information for developers
- - Included components
- Glossary
- --------
- Terms as used in this module.
- - Service class:
- A type of search engine, e.g. using the database, Apache Solr,
- Sphinx or any other professional or simple indexing mechanism. Takes care of
- the details of all operations, especially indexing or searching content.
- - Server:
- One specific place for indexing data, using a set service class. Can
- e.g. be some tables in a database, a connection to a Solr server or other
- external services, etc.
- - Index:
- A configuration object for indexing data of a specific type. What and how data
- is indexed is determined by its settings. Also keeps track of which items
- still need to be indexed (or re-indexed, if they were updated). Needs to lie
- on a server in order to be really used (although configuration is independent
- of a server).
- - Item type:
- A type of data which can be indexed (i.e., for which indexes can be created).
- Most entity types (like Content, User, Taxonomy term, etc.) are available, but
- possibly also other types provided by contrib modules.
- - Entity:
- One object of data, usually stored in the database. Might for example
- be a node, a user or a file.
- - Field:
- A defined property of an entity, like a node's title or a user's mail address.
- All fields have defined datatypes. However, for indexing purposes the user
- might choose to index a property under a different data type than defined.
- - Data type:
- Determines how a field is indexed. While "Fulltext" fields can be completely
- searched for keywords, other fields can only be used for filtering. They will
- also be converted to fit their respective value ranges.
- How types other than "Fulltext" are handled depends on the service class used.
- Its documentation should state how the type-selection affect the indexed
- content. However, service classes will always be able to handle all data
- types, it is just possible that the type doesn't affect the indexing at all
- (apart from "Fulltext vs. the rest").
- - Boost:
- Number determining how important a certain field is, when searching for
- fulltext keywords. The higher the value is, the more important is the field.
- E.g., when the node title has a boost of 5.0 and the node body a boost of 1.0,
- keywords found in the title will increase the score as much as five keywords
- found in the body. Of course, this has only an effect when the score is used
- (for sorting or other purposes). It has no effect on other parts of the search
- result.
- - Data alteration:
- A component that is used when indexing data. It can add additional fields to
- the indexed entity or prevent certain entities from being indexed. Fields
- added by callbacks have to be enabled on the "Fields" page to be of any use,
- but this is done by default.
- - Processor:
- An object that is used for preprocessing indexed data as well as search
- queries, and for postprocessing search results. Usually only work on fulltext
- fields to control how content is indexed and searched. E.g., processors can be
- used to make searches case-insensitive, to filter markup out of indexed
- content, etc.
- Information for users
- ---------------------
- IMPORTANT: Access checks
- In general, the Search API doesn't contain any access checks for search
- results. It is your responsibility to ensure that only accessible search
- results are displayed – either by only indexing such items, or by filtering
- appropriately at search time.
- For search on general site content (item type "Node"), this is already
- supported by the Search API. To enable this, go to the index's "Filters" tab
- and activate the "Node access" data alteration. This will add the necessary
- field, "Node access information", to the index (which you have to leave as
- "indexed"). If both this field and "Published" are set to be indexed, access
- checks will automatically be executed at search time, showing only those
- results that a user can view. Some search types (e.g., search views) also
- provide the option to disable these access checks for individual searches.
- Please note, however, that these access checks use the indexed data, while
- usually the current data is displayed to users. Therefore, users might still
- see inappropriate content as long as items aren't indexed in their latest
- state. If you can't allow this for your site, please use the index's "Index
- immediately" feature (explained below) or possibly custom solutions for
- specific search types, if available.
- As stated above, you will need at least one other module to use the Search API,
- namely one that defines a service class (e.g., search_api_db ("Database search")
- which can be found at [3]).
- [3] http://drupal.org/project/search_api_db
- - Creating a server
- (Configuration > Search API > Add server)
- The most basic thing you have to create is a search server for indexing content.
- Go to Configuration > Search API in the administration pages and select
- "Add server". Name and description are usually only shown to administrators and
- can be used to differentiate between several servers, or to explain a server's
- use to other administrators (for larger sites). Disabling a server makes it
- unusable for indexing and searching and can e.g. be used if the underlying
- search engine is temporarily unavailable.
- The "service class" is the most important option here, since it lets you select
- which backend the search server will use. This cannot be changed after the
- server is created.
- Depending on the selected service class, further, service-specific settings will
- be available. For details on those settings, consult the respective service's
- documentation.
- - Creating an index
- (Configuration > Search API > Add index)
- For adding a search index, choose "Add index" on the Search API administration
- page. Name, description and "enabled" status serve the exact same purpose as
- for servers.
- The most important option in this form is the indexed entity type. Every index
- contains data on only a single type of entities, e.g. nodes, users or taxonomy
- terms. This is therefore the only option that cannot be changed afterwards.
- The server on which the index lies determines where the data will actually be
- indexed. It doesn't affect any other settings of the index and can later be
- changed with the only drawback being that the index' content will have to be
- indexed again. You can also select a server that is at the moment disabled, or
- choose to let the index lie on no server at all, for the time being. Note,
- however, that you can only create enabled indexes on an enabled server. Also,
- disabling a server will disable all indexes that lie on it.
- The "Index items immediately" option specifies that you want items to be
- directly re-indexed after being changed, instead of waiting for the next cron
- run. Use this if it is important that users see no stale data in searches, and
- only when your setup enables relatively fast indexing.
- Lastly, the "Cron batch size" option allows you to set whether items will be
- indexed when cron runs (as long as the index is enabled), and how many items
- will be indexed in a single batch. The best value for this setting depends on
- how time-consuming indexing is for your setup, which in turn depends mostly on
- the server used and the enabled data alterations. You should set it to a number
- of items which can easily be indexed in 10 seconds' time. Items can also be
- indexed manually, or directly when they are changed, so even if this is set to
- 0, the index can still be used.
- - Indexed fields
- (Configuration > Search API > [Index name] > Fields)
- Here you can select which of the entities' fields will be indexed, and how.
- Fields added by (enabled) data alterations will be available here, too.
- Without selecting fields to index, the index will be useless and also won't be
- available for searches. Select the "Fulltext" data type for fields which you
- want search for keywords, and other data types when you want to use the field
- for filtering (e.g., as facets). The "Item language" field will always be
- indexed as it contains important information for processors and hooks.
- You can also add fields of related entities here, via the "Add related fields"
- form at the bottom of the page. For instance, you might want to index the
- author's username to the indexed data of a node, and you need to add the "Body"
- entity to the node when you want to index the actual text it contains.
- - Indexing workflow
- (Configuration > Search API > [Index name] > Filters)
- This page lets you customize how the created index works, and what metadata will
- be available, by selecting data alterations and processors (see the glossary for
- further explanations).
- Data alterations usually only add one or more fields to the entity and their
- order is mostly irrelevant.
- The order of processors, however, often is important. Read the processors'
- descriptions or consult their documentation for determining how to use them most
- effectively.
- - Index status
- (Configuration > Search API > [Index name] > Status)
- On this page you can view how much of the entities are already indexed and also
- control indexing. With the "Index now" button (displayed only when there are
- still unindexed items) you can directly index a certain number of "dirty" items
- (i.e., items not yet indexed in their current state). Setting "-1" as the number
- will index all of those items, similar to the cron batch size setting.
- When you change settings that could affect indexing, and the index is not
- automatically marked for re-indexing, you can do this manually with the
- "Re-index content" button. All items in the index will be marked as dirty and be
- re-indexed when subsequently indexing items (either manually or via cron runs).
- Until all content is re-indexed, the old data will still show up in searches.
- This is different with the "Clear index" button. All items will be marked as
- dirty and additionally all data will be removed from the index. Therefore,
- searches won't show any results until items are re-indexed, after clearing an
- index. Use this only if completely wrong data has been indexed. It is also done
- automatically when the index scheme or server settings change too drastically to
- keep on using the old data.
- - Hidden settings
- search_api_index_worker_callback_runtime:
- By changing this variable, you can determine the time (in seconds) the Search
- API will spend indexing (for all indexes combined) in each cron run. The
- default is 15 seconds.
- Information for developers
- --------------------------
- | NOTE:
- | For modules providing new entities: In order for your entities to become
- | searchable with the Search API, your module will need to implement
- | hook_entity_property_info() in addition to the normal hook_entity_info().
- | hook_entity_property_info() is documented in the entity module.
- | For making certain non-entities searchable, see "Item type" below.
- | For custom field types to be available for indexing, provide a
- | "property_type" key in hook_field_info(), and optionally a callback at the
- | "property_callbacks" key.
- | Both processes are explained in [4].
- |
- | [4] http://drupal.org/node/1021466
- Apart from improving the module itself, developers can extend search
- capabilities provided by the Search API by providing implementations for one (or
- several) of the following classes. Detailed documentation on the methods that
- need to be implemented are always available as doc comments in the respective
- interface definition (all found in their respective files in the includes/
- directory). The details for hooks can be looked up in the search_api.api.php
- file. Note that all hooks provided by the Search API use the "search_api" hook
- group. Therefore, implementations of the hook can be moved into a
- MODULE.search_api.inc file in your module's directory.
- For all interfaces there are handy base classes which can (but don't need to) be
- used to ease custom implementations, since they provide sensible generic
- implementations for many methods. They, too, should be documented well enough
- with doc comments for a developer to find the right methods to override or
- implement.
- - Service class
- Interface: SearchApiServiceInterface
- Base class: SearchApiAbstractService
- Hook: hook_search_api_service_info()
- The service classes are the heart of the API, since they allow data to be
- indexed on different search servers. Since these are quite some work to get
- right, you should probably make sure a service class for a specific search
- engine doesn't exist already before programming it yourself.
- When your module supplies a service class, please make sure to provide
- documentation (at least a README.txt) that clearly states the datatypes it
- supports (and in what manner), how a direct query (a query where the keys are
- a single string, instead of an array) is parsed and possible limitations of the
- service class.
- The central methods here are the indexItems() and the search() methods, which
- always have to be overridden manually. The configurationForm() method allows
- services to provide custom settings for the user.
- See the SearchApiDbService class provided by [5] for an example implementation.
- [5] http://drupal.org/project/search_api_db
- - Query class
- Interface: SearchApiQueryInterface
- Base class: SearchApiQuery
- You can also override the query class' behaviour for your service class. You
- can, for example, change key parsing behaviour, add additional parse modes
- specific to your service, or override methods so the information is stored more
- suitable for your service.
- For the query class to become available (other than through manual creation),
- you need a custom service class where you override the query() method to return
- an instance of your query class.
- - Item type
- Interface: SearchApiDataSourceControllerInterface
- Base class: SearchApiAbstractDataSourceController
- Hook: hook_search_api_item_type_info()
- If you want to index some data which is not defined as an entity, you can
- specify it as a new item type here. For defining a new item type, you have to
- create a data source controller for the type and track new, changed and deleted
- items of the type by calling the search_api_track_item_*() functions.
- An instance of the data source controller class will then be used by indexes
- when handling items of your newly-defined type.
- If you want to make external data that is indexed on some search server
- available to the Search API, there is a handy base class for your data source
- controller (SearchApiExternalDataSourceController in
- includes/datasource_external.inc) which you can extend. For a minimal use case,
- you will then only have to define the available fields that can be retrieved by
- the server.
- - Data type
- Hook: hook_search_api_data_type_info()
- You can specify new data types for indexing fields. These new types can then be
- selected on indexes' „Fields“ tabs. You just have to implement the hook,
- returning some information on your data type, and specify in your module's
- documentation the format of your data type and how it should be used.
- For a custom data type to have an effect, in most cases the server's service
- class has to support that data type. A service class can advertize its support
- of a data type by declaring support for the "search_api_data_type_TYPE" feature
- in its supportsFeature() method. If this support isn't declared, a fallback data
- type is automatically used instead of the custom one.
- If a field is indexed with a custom data type, its entry in the index's options
- array will have the selected type in "real_type", while "type" contains the
- fallback type (which is always one of the default data types, as returned by
- search_api_default_field_types().
- - Data-alter callbacks
- Interface: SearchApiAlterCallbackInterface
- Base class: SearchApiAbstractAlterCallback
- Hook: hook_search_api_alter_callback_info()
- Data alter callbacks can be used to change the field data of indexed items, or
- to prevent certain items from being indexed. They are only used when indexing,
- or when selecting the fields to index. For adding additional information to
- search results, you have to use a processor.
- Data-alter callbacks are called "data alterations" in the UI.
- - Processors
- Interface: SearchApiProcessorInterface
- Base class: SearchApiAbstractProcessor
- Hook: hook_search_api_processor_info()
- Processors are used for altering the data when indexing or searching. The exact
- specifications are available in the interface's doc comments. Just note that the
- processor description should clearly state assumptions or restrictions on input
- types (e.g. only tokenized text), item language, etc. and explain concisely what
- effect it will have on searches.
- See the processors in includes/processor.inc for examples.
- Included components
- -------------------
- - Data alterations
- * URL field
- Provides a field with the URL for displaying the entity.
- * Aggregated fields
- Offers the ability to add additional fields to the entity, containing the
- data from one or more other fields. Use this, e.g., to have a single field
- containing all data that should be searchable, or to make the text from a
- string field, like a taxonomy term, also fulltext-searchable.
- The type of aggregation can be selected from a set of values: you can, e.g.,
- collect the text data of all contained fields, or add them up, count their
- values, etc.
- * Bundle filter
- Enables the admin to prevent entities from being indexed based on their
- bundle (content type for nodes, vocabulary for taxonomy terms, etc.).
- * Complete entity view
- Adds a field containing the whole HTML content of the entity as it is viewed
- on the site. The view mode used can be selected.
- Note, however, that this might not work for entities of all types. All core
- entities except files are supported, though.
- * Index hierarchy
- Allows to index a hierarchical field along with all its parents. Most
- importantly, this can be used to index taxonomy term references along with
- all parent terms. This way, when an item, e.g., has the term "New York", it
- will also be matched when filtering for "USA" or "North America".
- - Processors
- * Ignore case
- Makes all fulltext searches (and, optionally, also filters on string values)
- case-insensitive. Some servers might do this automatically, for others this
- should probably always be activated.
- * HTML filter
- Strips HTML tags from fulltext fields and decodes HTML entities. If you are
- indexing HTML content (like node bodies) and the search server doesn't
- handle HTML on its own, this should be activated to avoid indexing HTML
- tags, as well as to give e.g. terms appearing in a heading a higher boost.
- * Tokenizer
- This processor allows you to specify how indexed fulltext content is split
- into seperate tokens – which characters are ignored and which treated as
- white-space that seperates words.
- * Stopwords
- Enables the admin to specify a stopwords file, the words contained in which
- will be filtered out of the text data indexed. This can be used to exclude
- too common words from indexing, for servers not supporting this natively.
- - Additional modules
- * Search views
- This integrates the Search API with the Views module [6], enabling the user
- to create views which display search results from any Search API index.
- * Search facets
- For service classes supporting this feature (e.g. Solr search), this module
- automatically provides configurable facet blocks on pages that execute
- a search query.
- [6] http://drupal.org/project/views
|