Django-haystack全局搜索
简述
Haystack(稻草堆)一般用于形容待检索的集合,needle(针)一般用于形容待检索的对象,以大海捞针来形容全局搜索可以说是十分的形象生动。简述如何使用django-haystack
+Whoosh
+jieba
来实现多模型混合检索,支持模糊检索、分模型检索、检索结果高亮检索关键字,以及django-haystck
+django-hvad
实现多语言环境中的全局检索。
1、 安装对应插件
2、 更新settings.py
文件
# 检索服务
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'common.whoosh_cn_backend.ChineseWhooshEngine',
'PATH': os.path.join(os.path.dirname(__file__), 'whoosh_index'),
},
}
# 自动更新搜索索引
HAYSTACK_SIGNAL_PROCESSOR = 'haystack.signals.RealtimeSignalProcessor'
3、 将jieba
分词加入检索引擎中
新建文件common/whoosh_cn_backend.py
(若修改了文件夹或文件名请相应修改settings.py
中的HAYSTACK_CONNECTIONS->ENGINE
路径),内容如下:
from haystack.constants import DJANGO_CT, DJANGO_ID, ID
from haystack.exceptions import SearchBackendError
from whoosh.fields import ID as WHOOSH_ID
from whoosh.fields import BOOLEAN, DATETIME, IDLIST, KEYWORD, NGRAM, NGRAMWORDS, NUMERIC, Schema, TEXT
from jieba.analyse import ChineseAnalyzer
from haystack.backends.whoosh_backend import WhooshSearchBackend, WhooshEngine
class ChineseWhooshSearchBackend(WhooshSearchBackend):
def build_schema(self, fields):
schema_fields = {
ID: WHOOSH_ID(stored=True, unique=True),
DJANGO_CT: WHOOSH_ID(stored=True),
DJANGO_ID: WHOOSH_ID(stored=True),
}
# Grab the number of keys that are hard-coded into Haystack.
# We'll use this to (possibly) fail slightly more gracefully later.
initial_key_count = len(schema_fields)
content_field_name = ''
for field_name, field_class in fields.items():
if field_class.is_multivalued:
if field_class.indexed is False:
schema_fields[field_class.index_fieldname] = IDLIST(stored=True, field_boost=field_class.boost)
else:
schema_fields[field_class.index_fieldname] = KEYWORD(stored=True, commas=True, scorable=True,
field_boost=field_class.boost)
elif field_class.field_type in ['date', 'datetime']:
schema_fields[field_class.index_fieldname] = DATETIME(stored=field_class.stored, sortable=True)
elif field_class.field_type == 'integer':
schema_fields[field_class.index_fieldname] = NUMERIC(stored=field_class.stored, numtype=int,
field_boost=field_class.boost)
elif field_class.field_type == 'float':
schema_fields[field_class.index_fieldname] = NUMERIC(stored=field_class.stored, numtype=float,
field_boost=field_class.boost)
elif field_class.field_type == 'boolean':
# Field boost isn't supported on BOOLEAN as of 1.8.2.
schema_fields[field_class.index_fieldname] = BOOLEAN(stored=field_class.stored)
elif field_class.field_type == 'ngram':
schema_fields[field_class.index_fieldname] = NGRAM(minsize=3, maxsize=15, stored=field_class.stored,
field_boost=field_class.boost)
elif field_class.field_type == 'edge_ngram':
schema_fields[field_class.index_fieldname] = NGRAMWORDS(minsize=2, maxsize=15, at='start',
stored=field_class.stored,
field_boost=field_class.boost)
else:
schema_fields[field_class.index_fieldname] = TEXT(stored=True, analyzer=ChineseAnalyzer(),
field_boost=field_class.boost, sortable=True)
if field_class.document is True:
content_field_name = field_class.index_fieldname
schema_fields[field_class.index_fieldname].spelling = True
# Fail more gracefully than relying on the backend to die if no fields
# are found.
if len(schema_fields) <= initial_key_count:
raise SearchBackendError(
"No fields were found in any search_indexes. Please correct this before attempting to search.")
return (content_field_name, Schema(**schema_fields))
class ChineseWhooshEngine(WhooshEngine):
backend = ChineseWhooshSearchBackend
主要操作是使用from jieba.analyse import ChineseAnalyzer
中的ChineseAnalyzer
替换原先的StemmingAnalyzer
,以达到更好的中文分词效果。
4、 为模型检索索引
索引文件放于各模型所在app
的根目录下,默认名称为search_indexes.py
,以新闻和招聘为例:
news/search_indexes.py
from haystack import indexes
from .models import News
class NewsIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
title = indexes.CharField(null=True, model_attr='title')
summary = indexes.CharField(null=True, model_attr='summary')
content = indexes.CharField(null=True, model_attr='content')
publish_time = indexes.DateTimeField(model_attr='publish_time')
lang = indexes.CharField(model_attr='language_code')
def get_model(self):
return News
def index_queryset(self, using=None):
"""Used when the entire index for model is updated."""
return self.get_model().objects.language('all').all()
def read_queryset(self, using=None):
return self.get_model().objects.language()
recruit/search_indexes.py
from haystack import indexes
from .models import Recruit
class RecruitIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
job = indexes.CharField(null=True, model_attr='job')
place = indexes.CharField(null=True, model_attr='place')
content = indexes.CharField(null=True, model_attr='content')
publish_time = indexes.DateTimeField(model_attr='publish_time')
lang = indexes.CharField(model_attr='language_code')
def get_model(self):
return Recruit
def index_queryset(self, using=None):
"""Used when the entire index for model is updated."""
return self.get_model().objects.language('all').all()
def read_queryset(self, using=None):
return self.get_model().objects.language()
NewsIndex、RecruitIndex
里的lang = indexes.CharField(model_attr='language_code')
和index_queryset\read_queryset
中的.language()
方法均为兼容django-hvad
的写法,未使用django-hvad
可删除lang = indexes.CharField(model_attr='language_code')
,将.language()
替换为.all()
即可。
在templates目录下新增search目录,目录树如下:
├─templates
│ └─search
│ └─indexes
│ ├─case
│ ├─download
│ ├─flatpage
│ ├─goods
│ ├─news
│ ├─product
│ ├─recruit
│ ├─service
│ ├─solution
│ └─staff
indexes下的文件夹以app小写名称开头,在news
文件夹下新增news_text.txt
文件,内容如下:
在recruit
文件夹下新增recruit_text.txt
文件,内容如下:
然后执行 python manage.py rebuild_index
来建立索引,如已建立索引,可执行 python manage.py update_index
来更新索引。
5、 多语言检索
urls.py
views.py
from haystack.generic_views import SearchView
# 多语言搜索
class LangSearchView(BaseMixin, SearchView):
form_class = LangSearchForm
paginate_by = 10
forms.py
from django.utils.translation import get_language
from haystack.forms import HighlightedSearchForm
class LangSearchForm(HighlightedSearchForm):
def search(self):
sqs = super(LangSearchForm, self).search()
sqs = sqs.filter(lang=get_language())
return sqs
6、 搜索结果高亮
新增页面/templates/search/search.html:
{% extends 'web/base.html' %}
{% load i18n %}
{% load highlight %}
{% block seo %}
{% trans "搜索结果" as default_seo_title %}
{% include "web/seo.html" with default_seo_title=default_seo_title %}
{% endblock %}
{% block css %}
<style>
span.highlighted {
color: #22a7c6;
}
</style>
{% endblock %}
{% block main %}
<div id="news_search">
<div class="container base news_search_container">
<h2>{% trans '搜索' %}</h2>
<form method="get" action="{% url 'web_search' %}" class="news_search_form">
<input class="news_search_input" type="text" name="q"
{% if query %} value="{{ query }}" {% else %} value="" {% endif %}
placeholder="{% trans '搜索你想要的关键字' %}">
<input type="submit" value="" class="back_submit">
<input type="submit" value="" id="search"
style="background:url('/static/web/img/search_white.png') no-repeat center center;width: 60px;height: 40px;">
</form>
<div class="thumb-container">
<span><img src="/static/web/img/location.png" alt=""></span>
<a href="{% url 'web_search' %}">{% trans '关键词搜索' %}</a>
</div>
{% if query %}
<div class="information-list wow fadeInUp ">
<h3>{% trans '搜索结果:' %}</h3>
{% for result in object_list %}
{% if result.model_name == 'news' %}
<a class="information-item" href="{{ result.object.get_absolute_url }}"
{% if result.object.url %}target="_blank"{% endif %}>
<div class="information-content">
<div class="information_div">
<p class="information-title">{% highlight result.object.title with query %}</p>
<p class="information-date">{{ result.object.publish_time.date }}</p>
</div>
<div class="information-summary-div">
<p class="information-summary">{% highlight result.object.summary with query %}</p>
</divclass>
</div>
</a>
{% elif result.model_name == 'recruit' %}
<a class="information-item" href="{{ result.object.get_absolute_url }}"
{% if result.object.url %}target="_blank" {% endif %}>
<div class="information-content">
<div class="information_div">
<p class="information-title">{% highlight result.object.job with query %}</p>
<p class="information-date">{{ result.object.update_time.date }}</p>
</div>
<div>
<p class="information-summary">{% highlight result.object.place with query %}</p>
</div>
</div>
</a>
{% endif %}
{% empty %}
<p>{% trans '没有找到您想要搜索的结果...' %}</p>
{% endfor %}
</div>
{% include 'web/pagination.html' %}
{% else %}
{# Show some example queries to run, maybe query syntax, something else? #}
{% endif %}
</div>
</div>
{% endblock %}
使用highlight
标签+自定义的span.highlighted css
可以让搜索结果高亮,但SearchForm
必须要继承HighlightedSearchForm
。{{ result.model_name }}
可以获取到当前结果对应的模型名,据此可对不同的搜索结果做出不同的展示效果。{{ result.object }}
既为原始对象,可据此获取到原始对象的各个属性。若想可以按模型检索请使用ModelSearchForm
或HighlightedModelSearchForm
,检索发起的form
不宜再自定义,应使用{{ form.as_table }}
。