django-haystack_elasticsearch_pyelasticsearch

来源:互联网 发布:成龙人品知乎 编辑:程序博客网 时间:2024/06/03 15:32

Centos6.2   python 2.6.6   django 1.4    ||   django-haystack v2.0.0-beta   pyelasticsearch(toastdriven) 0.0.5   elasticsearch0.18.5

下载最新的django-haystack,在这里最新的是开发版v2.0.0beta。因为之前的版本都不支持elasticsearch。参照文档http://django-haystack.readthedocs.org/en/latest/tutorial.html

事先需要安装pyelasticsearch elasticsearch

elasticsearchComplete & included with Haystack. Full SearchQuerySet support Automatic query building “More Like This” functionality Term Boosting Faceting Stored (non-indexed) fields Highlighting Spatial search Requires: pyelasticsearch (toastdriven git master) & Elasticsearch 0.17.7+
一、安装elasticsearch

对于elasticsearch,我下的是二进制版本。根据django-stack(v2.0.00-beta,文档也是v2.0.0-beta的)参照文档installing_search_engines里的说明:

Elasticsearch is Java but comes in a pre-packaged form that requires verylittle other than the JRE. It’s also very performant, scales easily and hasan advanced featureset. Haystack requires at least version 0.17.7 (0.18.6 iscurrent as of writing).

我下载的是0.18.5.

运行bin下面的elasticsearch就可以了

./elasticsearch -f

测试elasticsearch是否运行良好,向elasticsearch添加索引,然后查询该索引。参照文档http://www.elasticsearchtutorial.com/elasticsearch-in-5-minutes.html

使用http://localhost:9200访问,可以看到类似的内容:

{  "ok" : true,  "name" : "Midas",  "version" : {    "number" : "0.18.5",    "snapshot_build" : false  },  "tagline" : "You Know, for Search",  "cover" : "DON'T PANIC",  "quote" : {    "book" : "The Hitchhiker's Guide to the Galaxy",    "chapter" : "Chapter 12",    "text1" : "If there's anything bigger than my ego around, I want it caught and shot now."  }}

使用curl插入索引

curl -XPUT 'http://localhost:9200/blog/user/dilbert' -d '{ "name" : "Dilbert Brown" }'curl -XPUT 'http://localhost:9200/blog/post/1' -d '{    "user": "dilbert",    "postDate": "2011-12-15",    "body": "Search is hard. Search should be easy." ,    "title": "On search"}'curl -XPUT 'http://localhost:9200/blog/post/2' -d '{    "user": "dilbert",    "postDate": "2011-12-12",    "body": "Distribution is hard. Distribution should be easy." ,    "title": "On distributed search"}'curl -XPUT 'http://localhost:9200/blog/post/3' -d '{    "user": "dilbert",    "postDate": "2011-12-10",    "body": "Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat" ,    "title": "Lorem ipsum"}'

使用curl进行检索,检索也可以使用浏览器

curl 'http://localhost:9200/blog/post/_search?q=user:dilbert&pretty=true'

{  "took" : 96,  "timed_out" : false,  "_shards" : {    "total" : 5,    "successful" : 5,    "failed" : 0  },  "hits" : {    "total" : 3,    "max_score" : 1.0,    "hits" : [ {      "_index" : "blog",      "_type" : "post",      "_id" : "1",      "_score" : 1.0, "_source" : {    "user": "dilbert",    "postDate": "2011-12-15",    "body": "Search is hard. Search should be easy." ,    "title": "On search"}    }, {      "_index" : "blog",      "_type" : "post",      "_id" : "2",      "_score" : 0.30685282, "_source" : {    "user": "dilbert",    "postDate": "2011-12-12",    "body": "Distribution is hard. Distribution should be easy." ,    "title": "On distributed search"}    }, {      "_index" : "blog",      "_type" : "post",      "_id" : "3",      "_score" : 0.30685282, "_source" : {    "user": "dilbert",    "postDate": "2011-12-10",    "body": "Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat" ,    "title": "Lorem ipsum"}    } ]  }}
可以看出elasticsearch工作正常。

也可以使用head插件,运行bin下面的plugin来安装插件,参照文档http://mobz.github.com/elasticsearch-head/

./plugin -install mobz/elasticsearch-head

然后使用http://localhost:9200/_plugin/head/访问,可以看到刚才插入的索引




二、安装pyelasticsearch

参照文档http://django-haystack.readthedocs.org/en/latest/installing_search_engines.html

You’ll also need an Elasticsearch binding, pyelasticsearch (NOTpyes).The unofficial, Haystack-compatiblepyelasticsearch package,hosted onGitHub, is the best version to use. Placepyelasticsearch.pysomewhere on yourPYTHONPATH (usuallypython setup.pyinstall).

我起初是使用

git clone https://github.com/rhec/pyelasticsearch
进行下载,安装了好像是0.0.6版本。结果出现错误

`from_python` and `to_python` methods don't exist in PyElasticSearch

查找解决方法
https://github.com/toastdriven/django-haystack/issues/514
https://github.com/toastdriven/pyelasticsearch/blob/master/pyelasticsearch.py#L424-469

手动添加from_python、to_python
在运行python manage.py rebuild_index(见三6)时,出现

    self.conn.bulk_index(self.index_name, 'modelresult', prepped_docs, id_field=ID)AttributeError: 'ElasticSearch' object has no attribute 'bulk_index'
发现pyelasticsearch.py中没有bulk_index,原因是使用的版本不对。

应该下载toastdriven的版本(https://github.com/toastdriven/pyelasticsearch),里面有很多branch,我不记得我选的是master还是bulk_index了,以zip包的形式下载,含有from_python、to_python、bulk_index方法,安装后发现是0.0.5的版本。工作正常。
三、安装并使用django-haystack

下载v2.0.0-beta,用python manage.py install安装,根据http://django-haystack.readthedocs.org/en/latest/tutorial.html进行配置,该文档创建一个新的model,并对将该model的数据创建为elasticsearch的索引,然后进行检索。

我使用已有的django项目,主要步骤:

1.添加Note模块

2.修改settings.py文件与elasticsearch进行连接

3.创建search_indexes.py使之能将Note中的数据创建为elasticsearch的索引

4.配置视图与url,修改urls.py

5.创建模板文件

6.生成索引

7.检索

==

1.添加Note模块

from django.db import modelsfrom django.contrib.auth.models import Userclass Note(models.Model):    user = models.ForeignKey(User)    pub_date = models.DateTimeField()    title = models.CharField(max_length=200)    body = models.TextField()    def __unicode__(self):        return self.title
2.修改settings.py文件与elasticsearch进行连接

安装haystack

INSTALLED_APPS = (    'django.contrib.auth',    'django.contrib.contenttypes',    'django.contrib.sessions',    'django.contrib.sites',    'django.contrib.messages',    'django.contrib.staticfiles',    'haystack',    'tjob',    # Uncomment the next line to enable the admin:    'django.contrib.admin',    # Uncomment the next line to enable admin documentation:    # 'django.contrib.admindocs',)
连接ealsticsearch

HAYSTACK_CONNECTIONS = {    'default': {        'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',        'URL': 'http://127.0.0.1:9200/',        'INDEX_NAME': 'haystack',    },}

使用http://127.0.0.1:9200访问

3.创建search_indexes.py使之能将Note中的数据创建为elasticsearch的索引

在应用程序目录下创建search_indexs.py

我的项目tdjproj,应用程序tjob:tdjproj/tjob/search_indexef;glkf;gl;lg;dl;dlf;dlf;dlf;s.py

import datetimefrom haystack import indexesfrom tjob.models import Noteclass NoteIndex(indexes.SearchIndex, indexes.Indexable):    text = indexes.CharField(document=True, use_template=True)    author = indexes.CharField(model_attr='user')    pub_date = indexes.DateTimeField(model_attr='pub_date')    def get_model(self):        return Note    def index_queryset(self):        """Used when the entire index for model is updated."""        return self.get_model().objects.filter(pub_date__lte=datetime.datetime.now())

因为上面text域的use_template=True,所以需要创建数据模板,在模板目录下创建note_text.txt文件。

文件位置:

templates/search/indexes/tjob/note_text.txt
tjob为我的应用程序,内容为:

{{ object.title }}{{ object.user.get_full_name }}{{ object.body }}
4.配置视图与url,修改urls.py

    url(r'^search/', include('haystack.urls')),

5.创建模板文件

文件位置:

templates/search/search.html

{% extends 'base.html' %}{% block content %}    <h2>Search</h2>    <form method="get" action=".">        <table>            {{ form.as_table }}            <tr>                <td> </td>                <td>                    <input type="submit" value="Search">                </td>            </tr>        </table>        {% if query %}            <h3>Results</h3>            {% for result in page.object_list %}                <p>                    <a href="{{ result.object.get_absolute_url }}">{{ result.object.title }}</a>                </p>            {% empty %}                <p>No results found.</p>            {% endfor %}            {% if page.has_previous or page.has_next %}                <div>                    {% if page.has_previous %}<a href="?q={{ query }}&amp;page={{ page.previous_page_number }}">{% endif %}&laquo; Previous{% if page.has_previous %}</a>{% endif %}                    |                    {% if page.has_next %}<a href="?q={{ query }}&amp;page={{ page.next_page_number }}">{% endif %}Next &raquo;{% if page.has_next %}</a>{% endif %}                </div>            {% endif %}        {% else %}            {# Show some example queries to run, maybe query syntax, something else? #}        {% endif %}    </form>{% endblock %}

使用http://localhost/search/结果为:

我这里使用的是apache端口是默认端口80,在点击搜索后,django-hack与elasticsearch连接使用的http连接是上面设置的9200端口,如果要修改这个端口,需要修改elasticsearch的http端口

6.生成索引
首先使用python manage.py syncdb创建Note表,然后使用数据库API或者django的admin管理工具添加数据。最后使用python manage.py rebuild_index创建索引。

7.检索

在浏览器中使用

http://localhost:9200/haystack/modelresult/_search?q=author:djroot&pretty=true
访问,结果:

{  "took" : 11,  "timed_out" : false,  "_shards" : {    "total" : 5,    "successful" : 5,    "failed" : 0  },  "hits" : {    "total" : 3,    "max_score" : 0.30685282,    "hits" : [ {      "_index" : "haystack",      "_type" : "modelresult",      "_id" : "tjob.note.3",      "_score" : 0.30685282, "_source" : {"django_id": "3", "author": "djroot", "text": "forsearch\n\nthis is the body of for_search.\n", "django_ct": "tjob.note", "pub_date": "2012-08-05T14:05:00", "id": "tjob.note.3"}    }, {      "_index" : "haystack",      "_type" : "modelresult",      "_id" : "tjob.note.2",      "_score" : 0.30685282, "_source" : {"django_id": "2", "author": "djroot", "text": "\u4e2d\u56fd\u5965\u8fd0\u4f1a\n\n\u51a0\u519b\u4eba\u65701000\n", "django_ct": "tjob.note", "pub_date": "2012-08-12T05:00:00", "id": "tjob.note.2"}    }, {      "_index" : "haystack",      "_type" : "modelresult",      "_id" : "tjob.note.1",      "_score" : 0.30685282, "_source" : {"django_id": "1", "author": "djroot", "text": "For nothing\n\nWhat's that?\n", "django_ct": "tjob.note", "pub_date": "2012-08-02T14:39:48", "id": "tjob.note.1"}    } ]  }}
使用head插件,结果:


使用django-haystack,结果:



参考:

http://django-haystack.readthedocs.org/en/latest/tutorial.html

http://django-haystack.readthedocs.org/en/latest/installing_search_engines.html

http://www.elasticsearchtutorial.com/elasticsearch-in-5-minutes.html

http://mobz.github.com/elasticsearch-head/

https://github.com/toastdriven/pyelasticsearch

http://stackoverflow.com/questions/11963513/attributeerror-elasticsearch-object-has-no-attribute-bulk-index

http://django-haystack.readthedocs.org/en/latest/migration_from_1_to_2.html