Lucene4.0过滤器的实现(含3.6版本)

来源:互联网 发布:梦里花落知多少 读后感 编辑:程序博客网 时间:2024/06/08 23:27

一:Lucene3.6过滤器的实现

以空间搜索为例,以下是代码:

 public static final class DistanceFilter extends Filter    {                private static final long serialVersionUID = 1L;        private float Radius;        private double x;        private double y;        public DistanceFilter(String location, float radius)        {            this.Radius = radius;            String[] parts = location.split(",");            this.x = Double.valueOf(parts[0]);            this.y = Double.valueOf(parts[1]);        }        private static double rad(double d)        {            return d * Math.PI / 180.0;        }        public DocIdSet getDocIdSet(IndexReader reader) throws IOException        {            OpenBitSet result = new OpenBitSet(reader.maxDoc());            TermDocs td = reader.termDocs(new Term("type", "restaurant"));            double destination_x;            double destination_y;            while (td.next())            {                Document doc = reader.document(td.doc());                String field = doc.get("location");                String[] loca = field.split(",");                destination_x = Double.valueOf(loca[0]);                destination_y = Double.valueOf(loca[1]);                if (getDistance(x, y, destination_x, destination_y, Radius))                {                    result.set((long) td.doc());                }            }            return result;        } public boolean getDistance(double x, double y, double dx, double dy, float radius)        {            double radLat1 = rad(x);            double radLat2 = rad(y);            double wadLat1 = rad(dx);            double wadLat2 = rad(dy);            double a = radLat1 - wadLat1;            double b = radLat2 - wadLat2;            double s = 2 * Math.asin(Math.sqrt(Math.pow(Math.sin(a / 2), 2) + Math.cos(radLat1) * Math.cos(wadLat1)                    * Math.pow(Math.sin(b / 2), 2)));            s = s * EARTH_RADIUS;            if (s > radius)                return false;            return true;        }


 

在Lucene3.6中TermDoc这个类非常有用,可以直接从IndexReader中读取有关term的所有信息,过滤就是从TermDoc中展开。

方法getDocIdSet(IndexReader reader)是从filter中继承的必须实现的方法,返回值是DocIdSet类型的,就是说只要条件符合,result的set方法就将这篇文档标记为需要返回的类型,未被标记的就不返回,从而实现过滤功能。



二:Luene4.0过滤器的实现

在Lucene4.0中,TermDoc这个类被删除了,但是提供了其他的类来代替,具体来讲是用TermsEnum/FieldsEnum/DocsEnum这几个类来代替TermDoc/TermEnum/TermPosition,所以,为了实现过滤要稍微麻烦一点,以下是代码:

 public DocIdSet getDocIdSet(AtomicReaderContext context, Bits livedocs) throws IOException        {            OpenBitSet result = new OpenBitSet(context.reader().maxDoc());            double destination_x;            double destination_y;            int document;            BytesRef term=new BytesRef("restaurant");            DocsEnum docEnum = context.reader().termDocsEnum(livedocs,"type",term,false);            while((document=docEnum.nextDoc())!=DocsEnum.NO_MORE_DOCS){                Document doc = context.reader().document(document);                String xfield = doc.get("X");                String yfield=doc.get("Y");                destination_x = Double.valueOf(xfield);                destination_y = Double.valueOf(yfield);                if (getDistance(x, y, destination_x, destination_y, Radius))                {                    result.set(docEnum.docID());                }            }            return result;        }

该处省去了跟Lucene3.6中相同的代码,看起来4.0中实现过滤要比3.6中复杂一些。

首先是IndexReader被替换成了AtomicReaderContext,TermDoc被替换成了DocsEnum,具体实现详见代码。

原创粉丝点击