mongodb全文搜索解决方案(lucene+IKAnalyzer)

mongodb 解决全文搜索是个不小的问题

可以用正则匹配但是效率很低往往到大数据量的搜索的时候就会出现查询超时等现象

当然也可以用官方的做法(在mongodb的文档类型中加字段,存分词结果,

然后从该字段中匹配) 但是我尝试了效率比原先的好像还要低

www.zzzyk.com

http://www.oschina.net/question/200745_61968

后来我尝试了 lucene+IKAnalyzer 发现效率有所提升啊

原理:lucene 把大文本的数据利用分词器在新建的索引文件中建立索引

取数据的时候从索引文件中取

取出mongodb 中的数据进行索引的创建

package sample3;

import java.io.File;

import org.apache.lucene.易做图ysis.Analyzer;

06 www.zzzyk.com

import org.apache.lucene.document.Document;

import org.apache.lucene.document.Field;

import org.apache.lucene.index.IndexWriter;

import org.apache.lucene.index.IndexWriter.MaxFieldLength;

import org.apache.lucene.store.Directory;

import org.apache.lucene.store.FSDirectory;

import org.wltea.易做图yzer.lucene.IKAnalyzer;

import com.mongodb.DB;

import com.mongodb.DBCollection;

import com.mongodb.DBCursor;

import com.mongodb.Mongo;

/**

* 创建索引

* <a href="http://my.oschina.net/arthor" class="referer" target="_blank">@author</a> zhanghaijun www.zzzyk.com

public class Demo1 {

public static void main(String[] args) throws Exception {

//先在数据库中拿到要创建索引的数据

Mongo mongo = new Mongo();

DB db = mongo.getDB("zhang");

DBCollection msg = db.getCollection("test3");

DBCursor cursor = msg.find();

//是否重新创建索引文件，false：在原有的基础上追加

boolean create = true;

//创建索引

Directory directory = FSDirectory.open(new File("E:\\lucene\\index"));

35 www.zzzyk.com

Analyzer 易做图yzer = new IKAnalyzer();//IK中文分词器

IndexWriter indexWriter = new IndexWriter(directory,易做图yzer,MaxFieldLength.LIMITED);

boolean exist = cursor.hasNext();

while(exist){

//System.out.println(cursor.next().get("text").toString());

Document doc = new Document();

Field fieldText = new Field("text",cursor.next().get("text").toString(),Field.Store.YES,

Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS);

doc.add(fieldText);

indexWriter.addDocument(doc);

exist = cursor.hasNext();

}

cursor = null;

//optimize()方法是对索引进行优化

indexWriter.optimize();

//最后关闭索引

indexWriter.close();

52 www.zzzyk.com

}

数据的查找(直接从索引文件中查找)

package sample3;

import java.io.File;

import org.apache.lucene.document.Document;

import org.apache.lucene.index.IndexReader;

import org.apache.lucene.search.IndexSearcher;

import org.apache.lucene.search.Query;

import org.apache.lucene.search.ScoreDoc;

import org.apache.lucene.search.TopDocs;

import org.apache.lucene.store.FSDirectory;

import org.wltea.易做图yzer.lucene.IKAnalyzer;

import org.wltea.易做图yzer.lucene.IKQueryParser;

import org.wltea.易做图yzer.lucene.IKSimilarity;

15 www.zzzyk.com

/**

* 查找索引

public class Demo2 {

public static void main(String[] args) throws Exception {

// onlysearching, so read-only=true

long starttime = System.currentTimeMillis();

IndexReader reader =IndexReader.open(FSDirectory.open(new File("E:\\lucene\\index")),true);

IndexSearcher searcher = new IndexSearcher(reader);

searcher.setSimilarity(new IKSimilarity()); //在索引器中使用IKSimilarity相似度评估器

//String[] keys = {"4","testtest"}; //关键字数组

//String[] fields = {"id","title"}; //搜索的字段

//BooleanClause.Occur[] flags = {BooleanClause.Occur.MUST,BooleanClause.Occur.MUST}; //BooleanClause.Occur[]数组,它表示多个条件之间的关系 www.zzzyk.com

//使用 IKQueryParser类提供的parseMultiField方法构建多字段多条件查询

//Query query = IKQueryParser.parseMultiField(fields,keys, flags); //IKQueryParser多个字段搜索

Query query =IKQueryParser.parse("text","上海人"); //IK搜索单个字段

IKAnalyzer 易做图yzer = new IKAnalyzer();

//Query query =MultiFieldQueryParser.parse(Version.LUCENE_CURRENT, keys, fields, flags,易做图yzer); //用MultiFieldQueryParser得到query对象

上一个：数据库基础之“索引”
下一个：dblink连接数据库clob字段的异常处理

更多图片编程知识:

更多mongodb疑问解答：: 【急】MongoDB写入错误~~~~; Mongodb NOSql 数据库问题，是否可以插入带接口的类; java操作mongodb; Spring data MongoDB 更新整个内嵌文档时报错？？？？？？？; node.js连接mongodb更新; MongoDB Java驱动 WriteConcern.SAFE非常浪费资源; 求科普，hibernate怎样操作mongodb？; 问一下mongodb怎么用hibernate整合; mongodb查询的数据过多; 使用JAVA创建MongoDB的问题; Mongodb事务管理问题？; mongodb利用java进行模糊查询; spring 抽象类　注入值为空(spring3+mongodb+morphia); 【急】MongoDB写入错误~~~~; Mongodb NOSql 数据库问题，是否可以插入带接口的类