向日葵Solros 2016-11-29
Solr需要流式导出海量数据,导出数据是基于流式的,当服务端match到第一条数据之后就会向客户端flush出数据。
需要导出的列需要将schema中field元素的docvalue设置为true,并且在solrconfig.xml中配置
<requestHandler name="/export" class="solr.SearchHandler"> <lst name="invariants"> <str name="rq">{!xport}</str> <str name="wt">xsort</str> <str name="distrib">false</str> </lst> <arr name="components"> <str>query</str> </arr> </requestHandler>
客户端的查询代码如下:
final String[] fl = StringUtils.split(fields, ","); SolrClient client = new HttpSolrClient(url); query.setDistrib(false); query.setFields(fields); query.setRows(9999999); final PrintWriter writer = new PrintWriter(new OutputStreamWriter( FileUtils.openOutputStream(outfile), Charset.forName("utf8"))); for (String f : fl) { writer.print(f); writer.print(","); } final AtomicInteger count = new AtomicInteger(0); QueryResponse result = client.queryAndStreamResponse(query, new StreamingResponseCallback() { @Override public void streamSolrDocument(SolrDocument doc) { // process doc } public void streamDocListInfo(long numFound, long start, Float maxScore) { // writer.println("numFound:" + numFound); } }); writer.close(); System.out.println("numFound:" + result.getResults().getNumFound()); client.close();
solr服务端相关的代码:
QP:
ExportQParserPlugin 在export handler中使用QP
查询结果流式排序输出:
SortingResponseWriter