12/19 编程总结:向网页机器人通话并返回它说的话

来源:互联网 发布:手机丢了 知乎 编辑:程序博客网 时间:2024/05/21 08:55

总流程:

1. 获得你要发送的文字
2. 使用post方式发送,接收返回html页面
3. 使用java正则表达式处理返回的html页面
4. 提取出机器人的回应文字

重点:

post方式发送和接收

使用org.apache.http包
HttpClient httpClient = null;          HttpPost httpPost = null;          String result = null;          try{              httpClient = new DefaultHttpClient();
 //上一句在编程时会出现过时提示;可用下面的
//CloseableHttpClient httpClient = HttpClients.createDefault();
//但是下面的用不了,cookie报错无法解决
            HttpClientParams.setCookiePolicy(httpClient.getParams(), CookiePolicy.BROWSER_COMPATIBILITY); //很重要!设置使cookie不再报错            httpPost = new HttpPost(url);                List<NameValuePair> list = new ArrayList<NameValuePair>();              Iterator<Entry<String, String>> iterator = map.entrySet().iterator();              while(iterator.hasNext()){                  Entry<String,String> elem = (Entry<String, String>) iterator.next();                  list.add(new BasicNameValuePair(elem.getKey(),elem.getValue()));              }                          httpPost.setEntity(new UrlEncodedFormEntity(list));               HttpResponse response = httpClient.execute(httpPost);               HttpEntity resEntity = response.getEntity();                result = EntityUtils.toString(resEntity);             }catch(Exception ex){              ex.printStackTrace();          }          return result;  

处理返回的html页面


使用java.util.regex正则表达式

String htmlStr = html;         String textStr = "";          Pattern p_script;          Matcher m_script;          Pattern p_style;          Matcher m_style;          Pattern p_html;          Matcher m_html;          try {              String regEx_script = "<[\\s]*?script[^>]*?>[\\s\\S]*?<[\\s]*?\\/[\\s]*?script[\\s]*?>"; // 定义script的正则表达式{或<script[^>]*?>[\\s\\S]*?<\\/script>              String regEx_style = "<[\\s]*?style[^>]*?>[\\s\\S]*?<[\\s]*?\\/[\\s]*?style[\\s]*?>"; // 定义style的正则表达式{或<style[^>]*?>[\\s\\S]*?<\\/style>              String regEx_html = "<[^>]+>"; // 定义HTML标签的正则表达式              p_script = Pattern.compile(regEx_script, Pattern.CASE_INSENSITIVE);              m_script = p_script.matcher(htmlStr);              htmlStr = m_script.replaceAll(""); // 过滤script标签              p_style = Pattern.compile(regEx_style, Pattern.CASE_INSENSITIVE);              m_style = p_style.matcher(htmlStr);             htmlStr = m_style.replaceAll(""); // 过滤style标签             p_html = Pattern.compile(regEx_html, Pattern.CASE_INSENSITIVE);              m_html = p_html.matcher(htmlStr);              htmlStr = m_html.replaceAll(""); // 过滤html标签              textStr = htmlStr;          } catch (Exception e) {System.err.println("Html2Text: " + e.getMessage()); }          //剔除空格行          textStr=textStr.replaceAll("[ ]+", " ");          textStr=textStr.replaceAll("(?m)^\\s*$(\\n|\\r\\n)", "");         //从剩余文本字符串找到回话        String matStr = "Μitsuku - ([a-zA-Z\\s]{1,})."; Pattern p = Pattern.compile(matStr);Matcher m =  p.matcher(textStr);while(m.find()) {textStr = m.group(1);}        return textStr;// 返回文本字符串 











原创粉丝点击