阿里云文本关键词过滤检测
来源:互联网 发布:维生素软件 编辑:程序博客网 时间:2024/06/03 12:53
功能描述
关键词检测是对文本检测的最简单的一种方式,本文章对一段文本进行关键词的匹配,判断是否包含特定关键词。
HTTP 接口描述
场景(scene)中文名 场景(scene) 分类(label) 备注 关键词检测 keyword normal 正常文本 关键词检测 keyword spam 含违规信息 关键词检测 keyword ad 广告 关键词检测 keyword politics 渉政 关键词检测 keyword terrorism 暴恐 关键词检测 keyword porn 色情 关键词检测 keyword contraband 违禁 关键词检测 keyword customized 自定义(比如命中自定义关键词)
1.2 文本关键词检测 (uri: /green/text/scan)
检测文本是否为命中特定关键词。请求body是一个结构体,说明如下:
JSON数组中的每个元素是一个结构体,有如下字段:
返回body中的Data字段是JSON数组,每一个元素有如下字段:
上表results中包含的元素说明:
各语言SDK依赖开发环境准备
请参照 https://develop.aliyun.com/sdk/java?spm=5176.doc28430.2.1.Qp3LL4 准备阿里云SDK依赖环境, 进行开发.
添加依赖包 maven
内容检测API SDK包含阿里云Java SDK公共部分和内容检测部分,公共部分依赖 aliyun-java-sdk-core , 内容检测部分依赖 aliyun-java-sdk-green
Maven Dependencies
<dependency>
<groupId>com.aliyun</groupId>
<artifactId>aliyun-java-sdk-core</artifactId>
<version>3.0.7</version>
</dependency>
<dependency>
<groupId>com.aliyun</groupId>
<artifactId>aliyun-java-sdk-green</artifactId>
<version>3.0.0</version>
</dependency>
代码实现
1.1 config.properties
阿里云accessKeyId、accessKeySecretaccessKeyId=你的accessKeyIdaccessKeySecret=你的accessKeySecret调用阿里绿网服务的regionId,目前服务有两个集群,支持cn-hangzhou、cn-shanghairegionId=cn-hangzhou
1.2 BaseSample.java
import java.io.IOException;
import java.util.Properties;
/**
* Created by liqingfeng.lh on 17/01/12.
public class BaseSample {
protected static String accessKeyId;protected static String accessKeySecret;protected static String regionId;static { Properties properties = new Properties(); try { properties.load(BaseSample.class.getResourceAsStream("config.properties")); accessKeyId = properties.getProperty("accessKeyId"); accessKeySecret = properties.getProperty("accessKeySecret"); regionId = properties.getProperty("regionId"); } catch(IOException e) { e.printStackTrace(); }}protected static String getDomain(){ if("cn-shanghai".equals(regionId)){ return "green.cn-shanghai.aliyuncs.com"; } return "green.cn-hangzhou.aliyuncs.com";}protected static String getEndPointName(){ return regionId;}
}
1.3 TextKeywordScanSample.java
import java.util.ArrayList;
import java.util.Arrays;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.UUID;
import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONArray;
import com.alibaba.fastjson.JSONObject;
import com.aliyuncs.DefaultAcsClient;
import com.aliyuncs.IAcsClient;
import com.aliyuncs.exceptions.ClientException;
import com.aliyuncs.exceptions.ServerException;
import com.aliyuncs.green.model.v20170112.TextScanRequest;
import com.aliyuncs.http.FormatType;
import com.aliyuncs.http.HttpResponse;
import com.aliyuncs.profile.DefaultProfile;
import com.aliyuncs.profile.IClientProfile;
/**
* Created by liqingfeng on 16/3/2.
* 文本检测
*/
public class TextKeywordScanSample extends BaseSample {
public static void main(String[] args) throws Exception { //请替换成你自己的accessKeyId、accessKeySecret IClientProfile profile = DefaultProfile.getProfile(regionId, accessKeyId, accessKeySecret); DefaultProfile.addEndpoint(getEndPointName(), regionId, "Green", getDomain()); IAcsClient client = new DefaultAcsClient(profile); TextScanRequest textScanRequest = new TextScanRequest(); textScanRequest.setAcceptFormat(FormatType.JSON); // 指定api返回格式 textScanRequest.setContentType(FormatType.JSON); textScanRequest.setMethod(com.aliyuncs.http.MethodType.POST); // 指定请求方法 textScanRequest.setEncoding("UTF-8"); textScanRequest.setRegionId(regionId); List<Map<String, Object>> tasks = new ArrayList<Map<String, Object>>(); Map<String, Object> task1 = new LinkedHashMap<String, Object>(); task1.put("dataId", UUID.randomUUID().toString()); task1.put("content", "你想要检测的关键字"); tasks.add(task1); JSONObject data = new JSONObject(); data.put("scenes", Arrays.asList("keyword")); data.put("tasks", tasks); textScanRequest.setContent(data.toJSONString().getBytes("UTF-8"), "UTF-8", FormatType.JSON); /** * 请务必设置超时时间 */ textScanRequest.setConnectTimeout(3000); textScanRequest.setReadTimeout(6000); try { HttpResponse httpResponse = client.doAction(textScanRequest); if(httpResponse.isSuccess()){ JSONObject scrResponse = JSON.parseObject(new String(httpResponse.getContent(), "UTF-8")); System.out.println(JSON.toJSONString(scrResponse, true)); if (200 == scrResponse.getInteger("code")) { JSONArray taskResults = scrResponse.getJSONArray("data"); for (Object taskResult : taskResults) { if(200 == ((JSONObject)taskResult).getInteger("code")){ JSONArray sceneResults = ((JSONObject)taskResult).getJSONArray("results"); for (Object sceneResult : sceneResults) { String scene = ((JSONObject)sceneResult).getString("scene"); String suggestion = ((JSONObject)sceneResult).getString("suggestion"); //根据scene和suggetion做相关的处理 //do something System.out.println("args = [" + scene + "]"); System.out.println("args = [" + suggestion + "]"); } }else{ System.out.println("task process fail:" + ((JSONObject)taskResult).getInteger("code")); } } } else { System.out.println("detect not success. code:" + scrResponse.getInteger("code")); } }else{ System.out.println("response not success. status:" + httpResponse.getStatus()); } } catch (ServerException e) { e.printStackTrace(); } catch (ClientException e) { e.printStackTrace(); } catch (Exception e){ e.printStackTrace(); }}
}
请求body例子:
{
"scenes":[
"keyword"
],
"tasks":[
{
"dataId":"f14cde88-a5d3-44f7-b1a1-80d95b474f99",
"content":"balala1"
},
{
"dataId":"4a57e971-62b1-4a75-9563-cc2703b28244",
"content":"balala2"
}
]
}
响应例子:
{
"msg":"OK",
"code":200,
"data":[
{
"msg":"OK",
"code":200,
"dataId":"f14cde88-a5d3-44f7-b1a1-80d95b474f99",
"results":[
{
"rate":99.91,
"suggestion":"block",
"extras":{
"hitContext":"balala"
},
"label":"porn",
"scene":"keyword"
}
],
"content":"balala",
"taskId":"6cb22909-bb61-4848-8fab-0d4bc8dc4b9c-1494295749828"
},
{
"msg":"OK",
"code":200,
"dataId":"4a57e971-62b1-4a75-9563-cc2703b28244",
"results":[
{
"rate":99.91,
"suggestion":"block",
"extras":{
"hitContext":"balala"
},
"label":"spam",
"scene":"keyword"
}
],
"content":"balala",
"taskId":"1249a8c2-5bb3-477c-98d9-f78dad8ae15b-1494295749828"
}
],
"requestId":"96B928F4-1668-4F43-A3ED-4480ACBDA24C"
}
总结
根据返回根据scene(场景)和suggetion(pass or block)做相关的处理,
- 阿里云文本关键词过滤检测
- C#实现文本关键词过滤
- 求助 VC++ 基于关键词的文本过滤
- 对文本内容进行关键词过滤
- 关键词过滤
- 关键词过滤
- 使用ngx_log_if模块,对阿里云SLB健康检测产生的大量日志进行过滤处理
- php关键词过滤
- php 敏感关键词过滤
- 文本过滤
- 文本过滤
- 文本关键词提取算法
- 统计文本关键词频数
- 文本关键词提取算法
- 文本关键词提取算法
- 文本关键词提取算法
- php木马检测关键词
- 关键词检测总结
- 算法第4版(谢路云译)学习笔记(2) -- Eclipse直接运行算法第4版例子(重定向和读取指定路径文件)
- 微信小程序------获取地理位置
- 魅族手机 INSTALL_FAILED_USER_RESTRICTED
- 关于百度地图获取测试版SHA1和发布版发布版SHA1的问题
- 混淆代码
- 阿里云文本关键词过滤检测
- JS中for循环输出同一变量值的问题
- Redis中keys和hkeys的区别
- Mysql查询数据第一讲之查询数据基本语法
- poj 2159 Ancient Cipher
- P2P 之UDP穿透NAT的原理与实现 whood
- 搜索引擎-solr
- 详解嵌套ListView、ScrollView布局显示不全的问题
- Get和Post的区别