统计某招聘网岗位职责要求关键字权重

来源:互联网 发布:仿真软件proteus出错 编辑:程序博客网 时间:2024/05/04 17:07

继上一篇D3.js - 某招聘网信息统计可视化简单统计了一些职位和公司比例信息,有时候某些简历自动分拣系统等对关键字进行过滤,这时就有必要去统计一下某些职位下关键字的权重。

准备数据

这里取职位描述内容进行关键词提取,一次性从数据库提取匹配记录汇总到一个文本文件,直接对标题进行like匹配,这里以“世上最好的语言PHP”为例

select title, content from job where title like "%php%";

写到一个文件

print(len(res))with open('php.txt', 'wt') as f:    for r in res:        f.write(r['content'])

分词统计

这里使用Python版结巴分词jieba中基于TF-IDF算法的关键词抽取进行权重计算;相对统计数量,权重显得更直观一点。

这里使用自带的demo extract_tags_with_weight.py就行了

取前十关键字权重(34882条PHP招聘信息)

python3 extract_tags_with_weight.py php.txt -k 10 -w 1

这里写图片描述

这里有中文,这些并不怎么需要,过滤掉中文取听top10技术关键字

python3 extract_tags_with_weight.py php.txt -k 999999 -w 1 | grep 'tag: [a-z]' | sed -n '1,10p'

这里把-k参数调最大取全部,同时也要将文本全转成小写或大写防止重复

这里写图片描述

其他职位

权重是全部关键字统计所得,已去除中文部分

php top 50

记录:34882

tag: php                  weight: 0.241653tag: mysql                weight: 0.127151tag: web                  weight: 0.070640tag: linux                weight: 0.062477tag: css                  weight: 0.061224tag: javascript           weight: 0.060166tag: html                 weight: 0.052278tag: ajax                 weight: 0.040207tag: jquery               weight: 0.033997tag: mvc                  weight: 0.024620tag: sql                  weight: 0.024506tag: thinkphp             weight: 0.021231tag: lamp                 weight: 0.020800tag: xml                  weight: 0.020014tag: div                  weight: 0.018878tag: redis                weight: 0.018336tag: apache               weight: 0.017596tag: js                   weight: 0.016749tag: yii                  weight: 0.015015tag: nginx                weight: 0.014331tag: html5                weight: 0.011799tag: nosql                weight: 0.011088tag: xhtml                weight: 0.010838tag: app                  weight: 0.010734tag: http                 weight: 0.010595tag: unix                 weight: 0.010556tag: shell                weight: 0.010319tag: json                 weight: 0.010156tag: memcache             weight: 0.010130tag: oop                  weight: 0.010030tag: lnmp                 weight: 0.009845tag: java                 weight: 0.008475tag: ci                   weight: 0.008348tag: mongodb              weight: 0.007690tag: smarty               weight: 0.007537tag: api                  weight: 0.007303tag: python               weight: 0.007112tag: memcached            weight: 0.007014tag: zend                 weight: 0.006881tag: svn                  weight: 0.006774tag: ecshop               weight: 0.006349tag: bug                  weight: 0.006180tag: git                  weight: 0.005200tag: discuz               weight: 0.005051tag: oracle               weight: 0.004645tag: w3c                  weight: 0.004577tag: c++                  weight: 0.004544tag: cms                  weight: 0.004515tag: framework            weight: 0.004012tag: css3                 weight: 0.003905

python top 50

记录:3229

tag: python               weight: 0.217281tag: web                  weight: 0.107951tag: linux                weight: 0.085868tag: mysql                weight: 0.061585tag: django               weight: 0.060428tag: redis                weight: 0.030252tag: javascript           weight: 0.029916tag: css                  weight: 0.029580tag: html                 weight: 0.028722tag: tornado              weight: 0.028125tag: mongodb              weight: 0.027454tag: flask                weight: 0.024022tag: java                 weight: 0.021075tag: git                  weight: 0.020255tag: http                 weight: 0.019471tag: php                  weight: 0.017905tag: shell                weight: 0.016562tag: sql                  weight: 0.015741tag: nginx                weight: 0.015480tag: c++                  weight: 0.014809tag: api                  weight: 0.013503tag: jquery               weight: 0.012981tag: nosql                weight: 0.012086tag: js                   weight: 0.011713tag: mvc                  weight: 0.011228tag: postgresql           weight: 0.010668tag: app                  weight: 0.010594tag: github               weight: 0.009586tag: tcp                  weight: 0.009437tag: ip                   weight: 0.008691tag: html5                weight: 0.007908tag: unix                 weight: 0.007460tag: ruby                 weight: 0.007199tag: restful              weight: 0.007087tag: apache               weight: 0.006267tag: ajax                 weight: 0.006043tag: py                   weight: 0.006006tag: openstack            weight: 0.005968tag: hadoop               weight: 0.005670tag: mac                  weight: 0.005670tag: memcache             weight: 0.005483tag: com                  weight: 0.005260tag: socket               weight: 0.005036tag: xml                  weight: 0.004961tag: wecash               weight: 0.004887tag: oracle               weight: 0.004513tag: www                  weight: 0.004103tag: svn                  weight: 0.003991tag: experience           weight: 0.003991tag: bug                  weight: 0.003954

java top 50

记录:53599

tag: java                 weight: 0.173577tag: web                  weight: 0.069864tag: spring               weight: 0.069113tag: mysql                weight: 0.065630tag: oracle               weight: 0.048071tag: j2ee                 weight: 0.046025tag: sql                  weight: 0.045887tag: javascript           weight: 0.045806tag: linux                weight: 0.040972tag: hibernate            weight: 0.038958tag: jquery               weight: 0.033819tag: html                 weight: 0.032508tag: tomcat               weight: 0.031317tag: css                  weight: 0.029937tag: struts               weight: 0.028279tag: ajax                 weight: 0.026473tag: jsp                  weight: 0.022588tag: mvc                  weight: 0.020513tag: ibatis               weight: 0.020188tag: mybatis              weight: 0.019560tag: servlet              weight: 0.015777tag: xml                  weight: 0.015701tag: js                   weight: 0.014403tag: redis                weight: 0.013724tag: eclipse              weight: 0.013452tag: weblogic             weight: 0.012535tag: springmvc            weight: 0.011370tag: jboss                weight: 0.011296tag: struts2              weight: 0.011183tag: http                 weight: 0.010566tag: server               weight: 0.010471tag: svn                  weight: 0.009402tag: ssh                  weight: 0.009260tag: nosql                weight: 0.008990tag: sqlserver            weight: 0.008626tag: html5                weight: 0.008624tag: uml                  weight: 0.008324tag: apache               weight: 0.008202tag: maven                weight: 0.008192tag: app                  weight: 0.007383tag: mongodb              weight: 0.007200tag: shell                weight: 0.007161tag: nginx                weight: 0.007028tag: jdbc                 weight: 0.006908tag: unix                 weight: 0.006865tag: json                 weight: 0.006723tag: webservice           weight: 0.006723tag: tcp                  weight: 0.006310tag: websphere            weight: 0.006182tag: io                   weight: 0.005851

android top 50

记录:33434

tag: android              weight: 0.477328tag: java                 weight: 0.092136tag: app                  weight: 0.057003tag: ui                   weight: 0.047871tag: sdk                  weight: 0.039509tag: http                 weight: 0.037914tag: tcp                  weight: 0.025357tag: socket               weight: 0.024767tag: ip                   weight: 0.020937tag: c++                  weight: 0.020032tag: ios                  weight: 0.019612tag: xml                  weight: 0.015327tag: api                  weight: 0.014370tag: json                 weight: 0.013632tag: framework            weight: 0.012522tag: linux                weight: 0.012276tag: eclipse              weight: 0.009863tag: ndk                  weight: 0.009170tag: sqlite               weight: 0.007842tag: html5                weight: 0.007360tag: web                  weight: 0.007267tag: jni                  weight: 0.005627tag: andriod              weight: 0.005464tag: bug                  weight: 0.005176tag: os                   weight: 0.004833tag: svn                  weight: 0.004469tag: git                  weight: 0.004406tag: gui                  weight: 0.003779tag: github               weight: 0.003699tag: javascript           weight: 0.003682tag: https                weight: 0.003599tag: sql                  weight: 0.003401tag: html                 weight: 0.003359tag: com                  weight: 0.003169tag: udp                  weight: 0.003079tag: service              weight: 0.003075tag: mysql                weight: 0.002957tag: wifi                 weight: 0.002829tag: j2me                 weight: 0.002683tag: studio               weight: 0.002631tag: www                  weight: 0.002548tag: mvc                  weight: 0.002517tag: google               weight: 0.002309tag: js                   weight: 0.002305tag: lbs                  weight: 0.002257tag: im                   weight: 0.002233tag: objective            weight: 0.002177tag: server               weight: 0.002125tag: activity             weight: 0.002118tag: iphone               weight: 0.002115

ios top 50

记录:27632

tag: ios                  weight: 0.366072tag: app                  weight: 0.084167tag: objective            weight: 0.071731tag: iphone               weight: 0.069397tag: sdk                  weight: 0.051040tag: xcode                weight: 0.047470tag: c++                  weight: 0.045456tag: ui                   weight: 0.042223tag: http                 weight: 0.039140tag: ipad                 weight: 0.032945tag: android              weight: 0.025376tag: tcp                  weight: 0.021865tag: socket               weight: 0.021082tag: json                 weight: 0.020783tag: xml                  weight: 0.020325tag: store                weight: 0.019605tag: mac                  weight: 0.018228tag: cocoa                weight: 0.016381tag: ip                   weight: 0.015349tag: object               weight: 0.014816tag: os                   weight: 0.012732tag: appstore             weight: 0.011205tag: java                 weight: 0.010568tag: swift                weight: 0.009782tag: web                  weight: 0.008317tag: api                  weight: 0.007768tag: touch                weight: 0.007061tag: interface            weight: 0.006853tag: builder              weight: 0.006794tag: html5                weight: 0.006774tag: sqlite               weight: 0.006387tag: gui                  weight: 0.006187tag: mvc                  weight: 0.005925tag: uikit                weight: 0.005908tag: bug                  weight: 0.005272tag: github               weight: 0.005055tag: udp                  weight: 0.004772tag: instruments          weight: 0.004752tag: core                 weight: 0.004327tag: ue                   weight: 0.004152tag: git                  weight: 0.004057tag: framework            weight: 0.003807tag: macos                weight: 0.003582tag: service              weight: 0.003483tag: runtime              weight: 0.003233tag: javascript           weight: 0.003037tag: com                  weight: 0.003025tag: im                   weight: 0.002896tag: apple                weight: 0.002863tag: oop                  weight: 0.002642
0 0
原创粉丝点击