First Demo of Xspider Mycode
来源:互联网 发布:最新能用的免费域名 编辑:程序博客网 时间:2024/06/05 09:05
News of This Demo
2016-11-07, First demo released, mainly included below usages:
1. search a *.tar.gz package from a open source page and download it.
2. detect a given page if it's a free software page.
3. a prototype of expert system of free software.
Share the clisp code with your guys:
xspider-mycode-client.lisp:
;;;;;; the client of xspider mycode;;;(defstruct (s-url) "URL = Uniform Resource Locator protocol://user#password@host:port/path" (protocol nil :type symbol) (user "" :type simple-string) (pass "" :type simple-string) (host "" :type simple-string) (port 0 :type fixnum) (path "" :type simple-string))(defun parse-url (url) (when (< (length url) (length "http://")) (return-from parse-url nil)) ;(format t "parse url: '~a' ~%" url) (setf url-info (make-s-url)) (when (string= #1="http://" url :end2 (min (length url) #2=#.(length #1#))) ;(format t "http protocol ~%"); (setf (s-url-protocol url-info) :http)) (setf host-port-end (position #\/ url :start #2#)) (setf port-start (position #\: url :start #2# :end host-port-end)) (setf (s-url-host url-info)(subseq url #2# (or port-start host-port-end))) (if port-start (setf (s-url-port url-info) (parse-integer url :start (1+ port-start) :end host-port-end))) (setf (s-url-path url-info)(if host-port-end (subseq url host-port-end) "/")) (format t "~a ~%" url-info) url-info)(defun is_path (path) (and (string= #1="/" path:end2 (min (length path) #2=#.(length #1#)))))(defun url-string-p (string) (let ((non-alpha (position-if-not #'alpha-char-p string))host) (and non-alpha (char= #\: (char string non-alpha)) (setq host (nstring-upcase (subseq string 0 non-alpha))) (or (string= host "HTTP")))))(defconstant *http-port* 80)(defun open-http2 (url &key (if-does-not-exist :error) ((:log *http-log-stream*) *http-log-stream*)) (setf s 0) (when (string-equal "http://" url :end2 (length "http://")) (setf s (length "http://"))) (when (string-equal "https://" url :end2 (length "https://")) (setf s (length "https://"))) ;(unless (string-equal #1="http://" url ; :end2 (min (length url) #2=#.(length #1#))) ;(format t "~S: ~S is not an HTTP URL" 'open-http url)) (format *http-log-stream* "~&;; connecting to ~S..." url) (force-output *http-log-stream*) (http-proxy) (let* ((host-port-end (position #\/ url :start s)) (port-start (position #\: url :start s :end host-port-end)) (url-host (subseq url s (or port-start host-port-end))) (host (if *http-proxy* (second *http-proxy*) url-host)) (url-port (if port-start (parse-integer url :start (1+ port-start) :end host-port-end) *http-port*)) (port (if *http-proxy* (third *http-proxy*) url-port)) (path (if *http-proxy* url (if host-port-end (subseq url host-port-end) "/"))) (sock (ignore-errors(socket:socket-connect port host :external-format :dos))) ;(sock (handler-bind ((error (lambda (c) ;(unless (eq if-does-not-exist :error) ;(format *http-log-stream* ;"cannot connect to ~S:~D: ~A~%" ;host port c) ;(return-from open-http2 nil))))) ;(socket:socket-connect port host :external-format :dos))) status code content-length) (unless sock (return-from open-http2)) (format *http-log-stream* "connected...") (force-output *http-log-stream*) ;; http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.23 ;; http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.43 (format sock "GET ~A HTTP/1.0~%User-agent: ~A ~A~%Host: ~A:~D~%" path (lisp-implementation-type) (lisp-implementation-version) url-host url-port) #+unicode ; base64 requires unicode for some weird infrastructure reasons (when (first *http-proxy*) ; auth: http://www.ietf.org/rfc/rfc1945.txt ;; http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.34 (format sock "Proxy-Authorization: Basic ~A~%" (convert-string-from-bytes (convert-string-to-bytes (first *http-proxy*) *http-encoding*) charset:base64))) ;; http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.1 ;; http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.10 (format sock "Accept: */*~%Connection: close~2%") ; finish request (write-string (setq status (read-line sock))) (force-output) (let* ((pos1 (position #\Space status)) (pos2 (position #\Space status :start (1+ pos1)))) (setq code (parse-integer status :start pos1 :end pos2))) ;; dump headers ;(loop :for line = (read-line sock nil nil) :while line ; :do (format *http-log-stream* "~&;; ~S~%" line)) ;(case if-does-not-exist ; (:error (error (TEXT "~S: error ~D: ~S") 'open-http code status)) ; (t (close sock) ; (return-from open-http nil)))) (if (and (< code 400) (>= code 300)) ; redirection (loop :for res = (read-line sock) :until (string-equal #3="Location: " res :end2 (min (length res) #4=#.(length #3#))) :finally (let ((new-url (subseq res #4#))) (format *http-log-stream* " --> ~S~%" new-url) (unless (string= url new-url) ;; bug to be fixed (when (string= "https" new-url :end2 5) (setf new-url url))))) ;(unless (string-equal #1# new-url ;:end2 (min (length new-url) #2#)) ;(setq new-url (string-concat #1# host new-url))) ;(return-from open-http2 (open-http2 new-url)))) ;; drop response headers (loop :for line = (read-line sock) :while (plusp (length line)) :do (when (string-equal #5="Content-Length: " line :end2 (min (length line) #6=#.(length #5#))) (format *http-log-stream* "...~:D bytes" (setq content-length (parse-integer line :start #6#)))) :finally (terpri))) (when (>= code 400) (close sock) (setf sock nil) (setf content-length 0)) (list sock content-length)))(defmacro with-http-input2 ((var url) &body body) (if (symbolp var) `(with-open-stream (,var (first (open-http2 ,url))) ,@body) (multiple-value-bind (body-rest declarations) (SYSTEM::PARSE-BODY body) `(multiple-value-bind ,var (first (open-http2 ,url)) (DECLARE (READ-ONLY ,@var) ,@declarations) (UNWIND-PROTECT (MULTIPLE-VALUE-PROG1 (PROGN ,@body-rest) (when ,(first var) (CLOSE ,(first var))))(when ,(first var) (CLOSE ,(first var) :ABORT T)))))))(defun download-file (url file) (setq file-info (open-http2 url)) (setf socket (first file-info)) (setf content-length (second file-info)) (format t ";; file: '~a', size: ~abytes(~fkb)(~fmb) ~%" file content-length (/ content-length 1024.0) (/ content-length (* 1024.0 1024))) (format t ";; downloading '~a'... please be patient to wait...~%" file) (let ((data (make-array content-length :element-type '(unsigned-byte 8)))) (setf (stream-element-type socket) '(unsigned-byte 8)) (ext:read-byte-sequence data socket) (with-open-file (out file :direction :output :element-type '(unsigned-byte 8)) (ext:write-byte-sequence data out))) (format t "~a downloaded, done.~%" file));;;; load the page;; search the open code package;; download the package;;(defun search-package-and-download (home-page &optional (depth 3)) (let ((r)(info)(socket)(content-length)(url-info)(new-depth)(url-s)(url-e)) (format t "depth ~a ~%" depth) (when (<= depth 0) (format t "the max depth to cease the search ~%") (return-from search-package-and-download)) (setq info (open-http2 home-page)) (setf socket (first info)) (setf content-length (second info)) (setf url-info (parse-url home-page)) (when (null socket) (return)) (format t "~a opened. ~%" home-page) (format t "search the code package... ~%") ;(with-open-file ;(out (incf file-name-id) :direction :output) (loop :for line = (read-line socket nil nil) :while line :do ;(write-line line out) (format t "~a ~%len:~a ~%" line (length line)) (setf r (regexp:match "href=\"" line :end (length line))) (setf url-s -1) (setf url-e -1) (unless (null r) (when (> (length line) (regexp:match-end r)) (setf url-s (regexp:match-end r)) (format t "url-s:~a ~%" url-s))) (when (>= url-s 0) (format t "s: ~a ~%" url-s) (setf a (position #\" line :start url-s)) (unless (null a) (setf url-e a))) ;(setf r (regexp:match "\"" line :start url-s)) ;(unless (null r) ;(setf url-e (regexp:match-start r)) ;(format t "e: ~a ~%" url-e))) (when (> url-e url-s) (format t "e: ~a ~%" url-e) (setq url (subseq line url-s url-e)) (format t "~a s:~a e:~a ~%" url url-s url-e) (when (and (is_path url) (> (length url) 1)) (format t "~a is a path of the host. ~%" url) (setf url (string-concat "http://" (s-url-host url-info) url))) (when (url-string-p url) (format t "~a ~%" url) (setf r (regexp:match ".tar.gz" url)) (when (null r)(setf new-depth (- depth 1))(search-package-and-download url new-depth)(format t "a search was done.~%")) (unless (null r)(when (= (length url) (regexp:match-end r)) (setf file-s (+ 1 (position #\/ url :from-end (length url)))) (setf file (subseq url file-s)) (download-file url file))))))));(search-package-and-download; "http://packages.ubuntu.com/precise/clisp");(search-package-and-download; "http://packages.ubuntu.com/search?keywords=clisp");(search-package-and-download; "http://packages.ubuntu.com/precise/clisp");(defun split-seq (seq pred &key (start 0) end key strict) "Return a list of subseq's of SEQ, split on predicate PRED.Start from START, end with END. If STRICT is non-nil, collectzero-length subsequences too. (split-seq SEQ PRED &key (start 0) end key strict)" (declare (sequence seq) (type (function (t t) t) pred) (fixnum start)) (loop :for st0 = (if strict start (position-if-not pred seq :start start :end end :key key)) :then (if strict (if st1 (1+ st1)) (position-if-not pred seq :start (or st1 st0) :end end :key key)) :with st1 = 0 :while (and st0 st1) :do (setq st1 (position-if pred seq :start st0 :end end :key key)) :collect (subseq seq st0 (or st1 end))))(defun split-string (str chars &rest opts) "Split the string on chars." (declare (string str) (sequence chars)) (apply #'split-seq str (lambda (ch) (declare (character ch)) (find ch chars)) opts))(defun search-sentences (line) (let ((s)(e)(len (length line))(lst nil)) (loop :for i :from 0 :below len :do (setf s nil) (setf e nil) (cond ((char= #\" (aref line i)) (setf s (1+ i)) (setf e (position #\" line :start s)) (when e ;(format t "~a ~%" (subseq line s e)) )) ((char= #\> (aref line i)) (setf s (1+ i)) (setf e (position #\< line :start s)) (when e ;(format t "~a ~%" (subseq line s e)) ))) :when (and s e) :collect (subseq line s e))))(defun search-words (line) ;(format t "search words in '~a' ~%" line) (let ((sentences (search-sentences line))(words)(r)) (when sentences (dolist (s sentences)(when (> (length s) 0) (setf r (split-string s '(#\ ))) (when r (setf words (append words r)))))) words)) (defun search-keywords (line) (let ((words (search-words line))(keywords)) (when words ;(format t "words: ~a ~%" words) (dolist (w words)(when (> (length w) 0) (setf w (string-downcase w)) ;(format t "~a ~%" w) (setf r (gethash w *elements-htb-of-fs*)) (when r ;(format t "found: ~a ~%" w) (setf keywords (append keywords (list w))))))) ;(when keywords ;(format t ";; ~a, count: ~a ~%" keywords (length keywords))) keywords));;;; -get the content in "?" as sentence;; -get the content in >?< as sentence;; -sentences as a list;; -parse the sentence into words list;; -match the words in the elements hashtable of common sense;; -count the matched words of each page;; -build the tree for all pages searched;; (do not search the page has been searched or under being searched);; (a tree to save the valid pages, a hashtable for page matching);; -save the tree into xml file;; -save the result of this detection into a txt file as db;;(defun search-keywords-in-page (home-page) (let ((r)(info)(socket)(content-length)(keyword-cnt 0)) (setq info (open-http2 home-page)) (setf socket (first info)) (setf content-length (second info)) ;(format t ";; socket: ~a, content-length: ~a ~%" socket content-length) (when (null socket) (return-from search-keywords-in-page 0)) (format t ";; ~a opened. ~%" home-page) (format t ";; search keywords... ~%") (loop :for line = (ignore-errors (read-line socket nil nil)) :while line :do ;(format t ";; ~a ~%len:~a ~%" line (length line)) (setf r (search-keywords line)) (when r (setf keyword-cnt (+ keyword-cnt (length r))))) (when (> keyword-cnt 0) (format t ";; count of keywords found: ~a ~%" keyword-cnt)) keyword-cnt))(setf *min-keyword-count-of-fs* 30)(defun crawl-free-software-page (home-page htb tree &optional (depth 2)) (let (r info socket content-length url-info new-depth url-s url-e (p 0.0)) (format t ";; ~a: depth:~a ~%" 'crawl-free-software-page depth) (unless (null (gethash home-page htb)) (format t ";; page '~a' has been searched ~%" home-page) (return-from crawl-free-software-page)) (setf r (search-keywords-in-page home-page)) (when (<= r *min-keyword-count-of-fs*) (return-from crawl-free-software-page)) (push home-page (car tree)) ;(format t ";; ~a ~%" tree) (setf (gethash home-page htb) t) (when (<= depth 0) ;(format t ";; the max depth to cease the search ~%") (return-from crawl-free-software-page)) (setq info (open-http2 home-page)) (setf socket (first info)) (setf content-length (second info)) (setf url-info (parse-url home-page)) (when (null socket) (return-from crawl-free-software-page)) (format t ";; ~a opened. ~%" home-page) (format t ";; crawling... ~%") (loop :for line = (read-line socket nil nil) :while line :do (setf r (regexp:match "href=\"" line :end (length line))) (setf url-s -1) (setf url-e -1) (unless (null r) (when (> (length line) (regexp:match-end r)) (setf url-s (regexp:match-end r)))) (when (>= url-s 0) (setf a (position #\" line :start url-s)) (unless (null a) (setf url-e a))) (when (> url-e url-s) (setq url (subseq line url-s url-e)) (when (and (is_path url) (> (length url) 1)) (format t ";; ~a is a path of the host. ~%" url) (setf url (string-concat "http://" (s-url-host url-info) url))) (when (url-string-p url) (format t ";; ~a ~%" url) (setf r (regexp:match ".tar.gz" url)) (when (null r)(setf new-depth (- depth 1))(push '(nil) (cdr tree))(crawl-free-software-page url htb (cdr tree) new-depth)) (unless (null r)(when (= (length url) (regexp:match-end r)) (setf file-s (+ 1 (position #\/ url :from-end (length url)))) (setf file (subseq url file-s)) (setf p 1)))))))) ;(search-sentences "<h1>Welcome to <a href=\"http://clisp.org\">CLISP</a></h1>");(search-sentences "<a href=\"https://sourceforge.net/projects/clisp\">CLISP</a>");(search-words "<h1>Welcome to <a href=\"http://clisp.org\">CLISP</a></h1>");(search-keywords "<h1>Welcome to <a href=\"http://clisp.org\">CLISP open source</a></h1>")(defun crawl-page (home-page) (let ((htb (make-hash-table :test #'equal))(tree '(nil))) (crawl-free-software-page home-page htb tree 2) (maphash #'(lambda (k v) (format t "~a = ~a ~%" k v)) htb) (format t ";; ~a ~%" tree)))(crawl-page "http://clisp.org/");(crawl-page "http://freshcode.club/");(crawl-page "http://packages.ubuntu.com/precise/clisp");(crawl-free-software-page "http://clisp.org/");(crawl-free-software-page "http://freshcode.club/");;(crawl-free-software-page "http://www.openhub.net/");;(crawl-free-software-page "http://sourceforge.net");;(crawl-free-software-page "http://github.com");(crawl-free-software-page "http://directory.fsf.org/wiki/Main_Page");(crawl-free-software-page "http://packages.ubuntu.com/precise/clisp")
expert-system.lisp:
;;;;;; expert system of xspider mycode;;;;;;; elements of common sense of free software;;;; -create the database (simply a *.txt file);; -read the words from database to a hashtable;; -insert a new word into hashtable and update the database;; -delete a word from hashtable and update the database;;(defmacro string-beg-with (beg strv &optional (lenv `(length ,strv))) "Check whether the string STRV starts with BEG." (if (stringp beg) (let ((len (length beg))) `(and (>= ,lenv ,len) (string-equal ,beg ,strv :end2 ,len))) (with-gensyms ("SBW-" len) `(let ((,len (length ,beg))) (and (>= ,lenv ,len) (string-equal ,beg ,strv :end2 ,len))))))(defun load-elements-of-free-software (file) (with-open-file (in file :direction :input) (loop :for line = (read-line in nil nil) :while line :do ;(format t "~a ~%" line) (unless (string-beg-with ";" line) (setf line (string-downcase line)) (setf (gethash line *elements-htb-of-fs*) t) (unless (gethash line *elements-htb-of-fs*) (format t ";; insert '~a' to hashtable failed~%" line))))))(defun exist-element-of-free-software (key) (gethash key *elements-htb-of-fs*))(defun expert-system-init () (setq *elements-htb-of-fs* (make-hash-table :test #'equal)) (load-elements-of-free-software "elements-of-free-software.txt") (format t ";; size of elements-of-free-software hashtable: ~a ~%" (hash-table-size *elements-htb-of-fs*)) (format t ";; show all elements of this hashtable... ~%") (maphash #'(lambda (k v) (format t "~a = ~a ~%" k v)) *elements-htb-of-fs*))(expert-system-init);;;; forms of homepage of free software project;;;;;; forms of code downloading page of free software project;;;;;; forms of user guide page of free software project;;;;;; evaluate the certain degree of detecting free software page;; -given the P;; -how to update the P to a better cretain degree;; -if always exist a best degree, there already be a common form;; -search the forms, build the P network, calculate the P degree;; -given test cases of pages, verify the degree, improve and loop;;elements-of-free-software.txt:
;;;;;; common words of free software community;;;hackhackerhackinggnulinuxfsffree software foundationfree softwarecommunitylpfleague for programming freedomMozilla Foundation;; free software lincesesFree Documentation LicenseGeneral Public Licensegplgnu gplgnu fdlgplv2gplv3ApacheArtisticBerkeley DatabaseBoost SoftwareBSDCeCILLCryptixEducational CommunityEU DataGridFreeBSDFreetypeimlib2jpeg groupintel acpiiscMozillamplPublic DomainPythonSGIUnicodeUPLUnlicensevimwebmwtfplWxWidgetsX11XFree86zlibzope;;;;;; common words of software program;;;softwarefirmwaredriverslibrarylibrarieslibsystemmoduleframeworkinterfaceapiAPIsosoperation systemkernelgraphical user interfaceguiuser interfaceuiappappsapplicationdatadatabasedbwebcodecodedcodersourcesourcessource codeprogramprogramedprogramsprogrammingcodingdesigndesigneddevelopdevelopeddeveloperdevelopingdevelopmentimplementimplementedimplementsimplementationopen sourceopen-sourceopensourceopen source softwareossprojectprojectsversionnewsbugbugslanguagemanualguideuser guideplatformfaqmailing listpackagenotestooltoolsUtilityUtilitiesdocumentationprogrammermalwaredesktopbrowserubuntuunixweb pageweb serverclientserverpluginpluginsextendextendsfeaturefeaturesembeddedcreatebuilddebugmaintainsoftware engineerArchitecturelayerlayersloadloadedinstructioninstructionsmemorystackmallocheaphoopcallbackramromexecuteexecutesexecutionruncrasherrorerrorstesttestedtestingperformanceunit testingregression testingcommandreleasereleasedfixeddiffpatchLicensealgorithmalgorithmssortsortingData structureaiUser experienceend-userprocessMultiprocessMulti-processMultiprocessingthreadmultithreadmulti-threadMultithreadingtasktasksiotermnialttysocketfile systemstreamprotocolfront-endback-end;; Software standard organizationsw3cietf;; software standardstcp/iptcpipxmlhttphtml;; common words of software projectpre-alpharequirement;design;development;unit testingMilestone versionalphawhite boxwhite-boxblack boxblack-box;featurebeta;releaseissueissuesrtmga;; where and how to share the codesgithubSourceForgeohlohOpen Hubfreecodefree codefreshcodefresh codegitsvncvs;; types of software;applicatinMiddlewaredistributed systemProgramming languageSystem software;firmware;driversProgramming toolCompilerDebuggerInterpreterLinkerIDEoopgccg++makefilemake all;;;;;; common words of programming language;;;syntaxsyntacticsemanticsfunctionlispcommon lispclschemeclispshellsbclcc++c#Objective-Cjavaperlpythonjavascriptrubyluaphpcss;;;;;; common words of data structures;;;hashtablehash tablearraytreelistqueue;;;;;; common words of database language;;;mysqlMariaDBsqlitesqlsql servernosqlMongoDBPostgreSQLBigTable;;;;;; common words of types of package;;;zipgzrartgz;;;;;; common words of great hackers;;;freedomProgramming Freedom
Run This Demo
Step 1: clisp -q
Step 2: (load "expert-system.lisp")
Step 3: (load "xspider-mycode-client.lisp")
Step 4: modify above codes to crawl the free-software-like web pages as you want...
The result shown like below:
...
Enjoy~
I.Mars- Nov. 7 2016
1 0
- First Demo of Xspider Mycode
- First Demo of OGRE
- mycode
- PyQT first demo
- TinyRenderEngine------First Demo
- First Velocity Demo
- First Portal Demo
- My first struts2 demo
- nutz first demo
- my first highchart demo
- Spark First Sample Demo
- Weather ( First Android demo )
- ActiveMQ First Demo
- OpenMP First Demo
- SpringBoot first Demo
- My First SmartDevice Application ---Demo
- 《 Head First 设计模式》Demo
- JFinal First Demo "Hello World"
- android 面试题笔试题总结
- tomcat中javabean的位置
- docker 实例设置自动重启
- Windows 下配置 Common Lisp 环境
- java中值传递的详解
- First Demo of Xspider Mycode
- 《深入理解Java虚拟机》读书笔记5
- Player Settings导出设置
- python数据结构学习笔记-2016-11-07-01-双链表
- AndroidStudio 混淆打包
- CSS入门系列(二)基本选择器&优先级
- Mybatis框架搭建
- Prim与Kruskal求解带权图的最小生成树C/C++
- linux 之 I/O模型和I/O函数