First Demo of Xspider Mycode

来源:互联网 发布:最新能用的免费域名 编辑:程序博客网 时间:2024/06/05 09:05

News of This Demo

2016-11-07, First demo released, mainly included below usages:
1. search a *.tar.gz package from a open source page and download it.
2. detect a given page if it's a free software page.
3. a prototype of expert system of free software.

Share the clisp code with your guys:

xspider-mycode-client.lisp:

;;;;;; the client of xspider mycode;;;(defstruct (s-url)  "URL = Uniform Resource Locator protocol://user#password@host:port/path"  (protocol nil :type symbol)  (user "" :type simple-string)  (pass "" :type simple-string)  (host "" :type simple-string)  (port 0 :type fixnum)  (path "" :type simple-string))(defun parse-url (url)  (when (< (length url) (length "http://")) (return-from parse-url nil))  ;(format t "parse url: '~a' ~%" url)  (setf url-info (make-s-url))  (when (string= #1="http://" url :end2 (min (length url) #2=#.(length #1#)))    ;(format t "http protocol ~%");    (setf (s-url-protocol url-info) :http))  (setf host-port-end (position #\/ url :start #2#))  (setf port-start (position #\: url :start #2# :end host-port-end))  (setf (s-url-host url-info)(subseq url #2# (or port-start host-port-end)))  (if port-start      (setf (s-url-port url-info)    (parse-integer url :start (1+ port-start) :end host-port-end)))    (setf (s-url-path url-info)(if host-port-end    (subseq url host-port-end) "/"))  (format t "~a ~%" url-info)  url-info)(defun is_path (path)  (and (string= #1="/" path:end2 (min (length path) #2=#.(length #1#)))))(defun url-string-p (string)  (let ((non-alpha (position-if-not #'alpha-char-p string))host)    (and     non-alpha     (char= #\: (char string non-alpha))     (setq host (nstring-upcase (subseq string 0 non-alpha)))     (or      (string= host "HTTP")))))(defconstant *http-port* 80)(defun open-http2 (url &key (if-does-not-exist :error)       ((:log *http-log-stream*) *http-log-stream*))  (setf s 0)  (when (string-equal "http://" url :end2 (length "http://"))    (setf s (length "http://")))  (when (string-equal "https://" url :end2 (length "https://"))    (setf s (length "https://")))    ;(unless (string-equal #1="http://" url   ;                     :end2 (min (length url) #2=#.(length #1#)))    ;(format t "~S: ~S is not an HTTP URL" 'open-http url))      (format *http-log-stream* "~&;; connecting to ~S..." url)  (force-output *http-log-stream*)  (http-proxy)  (let* ((host-port-end (position #\/ url :start s))         (port-start (position #\: url :start s :end host-port-end))         (url-host (subseq url s (or port-start host-port-end)))         (host (if *http-proxy* (second *http-proxy*) url-host))         (url-port (if port-start                       (parse-integer url :start (1+ port-start)                                      :end host-port-end)                       *http-port*))         (port (if *http-proxy* (third *http-proxy*) url-port))         (path (if *http-proxy* url (if host-port-end (subseq url host-port-end) "/"))) (sock (ignore-errors(socket:socket-connect port host :external-format :dos)))         ;(sock (handler-bind ((error (lambda (c)                                       ;(unless (eq if-does-not-exist :error)                                         ;(format *http-log-stream*                                                 ;"cannot connect to ~S:~D: ~A~%"                                                 ;host port c)                                         ;(return-from open-http2 nil)))))                 ;(socket:socket-connect port host :external-format :dos)))         status code content-length)    (unless sock (return-from open-http2))    (format *http-log-stream* "connected...") (force-output *http-log-stream*)    ;; http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.23    ;; http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.43    (format sock "GET ~A HTTP/1.0~%User-agent: ~A ~A~%Host: ~A:~D~%" path            (lisp-implementation-type) (lisp-implementation-version)            url-host url-port)    #+unicode ; base64 requires unicode for some weird infrastructure reasons    (when (first *http-proxy*) ; auth: http://www.ietf.org/rfc/rfc1945.txt      ;; http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.34      (format sock "Proxy-Authorization: Basic ~A~%"              (convert-string-from-bytes               (convert-string-to-bytes (first *http-proxy*)                                        *http-encoding*)               charset:base64)))    ;; http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.1    ;; http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.10    (format sock "Accept: */*~%Connection: close~2%") ; finish request    (write-string (setq status (read-line sock))) (force-output)    (let* ((pos1 (position #\Space status))           (pos2 (position #\Space status :start (1+ pos1))))      (setq code (parse-integer status :start pos1 :end pos2)))          ;; dump headers      ;(loop :for line = (read-line sock nil nil) :while line      ;  :do (format *http-log-stream* "~&;; ~S~%" line))      ;(case if-does-not-exist      ;  (:error (error (TEXT "~S: error ~D: ~S") 'open-http code status))      ;  (t (close sock)      ;     (return-from open-http nil))))    (if (and (< code 400) (>= code 300))        ; redirection        (loop :for res = (read-line sock)          :until (string-equal #3="Location: " res                               :end2 (min (length res) #4=#.(length #3#)))          :finally (let ((new-url (subseq res #4#)))                     (format *http-log-stream* " --> ~S~%" new-url)     (unless (string= url new-url)            ;; bug to be fixed     (when (string= "https" new-url    :end2 5)       (setf new-url url)))))                          ;(unless (string-equal #1# new-url                                           ;:end2 (min (length new-url) #2#))                       ;(setq new-url (string-concat #1# host new-url)))                     ;(return-from open-http2 (open-http2 new-url))))        ;; drop response headers        (loop :for line = (read-line sock) :while (plusp (length line)) :do          (when (string-equal #5="Content-Length: " line                              :end2 (min (length line) #6=#.(length #5#)))            (format *http-log-stream* "...~:D bytes"                    (setq content-length (parse-integer line :start #6#))))          :finally (terpri)))    (when (>= code 400)      (close sock)      (setf sock nil)      (setf content-length 0))        (list sock content-length)))(defmacro with-http-input2 ((var url) &body body)  (if (symbolp var)      `(with-open-stream (,var (first (open-http2 ,url))) ,@body)      (multiple-value-bind (body-rest declarations) (SYSTEM::PARSE-BODY body)        `(multiple-value-bind ,var (first (open-http2 ,url))           (DECLARE (READ-ONLY ,@var) ,@declarations)           (UNWIND-PROTECT                (MULTIPLE-VALUE-PROG1 (PROGN ,@body-rest)                  (when ,(first var) (CLOSE ,(first var))))(when ,(first var) (CLOSE ,(first var) :ABORT T)))))))(defun download-file (url file)  (setq file-info (open-http2 url))  (setf socket (first file-info))  (setf content-length (second file-info))  (format t ";; file: '~a', size: ~abytes(~fkb)(~fmb) ~%"  file content-length (/ content-length 1024.0)  (/ content-length (* 1024.0 1024)))  (format t ";; downloading '~a'... please be patient to wait...~%" file)  (let ((data (make-array content-length :element-type '(unsigned-byte 8))))    (setf (stream-element-type socket) '(unsigned-byte 8))    (ext:read-byte-sequence data socket)    (with-open-file     (out file :direction :output :element-type '(unsigned-byte 8))     (ext:write-byte-sequence data out)))  (format t "~a downloaded, done.~%" file));;;; load the page;; search the open code package;; download the package;;(defun search-package-and-download (home-page &optional (depth 3))  (let ((r)(info)(socket)(content-length)(url-info)(new-depth)(url-s)(url-e))    (format t "depth ~a ~%" depth)      (when (<= depth 0)    (format t "the max depth to cease the search ~%")    (return-from search-package-and-download))  (setq info (open-http2 home-page))  (setf socket (first info))  (setf content-length (second info))  (setf url-info (parse-url home-page))    (when (null socket) (return))   (format t "~a opened. ~%" home-page)   (format t "search the code package... ~%")   ;(with-open-file    ;(out (incf file-name-id) :direction :output)   (loop :for line = (read-line socket nil nil) :while line :do  ;(write-line line out)  (format t "~a ~%len:~a ~%" line (length line))  (setf r (regexp:match "href=\"" line :end (length line)))  (setf url-s -1)  (setf url-e -1)  (unless (null r)    (when (> (length line) (regexp:match-end r))    (setf url-s (regexp:match-end r))    (format t "url-s:~a ~%" url-s)))  (when (>= url-s 0)    (format t "s: ~a ~%" url-s)    (setf a (position #\" line :start url-s))    (unless (null a)      (setf url-e a)))      ;(setf r (regexp:match "\"" line :start url-s))    ;(unless (null r)      ;(setf url-e (regexp:match-start r))      ;(format t "e: ~a ~%" url-e)))    (when (> url-e url-s)      (format t "e: ~a ~%" url-e)    (setq url (subseq line url-s url-e))    (format t "~a s:~a e:~a ~%" url url-s url-e)    (when (and (is_path url)       (> (length url) 1))      (format t "~a is a path of the host. ~%" url)      (setf url (string-concat "http://" (s-url-host url-info) url)))    (when (url-string-p url)      (format t "~a ~%" url)      (setf r (regexp:match ".tar.gz" url))      (when (null r)(setf new-depth (- depth 1))(search-package-and-download url new-depth)(format t "a search was done.~%"))      (unless (null r)(when (= (length url) (regexp:match-end r))  (setf file-s (+ 1 (position #\/ url :from-end (length url))))  (setf file (subseq url file-s))    (download-file url file))))))));(search-package-and-download; "http://packages.ubuntu.com/precise/clisp");(search-package-and-download; "http://packages.ubuntu.com/search?keywords=clisp");(search-package-and-download; "http://packages.ubuntu.com/precise/clisp");(defun split-seq (seq pred &key (start 0) end key strict)  "Return a list of subseq's of SEQ, split on predicate PRED.Start from START, end with END.  If STRICT is non-nil, collectzero-length subsequences too.  (split-seq SEQ PRED &key (start 0) end key strict)"  (declare (sequence seq) (type (function (t t) t) pred) (fixnum start))  (loop :for st0 = (if strict start                       (position-if-not pred seq :start start                                        :end end :key key))        :then (if strict (if st1 (1+ st1))                  (position-if-not pred seq :start (or st1 st0)                                   :end end :key key))        :with st1 = 0 :while (and st0 st1) :do        (setq st1 (position-if pred seq :start st0 :end end :key key))        :collect (subseq seq st0 (or st1 end))))(defun split-string (str chars &rest opts)  "Split the string on chars."  (declare (string str) (sequence chars))  (apply #'split-seq str (lambda (ch) (declare (character ch)) (find ch chars))         opts))(defun search-sentences (line)  (let ((s)(e)(len (length line))(lst nil))    (loop :for i :from 0 :below len :do  (setf s nil)  (setf e nil)  (cond   ((char= #\" (aref line i))  (setf s (1+ i))  (setf e (position #\" line :start s))  (when e    ;(format t "~a ~%" (subseq line s e))    ))   ((char= #\> (aref line i))  (setf s (1+ i))  (setf e (position #\< line :start s))  (when e    ;(format t "~a ~%" (subseq line s e))    )))  :when (and s e)  :collect (subseq line s e))))(defun search-words (line)  ;(format t "search words in '~a' ~%" line)  (let ((sentences (search-sentences line))(words)(r))    (when sentences      (dolist (s sentences)(when (> (length s) 0)  (setf r (split-string s '(#\ )))  (when r    (setf words (append words r))))))    words))    (defun search-keywords (line)  (let ((words (search-words line))(keywords))    (when words      ;(format t "words: ~a ~%" words)      (dolist (w words)(when (> (length w) 0)  (setf w (string-downcase w))  ;(format t "~a ~%" w)  (setf r (gethash w *elements-htb-of-fs*))  (when r    ;(format t "found: ~a ~%" w)    (setf keywords (append keywords (list w)))))))    ;(when keywords      ;(format t ";; ~a, count: ~a ~%" keywords (length keywords)))    keywords));;;; -get the content in "?" as sentence;; -get the content in >?< as sentence;; -sentences as a list;; -parse the sentence into words list;; -match the words in the elements hashtable of common sense;; -count the matched words of each page;; -build the tree for all pages searched;;  (do not search the page has been searched or under being searched);;  (a tree to save the valid pages, a hashtable for page matching);; -save the tree into xml file;; -save the result of this detection into a txt file as db;;(defun search-keywords-in-page (home-page)  (let ((r)(info)(socket)(content-length)(keyword-cnt 0))    (setq info (open-http2 home-page))    (setf socket (first info))    (setf content-length (second info))    ;(format t ";; socket: ~a, content-length: ~a ~%" socket content-length)    (when (null socket) (return-from search-keywords-in-page 0))    (format t ";; ~a opened. ~%" home-page)    (format t ";; search keywords... ~%")    (loop :for line = (ignore-errors (read-line socket nil nil)) :while line :do ;(format t ";; ~a ~%len:~a ~%" line (length line)) (setf r (search-keywords line)) (when r (setf keyword-cnt (+ keyword-cnt (length r)))))    (when (> keyword-cnt 0) (format t ";; count of keywords found: ~a ~%" keyword-cnt))    keyword-cnt))(setf *min-keyword-count-of-fs* 30)(defun crawl-free-software-page (home-page htb tree &optional (depth 2))  (let (r info socket content-length url-info new-depth url-s url-e  (p 0.0))    (format t ";; ~a: depth:~a ~%" 'crawl-free-software-page depth)    (unless (null (gethash home-page htb))      (format t ";; page '~a' has been searched ~%" home-page)      (return-from crawl-free-software-page))    (setf r (search-keywords-in-page home-page))    (when (<= r *min-keyword-count-of-fs*) (return-from crawl-free-software-page))        (push home-page (car tree))    ;(format t ";; ~a ~%" tree)    (setf (gethash home-page htb) t)    (when (<= depth 0)      ;(format t ";; the max depth to cease the search ~%")      (return-from crawl-free-software-page))        (setq info (open-http2 home-page))    (setf socket (first info))    (setf content-length (second info))    (setf url-info (parse-url home-page))    (when (null socket) (return-from crawl-free-software-page))    (format t ";; ~a opened. ~%" home-page)    (format t ";; crawling... ~%")    (loop :for line = (read-line socket nil nil)  :while line :do  (setf r (regexp:match "href=\"" line :end (length line)))  (setf url-s -1)  (setf url-e -1)  (unless (null r)    (when (> (length line) (regexp:match-end r))    (setf url-s (regexp:match-end r))))  (when (>= url-s 0)    (setf a (position #\" line :start url-s))    (unless (null a) (setf url-e a)))    (when (> url-e url-s)      (setq url (subseq line url-s url-e))      (when (and (is_path url)       (> (length url) 1))      (format t ";; ~a is a path of the host. ~%" url)      (setf url (string-concat "http://" (s-url-host url-info) url)))    (when (url-string-p url)      (format t ";; ~a ~%" url)      (setf r (regexp:match ".tar.gz" url))      (when (null r)(setf new-depth (- depth 1))(push '(nil) (cdr tree))(crawl-free-software-page url htb (cdr tree) new-depth))      (unless (null r)(when (= (length url) (regexp:match-end r))  (setf file-s (+ 1 (position #\/ url :from-end (length url))))  (setf file (subseq url file-s))  (setf p 1))))))))  ;(search-sentences "<h1>Welcome to <a href=\"http://clisp.org\">CLISP</a></h1>");(search-sentences "<a href=\"https://sourceforge.net/projects/clisp\">CLISP</a>");(search-words "<h1>Welcome to <a href=\"http://clisp.org\">CLISP</a></h1>");(search-keywords "<h1>Welcome to <a href=\"http://clisp.org\">CLISP open source</a></h1>")(defun crawl-page (home-page)  (let ((htb (make-hash-table :test #'equal))(tree '(nil)))    (crawl-free-software-page home-page htb tree 2)    (maphash #'(lambda (k v) (format t "~a = ~a ~%" k v))     htb)    (format t ";; ~a ~%" tree)))(crawl-page "http://clisp.org/");(crawl-page "http://freshcode.club/");(crawl-page "http://packages.ubuntu.com/precise/clisp");(crawl-free-software-page "http://clisp.org/");(crawl-free-software-page "http://freshcode.club/");;(crawl-free-software-page "http://www.openhub.net/");;(crawl-free-software-page "http://sourceforge.net");;(crawl-free-software-page "http://github.com");(crawl-free-software-page "http://directory.fsf.org/wiki/Main_Page");(crawl-free-software-page "http://packages.ubuntu.com/precise/clisp")

expert-system.lisp:

;;;;;; expert system of xspider mycode;;;;;;; elements of common sense of free software;;;; -create the database (simply a *.txt file);; -read the words from database to a hashtable;; -insert a new word into hashtable and update the database;; -delete a word from hashtable and update the database;;(defmacro string-beg-with (beg strv &optional (lenv `(length ,strv)))  "Check whether the string STRV starts with BEG."  (if (stringp beg)      (let ((len (length beg)))        `(and (>= ,lenv ,len) (string-equal ,beg ,strv :end2 ,len)))      (with-gensyms ("SBW-" len)        `(let ((,len (length ,beg)))   (and (>= ,lenv ,len) (string-equal ,beg ,strv :end2 ,len))))))(defun load-elements-of-free-software (file)  (with-open-file   (in file :direction :input)   (loop :for line = (read-line in nil nil)   :while line   :do   ;(format t "~a ~%" line)   (unless (string-beg-with ";" line)     (setf line (string-downcase line))     (setf (gethash line *elements-htb-of-fs*) t)     (unless (gethash line  *elements-htb-of-fs*)       (format t ";; insert '~a' to hashtable failed~%" line))))))(defun exist-element-of-free-software (key)  (gethash key *elements-htb-of-fs*))(defun expert-system-init ()  (setq *elements-htb-of-fs* (make-hash-table :test #'equal))  (load-elements-of-free-software "elements-of-free-software.txt")  (format t ";; size of elements-of-free-software hashtable: ~a ~%"  (hash-table-size *elements-htb-of-fs*))  (format t ";; show all elements of this hashtable... ~%")  (maphash #'(lambda (k v)       (format t "~a = ~a ~%" k v))   *elements-htb-of-fs*))(expert-system-init);;;; forms of homepage of free software project;;;;;; forms of code downloading page of free software project;;;;;; forms of user guide page of free software project;;;;;; evaluate the certain degree of detecting free software page;; -given the P;; -how to update the P to a better cretain degree;; -if always exist a best degree, there already be a common form;; -search the forms, build the P network, calculate the P degree;; -given test cases of pages, verify the degree, improve and loop;;

elements-of-free-software.txt:
;;;;;; common words of free software community;;;hackhackerhackinggnulinuxfsffree software foundationfree softwarecommunitylpfleague for programming freedomMozilla Foundation;; free software lincesesFree Documentation LicenseGeneral Public Licensegplgnu gplgnu fdlgplv2gplv3ApacheArtisticBerkeley DatabaseBoost SoftwareBSDCeCILLCryptixEducational CommunityEU DataGridFreeBSDFreetypeimlib2jpeg groupintel acpiiscMozillamplPublic DomainPythonSGIUnicodeUPLUnlicensevimwebmwtfplWxWidgetsX11XFree86zlibzope;;;;;; common words of software program;;;softwarefirmwaredriverslibrarylibrarieslibsystemmoduleframeworkinterfaceapiAPIsosoperation systemkernelgraphical user interfaceguiuser interfaceuiappappsapplicationdatadatabasedbwebcodecodedcodersourcesourcessource codeprogramprogramedprogramsprogrammingcodingdesigndesigneddevelopdevelopeddeveloperdevelopingdevelopmentimplementimplementedimplementsimplementationopen sourceopen-sourceopensourceopen source softwareossprojectprojectsversionnewsbugbugslanguagemanualguideuser guideplatformfaqmailing listpackagenotestooltoolsUtilityUtilitiesdocumentationprogrammermalwaredesktopbrowserubuntuunixweb pageweb serverclientserverpluginpluginsextendextendsfeaturefeaturesembeddedcreatebuilddebugmaintainsoftware engineerArchitecturelayerlayersloadloadedinstructioninstructionsmemorystackmallocheaphoopcallbackramromexecuteexecutesexecutionruncrasherrorerrorstesttestedtestingperformanceunit testingregression testingcommandreleasereleasedfixeddiffpatchLicensealgorithmalgorithmssortsortingData structureaiUser experienceend-userprocessMultiprocessMulti-processMultiprocessingthreadmultithreadmulti-threadMultithreadingtasktasksiotermnialttysocketfile systemstreamprotocolfront-endback-end;; Software standard organizationsw3cietf;; software standardstcp/iptcpipxmlhttphtml;; common words of software projectpre-alpharequirement;design;development;unit testingMilestone versionalphawhite boxwhite-boxblack boxblack-box;featurebeta;releaseissueissuesrtmga;; where and how to share the codesgithubSourceForgeohlohOpen Hubfreecodefree codefreshcodefresh codegitsvncvs;; types of software;applicatinMiddlewaredistributed systemProgramming languageSystem software;firmware;driversProgramming toolCompilerDebuggerInterpreterLinkerIDEoopgccg++makefilemake all;;;;;; common words of programming language;;;syntaxsyntacticsemanticsfunctionlispcommon lispclschemeclispshellsbclcc++c#Objective-Cjavaperlpythonjavascriptrubyluaphpcss;;;;;; common words of data structures;;;hashtablehash tablearraytreelistqueue;;;;;; common words of database language;;;mysqlMariaDBsqlitesqlsql servernosqlMongoDBPostgreSQLBigTable;;;;;; common words of types of package;;;zipgzrartgz;;;;;; common words of great hackers;;;freedomProgramming Freedom

Run This Demo

Step 1: clisp -q

Step 2: (load "expert-system.lisp")

Step 3: (load "xspider-mycode-client.lisp")

Step 4: modify above codes to crawl the free-software-like web pages as you want...

The result shown like below:


...


Enjoy~

I.Mars- Nov. 7 2016


1 0