Getting Wet With Common Lisp November 7th, 2007
I have developed something with C, C++, PHP, Java, Ruby, Python, Delphi, Javascript from the first time I learned programming language. I am quite familiar with those languages. But I want to developed something with programming language which is very “different” with those programming languages I have mentioned. Those programming languages use algo-like syntax. There are Erlang, Haskell, Lisp, Scheme for very “different” programming languages camp. I choose Lisp or Common Lisp as first very “different” language to explore. Yes, in my colleague years, I have learned and developed something simple with CLIPS. But the programming language is constrained to expert system. I want to use very “different” programming language to develop something general. There are many reasons I choose Common Lisp as first very “different” language to explore. One of them is Eric Raymond. He said this:
“Lisp is worth learning for the profound enlightenment experience you will have when you finally get it; that experience will make you a better programmer for the rest of your days, even if you never actually use Lisp itself a lot.” from http://www.catb.org/~esr/faqs/hacker-howto.html
The parenthesis are so sexy too. Then I learned Common Lisp from internet. Fortunately there are many good free tutorials out there. Here some of them:I have finished reading the latter, but not the former. Then I thought I want to try it (Common Lisp). So I decided to use Common Lisp to find programming languages used to build applications popularity in Gnomefiles website. As you know, GTK+ toolkit has many bindings. So you can use PHP to build application using GTK+ toolkit rather than pure C.
The first thing to do is to find the list of all applications registered to Gnomefiles. In its front page, it said that it has 1898 applications registered. But I can not find the page listing all applications. So I thought at that time that I have to iterate one by one from sub category pages. The sub category page example is like this: http://www.gnomefiles.org/subcategory.php?sub_cat_id=38
I played with parameter for sub_cat_id. It looks the highest number that can be passed to url is 157. So I have to iterate one by one from…. ehm, 1 or 0? Okay, I have to check whether 0 can be passed as parameter to that url. But when I try this url: http://www.gnomefiles.org/subcategory.php?sub_cat_id=0, I get mysql warning in web page. But I get many applications sorted alphabetically. But the sad part is in this web page, there are only 1333 applications rather than 1898 applications. But it is okay. I just want to find out the big picture. 1333 from 1898 can give clear picture about programming languages popularity, I think. So I downloaded the web page. I named it webpages.html. I needed to filter it out. So I can have this simple format text:
... app.php/xoscope app.php/Xournal app.php/Xpad app.php/XPN_-_X_Python_Newsreader app.php/XQF app.php/xrdesktop app.php/XSane app.php/XSensors app.php/XTerm-Here ...
rather than this:
... <TD width=535 BGCOLOR="#EFF0E8"><FONT SIZE=2 FACE="Arial"> <B><A HREF="app.php/GAMMApage">GAMMApage</a></B></font></td></tr> <tr><td width=535 BGCOLOR="#F6F6F6"><FONT SIZE=1 FACE="Verdana, Arial"> Monitor calibration tool</font></td></tr></table> <table cellPadding=2 cellSpacing=0 width=575 border=1 BORDERCOLOR="#eeDDDD" class="table"> <TD WIDTH=40 HEIGHT=40 BGCOLOR="#FFFFFF" ROWSPAN=2 ALIGN="center"><IMG SRC="shots/generic.gif" WIDTH=32 HEIGHT=32 BORDER=0></TD> <TD width=535 BGCOLOR="#EFF0E8"><FONT SIZE=2 FACE="Arial"> <B><A HREF="app.php/GanttPV">GanttPV</a></B></font></td></tr> <tr><td width=535 BGCOLOR="#F6F6F6"><FONT SIZE=1 FACE="Verdana, Arial"> Project Scheduling Software</font></td></tr></table> <table cellPadding=2 cellSpacing=0 width=575 border=1 BORDERCOLOR="#eeDDDD" class="table"> ...
This is the lisp code to make that happen.
;filter_gnomefiles.lisp
(with-open-file (stream "webpages.html")
(with-open-file (out-stream "applications.html" :direction :output)
(loop for line = (read-line stream nil)
do
(if line
(when (search "app.php" line)
(format out-stream
(concatenate 'string
(subseq line 77 (search "\"" line :start2 77))
"~%")))
(return)))))
I hardcore the number (77). 77 is the column where our important information start showing up in specific line. The code is very easy to understand. You open webpages.html to be read and applications.html as the simple format text. Loop from line to line. When there is “app.php” substring in line, we extract the line into “app.php/TheNameOfApplication” line to be written in applications.html file.
The url of application page in Gnomefiles is like this: http://www.gnomefiles.org/app.php/Galaxium_Messenger
We have applications.html file consist of:
... app.php/xoscope app.php/Xournal app.php/Xpad app.php/XPN_-_X_Python_Newsreader app.php/XQF app.php/xrdesktop app.php/XSane app.php/XSensors app.php/XTerm-Here ...
Using this lisp code, we iterate line by line to download the application web pages.
;spider_gnomefiles.lisp
(with-open-file (stream "applications.html")
(with-open-file (out-stream "written_app.html" :direction :output)
(setf index 1)
(loop for line = (read-line stream nil)
do
(if line
(progn
(setf url (concatenate 'string "http://www.gnomefiles.org/" line))
(setf filename (concatenate 'string "app_gnomefiles/" (write-to-string index) ".html"))
(setf index (+ index 1))
(sleep (+ 3 (random 3)))
(shell (concatenate 'string "curl " url " -o " filename))
(format out-stream (concatenate 'string line "~%")))
(return)))))
You read applications.html to iterate line by line. The written_app.html is indicator of our progress downloading all application web pages. So when we break the process, we don’t have to start from the beginning. We download our application web pages into app_gnomefiles folder. I choose to name the file by numbers. So the application web pages names will be 1.html, 2.html, 3.html and so on. I use sleep function because I try to be polite spidering somebody’s server. About shell function, I don’t know how to download something from internet using only pure Common Lisp. There are third party libraries. But I confused how to install them in Windows XP without Cygwin. So I use third party tool to be called from lisp code, that is curl.
After you got all applications web pages in app_gnomefiles folder, it’s time to extract the information we need. Every application web page has this part that we need:
... <B>Requirements</B><BR> This application requires GTK+ version 2.6.x. Other dependencies include:<BR> gtkmm, libglademm, gconfmm <br> </font></TD></TR> <TR><TD VALIGN="TOP" bgcolor="#DDDDDD" align="left"><FONT SIZE=2 FACE="Arial"><B> Latest Version: 0.2.1</B></font></TD></TR> ...
From that part of information, we know that this application uses C++ because gtkmm is C++ binding of GTK+.
To make it easy to query the information, we gather all information about requirements into one file. This is the job of this lisp code.
;grouping_gnomefiles.lisp
(setf files-list
(directory (make-pathname :name :wild :directory '(:relative "app_gnomefiles"))))
(with-open-file (out-stream "db_gnomefiles.txt" :direction :output)
(loop for file in files-list
do
(with-open-file (stream file)
(setf save-text nil)
(loop for line = (read-line stream nil)
do
(if line
(if save-text
(if (search "Latest Version:" line)
(progn
(format out-stream "~%")
(setf save-text nil))
(format out-stream (concatenate 'string line "~%")))
(when (search "<B>Requirements</B>" line)
(setf save-text t)))
(return))))))
We put all files that we need to extract in files-list variable. We put our result from gathering information in db_gnomefiles.txt. We iterate from file to file in app_gnomefiles folder. When processing one file, as usually we iterate from line to line. The logic is like this. We find line containing ”<B>Requirements</B>”. We set the save-text flag to true. From that, we save the text to db_gnomefiles.txt. When we encounter line containing “Latest version:”, we set the save-text flag to false. We print blank line in db_gnomefiles.txt. Then we ignore the rest of file.
Now we have the information put in single file, we can query with convenient.
;question_db_gnomefiles.lisp
(defun split-by-one-space (string)
(loop for i = 0 then (1+ j)
as j = (position #\Space string :start i)
collect (subseq string i j)
while j))
(format t "Use double quote (\") to enclose your keywords.~%For example: \"c++ gtkmm
glademm\".~%Remember to use SINGLE space to separate the keywords.~%
All must be typed with lower case.~%")
(setf keywords (split-by-one-space (read)))
(setf counts 0)
(setf block-search nil)
(defun search-strings (p-line)
(mapcar (lambda (x) (search x (string-downcase p-line))) keywords))
(with-open-file (stream "db_gnomefiles.txt")
(loop for line = (read-line stream nil)
do
(if line
(if (search "This application requires GTK+ version" line)
(setf block-search t)
(when (> (length (remove nil (search-strings line))) 0)
(progn
(when block-search (incf counts))
(setf block-search nil))))
(return))))
(print (/ (* 100 counts) 1331.0))
First we define function to split string by one space to list of strings. Then we print information to user how we want them to give input. Variable counts will hold how many block of informations which have one of keywords that user give. Variable block-search is a flag variable. If it is true, when we find the keywords in specific block, we increase the variable counts, then we set block-search to false or nil. So when we are still in the same block, and we encounter the keywords again, we will not increase the variable counts again. When we encounter the new block indicated by line containing “This application requires GTK+ version”, we set the block-search to true. Function search-strings is just to check whether specific line has one of the keywords that user give. For example, we have these keywords: “c# gtk# mono”. This is the line we encounter: “tested mono version 1.2.5.1 and GTK# 2.” So we will have this list: (nil 32 7). The list contains positions of each keywords in the specific line or string. “c#” get nil because there is no substring “c#” in the line. To check whether the line has one of the keywords, we only need to remove the nil from the list and find the length of the list. If the length is greater than 0, then we have it. To make it clear, this is our db_gnomefiles.txt looks like:
... This application requires GTK+ version 2.6.x. Other dependencies include:<BR> GConf >= 2.4 <br> </font></TD></TR> This application requires GTK+ version 2.0.x. <br> </font></TD></TR> This application requires GTK+ version 2.6.x. Other dependencies include:<BR> libmysqlclient, libgnomeui, gconf <br> </font></TD></TR> This application requires GTK+ version 2.10.x. Other dependencies include:<BR> python-gtk2, ppp <br> </font></TD></TR> This application requires GTK+ version 2.2.x. Other dependencies include:<BR> - Poslib DNS library, version 1.1.0-pre*: http://www.posadis.org/wiki/doc:poslib - Libglade: http://www.jamesh.id.au/software/libglade/ <br> </font></TD></TR> ...
After reading the file and iterate line by line, we print the percentage of counts. But what does the “1331.0” number mean of? When I gather the information into one file, I encounter error encoding. The error is like this:
character cannot be represented in character set charset:cp437
I have to find the file (I put specific code to mark the file that gave me this error) and manually remove the unrecognized character. Vim does not recognize too (That’s why I know what character that I must delete). Firefox recognizes the characters by the way.
After cleansing process, I encountered character encoding warning.
*** - invalid byte #x81 in CHARSET:CP1252 conversion
But I got this warning after done with the processing so I had no clue what files give me this warning. Because of this I missed two files to be put in db_gnomefiles.txt. I have to make the 1331 number to be float (1331.0) so the result of division will be float number.
Okay, it’s time to test our beloved lisp code. Let’s find the popularity of C# programming language:
D:\Documents and Settings\torvald\My Documents\projects\lisp_gnomefiles>clisp question_db_gnomefiles.lisp
Use double quote (") to enclose your keywords.
For example: "c++ gtkmm glademm".
Remember to use SINGLE space to separate the keywords.
All must be typed with lower case.
"gtk# mono c#"
6.536439
D:\Documents and Settings\torvald\My Documents\projects\lisp_gnomefiles>
6.5%.... Ehm, that is quite high number. What about other programming languages popularity? I put the result of my investigation in another post. All these lisp codes tested with GNU CLISP.
Akbar