programming

Project Idea: "Programming library for curse words"

When programming, there are occasionally times when you need to detect or block curse words. At CourtListener, for example, we make URLs with ID numbers in them that are formed by converting an ID number to letters (so a → 1, b → 2, 27 → A, etc). Higher numbers create longer strings of letters, so over time, this creates curse words in the URL. Currently, the site is only has a few four letter strings, but I will rue the day when any of the seven dirty words is being shown to users on my site.

There are many lists of curse words on the web, but none that is maintained or curated. Having that alone would be a useful project. What would make it better would be libraries in popular programming languages that efficiently told you if a string contained a curse word.

The next feature would be to add additional languages, and then to add words like pen1s, which aren't normally curse words, but are certainly words you'd want to eliminate.

It'd be a pretty simple project, so I may just go for it.

Only question is, what do I name it?

Swimlane Diagram Generator Written in XSLT

For the past couple years, I've been wanting to make a swimlane diagram showing all of my roommates and which room they lived in. I considered drawing it out by hand with a charting program, but the idea of updating it whenever somebody moved in or out seemed daunting, and I decided that the best thing would be to make a program that could generate a chart from XML. My new job at Recommind requires that I learn and use XSLT, so I took the opportunity to write an XSLT script that converts the XML data to HTML, Javascript and SVG.

The final product looks something like this (anonymized with Presidential info for good measure):
Swimlane diagram demo
(click through for complete demo)

For the technically inclined: The program is an XSLT script, which converts the XML into HTML and Javascript. The Javascript is then interpreted by the Raphael library, which finally generates the SVG you see. It's overly complex, but it was a fun mis-mash of technologies to play with and the point was learning new things as much as anything.

The transformation should work to make all kinds of swimlane diagrams, so if you're interested in the code, let me know.

Projects & Papers

Papers and Essays

  • CourtListener.com: A platform for researching and staying abreast of the latest in the law, Michael Lissner, 07 May 2010. [pdf]
  • Exploratory Analysis of Service Recipients of the Community Services Bureau, Michael Lissner, 27 February 2010. [html]
  • Breaking ReCAPTCHA, Michael Lissner, 9 December 2009. [pdf]
  • Proactive Methods for Secure Design, Michael Lissner, 9 December 2009. [pdf]
  • Facebook's Battle Sign, Michael Lissner, 16 November 2009. [pdf]
  • Wikipedia Article on Jacobsen v. Katzer, Michael Lissner, et al, 03 October 2009. [html]
  • The Difficulties of Managing Online Estates, Michael Lissner, 15 May 2009. [pdf]
  • Online Grieving by Default, Michael Lissner, 12 May 2009. [pdf]
  • The Layered FTC Approach to Online Behavioral Advertising, Michael Lissner, 02 April 2009. [pdf]
  • Technology Revolution and the Fourth Amendment, Michael Lissner, 22 May 2009. [pdf]
  • Wikipedia Article on Zeran v. AOL, Michael Lissner, et al, 18 March 2009. [html]
  • Sustainability Metrics for the Energy Sector, Michael Lissner, Hazel Onsrud, Sharmila Ravula, 10 December 2008. [pdf]
  • TuneRepublic Democratic Jukebox, Ryan Greenberg, Michael Lissner, Zain Syed, 07 January 2009. [pdf]

Programming Projects

  • Swimlane Diagram Generator, 06 September 2010, XSLT. [html]
  • Mercurial Hook to Automatically Copyright Pushed Files, 24 January 2010, Python. [html]
  • F-spot Photo Management Database Cleaner, 14 October 2009, bash, SQL. [html]
  • Yelp Scraper, 21 December 2008, Python. [html]
  • Twitter Credentials Verification Script, 03 April 2009, Python. [html]
  • Pacific Crest Trail Temperature Analysis Visualization, 12 December 2007, Java. [html]

Presentations

  • Interface Design Final Project, 07 May 2009. [ppt]
  • HTML Basics, 09 July 2010. [html]
  • Search, 28 July 2010. [html]
  • Browsers, 30 July 2010. [html]
  • Mechanize, Beautiful Soup and Regular Expressions, 11 April 2009. [pdf]

Websites

A Python Function to Verify Twitter Credentials

Thought I'd post this for the future generations, since I had a hard time finding a template anywhere on the web when I needed one. It's nothing revolutionary, but a useful snippet nonetheless. This is for one of my projects this semester.

import pycurl
def verifyTwitterCredentials(username, password):
    c = pycurl.Curl()
    c.setopt(c.URL, 'http://twitter.com/account/verify_credentials.xml')
    c.setopt(c.USERPWD, username + ":" +  password)
    twitterfeed = c.perform()
 
    status = c.getinfo(c.HTTP_CODE)
 
    if str(status) == '200':
        verified = True
    else:
        verified = False
 
    c.close()
 
return verified

Working with matplotlib and pycairo

I spent a good part of my winter break working on learning Python and using it for projects. One project was the Yelp scraper that I posted about previously, and another was a report for my old work.

The report is a statistical analysis of the development of about 2,000 children aged three and four. For those interested, I'll try to post it here once the final version is ready to go. In the past when making the report, I had been frustrated because there was no easy way to script the creation of the 30 or so charts that need to be made. Excel had been our data analysis tool, and as such, we were stuck with either using VBA to create charts, or to do it by hand. Since nobody knew VBA, we always just buckled down and did the work by hand.

This time around, I discovered the matplotlib Python library, and used that to create the charts. It was an pretty rough experience all in all. While simple graphs can be created in about five lines of code, creating complicated ones took a good amount of work. For example, to change the tick markers on a graph requires that you create tick objects, and then manipulate them each individually in a for loop. Granted, I couldn't customize them at all in Excel, but figuring out that kind of change was a pain indeed.

The report itself required about 1,000 lines of code, and each chart required about 100-200 lines. For custom charts, I didn't find the library that useful, however towards the end of the report there are 30 charts, all of which are identical, except for the data. For these charts, I was able to make a for loop that created them all in about 20 minutes, whereas previously these took me a few hours to make by hand.

Another library I spent some time learning was the pycairo library, which allows pixel by pixel editing of pictures. I had planned to use it to do any editing to the charts that I was unable to accomplish with the matplotlib library, but in the end, it was unnecessary. I have another project coming up though that will use the pycairo library, so look for that soon.

Yelp Scraper to Get Business Info in a Geographic Area

I spent the past couple days on one of my first Python projects - using the Yelp API to compile a list of restaurants in a defined geographic area.

It's been a good project. Because of some limitations of the API, I had to do some interesting tricks to make it work. One problem with the API is that it only allows 20 hits per query, so if you want to do a big query, you have to divide it up into tiny queries that have fewer than 20 hits each.

To accomplish that, if a query gets 20 hits within those two points, it will divide the longer dimension of the rectangle created by the points in half, and perform a query on each of those two new rectangles. For each of those, if there are 20 hits, it will again divide it in two and perform two new queries, and so forth until less than 20 hits are found for the rectangle. Once less than 20 hits are found, the data is entered into a database. Once all the points have been added to the database, a comma separated file is created, and the program ends.

It was pretty incredible switching to Python for this project from my usual Java, and also using an official API for the first time. This project ended up being about 200 lines (half of which are comments). I can't imagine how long it would be with Java, since I used some rather powerful Python modules to accomplish this (namely, csv, urllib & json).

If anybody is interested in seeing/using the code, let me know. It should be useful if you need a list of restaurants or other businesses in a certain area. Worthy causes only please!

PCT Data Project - DONE

I'm happy to announce that the PCT data project is complete!

Over the past several weeks/months, I have been slaving away over my computer writing this program. When used, it will generate a dynamic graphing area that will load up temperature data for one to six PCT hikers.

All those that are interested in the most complicated programming assignment I have ever worked on are welcome to check it out at michaeljaylissner.com/pct-temperatures.

I am officially a free man once again! Thanks to all who made this possible with their encouragement and patience!

Pacific Crest Temperature Project

Tagged:  

Introduction

As mentioned in my previous blog posting, in 2005 and 2006 six hikers carried iButton thermocron devices 2,650 miles from Mexico to Canada along the Pacific Crest Trail. These devices are a sealed cannister about the size of five stacked dimes. Inside the cannister are the following:

  • A piece of memory
  • A clock
  • A thermometer

Each hour, these devices were programmed to check the temperature and record it to the memory. Upon returning from the journey, the devices were connected to a computer and the data was extracted. All told, there are about 18,522 data points - too many to be plotted in any one graph. As a result of the struggle to make meaningful use of this data, the applet below was created.

Applet and Java Project

This applet was created as a final project for Java: Discovering its Power, a class offered by University of California Berkeley Extension. The source code is attached to this posting, and modifications or updates are more than welcome. Additionally, the source data is attached to this posting for all six hikers. You will see that it is in .csv format (the fields go like this: month, day, year, hour, temperature).

To use the applet, simply select the date range that you are interested in displaying, the time of day that interests you, the hikers whose data you want to see, and press the 'generate' button.

Caveats and Warnings

There are a few caveats about using this data:

  • The primary caveat is that these results are all passive data, which is to say that these measurements were not taken by a careful experiment, but rather by a device that was carried somewhere in a backpack for the length of a five month journey. As a result, the figures shown can vary greatly depending on how the device was treated, where it was when it took its measurement, and any number of other factors.
  • Ground temperatures and solar energy can be very extreme. Many of these hikers carried their iButton in a pack that might have been set within one or two inches of the ground or directly in the sun, where the temperature can seem unreal. I have seen measurements ranging from about 10 to 160 degrees F. These are actual measurements.
  • Different hikers move at different paces, and take off-trail days at different times. There is no guarantee that the figures you are looking at were measured while the hiker was on the trail.

Hiker Start and End Dates

Adam Bradley: 5/15/06 to 9/24/06
Matt Church: 4/28/06 to 6/22/06
Robert Francisco: 4/25/06 to 9/26/06
Michael Lissner: 4/21/05 to 9/12/05
Jeff Singewald: 4/22/06 to 9/6/06

The Applet

Applet?
Go fullscreen

Bugs and Questions

Inevitably, we shall find bugs and problems with this applet. When that happens, it would be great if they were sent to me for analysis and correction.

Any questions about the use of this applet are more than welcome. Just send me a jot.

The Great Temperature Data Project

Back in '05 when I hiked from Mexico to Canada on the Pacific Crest Trail, I carried a little device called an iButton. This little device contains essentially three things: a clock, a bit of memory and a thermometer. It's waterproof, accurate to .1 degree Celsius, and is about the size of five dimes stacked one upon another. There are a bunch of silly things you can do with these, but what I chose to do with mine was to have it record the temperature every hour on the hour for the entire time I was hiking, with the idea being to get some good data about the temperature out there on the PCT.

All in all, you can figure that the temperature was recorded 24 times a day for about 150 days, for an astounding 3600 data points, and about 150 oscillations from the daytime high to the nighttime low. I've spent some time working with the data, and it's pretty much impossible to make much use of....unless you write a program to interpret it. You can see it for yourself if you're interested.

Well, as fate should have it, I am currently enrolled in a Java programming class, and I have the option of doing a final project of my own choosing. Having not put this data to good use has been a burden on my soul for a couple of years now, and I've decided to make my final project an applet that will allow a user to plot this data on a graph for any date range and any time range that they choose (e.g. 5pm to 10pm for September 20th to 23rd).

Once this is done, I will attempt to post it here, but here's the question to you dear reader, do you have any suggestions as to features that you would be interested in seeing in an applet of this sort? Thoughts?

I'm quite excited about getting this info out there. Finally.

Syndicate content