Project idea

Project Idea: "Programming library for curse words"

When programming, there are occasionally times when you need to detect or block curse words. At CourtListener, for example, we make URLs with ID numbers in them that are formed by converting an ID number to letters (so a → 1, b → 2, 27 → A, etc). Higher numbers create longer strings of letters, so over time, this creates curse words in the URL. Currently, the site is only has a few four letter strings, but I will rue the day when any of the seven dirty words is being shown to users on my site.

There are many lists of curse words on the web, but none that is maintained or curated. Having that alone would be a useful project. What would make it better would be libraries in popular programming languages that efficiently told you if a string contained a curse word.

The next feature would be to add additional languages, and then to add words like pen1s, which aren't normally curse words, but are certainly words you'd want to eliminate.

It'd be a pretty simple project, so I may just go for it.

Only question is, what do I name it?

Project Idea: "Community-Curated Data Repository"

There's an interesting problem that I've run into a number of times that goes like this: You want to start a new project studying X dump of data, and you have a great idea of how to do Y with it. You go download the data, but then you spend hours (days and weeks) manipulating it, manicuring it, and stuffing it neatly into a database. The problem is that the data is in their format, and they probably haven't told you much about it, much less put it into a useful format for other people. You have no option but to figure it out, optimize it, make it queryable, etc, when really, what you wish you were doing was simply working with it.

In other words, the data format and quality keeps you from working with the data itself. I've run into this a number of times, most notably when trying to work with the Recovery Data. I've also had fun working with census data, geographic data, and the list goes on. There are any number of useful data sources that are provided by non-profits and government bodies, such as population, economic, health, and agricultural data.

The solution to this problem is simple. A community needs to be built around curating the data and providing it in useful formats, and a repository of some sort needs to be made so people can download and install the data. Similar ideas have come up a few times in various formats. Most notably, Google has taken a stab at solving this with their public data sets, and back around the turn of the millennium, Debian considered making a repository for the data.

Neither of these solutions are good enough though. In Google's case, they're providing a one-way street: They choose the data source, they tune-up the data, and they provide the data. If there's a source you don't like, or if it's in a format you don't like, well, too bad. In the case of Debian, they decided not to go for it, but they should have. They had the right idea, but weren't prepared to give the idea its due.

The right solution will be one in which the community can suggest and debate data sources, and which treats the data with the respect it deserves. I think we'll see a data source like this eventually, but I fear that until we do, researchers around the world will be stuck doing unnecessary data transformations.

Project Idea: "User contribution aggregator"

As a frequent contributor to various open source projects, I find that I often want to know just how much I have contributed over the years, and to which projects. With enough time, I could figure out every bug that I've filed, every comment I've posted, every patch that I've submitted (there aren't many), and every contribution I've made. But it would take me a LOT of effort, and after not too long, I'd be knee deep in records and notes of where I had been.

For people that contribute and work on such projects, knowing these kinds of things is valuable in forming an online reputation. This lets people know whether you are a helpful person, what you find interesting, and where your expertise may be. If you're looking for work in such a field, it's great to be able to point to a record of contribution, and say, "Yes, I am interested in this field, and I have a track record to prove it." It creates competition amongst contributors.

But since the current eco-system of online contribution is so diversified, it becomes very challenging to determine a person's online reputation. Some sites do admirable work building in algorithms to calculate the value of users, and this is good. But if you're a person that has been interested in many applications, or that has been working on open-source projects for a long time, it's more likely than not that such systems fall short.

What we need is an aggregated, centralized system that uses public APIs to build global "meta"-reputations. This is likely not that hard, since many of the more-common systems for tracking user contributions already have APIs and RSS feeds for so many things. I'm sure it's more complicated than simply plugging into an API, but creating such a system might not be that hard, and would create great value for the open-source community.

Project Idea: "Bug Trackers for Cities."

Well, today's project idea was to post about the use of bug trackers for the management of city problems, but as it should turn out, I'm behind the curve on this one, so I'll just explain the concept, and post some links to people that have live implementations or have already blogged about this. When I first researched this idea about six months ago, I didn't find anything, but it seems that steam is building behind this idea.

Essentially, the idea is this: Cities have problems that citizens know about such as potholes, busted lampposts, gang activity, etc. They want to report these things to the city, but unfortunately reporting the problems by the phone or navigating the city websites is usually an awful, time-consuming, and unrewarding experience. It goes like this: First you get bumped from one department to another, eventually finding somebody who seems like they care. You tell them about the problem and feel satisfied that you've done your part, but you don't know if it's really in their system, or when it's going to get fixed or anything. You hang up the phone, and the problem is still a part of your daily life. You know if you call again, you won't be able to get an update, and you resign yourself to simply hoping that the problem will eventually be resolved. The next time you notice something that's in need of fixing, you're less likely to try to help. As this goes on, eventually the people that once cared no longer do, and getting residents of a city engaged in the problems in their community becomes increasingly difficult.

In the software world, there is a similar phenomenon, except instead of infrastructure and safety problems, the problems are errors in the software that need to be fixed – bugs. The solution to getting these bugs triaged and managed is to use what's known as a bug tracker. These systems allow the programmers behind the software to respond to problems that people find, and to triage them appropriately. In addition, they allow other people to vote on bugs, and help solve them. They allow careful prioritization of the bugs, and they allow visualizations of the bugs to be created such as the speed that they are fixed by department, the oldest bug in the system, etc.

If such as system were used for citizens to track problems they find in their city, it would have all kinds of benefits, and indeed a few such systems have been created. The most popular that I have found is called SeeClickFix, and looking at the page for Berkeley, it seems like it is a system that is at least used by Berkeley residents. Another popular one is http://www.fixmystreet.com/. Of course, for the system to be truly effective, it would have to be endorsed by the city itself, and used by its employees as well, which is something I have yet to find an example of.

Other people have also written about this idea, and Portland appears to be considering it, so it seems this idea is ripe on the vine and ready to be picked.

The question now is what will it take to implement it correctly, and what system will be the one that gains usage. I fully expect to see more cities using this type of technology in the next few years.

Project Idea: "Breaking the Cycle: Isolating Easy Solutions to the Bike Theft Problem"

I've decided that I should start blogging my project ideas so that they may be aired more widely in public. I have amassed quite a number of these, and have been sitting on them for some time, but more and more, it's looking like I won't have time to get to all of my ideas. Starting today, I'll be writing out ideas that I have had. If you have project ideas of your own that you think might be interesting to share here, let me know, and we'll get yours posted too. If you're interested in pursuing one of these ideas, go for it!

And so, without further ado, I present.......

Breaking the Cycle: Isolating Solutions to the Bike Theft Problem
This is something that I have been thinking about for a good while, but considering more seriously as of late. Basically, what it amounts to is 90% a social/political solution, and 10% a programming and system design solution.

Here's the problem: Last year, during the recession, about 15 million new bikes were sold in the United States, and according to the FBI, in 2008, about 220,000 bikes were reported stolen. Obviously, both of these numbers are suspect. The former doesn't include the many thousand used bikes that were purchased during 2009, and the FBI's number clearly doesn't include the vast majority of the bikes stolen. Other estimates of the number of bikes stolen are much higher than the reported number. One estimate is that more than five million bikes are stolen every year in the U.S. Another estimate from the National Crime Victimization Survey is less pessimistic, with a 2006 estimate of 1.3 million stolen bikes per year. Despite these differences in numbers, and the problems of under reporting, the point is clear that this is a major problem in the United States.

Solutions: Honey pots and databases
There are at least three simple and cost-effective solutions to this problem. I'll start with the most fun one, which is to place a GPS unit deep in the bowels of a nice bike, and to poorly lock up that bike in a high theft area. This, in theory, will tempt thieves to steal the bike, and will lead to their arrest. Such sting operations have been done in the past, and have had great success, since many of the people stealing bikes are mass offenders, that are also wanted for other illegal activity [ref]. There are worries that this may amount to inducement to steal (and thus may be illegal), and also that linking the person that has the bike after the fact with the person that stole the bike in the first place may be difficult. But both of these are fairly easy problems to solve, if the operation is done carefully.

The second solution to this problem is to create a LoJack system for bikes. As far as I can tell, such as system has not yet been created. As was mentioned in the freakonomics blog, such a system creates a positive externality: Your placing a GPS device in your bike also reduces the theft of other bikes in the area by creating a scare that those bikes might have the system as well. There are challenges in placing such a system in a bike, such as battery life and getting the satellite signal in and out of the bike, but again, these can be worked out. There is demand for such a system: When working on another project related to bike theft, I asked a number of people about LoJack for bikes, and they were all excited about creating and using such a system.

The third, and perhaps most important, step in breaking the bike theft problem is to create a better national registry of bikes. At present, there are a number of registration systems. Cities have implementations, there is a for-profit organization that does registrations nationally (this is where my bikes are registered), and there is even a registry of bikes that have been stolen. What we need, is a single national registry. It has to be good, and it has to be used. All new bikes sold in the United States need to be entered into the system before the sale, and if somebody is buying a new bike, they need to first look it up in the system. This is a cultural shift, and can be brought about in a number of ways. For example, sites like Craigslist and E-Bay can encourage linking to the system when bikes are sold, manufacturers and bikes shops can be required (legally) to check the system for the bike, a paperwork trail can be created and enforced, similar to the system for car sales. These are all ideas for such a system, but the point is, that it needs to be built, and it needs to be supported. Some states already have laws relating to bike registration, but they aren't enforced. The assumption needs to shift from "This bike isn't registered, oh well" to "This bike isn't registered in your name, it is not yours."

Conclusions
Some clear conclusions emerge when looking at this problem. First, bike theft is huge. Millions of bikes are stolen each year. And, judging by the number of thefts that are reported and trickle up to the FBI's database, people don't feel that reporting the theft is worth the effort. If we assume that five million bikes are stolen each year, and that of those, 250,000 are reported, that's a reporting rate of only 5%.

A second conclusion we can draw from the above is that this problem is solvable. Using social and technical approaches, this can be solved quickly and relatively inexpensively. Furthermore, it's quite likely that many of the solutions to this problem can be profitable for both the organization implementing it, as well as the bikers whose bikes are no longer stolen.

In parting, I will conclude by pointing you to the best resource I've found on this problem, which is the Center for Problem-Oriented Policing's report on bicycle theft. It's brief, to the point, and informative. Enjoy.

References
A lot of the information for this post was gleaned from the following excellent resources:

  1. Problem-Oriented Guides for Police, Problem-Specific Guides Series, Guide No. 52: Bicycle Theft (Sponsored by the Department of Justice)
  2. The National Bike Registry (A for-profit organization)
  3. National Bicycle Dealers Association
  4. Federal Bureau of Investigation Uniform Crime Reporting Program
  5. National Crime Victimization Survey

Research Idea - The Age of the Internet

Tagged:  

I blogged a while back about a Firefox command that would tell you the last modified date of the page you were looking at, and it got me thinking....what is the age of the Internet on the whole?

I've been thinking about it a bit, and it seems like knowing this kind of information could prove pretty useful for certain circles. If there was a way to summarize the last modified date of every page on the Internet, we could pretty easily figure out how useful the information is.

Firefox add-on?

Syndicate content