Known Sender Finder

Download KnownSenderFinder.Zip (You might also need to install this Microsoft update to make it work)

KnownSenderFinder is a little program I wrote to help me evaluate some spam filters. Specifically, I wanted to find out if the programs had generated any false positive spam matches. I get thousands of spams a day, so doing this manually was not practical.

When you run the program, it first asks you to select Outlook folders that contain known-good emails and folders that contain emails that you sent. Using the emails in these folders, the program builds a white-list of known-good email addresses based on messages you've sent and messages you've kept in your inbox.

The program then asks for folders to look for false-positives in. Using the recently generated white-list, the program scans though the selected folder looking for messages where the FROM address is known-good. Any matches found are then displayed.

I always delete any spams I see while going through my emails, so any email left in my inbox is known-good. I have rules set up so that all spams that get tagged by a filter are automatically moved into my Junk E-Mail folder. By periodically running the program against these two folders, I can quickly find emails the spam filers or I may have miscategorized.

This strategy is certainly not perfect. It will not show a false-positive that came from an address that you never got an email from before. It will show spam emails where the spammer forged a FROM address that matches an address that you get good emails from. Still, it is a very simple way to potentially find some false positives.

While you are looking at the list of matches, clicking on any address will remove all matches with the FROM address from the list. This makes it easy to quickly get rid of the spams with forged good FROM addresses.

I know that I could have hacked up a VBA script in a few minutes that would do basically the same thing as this program, but the new Microsoft Visual Studio had just come out and this program seemed like a good excuse to try it. I made a point to really try to use every possible new Microsoft feature, including...

I have not written a windows program since Windows 2.0 when you had to do *everything* by hand in C, so I was excited to try the new visual stuff. I've posted the project source code for anyone who wants to see how any of this was done.

Download a zip file of the source code

In all, I think the concept of visual programming is fantastic. Clearly as the OS gets more complicated and UIs have work better, it gets harder to actually write code- and you really should not have to. Visual Studio has some amazing ways of doing things. Drawing forms and controls is so much better than laying them out on graph paper and then coding them up. Adding a field to a DataSet is as easy as right clicking on the table and clicking "add". The event metaphor of sticking your tiny code segments directly into places based on when you want them to get run is very cool. Intillisense basically makes it so that you don't need to remember any of the API, you can discover it as you type.

Unfortunately, it is not perfect. There were many times when I just could not figure out how to get the tool to do what I wanted it to. Other times I would shutdown and restart and then it would work as expected. Worse, the new .NET library is just not that well designed. Now that J# and C# have generics, there is no excuse for forcing me to typecast arguments, not ever. The whole BackgroundWorker pattern is horrible; now I understand why so many Windows apps do not do multithreading correctly. There should be an easy and elegant way to do something that almost every application needs to do, especially when doing it wrong makes the application unresponsive. Also, I found the Microsoft documentation to be very hard to use. The MSDN website has tons of info, but it is almost impossible to actually find what you need.

I'm looking forward to the next iteration of Microsoft development where the program almost becomes a database of UI components and code snippets. Fast development, flatter learning curve, higher quality applications- we can use more of all these.

FAQ

Q: Is this program actually useful?
A: It has turned out to be useful to me. The program remembers which folders you selected last time it was run, so it is very quick to start it up and do a scan. I use it to periodically scan my Deleted folder just before I purge it make sure I didn't accidentally delete any mails I want.

Q: Why do I get a pop-up telling me that a program is trying to access my Outlook information?
A: Microsoft added this to try and stop viruses from reading your contacts and sending out copies of themselves. Alas, it is a bit misguided since if I can get you to run my application, I could conceivably replace Outlook with my own program and do anything I want.

Q: Will this work with other versions of Outlook beside 2003?
A: I don't know. I don't think I used any 2003-specific features in the API but I have no way to test it. Worse case is that you will get an error message, so give it a try and let me know if it worked.

Q: If you have found a way to find false positives, why not write your own spam filter that never generates false positives?
A: This program is not a spam filter, it is just a tool to help you find false positives by basically generating an on-the-fly white-list and then checking your spam folder to see if there are any white-listed addresses in there. If you want a spam filter that doesn't generate false positives, check out my comparison of Brightmail and Mail-Filters.

Q: Why did you use an in-memory Dataset to store the matches?
A: I wanted to be able to visually design the database and then visually bind it to a DataGridView. This actually works very well - there is almost no code required to do it. This pattern should be better documented and supported.

Q: Why did you use J# and not C#?
A: I really like Java syntax, and was interested to see how J# fit into the .NET framework. I'll probably do my next project in C#.

Updates

5/14/2009 - ClickOnce stopped working so instead just added a link to download a ZIP file with the program in it. You might need to install this Microsoft pack to make the EXE work.

12/24/2006 - New version 1.0.14. Instead of building a special-purpose data table to show the matches, now I just make a new subfolder in Outlook and move all the matches into there. The Microsoft datagridview control has some problems when you want to resort based on columns. The new way lets you use all of Outlook's normal sorting and searching tools to go though the matches. Being able to group by sender is particularly efficient. Note that because of an anoyying Microsoft bug, you will need to manually uninstall the old version and install the new one. I also upgraded to work with Outlook 2007.

Note that I just published note on my adventures in hacking a GMAIL account here.


###