This post was syndicated from: The Hacker Factor Blog and was written by: The Hacker Factor Blog. Original post: at The Hacker Factor Blog
As a software developer, one of my core philosophies is to automate common tasks. If you have to do something more than once, then it is better to automate it.
Of course, there always is that trade-off between the time to automate and the time saved. If a two-minute task takes 20 hours to automate, then it’s probably faster (and a better use of your time) to do it manually when needed. However, if you need to do it hundreds of times, then it’s better to spend 20 hours automating it.
Sometimes you may not even realize how often you do a task. All of those minutes may not seem like much, but they can really add up.
Work Harder, Not Smarter
FotoForensics is currently receiving over 1,000 unique pictures per day. We’re at the point where we can either (A) hire more administrators, or (B) simplify existing administrative duties.
Recently I’ve been taking a closer look at some of the tasks we manually perform. Things like categorizing content for various research projects, identifying trends, scanning content for “new” features that run the gambit from new devices to new attacks, reviewing flagged content, and responding to user requests. A lot of these tasks are time consuming and performed more than once. And a few of them can be automated.
Network abuses come in many different forms. Users may upload prohibited content, automate submissions, attack the site with port scans and vulnerability tests, or submit comment-spam to our contact form. It’s always a good idea to check abusers against known blacklists. This tells me whether it is a wide-spread abuse or if my site is just special.
There are a bunch of servers that run DNS-based blacklists. They all work in similar ways:
- You encode the query as a hostname. Like “22.214.171.124.dnsbl.whatever”. This encodes the IP address in reverse-notation: 127.9.1.2.
- You perform a DNS hostname lookup.
- The DNS result encodes the response as an IP address. Different DNSBL servers have different encoded values, but they typically report suspicious behavior, known proxies, and spammer.
Some DNSBL servers seem too focused for my use. For example, if they only report known-spam systems and not proxies or malware, then it will rarely find a match for my non-spam queries. Other DNSBL systems seem to have dated content, with lists of proxies that have not been active for years. (One system will quickly add proxies but won’t remove them without a request. So dead proxies remain listed indefinitely.)
Most DNSBL servers focus on anti-spam. They report whether the address was used to send spam, harvest addresses, or other related actions. Ideally, I’d like a DNSBL that focuses on other hostile activities: network scanners, attackers, and proxies. But for now, looking for other abuses, like harvesters and comment-spam, is good enough.
I believe that anonymous proxies are important. They permit whistle-blowers to make anonymous reports and allow people to discuss personal issues without the fear of direct retribution. Groups like “Alcoholics Anonymous” would not be as successful if members had to be fully outed.
Unfortunately, anonymity also permits abuses. The new automated system downloads the list of TOR nodes daily. This allows us to easily check if a ban is tied to a TOR node. We don’t ban every TOR node. Instead, we only ban the nodes used for uploading prohibited content to the site.
For beginner TOR users, this may not make sense. Banning one node won’t stop the problem since the user will just change nodes. Except… Not all TOR nodes are equal. Nodes that can handle a higher load are given a higher weight and are more likely to carry traffic. We’ve only banned about 300 of the 6,100 TOR nodes, but that seems to have stopped most abuses from TOR. (And best yet: only about a dozen of these bans were manually performed — most were caught by our auto-ban system.)
The newly automated system also scans the logs for own ban records and any actions made after being banned. I can tell if the network address is associated with network attacks or if the user just uploaded prohibited content. I can also tell if the user attempted to avoid the ban.
I recently had one person request a ban-removal. He claimed that he didn’t know why he was banned. After looking at the automated history report, I decided to leave the ban in place and not respond to him. But I was very tempted to write something like: “Dude… You were banned three seconds after you uploaded that picture. You saw the ban message that said to read the FAQ, and you read it twelve seconds later. Then you reloaded eight times, switched browsers, switched computers, and then tried to avoid the ban by changing your network address. And now you’re claiming that you don’t know why you were banned? Yeah, you’re still banned.”
Performing a full history search though the logs for information related to a ban used to take minutes. Now it takes one click.
The word forensics means “relating to the use of scientific knowledge or methods in solving crimes” or “relating to, used in, or suitable to a court of law”. When you see a forensic system, you know it is geared toward crime detection and legal issues.
And people who deal in child exploitation photos know that their photos are illegal. Yet, some people are stupid enough to upload illegal pictures to FotoForensics.
The laws regarding these pictures are very explicit: we must report pictures related to child abuse and exploitation to the CyberTipline at the National Center for Missing and Exploited Children (NCMEC).
While I don’t mind the reporting requirement, I don’t like the report form. The current online form has dozens of fields and takes me more than 6 minutes to complete each time I need to submit a report. I need to gather the picture(s), information about the submitter, and other related log information. Some reports have a lot of files to attach, so they can take 12 minutes or more to complete. The total time I’ve spent using this form in the last year can be measured in days.
I’ve finally had enough of the manual submission process. I just spent a few days automating it from my side. It’s a PHP script that automatically logs in (for the session tokens), grabs the form (for the fields and any pre-populated values), fills out the data, attaches files, and submits it. It also automatically writes a short report (that I can edit with more information), records the confirmation information, and archives the stuff I am legally required to retain.
Instead of taking me 6+ minutes for each report, it now takes about 3 seconds. This simplifies the entire reporting process and significantly reduces the ick-factor.
Will Work for Work
A week of programming effort (spread over three weeks) has allowed me to reduce the overhead. Administrative tasks that would take a few hours each day now take minutes.
There’s still a good number of tasks that can be automated. This includes spotting certain types of pictures that are currently being included in specific research projects, and some automated classification. I can probably add in a little more automated NCMEC reporting, for those common cases where there is no need for a manually confirmation.
Eventually I will need to get a more powerful server and maybe bring on more help. But for right now, simply automating common tasks makes the current server very manageable.