{"componentChunkName":"component---src-templates-blog-entry-js","path":"/blog/2006/02/02/spambots_that_drink_coffee.html","result":{"data":{"markdownRemark":{"html":"<p>A long time ago, in a galaxy far, far away&#8230; OK, in\nReading. I used to work on a bug ugly pile of PHP. It didn't do much\nfor my liking of the language, and its advocates have tended to bring\nthe worst out in me ever since.</p>\n<p>However, I never got around to unsubscribing from the mailing list,\nand I still read it from time to time. So when the topic of hiding\nemail addresses from spambots came up, I couldn't resist (I'm also a\nbig advocate of accessibility, and I'm yet to find a good way to hide\ninformation from humans, but not from \"evil robots\").</p>\n<p>One thing led to another and the inevitable \"Write it out to the\nscreen with complicated JavaScript\" option came up, in this case using\nEnkoder.</p>\n<pre class=\"code\"><code>&lt;script language=\"javascript\"&gt;\nfunction hiveware_enkoder(){var i,j,x,y,x=\n\"x=\\\"783d227a3f24327a343f375e2465383436343835686738383536393837333839663838\" +\n\"38373b383867363936363234386736393839683939343438393b3939383633383937343438\" +\n\"3a363434346735386566383833373434326738393a37393834643835376538343868353866\" +\n\"3337356567343434343834663b3835336438353b3238356564395e24363d387b683f352963\" +\n\"29383d366838713374392a386b383f3b32383d366b363e327a3830366e3867687039693476\" +\n\"396a393d386b332d393f3434382b367d347b672d383f667738703767347567653963377238\" +\n\"67642a372965273429342d347a32303975367738643b7539763674382a656b382e3734352b\" +\n\"662b373d6521347b243d6c3f6778636e2a7a30656a637443762a322b2b3d7a3f7a30757764\" +\n\"7576742a332b3d7b3f29293d6871742a6b3f323d6b3e7a306e677069766a3d6b2d3f342b7d\" +\n\"7b2d3f7a307577647576742a6b2e332b3d216871742a6b3f333d6b3e7a306e677069766a3d\" +\n\"6b2d3f342b7d7b2d3f7a307577647576742a6b2e332b3d217b3f7b307577647576742a6c2b\" +\n\"3d223b793d27273b783d756e6573636170652878293b666f7228693d303b693c782e6c656e\" +\n\"6774683b692b2b297b6a3d782e63686172436f646541742869292d323b6966286a3c333229\" +\n\"6a2b3d39343b792b3d537472696e672e66726f6d43686172436f6465286a297d79\\\";y='';\" +\n\"for(i=0;i&lt;x.length;i+=2){y+=unescape('%'+x.substr(i,2));}y\";\nwhile(x=eval(x));}hiveware_enkoder();\n&lt;/script&gt;\n</code></pre>\n<p>Oh look, so good it generates HTML 3.2.</p>\n<p>But in all seriousness, this type of \"solution\" always has two main\nissues:</p>\n<ul>\n <li>There are bots out there gathering email addresses that can parse JavaScript.</li>\n <li>There are users out there who have JavaScript turned off, or browsers that don't support it in the first place.</li>\n</ul>\n<p>Just saying this wasn't enough though! Evidence was demanded!</p>\n<p>Just one problem. I've never tried parsing JavaScript\nprogrammatically before. First stop - <a\nhref=\"http://search.cpan.org/search?query=JavaScript&amp;mode=all\">CPAN</a>.</p>\n<p>I grabbed <a\nhref=\"http://search.cpan.org/~mschilli/JavaScript-SpiderMonkey-0.12/SpiderMonkey.pm\">JavaScript::SpiderMonkey</a>\nas I vaguely recalled hearing good things about it and got it\ninstalled. The documentation was nice and clear, so I hacked away.</p>\n<p>The only issue I had was that since the code was not being executed\nin a browser, there was no <code>document</code> object. It turned out\nto be relatively simple to create such an object and add a\n<code>write()</code> method to it.</p>\n<pre class=\"code\"><code>#!/usr/bin/perl\nuse strict;\nuse warnings;\nuse JavaScript::SpiderMonkey;\n\n<code class=\"comment\"># In a real Evil Spam Bot, \n# this would actually do something.</code>\nmy $the_javascript = get_the_javascript_from_somewhere();\n\n<code class=\"comment\"># Get a JavaScript interpreter ready.</code>\nmy $js = JavaScript::SpiderMonkey-&gt;new();\n$js-&gt;init();\n\n<code class=\"comment\"># Browsers can document.write\n# This script should too</code>\nmy $document = $js-&gt;object_by_path(\"document\");\n\n<code class=\"comment\"># We need somewhere to document.write to</code>\nmy $extracted_html;\n$js-&gt;function_set(\"write\", sub { \n        $extracted_html .= join('', @_) \n    }, $document);\n\n<code class=\"comment\"># Execute the JavaScript</code>\nmy $rc = $js-&gt;eval($the_javascript);\n\n<code class=\"comment\"># Output the retrieved HTML</code>\nprint $extracted_html;\n</code></pre>\n<p>Now this is a simple proof of concept and won't defeat all\nsituations. For example, it won't handle techniques to manipulate the\nDOM rather than document.writing. It also just dumps the generated\nHTML to standard out rather then trying to parse it and find the email\naddress.</p>\n<p>That said, it did only take about 15 minutes to write the script\n(from a \"Never used the module before\" starting point), and I can't\nimagine that adding such features would prove <em>that</em>\ndifficult.</p>\n<p>Finally, a point on morals. Am I helping spammers defeat attempts\nto hide email addresses from them? I don't think so, the problem is\ntoo trivial.</p>","frontmatter":{"slug":"spambots_that_drink_coffee.html","title":"Spambots that Drink Coffee","url":"/blog/2006/02/02/spambots_that_drink_coffee.html","date":"02 February 2006"}}},"pageContext":{"url":"/blog/2006/02/02/spambots_that_drink_coffee.html"}},"staticQueryHashes":["2817829322","3649515864"]}