2 years ago I wanted to do a small pointless module to visualize my wordpress spam.
nothing new or original here: of course the work of Alex Dragulescu ( his website is down -> Google Image ) or http://www.spamrecycling.com/ and more recently the http://www.spamghetto.com/ immediately spring to mind. beautiful pieces in all cases.
as shown in this akismet stats’ screenshot, this very blog has been spammed exponentially for the past 4 years.
in 2011, 58.072 spams have been sent (and blocked! \o/) and in september 2012, there was already more spmas than that… spam is an abundant resource. it is not steady though. sometimes you’d get a thousand spams over 3 days and then nothing.
CONSTRAINTS
even though the plugin is pointless, pointlessness leads you nowhere unless you give it some constraints. so the plugin should:
- handle an arbitrary amount of spams ( from 1 to X )
- generate a colorful output
- use the spam’s contents to generate random yet reproductible visuals
- be animated and visually pleasing
1 – HANDLE THE SPAM
WordPress is good at that ; it’s pretty easy to create your own plugin. I won’t go too much into details here but I followed a couple of tutorials[1] [2] and pretty quickly I had a plugin up and running with an admin page that allows the user to select the amount of spam to use and pass extra params to the animation if need be.
I’ve done a quick typology of the spams and classified them in 4 categories:
the flatter
no link embedded in the message’s body but one as the user’s website. the message doesn’t contain explicit keywords and is written in the target’s language. the text is usually flattering and the incentive relies on the reciepient’s ego. it can also be a direct question to the recipient. usually promotes services: car insurance, travel, games…I read a lot of blogs recently and yours is among the best. I enjoy reading through your posts — clear as well as well written. Your page will go right to my bookmarks. I got some good inspirational ideas after reading this. via
explicit lyrics
one link is embedded in the message’s body and as the user’s website. it contains explicit keywords and usually, the text is written in the target’s language. the incentive relies on health / voyeur / money making concerns and the fact that most of the recipients could be concerned by the keywords. it can promote anything related or not to the text.cardio exercise Hello, losing fat genie made it easy and enjoyable to lose weight. via
naked taylor swift pics taylor swift nude picture via ( it’s a trap! I checked :) )
How should I invest my money to make more money in a short amount of time? via
fat bob
the text is no more structured in sentences, they contain a massive amount of links, a massive amount explicit keywords written in any language, usually includes some typos to bypass spam filters.cheap, cheapest, buy, travel, insurance, car, health, best, viagra, cialis, levitra, doxycycline, tramadol, auto, rates, rating, ratings, medical, blackjack, game, life, casino, slots, affordable[...]
terminator
no more text, just links, lots of them.http://www.micizibi.com/271.php 95336
http://www.micizibi.com/270.php 68076
http://www.micizibi.com/269.php 43661[...]
so, not only are the spams numerous, they’re also varied in quality which is a good thing to create a rich visual experience. in WP there are several extra data coming along with every spam: http://codex.wordpress.org/Function_Reference/get_comment
I chose to keep the richest and most meaningful fields:
- comment_ID
- comment_author
- comment_author_IP
- comment_date
- comment_content
2 – EXTRACT PALETTES
I wanted to create some colorful animation, I used the IP address which is made of 4 ints (comprised between 0 and 0xFF ) and looped through them to get 4 main colors. 2 problems emerged:
1 some IPs could give very dark colors. for example the IP 50.115.164.25 would output this:
2 as the bots can send massive amounts of spam from the same IP, we would get series of very close colors.
to make colors brighter, I balance the IP components with the time at which the spam was published. if the RGB component fall under a given threshold ( 0×50 ), then I add the Hour, Minute or Second value. with the above IP, it gives:
I also did a “complementary” palette by subtracting the time from the IP.
here’s a sample palette generated from 20 spams, to the left we have the IP addresses, to the right, the time.
I’m still not statisfied by how it separates IP series but it gives a varied and somewhat eye-pleasing palette. a good thing is that there are vivid colors as well as more “pastel” colors and they keep the same saturation level.
3 – EXTRACT CONTENT, CREATE RANDOMNESS
to create interesting visuals, we need numeric values (numbers, digits, you name it), lots fo them. ID, IP and Date are already numbers and we just saw how to convert them into colors. we could easily turn numbers into text for example our previous IP 50.115.164.25 can turn into: 2s¤ or fbpqez…
now, we need to turn letters into numbers.
fortunately for us, no need to take the meaning into account. no data mining, no cypher/decypher and very basic string manipulation will be necessary as we’re only concerned by the plasticity of the contents.
some values (metrics) are easy to extract, here’s the ones I chose:
private var _words:Array; private var _wordCount:int; private var _wordsLengths:Array; private var _wordLetterAlphaPosition:Array; private var _wordLetterASCIICode:Array; private var _lettersCount:int; private var _lettersAlphaPosition:Array; private var _lettersASCIICode:Array; private var _lettersRecurrence:Array; private var _hasPunctuation:Boolean; private var _punctuationCount:int = 0; private var _hasSpecialChars:Boolean; private var _specialCharsCount:int = 0; private var _hasLinks:Boolean; private var _linksCount:int = 0;
for instance this (fake) small one:
hello beauty! want to see [url=http://bigbadwolf.com/]the big bad wolf[/url]?
returns this:
words: hello,beauty!,want,to,see,[url=http://bigbadwolf.com/]the,big,bad,wolf[/url]?
wordCount: 9
wordsLengths: 5,7,4,2,3,31,3,3,11
lettersCount: 69
lettersAlphaPosition: 8, 5, 12, 12, 15, 2, 5, 1, 21, 20, 25, -31, 23, 1, 14, 20, 20, 15, 19, 5, 5, 27, 21, 18, 12, -3, 8, 20, 20, 16, -6, -17, -17, 2, 9, 7, 2, 1, 4, 23, 15, 12, 6, -18, 3, 15, 13, -17, 29, 20, 8, 5, 2, 9, 7, 2, 1, 4, 23, 15, 12, 6, 27, -17, 21, 18, 12, 29, -1
lettersASCIICode: 104, 101, 108, 108, 111, 98, 101, 97, 117, 116, 121, 33, 119, 97, 110, 116, 116, 111, 115, 101, 101, 91, 117, 114, 108, 61, 104, 116, 116, 112, 58, 47, 47, 98, 105, 103, 98, 97, 100, 119, 111, 108, 102, 46, 99, 111, 109, 47, 93, 116, 104, 101, 98, 105, 103, 98, 97, 100, 119, 111, 108, 102, 91, 47, 117, 114, 108, 93, 63
lettersRecurrence: 4, 5, 1, 2, 5, 2, 2, 3, 2, 0, 0, 6, 1, 1, 5, 1, 0, 2, 1, 6, 3, 0, 3, 0, 1, 0
hasPunctuation: true count: 4
hasSpecialChars: true count: 10
hasLinks: true count: 1
this is enough to make each spam unique.
now to get more randomness, we have to process these values ; as such they’re repetitive series of value ranges (0-26).
THE GRANULARITY OF S.A.N.D
one key feature of any generative piece is the parser. the mechanics that turn a dataset into another. in our case splitting a text into numbers or chunks of numbers. there are ways and all of them relate more or less to the field of statistics or string metrics. no need to go too deep into it some very simple operations will do.
SAND is a (homebrewed) mnemonic standing for Sort Average Normalize Distribute, 4 of the main actions when you deal with data. Sorting lets you classify the data according to one or more criteria, Averaging gives a baseline to the dataset, Normalizing helps you balance the differences between values by giving them the same scale and distributing is the fact of creating new associations (that can in turn be sorted, averaged, normalized and distributed etc.).
Sorting, is trivial, normalizing is about making the overall value of each entry a fraction of 1 so that when you add everything, the sum of all values equals to one. to normalize a Vector of Numbers, you basically sum up all the values of the Vector then divide each entry by the sum. with a twist, we can find the minimum and maximum bounds and map the entry’s value between them.
PRNG MY LOVE I LOVE
some time ago, I’ve done a series of tests with a Pseudo Random Number Generator [3].
later I’ve silently improved it but the idea is the same, give it a seed, it will spit a series of seemingly random numbers. below is an example of a PRNG:
PRNG – wonderfl build flash online
you should notice dead spots ; places where the values seem not to change. it is normal. the algorithm works by shifting bits then operates a modulo hence producing redundant series of values
rule-based systems such as L-System [4] can also create very complex (yet reproductible) series of values.
all these operations are aimed at preparing the content to be displayed by a range of animations. but turning the spam into a universal purée doesn’t tell us much about how to render it.
that’s where we’ll use the right side of our brain.
THE RIGHT SIDE IN ACTION
the right hemisphere of the brain is supposedly the craddle of imagination and creativity. that’s where you’d drop your keyboard, use your mouse and do what creatives call a moodboard. a moodbaord is a great tool even if it often looks like a childish collage. this is for instance a snapshot of the first Google Image search results for moodbaord:
inspiration uses a powerful (yet poorly documented) feature of the brain called divergent thinking [5] and a moodboard helps you feed this aptitude. gathering images, photos, color palettes, videos, animations produced by other artists will gradually build a visual culture in which you’ll pick, combine transform and re-arrange new ideas, new visuals, new animations. in a word : be creative.
creativity doesn’t exist ex-nihilo, it’s fed by what you saw, what moved you and what you want to share with others.
I won’t inflict you my own moodboard but ever since I’m a kid, I love chaotic phenomena ; the sky is an endless source of amazement: patterns of pure light and colors. I like stones and natural media, plants, shells, every rule based life form. I love insects (dead ones), for their essential dryness and the way they’re shaped by / adapted to their environment. I also like geometric patterns, structures, fabrics, jewels, very tight mechanics ( clocks, watches, engines… ), toys, small and dense objects in general, boxes, networks… [long list]
it’s funny to notice that just as we turned text into numbers, we now have to turn ideas (concepts) into a form. it is some sort of crystalization and is the actual meaning of information as ooposed to an instruction ; turning an idea into a structure.
I chose to create a graphic object, dense, with a strong apparent underlying structure (using linked geometric shapes) yet not self explanatory (the meaning of the original content should be sublimated).
after some research and a lot of failed attempts, I ended up with a first satisffactory result depicted here:
it appears that the process is very close to wht Boris Müller did for the poetry on the road festival poster in 2006 http://www.esono.com/boris/projects/poetry06/ (a bit more colourful). his work is a great source of inspiration.
then I did a variations inspired by Tatiana Plakhova Music iS Math series that you can see here: http://www.complexitygraphics.com/ along with some other breathtaking works.
here’s what I ended up with:
and a bigger one
and another variation that rotates some smoothed 2D convex hulls in 3D.
and another variation that uses drier lines that made me think of insects somewhat.
etc.
variations are infinite and all sorts of shpaes can be generated.
as there was no point in this module, I could play forever. if some people are really good at iterative works (monomaniac sort of), I’m not.
it’s a spam toy, multiplying the variations wouldn’t make it better.
there’s a live version on the right side bar with the latest 25 spams running, click to skip to the next, otherwise I’ve uploaded some bigger pics on Flickr: spamarium.
this wasn’t completely useless ; I’ve started an actual ‘piece’, hope I’ll find some time to finish it properly.
this long article is over,
thanks for reading
resources & links:
[1] http://corpocrat.com/2009/12/27/tutorial-how-to-write-a-wordpress-plugin/
[2] http://net.tutsplus.com/tutorials/wordpress/creating-a-custom-wordpress-plugin-from-scratch/
[3] Seed-based Noise Generator
[4] L-system bis repetita
[5] http://faculty.washington.edu/ezent/imdt.htm
my beloved readers wrote…