JavaScript/PHP Unicode Notes

UTF-8/Unicode-String To “Byte-String” Notes

JavaScript
unescape(encodeURIComponent("א"))
"א"

unescape(encodeURIComponent("א")).length
2

unescape(encodeURIComponent("א")).split('').map(function(c){ return c.charCodeAt(0) })/code>
[215, 144]

javascript_2016-03-16_000730


PHP
var_dump( 'א' );
— string(2) "א"
hmmmm…. 2 ?
it *might* be already processing UTF-8…. :)
var_dump( unpack('C*', ('א')) );

array(2) { [1]=> int(215) [2]=> int(144) }
(* Yes it does, natively!*)
..so if you
var_dump( unpack('C*', utf8_encode('א')) );
…….

array(4) { [1]=> int(195) [2]=> int(151) [3]=> int(194) [4]=> int(144) }

make sure it’s indeed UTF-8 globally everywhere:
1. encode the php file itself to UTF-8.
2. include those lines in the very top of the page:
mb_language("uni");
@mb_internal_encoding('UTF-8');
setlocale(LC_ALL,'en_US.UTF-8');
3*. Also, if you are outputting stuff.. make sure to send out an header with charset:
header('Content-Type: text/plain; charset=UTF-8');