PHP Byte String

Posted at

String To Byte-String.

On a Byte-String the maximum value of each character is 255,
While standard ASCII strings (such as "abcdef") will stay the same, any Unicode (for example) characters will be 'split up' to a pair (or triplet or more..) of characters.

For example: א really is \u00D7\u0090 in "byte-string'y" :)


On this first attempt, we'll break the string "א" and glue it to together,
(but the PHP engine will render it, server-side, perfectly back in to the "א" character again..)

<?php
  mb_language("uni"); /* inner engine to Unicode support */
  @mb_internal_encoding('UTF-8');
  setlocale(LC_ALL,'en_US.UTF-8');
  header('Content-Type: text/plain;charset=UTF-8');

  $string = 'א';
  $byte_array = unpack('C*', $string); /* byte array (character values) [215,144] */
  $char_array = array_map(function($byte){ /* char array (actual binary-string by character ['�','�'] */
                  return chr($byte);
                }, $byte_array);

  $string = implode('', $char_array); /* binary string */
  var_dump($string);
?>


Just for the presentation, the following "fix" will enable you to show the binary-string, fooling the PHP-engine (thinking it has already assembled it as Unicode-output):

<?php
  mb_language("uni"); /* inner engine to Unicode support */
  @mb_internal_encoding('UTF-8');
  setlocale(LC_ALL,'en_US.UTF-8');
  header('Content-Type: text/plain;charset=UTF-8');

  $string = 'א';
  $byte_array = unpack('C*', $string); /* byte array (character values) [215,144] */
  $char_array = array_map(function($byte){ /* char array (actual binary-string by character ['�','�'] */
                  return mb_convert_encoding ( chr($byte) , 'UTF-8', 'US-ASCII');
                }, $byte_array);

  $string = implode('', $char_array); /* binary string */
  var_dump($string);
?>


But keep in mind, this is only for viewing stuff,
since it will double the length of every-string (due to additional padding for each character)