archive

- Random tools & helpful resources for IRC
git clone git://git.acid.vegas/archive.git
Log | Files | Refs | Archive

stb_image.h (260289B)

      1 /* stb_image - v2.19 - public domain image loader - http://nothings.org/stb
      2                                   no warranty implied; use at your own risk
      3 
      4    Do this:
      5       #define STB_IMAGE_IMPLEMENTATION
      6    before you include this file in *one* C or C++ file to create the implementation.
      7 
      8    // i.e. it should look like this:
      9    #include ...
     10    #include ...
     11    #include ...
     12    #define STB_IMAGE_IMPLEMENTATION
     13    #include "stb_image.h"
     14 
     15    You can #define STBI_ASSERT(x) before the #include to avoid using assert.h.
     16    And #define STBI_MALLOC, STBI_REALLOC, and STBI_FREE to avoid using malloc,realloc,free
     17 
     18 
     19    QUICK NOTES:
     20       Primarily of interest to game developers and other people who can
     21           avoid problematic images and only need the trivial interface
     22 
     23       JPEG baseline & progressive (12 bpc/arithmetic not supported, same as stock IJG lib)
     24       PNG 1/2/4/8/16-bit-per-channel
     25 
     26       TGA (not sure what subset, if a subset)
     27       BMP non-1bpp, non-RLE
     28       PSD (composited view only, no extra channels, 8/16 bit-per-channel)
     29 
     30       GIF (*comp always reports as 4-channel)
     31       HDR (radiance rgbE format)
     32       PIC (Softimage PIC)
     33       PNM (PPM and PGM binary only)
     34 
     35       Animated GIF still needs a proper API, but here's one way to do it:
     36           http://gist.github.com/urraka/685d9a6340b26b830d49
     37 
     38       - decode from memory or through FILE (define STBI_NO_STDIO to remove code)
     39       - decode from arbitrary I/O callbacks
     40       - SIMD acceleration on x86/x64 (SSE2) and ARM (NEON)
     41 
     42    Full documentation under "DOCUMENTATION" below.
     43 
     44 
     45 LICENSE
     46 
     47   See end of file for license information.
     48 
     49 RECENT REVISION HISTORY:
     50 
     51       2.19  (2018-02-11) fix warning
     52       2.18  (2018-01-30) fix warnings
     53       2.17  (2018-01-29) bugfix, 1-bit BMP, 16-bitness query, fix warnings
     54       2.16  (2017-07-23) all functions have 16-bit variants; optimizations; bugfixes
     55       2.15  (2017-03-18) fix png-1,2,4; all Imagenet JPGs; no runtime SSE detection on GCC
     56       2.14  (2017-03-03) remove deprecated STBI_JPEG_OLD; fixes for Imagenet JPGs
     57       2.13  (2016-12-04) experimental 16-bit API, only for PNG so far; fixes
     58       2.12  (2016-04-02) fix typo in 2.11 PSD fix that caused crashes
     59       2.11  (2016-04-02) 16-bit PNGS; enable SSE2 in non-gcc x64
     60                          RGB-format JPEG; remove white matting in PSD;
     61                          allocate large structures on the stack;
     62                          correct channel count for PNG & BMP
     63       2.10  (2016-01-22) avoid warning introduced in 2.09
     64       2.09  (2016-01-16) 16-bit TGA; comments in PNM files; STBI_REALLOC_SIZED
     65 
     66    See end of file for full revision history.
     67 
     68 
     69  ============================    Contributors    =========================
     70 
     71  Image formats                          Extensions, features
     72     Sean Barrett (jpeg, png, bmp)          Jetro Lauha (stbi_info)
     73     Nicolas Schulz (hdr, psd)              Martin "SpartanJ" Golini (stbi_info)
     74     Jonathan Dummer (tga)                  James "moose2000" Brown (iPhone PNG)
     75     Jean-Marc Lienher (gif)                Ben "Disch" Wenger (io callbacks)
     76     Tom Seddon (pic)                       Omar Cornut (1/2/4-bit PNG)
     77     Thatcher Ulrich (psd)                  Nicolas Guillemot (vertical flip)
     78     Ken Miller (pgm, ppm)                  Richard Mitton (16-bit PSD)
     79     github:urraka (animated gif)           Junggon Kim (PNM comments)
     80     Christopher Forseth (animated gif)     Daniel Gibson (16-bit TGA)
     81                                            socks-the-fox (16-bit PNG)
     82                                            Jeremy Sawicki (handle all ImageNet JPGs)
     83  Optimizations & bugfixes                  Mikhail Morozov (1-bit BMP)
     84     Fabian "ryg" Giesen                    Anael Seghezzi (is-16-bit query)
     85     Arseny Kapoulkine
     86     John-Mark Allen
     87 
     88  Bug & warning fixes
     89     Marc LeBlanc            David Woo          Guillaume George   Martins Mozeiko
     90     Christpher Lloyd        Jerry Jansson      Joseph Thomson     Phil Jordan
     91     Dave Moore              Roy Eltham         Hayaki Saito       Nathan Reed
     92     Won Chun                Luke Graham        Johan Duparc       Nick Verigakis
     93     the Horde3D community   Thomas Ruf         Ronny Chevalier    github:rlyeh
     94     Janez Zemva             John Bartholomew   Michal Cichon      github:romigrou
     95     Jonathan Blow           Ken Hamada         Tero Hanninen      github:svdijk
     96     Laurent Gomila          Cort Stratton      Sergio Gonzalez    github:snagar
     97     Aruelien Pocheville     Thibault Reuille   Cass Everitt       github:Zelex
     98     Ryamond Barbiero        Paul Du Bois       Engin Manap        github:grim210
     99     Aldo Culquicondor       Philipp Wiesemann  Dale Weiler        github:sammyhw
    100     Oriol Ferrer Mesia      Josh Tobin         Matthew Gregan     github:phprus
    101     Julian Raschke          Gregory Mullen     Baldur Karlsson    github:poppolopoppo
    102     Christian Floisand      Kevin Schmidt                         github:darealshinji
    103     Blazej Dariusz Roszkowski                                     github:Michaelangel007
    104 */
    105 
    106 #ifndef STBI_INCLUDE_STB_IMAGE_H
    107 #define STBI_INCLUDE_STB_IMAGE_H
    108 
    109 // DOCUMENTATION
    110 //
    111 // Limitations:
    112 //    - no 12-bit-per-channel JPEG
    113 //    - no JPEGs with arithmetic coding
    114 //    - GIF always returns *comp=4
    115 //
    116 // Basic usage (see HDR discussion below for HDR usage):
    117 //    int x,y,n;
    118 //    unsigned char *data = stbi_load(filename, &x, &y, &n, 0);
    119 //    // ... process data if not NULL ...
    120 //    // ... x = width, y = height, n = # 8-bit components per pixel ...
    121 //    // ... replace '0' with '1'..'4' to force that many components per pixel
    122 //    // ... but 'n' will always be the number that it would have been if you said 0
    123 //    stbi_image_free(data)
    124 //
    125 // Standard parameters:
    126 //    int *x                 -- outputs image width in pixels
    127 //    int *y                 -- outputs image height in pixels
    128 //    int *channels_in_file  -- outputs # of image components in image file
    129 //    int desired_channels   -- if non-zero, # of image components requested in result
    130 //
    131 // The return value from an image loader is an 'unsigned char *' which points
    132 // to the pixel data, or NULL on an allocation failure or if the image is
    133 // corrupt or invalid. The pixel data consists of *y scanlines of *x pixels,
    134 // with each pixel consisting of N interleaved 8-bit components; the first
    135 // pixel pointed to is top-left-most in the image. There is no padding between
    136 // image scanlines or between pixels, regardless of format. The number of
    137 // components N is 'desired_channels' if desired_channels is non-zero, or
    138 // *channels_in_file otherwise. If desired_channels is non-zero,
    139 // *channels_in_file has the number of components that _would_ have been
    140 // output otherwise. E.g. if you set desired_channels to 4, you will always
    141 // get RGBA output, but you can check *channels_in_file to see if it's trivially
    142 // opaque because e.g. there were only 3 channels in the source image.
    143 //
    144 // An output image with N components has the following components interleaved
    145 // in this order in each pixel:
    146 //
    147 //     N=#comp     components
    148 //       1           grey
    149 //       2           grey, alpha
    150 //       3           red, green, blue
    151 //       4           red, green, blue, alpha
    152 //
    153 // If image loading fails for any reason, the return value will be NULL,
    154 // and *x, *y, *channels_in_file will be unchanged. The function
    155 // stbi_failure_reason() can be queried for an extremely brief, end-user
    156 // unfriendly explanation of why the load failed. Define STBI_NO_FAILURE_STRINGS
    157 // to avoid compiling these strings at all, and STBI_FAILURE_USERMSG to get slightly
    158 // more user-friendly ones.
    159 //
    160 // Paletted PNG, BMP, GIF, and PIC images are automatically depalettized.
    161 //
    162 // ===========================================================================
    163 //
    164 // Philosophy
    165 //
    166 // stb libraries are designed with the following priorities:
    167 //
    168 //    1. easy to use
    169 //    2. easy to maintain
    170 //    3. good performance
    171 //
    172 // Sometimes I let "good performance" creep up in priority over "easy to maintain",
    173 // and for best performance I may provide less-easy-to-use APIs that give higher
    174 // performance, in addition to the easy to use ones. Nevertheless, it's important
    175 // to keep in mind that from the standpoint of you, a client of this library,
    176 // all you care about is #1 and #3, and stb libraries DO NOT emphasize #3 above all.
    177 //
    178 // Some secondary priorities arise directly from the first two, some of which
    179 // make more explicit reasons why performance can't be emphasized.
    180 //
    181 //    - Portable ("ease of use")
    182 //    - Small source code footprint ("easy to maintain")
    183 //    - No dependencies ("ease of use")
    184 //
    185 // ===========================================================================
    186 //
    187 // I/O callbacks
    188 //
    189 // I/O callbacks allow you to read from arbitrary sources, like packaged
    190 // files or some other source. Data read from callbacks are processed
    191 // through a small internal buffer (currently 128 bytes) to try to reduce
    192 // overhead.
    193 //
    194 // The three functions you must define are "read" (reads some bytes of data),
    195 // "skip" (skips some bytes of data), "eof" (reports if the stream is at the end).
    196 //
    197 // ===========================================================================
    198 //
    199 // SIMD support
    200 //
    201 // The JPEG decoder will try to automatically use SIMD kernels on x86 when
    202 // supported by the compiler. For ARM Neon support, you must explicitly
    203 // request it.
    204 //
    205 // (The old do-it-yourself SIMD API is no longer supported in the current
    206 // code.)
    207 //
    208 // On x86, SSE2 will automatically be used when available based on a run-time
    209 // test; if not, the generic C versions are used as a fall-back. On ARM targets,
    210 // the typical path is to have separate builds for NEON and non-NEON devices
    211 // (at least this is true for iOS and Android). Therefore, the NEON support is
    212 // toggled by a build flag: define STBI_NEON to get NEON loops.
    213 //
    214 // If for some reason you do not want to use any of SIMD code, or if
    215 // you have issues compiling it, you can disable it entirely by
    216 // defining STBI_NO_SIMD.
    217 //
    218 // ===========================================================================
    219 //
    220 // HDR image support   (disable by defining STBI_NO_HDR)
    221 //
    222 // stb_image now supports loading HDR images in general, and currently
    223 // the Radiance .HDR file format, although the support is provided
    224 // generically. You can still load any file through the existing interface;
    225 // if you attempt to load an HDR file, it will be automatically remapped to
    226 // LDR, assuming gamma 2.2 and an arbitrary scale factor defaulting to 1;
    227 // both of these constants can be reconfigured through this interface:
    228 //
    229 //     stbi_hdr_to_ldr_gamma(2.2f);
    230 //     stbi_hdr_to_ldr_scale(1.0f);
    231 //
    232 // (note, do not use _inverse_ constants; stbi_image will invert them
    233 // appropriately).
    234 //
    235 // Additionally, there is a new, parallel interface for loading files as
    236 // (linear) floats to preserve the full dynamic range:
    237 //
    238 //    float *data = stbi_loadf(filename, &x, &y, &n, 0);
    239 //
    240 // If you load LDR images through this interface, those images will
    241 // be promoted to floating point values, run through the inverse of
    242 // constants corresponding to the above:
    243 //
    244 //     stbi_ldr_to_hdr_scale(1.0f);
    245 //     stbi_ldr_to_hdr_gamma(2.2f);
    246 //
    247 // Finally, given a filename (or an open file or memory block--see header
    248 // file for details) containing image data, you can query for the "most
    249 // appropriate" interface to use (that is, whether the image is HDR or
    250 // not), using:
    251 //
    252 //     stbi_is_hdr(char *filename);
    253 //
    254 // ===========================================================================
    255 //
    256 // iPhone PNG support:
    257 //
    258 // By default we convert iphone-formatted PNGs back to RGB, even though
    259 // they are internally encoded differently. You can disable this conversion
    260 // by by calling stbi_convert_iphone_png_to_rgb(0), in which case
    261 // you will always just get the native iphone "format" through (which
    262 // is BGR stored in RGB).
    263 //
    264 // Call stbi_set_unpremultiply_on_load(1) as well to force a divide per
    265 // pixel to remove any premultiplied alpha *only* if the image file explicitly
    266 // says there's premultiplied data (currently only happens in iPhone images,
    267 // and only if iPhone convert-to-rgb processing is on).
    268 //
    269 // ===========================================================================
    270 //
    271 // ADDITIONAL CONFIGURATION
    272 //
    273 //  - You can suppress implementation of any of the decoders to reduce
    274 //    your code footprint by #defining one or more of the following
    275 //    symbols before creating the implementation.
    276 //
    277 //        STBI_NO_JPEG
    278 //        STBI_NO_PNG
    279 //        STBI_NO_BMP
    280 //        STBI_NO_PSD
    281 //        STBI_NO_TGA
    282 //        STBI_NO_GIF
    283 //        STBI_NO_HDR
    284 //        STBI_NO_PIC
    285 //        STBI_NO_PNM   (.ppm and .pgm)
    286 //
    287 //  - You can request *only* certain decoders and suppress all other ones
    288 //    (this will be more forward-compatible, as addition of new decoders
    289 //    doesn't require you to disable them explicitly):
    290 //
    291 //        STBI_ONLY_JPEG
    292 //        STBI_ONLY_PNG
    293 //        STBI_ONLY_BMP
    294 //        STBI_ONLY_PSD
    295 //        STBI_ONLY_TGA
    296 //        STBI_ONLY_GIF
    297 //        STBI_ONLY_HDR
    298 //        STBI_ONLY_PIC
    299 //        STBI_ONLY_PNM   (.ppm and .pgm)
    300 //
    301 //   - If you use STBI_NO_PNG (or _ONLY_ without PNG), and you still
    302 //     want the zlib decoder to be available, #define STBI_SUPPORT_ZLIB
    303 //
    304 
    305 
    306 #ifndef STBI_NO_STDIO
    307 #include <stdio.h>
    308 #endif // STBI_NO_STDIO
    309 
    310 #define STBI_VERSION 1
    311 
    312 enum
    313 {
    314    STBI_default = 0, // only used for desired_channels
    315 
    316    STBI_grey       = 1,
    317    STBI_grey_alpha = 2,
    318    STBI_rgb        = 3,
    319    STBI_rgb_alpha  = 4
    320 };
    321 
    322 typedef unsigned char stbi_uc;
    323 typedef unsigned short stbi_us;
    324 
    325 #ifdef __cplusplus
    326 extern "C" {
    327 #endif
    328 
    329 #ifdef STB_IMAGE_STATIC
    330 #define STBIDEF static
    331 #else
    332 #define STBIDEF extern
    333 #endif
    334 
    335 //////////////////////////////////////////////////////////////////////////////
    336 //
    337 // PRIMARY API - works on images of any type
    338 //
    339 
    340 //
    341 // load image by filename, open file, or memory buffer
    342 //
    343 
    344 typedef struct
    345 {
    346    int      (*read)  (void *user,char *data,int size);   // fill 'data' with 'size' bytes.  return number of bytes actually read
    347    void     (*skip)  (void *user,int n);                 // skip the next 'n' bytes, or 'unget' the last -n bytes if negative
    348    int      (*eof)   (void *user);                       // returns nonzero if we are at end of file/data
    349 } stbi_io_callbacks;
    350 
    351 ////////////////////////////////////
    352 //
    353 // 8-bits-per-channel interface
    354 //
    355 
    356 STBIDEF stbi_uc *stbi_load_from_memory   (stbi_uc           const *buffer, int len   , int *x, int *y, int *channels_in_file, int desired_channels);
    357 STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk  , void *user, int *x, int *y, int *channels_in_file, int desired_channels);
    358 #ifndef STBI_NO_GIF
    359 STBIDEF stbi_uc *stbi_load_gif_from_memory(stbi_uc const *buffer, int len, int **delays, int *x, int *y, int *z, int *comp, int req_comp);
    360 #endif
    361 
    362 
    363 #ifndef STBI_NO_STDIO
    364 STBIDEF stbi_uc *stbi_load            (char const *filename, int *x, int *y, int *channels_in_file, int desired_channels);
    365 STBIDEF stbi_uc *stbi_load_from_file  (FILE *f, int *x, int *y, int *channels_in_file, int desired_channels);
    366 // for stbi_load_from_file, file pointer is left pointing immediately after image
    367 #endif
    368 
    369 ////////////////////////////////////
    370 //
    371 // 16-bits-per-channel interface
    372 //
    373 
    374 STBIDEF stbi_us *stbi_load_16_from_memory   (stbi_uc const *buffer, int len, int *x, int *y, int *channels_in_file, int desired_channels);
    375 STBIDEF stbi_us *stbi_load_16_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *channels_in_file, int desired_channels);
    376 
    377 #ifndef STBI_NO_STDIO
    378 STBIDEF stbi_us *stbi_load_16          (char const *filename, int *x, int *y, int *channels_in_file, int desired_channels);
    379 STBIDEF stbi_us *stbi_load_from_file_16(FILE *f, int *x, int *y, int *channels_in_file, int desired_channels);
    380 #endif
    381 
    382 ////////////////////////////////////
    383 //
    384 // float-per-channel interface
    385 //
    386 #ifndef STBI_NO_LINEAR
    387    STBIDEF float *stbi_loadf_from_memory     (stbi_uc const *buffer, int len, int *x, int *y, int *channels_in_file, int desired_channels);
    388    STBIDEF float *stbi_loadf_from_callbacks  (stbi_io_callbacks const *clbk, void *user, int *x, int *y,  int *channels_in_file, int desired_channels);
    389 
    390    #ifndef STBI_NO_STDIO
    391    STBIDEF float *stbi_loadf            (char const *filename, int *x, int *y, int *channels_in_file, int desired_channels);
    392    STBIDEF float *stbi_loadf_from_file  (FILE *f, int *x, int *y, int *channels_in_file, int desired_channels);
    393    #endif
    394 #endif
    395 
    396 #ifndef STBI_NO_HDR
    397    STBIDEF void   stbi_hdr_to_ldr_gamma(float gamma);
    398    STBIDEF void   stbi_hdr_to_ldr_scale(float scale);
    399 #endif // STBI_NO_HDR
    400 
    401 #ifndef STBI_NO_LINEAR
    402    STBIDEF void   stbi_ldr_to_hdr_gamma(float gamma);
    403    STBIDEF void   stbi_ldr_to_hdr_scale(float scale);
    404 #endif // STBI_NO_LINEAR
    405 
    406 // stbi_is_hdr is always defined, but always returns false if STBI_NO_HDR
    407 STBIDEF int    stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user);
    408 STBIDEF int    stbi_is_hdr_from_memory(stbi_uc const *buffer, int len);
    409 #ifndef STBI_NO_STDIO
    410 STBIDEF int      stbi_is_hdr          (char const *filename);
    411 STBIDEF int      stbi_is_hdr_from_file(FILE *f);
    412 #endif // STBI_NO_STDIO
    413 
    414 
    415 // get a VERY brief reason for failure
    416 // NOT THREADSAFE
    417 STBIDEF const char *stbi_failure_reason  (void);
    418 
    419 // free the loaded image -- this is just free()
    420 STBIDEF void     stbi_image_free      (void *retval_from_stbi_load);
    421 
    422 // get image dimensions & components without fully decoding
    423 STBIDEF int      stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp);
    424 STBIDEF int      stbi_info_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp);
    425 STBIDEF int      stbi_is_16_bit_from_memory(stbi_uc const *buffer, int len);
    426 STBIDEF int      stbi_is_16_bit_from_callbacks(stbi_io_callbacks const *clbk, void *user);
    427 
    428 #ifndef STBI_NO_STDIO
    429 STBIDEF int      stbi_info               (char const *filename,     int *x, int *y, int *comp);
    430 STBIDEF int      stbi_info_from_file     (FILE *f,                  int *x, int *y, int *comp);
    431 STBIDEF int      stbi_is_16_bit          (char const *filename);
    432 STBIDEF int      stbi_is_16_bit_from_file(FILE *f);
    433 #endif
    434 
    435 
    436 
    437 // for image formats that explicitly notate that they have premultiplied alpha,
    438 // we just return the colors as stored in the file. set this flag to force
    439 // unpremultiplication. results are undefined if the unpremultiply overflow.
    440 STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply);
    441 
    442 // indicate whether we should process iphone images back to canonical format,
    443 // or just pass them through "as-is"
    444 STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert);
    445 
    446 // flip the image vertically, so the first pixel in the output array is the bottom left
    447 STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip);
    448 
    449 // ZLIB client - used by PNG, available for other purposes
    450 
    451 STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen);
    452 STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header);
    453 STBIDEF char *stbi_zlib_decode_malloc(const char *buffer, int len, int *outlen);
    454 STBIDEF int   stbi_zlib_decode_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
    455 
    456 STBIDEF char *stbi_zlib_decode_noheader_malloc(const char *buffer, int len, int *outlen);
    457 STBIDEF int   stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
    458 
    459 
    460 #ifdef __cplusplus
    461 }
    462 #endif
    463 
    464 //
    465 //
    466 ////   end header file   /////////////////////////////////////////////////////
    467 #endif // STBI_INCLUDE_STB_IMAGE_H
    468 
    469 #ifdef STB_IMAGE_IMPLEMENTATION
    470 
    471 #if defined(STBI_ONLY_JPEG) || defined(STBI_ONLY_PNG) || defined(STBI_ONLY_BMP) \
    472   || defined(STBI_ONLY_TGA) || defined(STBI_ONLY_GIF) || defined(STBI_ONLY_PSD) \
    473   || defined(STBI_ONLY_HDR) || defined(STBI_ONLY_PIC) || defined(STBI_ONLY_PNM) \
    474   || defined(STBI_ONLY_ZLIB)
    475    #ifndef STBI_ONLY_JPEG
    476    #define STBI_NO_JPEG
    477    #endif
    478    #ifndef STBI_ONLY_PNG
    479    #define STBI_NO_PNG
    480    #endif
    481    #ifndef STBI_ONLY_BMP
    482    #define STBI_NO_BMP
    483    #endif
    484    #ifndef STBI_ONLY_PSD
    485    #define STBI_NO_PSD
    486    #endif
    487    #ifndef STBI_ONLY_TGA
    488    #define STBI_NO_TGA
    489    #endif
    490    #ifndef STBI_ONLY_GIF
    491    #define STBI_NO_GIF
    492    #endif
    493    #ifndef STBI_ONLY_HDR
    494    #define STBI_NO_HDR
    495    #endif
    496    #ifndef STBI_ONLY_PIC
    497    #define STBI_NO_PIC
    498    #endif
    499    #ifndef STBI_ONLY_PNM
    500    #define STBI_NO_PNM
    501    #endif
    502 #endif
    503 
    504 #if defined(STBI_NO_PNG) && !defined(STBI_SUPPORT_ZLIB) && !defined(STBI_NO_ZLIB)
    505 #define STBI_NO_ZLIB
    506 #endif
    507 
    508 
    509 #include <stdarg.h>
    510 #include <stddef.h> // ptrdiff_t on osx
    511 #include <stdlib.h>
    512 #include <string.h>
    513 #include <limits.h>
    514 
    515 #if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR)
    516 #include <math.h>  // ldexp, pow
    517 #endif
    518 
    519 #ifndef STBI_NO_STDIO
    520 #include <stdio.h>
    521 #endif
    522 
    523 #ifndef STBI_ASSERT
    524 #include <assert.h>
    525 #define STBI_ASSERT(x) assert(x)
    526 #endif
    527 
    528 
    529 #ifndef _MSC_VER
    530    #ifdef __cplusplus
    531    #define stbi_inline inline
    532    #else
    533    #define stbi_inline
    534    #endif
    535 #else
    536    #define stbi_inline __forceinline
    537 #endif
    538 
    539 
    540 #ifdef _MSC_VER
    541 typedef unsigned short stbi__uint16;
    542 typedef   signed short stbi__int16;
    543 typedef unsigned int   stbi__uint32;
    544 typedef   signed int   stbi__int32;
    545 #else
    546 #include <stdint.h>
    547 typedef uint16_t stbi__uint16;
    548 typedef int16_t  stbi__int16;
    549 typedef uint32_t stbi__uint32;
    550 typedef int32_t  stbi__int32;
    551 #endif
    552 
    553 // should produce compiler error if size is wrong
    554 typedef unsigned char validate_uint32[sizeof(stbi__uint32)==4 ? 1 : -1];
    555 
    556 #ifdef _MSC_VER
    557 #define STBI_NOTUSED(v)  (void)(v)
    558 #else
    559 #define STBI_NOTUSED(v)  (void)sizeof(v)
    560 #endif
    561 
    562 #ifdef _MSC_VER
    563 #define STBI_HAS_LROTL
    564 #endif
    565 
    566 #ifdef STBI_HAS_LROTL
    567    #define stbi_lrot(x,y)  _lrotl(x,y)
    568 #else
    569    #define stbi_lrot(x,y)  (((x) << (y)) | ((x) >> (32 - (y))))
    570 #endif
    571 
    572 #if defined(STBI_MALLOC) && defined(STBI_FREE) && (defined(STBI_REALLOC) || defined(STBI_REALLOC_SIZED))
    573 // ok
    574 #elif !defined(STBI_MALLOC) && !defined(STBI_FREE) && !defined(STBI_REALLOC) && !defined(STBI_REALLOC_SIZED)
    575 // ok
    576 #else
    577 #error "Must define all or none of STBI_MALLOC, STBI_FREE, and STBI_REALLOC (or STBI_REALLOC_SIZED)."
    578 #endif
    579 
    580 #ifndef STBI_MALLOC
    581 #define STBI_MALLOC(sz)           malloc(sz)
    582 #define STBI_REALLOC(p,newsz)     realloc(p,newsz)
    583 #define STBI_FREE(p)              free(p)
    584 #endif
    585 
    586 #ifndef STBI_REALLOC_SIZED
    587 #define STBI_REALLOC_SIZED(p,oldsz,newsz) STBI_REALLOC(p,newsz)
    588 #endif
    589 
    590 // x86/x64 detection
    591 #if defined(__x86_64__) || defined(_M_X64)
    592 #define STBI__X64_TARGET
    593 #elif defined(__i386) || defined(_M_IX86)
    594 #define STBI__X86_TARGET
    595 #endif
    596 
    597 #if defined(__GNUC__) && defined(STBI__X86_TARGET) && !defined(__SSE2__) && !defined(STBI_NO_SIMD)
    598 // gcc doesn't support sse2 intrinsics unless you compile with -msse2,
    599 // which in turn means it gets to use SSE2 everywhere. This is unfortunate,
    600 // but previous attempts to provide the SSE2 functions with runtime
    601 // detection caused numerous issues. The way architecture extensions are
    602 // exposed in GCC/Clang is, sadly, not really suited for one-file libs.
    603 // New behavior: if compiled with -msse2, we use SSE2 without any
    604 // detection; if not, we don't use it at all.
    605 #define STBI_NO_SIMD
    606 #endif
    607 
    608 #if defined(__MINGW32__) && defined(STBI__X86_TARGET) && !defined(STBI_MINGW_ENABLE_SSE2) && !defined(STBI_NO_SIMD)
    609 // Note that __MINGW32__ doesn't actually mean 32-bit, so we have to avoid STBI__X64_TARGET
    610 //
    611 // 32-bit MinGW wants ESP to be 16-byte aligned, but this is not in the
    612 // Windows ABI and VC++ as well as Windows DLLs don't maintain that invariant.
    613 // As a result, enabling SSE2 on 32-bit MinGW is dangerous when not
    614 // simultaneously enabling "-mstackrealign".
    615 //
    616 // See https://github.com/nothings/stb/issues/81 for more information.
    617 //
    618 // So default to no SSE2 on 32-bit MinGW. If you've read this far and added
    619 // -mstackrealign to your build settings, feel free to #define STBI_MINGW_ENABLE_SSE2.
    620 #define STBI_NO_SIMD
    621 #endif
    622 
    623 #if !defined(STBI_NO_SIMD) && (defined(STBI__X86_TARGET) || defined(STBI__X64_TARGET))
    624 #define STBI_SSE2
    625 #include <emmintrin.h>
    626 
    627 #ifdef _MSC_VER
    628 
    629 #if _MSC_VER >= 1400  // not VC6
    630 #include <intrin.h> // __cpuid
    631 static int stbi__cpuid3(void)
    632 {
    633    int info[4];
    634    __cpuid(info,1);
    635    return info[3];
    636 }
    637 #else
    638 static int stbi__cpuid3(void)
    639 {
    640    int res;
    641    __asm {
    642       mov  eax,1
    643       cpuid
    644       mov  res,edx
    645    }
    646    return res;
    647 }
    648 #endif
    649 
    650 #define STBI_SIMD_ALIGN(type, name) __declspec(align(16)) type name
    651 
    652 static int stbi__sse2_available(void)
    653 {
    654    int info3 = stbi__cpuid3();
    655    return ((info3 >> 26) & 1) != 0;
    656 }
    657 #else // assume GCC-style if not VC++
    658 #define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
    659 
    660 static int stbi__sse2_available(void)
    661 {
    662    // If we're even attempting to compile this on GCC/Clang, that means
    663    // -msse2 is on, which means the compiler is allowed to use SSE2
    664    // instructions at will, and so are we.
    665    return 1;
    666 }
    667 #endif
    668 #endif
    669 
    670 // ARM NEON
    671 #if defined(STBI_NO_SIMD) && defined(STBI_NEON)
    672 #undef STBI_NEON
    673 #endif
    674 
    675 #ifdef STBI_NEON
    676 #include <arm_neon.h>
    677 // assume GCC or Clang on ARM targets
    678 #define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
    679 #endif
    680 
    681 #ifndef STBI_SIMD_ALIGN
    682 #define STBI_SIMD_ALIGN(type, name) type name
    683 #endif
    684 
    685 ///////////////////////////////////////////////
    686 //
    687 //  stbi__context struct and start_xxx functions
    688 
    689 // stbi__context structure is our basic context used by all images, so it
    690 // contains all the IO context, plus some basic image information
    691 typedef struct
    692 {
    693    stbi__uint32 img_x, img_y;
    694    int img_n, img_out_n;
    695 
    696    stbi_io_callbacks io;
    697    void *io_user_data;
    698 
    699    int read_from_callbacks;
    700    int buflen;
    701    stbi_uc buffer_start[128];
    702 
    703    stbi_uc *img_buffer, *img_buffer_end;
    704    stbi_uc *img_buffer_original, *img_buffer_original_end;
    705 } stbi__context;
    706 
    707 
    708 static void stbi__refill_buffer(stbi__context *s);
    709 
    710 // initialize a memory-decode context
    711 static void stbi__start_mem(stbi__context *s, stbi_uc const *buffer, int len)
    712 {
    713    s->io.read = NULL;
    714    s->read_from_callbacks = 0;
    715    s->img_buffer = s->img_buffer_original = (stbi_uc *) buffer;
    716    s->img_buffer_end = s->img_buffer_original_end = (stbi_uc *) buffer+len;
    717 }
    718 
    719 // initialize a callback-based context
    720 static void stbi__start_callbacks(stbi__context *s, stbi_io_callbacks *c, void *user)
    721 {
    722    s->io = *c;
    723    s->io_user_data = user;
    724    s->buflen = sizeof(s->buffer_start);
    725    s->read_from_callbacks = 1;
    726    s->img_buffer_original = s->buffer_start;
    727    stbi__refill_buffer(s);
    728    s->img_buffer_original_end = s->img_buffer_end;
    729 }
    730 
    731 #ifndef STBI_NO_STDIO
    732 
    733 static int stbi__stdio_read(void *user, char *data, int size)
    734 {
    735    return (int) fread(data,1,size,(FILE*) user);
    736 }
    737 
    738 static void stbi__stdio_skip(void *user, int n)
    739 {
    740    fseek((FILE*) user, n, SEEK_CUR);
    741 }
    742 
    743 static int stbi__stdio_eof(void *user)
    744 {
    745    return feof((FILE*) user);
    746 }
    747 
    748 static stbi_io_callbacks stbi__stdio_callbacks =
    749 {
    750    stbi__stdio_read,
    751    stbi__stdio_skip,
    752    stbi__stdio_eof,
    753 };
    754 
    755 static void stbi__start_file(stbi__context *s, FILE *f)
    756 {
    757    stbi__start_callbacks(s, &stbi__stdio_callbacks, (void *) f);
    758 }
    759 
    760 //static void stop_file(stbi__context *s) { }
    761 
    762 #endif // !STBI_NO_STDIO
    763 
    764 static void stbi__rewind(stbi__context *s)
    765 {
    766    // conceptually rewind SHOULD rewind to the beginning of the stream,
    767    // but we just rewind to the beginning of the initial buffer, because
    768    // we only use it after doing 'test', which only ever looks at at most 92 bytes
    769    s->img_buffer = s->img_buffer_original;
    770    s->img_buffer_end = s->img_buffer_original_end;
    771 }
    772 
    773 enum
    774 {
    775    STBI_ORDER_RGB,
    776    STBI_ORDER_BGR
    777 };
    778 
    779 typedef struct
    780 {
    781    int bits_per_channel;
    782    int num_channels;
    783    int channel_order;
    784 } stbi__result_info;
    785 
    786 #ifndef STBI_NO_JPEG
    787 static int      stbi__jpeg_test(stbi__context *s);
    788 static void    *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
    789 static int      stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp);
    790 #endif
    791 
    792 #ifndef STBI_NO_PNG
    793 static int      stbi__png_test(stbi__context *s);
    794 static void    *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
    795 static int      stbi__png_info(stbi__context *s, int *x, int *y, int *comp);
    796 static int      stbi__png_is16(stbi__context *s);
    797 #endif
    798 
    799 #ifndef STBI_NO_BMP
    800 static int      stbi__bmp_test(stbi__context *s);
    801 static void    *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
    802 static int      stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp);
    803 #endif
    804 
    805 #ifndef STBI_NO_TGA
    806 static int      stbi__tga_test(stbi__context *s);
    807 static void    *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
    808 static int      stbi__tga_info(stbi__context *s, int *x, int *y, int *comp);
    809 #endif
    810 
    811 #ifndef STBI_NO_PSD
    812 static int      stbi__psd_test(stbi__context *s);
    813 static void    *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri, int bpc);
    814 static int      stbi__psd_info(stbi__context *s, int *x, int *y, int *comp);
    815 static int      stbi__psd_is16(stbi__context *s);
    816 #endif
    817 
    818 #ifndef STBI_NO_HDR
    819 static int      stbi__hdr_test(stbi__context *s);
    820 static float   *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
    821 static int      stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp);
    822 #endif
    823 
    824 #ifndef STBI_NO_PIC
    825 static int      stbi__pic_test(stbi__context *s);
    826 static void    *stbi__pic_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
    827 static int      stbi__pic_info(stbi__context *s, int *x, int *y, int *comp);
    828 #endif
    829 
    830 #ifndef STBI_NO_GIF
    831 static int      stbi__gif_test(stbi__context *s);
    832 static void    *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
    833 static void    *stbi__load_gif_main(stbi__context *s, int **delays, int *x, int *y, int *z, int *comp, int req_comp);
    834 static int      stbi__gif_info(stbi__context *s, int *x, int *y, int *comp);
    835 #endif
    836 
    837 #ifndef STBI_NO_PNM
    838 static int      stbi__pnm_test(stbi__context *s);
    839 static void    *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
    840 static int      stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp);
    841 #endif
    842 
    843 // this is not threadsafe
    844 static const char *stbi__g_failure_reason;
    845 
    846 STBIDEF const char *stbi_failure_reason(void)
    847 {
    848    return stbi__g_failure_reason;
    849 }
    850 
    851 static int stbi__err(const char *str)
    852 {
    853    stbi__g_failure_reason = str;
    854    return 0;
    855 }
    856 
    857 static void *stbi__malloc(size_t size)
    858 {
    859     return STBI_MALLOC(size);
    860 }
    861 
    862 // stb_image uses ints pervasively, including for offset calculations.
    863 // therefore the largest decoded image size we can support with the
    864 // current code, even on 64-bit targets, is INT_MAX. this is not a
    865 // significant limitation for the intended use case.
    866 //
    867 // we do, however, need to make sure our size calculations don't
    868 // overflow. hence a few helper functions for size calculations that
    869 // multiply integers together, making sure that they're non-negative
    870 // and no overflow occurs.
    871 
    872 // return 1 if the sum is valid, 0 on overflow.
    873 // negative terms are considered invalid.
    874 static int stbi__addsizes_valid(int a, int b)
    875 {
    876    if (b < 0) return 0;
    877    // now 0 <= b <= INT_MAX, hence also
    878    // 0 <= INT_MAX - b <= INTMAX.
    879    // And "a + b <= INT_MAX" (which might overflow) is the
    880    // same as a <= INT_MAX - b (no overflow)
    881    return a <= INT_MAX - b;
    882 }
    883 
    884 // returns 1 if the product is valid, 0 on overflow.
    885 // negative factors are considered invalid.
    886 static int stbi__mul2sizes_valid(int a, int b)
    887 {
    888    if (a < 0 || b < 0) return 0;
    889    if (b == 0) return 1; // mul-by-0 is always safe
    890    // portable way to check for no overflows in a*b
    891    return a <= INT_MAX/b;
    892 }
    893 
    894 // returns 1 if "a*b + add" has no negative terms/factors and doesn't overflow
    895 static int stbi__mad2sizes_valid(int a, int b, int add)
    896 {
    897    return stbi__mul2sizes_valid(a, b) && stbi__addsizes_valid(a*b, add);
    898 }
    899 
    900 // returns 1 if "a*b*c + add" has no negative terms/factors and doesn't overflow
    901 static int stbi__mad3sizes_valid(int a, int b, int c, int add)
    902 {
    903    return stbi__mul2sizes_valid(a, b) && stbi__mul2sizes_valid(a*b, c) &&
    904       stbi__addsizes_valid(a*b*c, add);
    905 }
    906 
    907 // returns 1 if "a*b*c*d + add" has no negative terms/factors and doesn't overflow
    908 #if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR)
    909 static int stbi__mad4sizes_valid(int a, int b, int c, int d, int add)
    910 {
    911    return stbi__mul2sizes_valid(a, b) && stbi__mul2sizes_valid(a*b, c) &&
    912       stbi__mul2sizes_valid(a*b*c, d) && stbi__addsizes_valid(a*b*c*d, add);
    913 }
    914 #endif
    915 
    916 // mallocs with size overflow checking
    917 static void *stbi__malloc_mad2(int a, int b, int add)
    918 {
    919    if (!stbi__mad2sizes_valid(a, b, add)) return NULL;
    920    return stbi__malloc(a*b + add);
    921 }
    922 
    923 static void *stbi__malloc_mad3(int a, int b, int c, int add)
    924 {
    925    if (!stbi__mad3sizes_valid(a, b, c, add)) return NULL;
    926    return stbi__malloc(a*b*c + add);
    927 }
    928 
    929 #if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR)
    930 static void *stbi__malloc_mad4(int a, int b, int c, int d, int add)
    931 {
    932    if (!stbi__mad4sizes_valid(a, b, c, d, add)) return NULL;
    933    return stbi__malloc(a*b*c*d + add);
    934 }
    935 #endif
    936 
    937 // stbi__err - error
    938 // stbi__errpf - error returning pointer to float
    939 // stbi__errpuc - error returning pointer to unsigned char
    940 
    941 #ifdef STBI_NO_FAILURE_STRINGS
    942    #define stbi__err(x,y)  0
    943 #elif defined(STBI_FAILURE_USERMSG)
    944    #define stbi__err(x,y)  stbi__err(y)
    945 #else
    946    #define stbi__err(x,y)  stbi__err(x)
    947 #endif
    948 
    949 #define stbi__errpf(x,y)   ((float *)(size_t) (stbi__err(x,y)?NULL:NULL))
    950 #define stbi__errpuc(x,y)  ((unsigned char *)(size_t) (stbi__err(x,y)?NULL:NULL))
    951 
    952 STBIDEF void stbi_image_free(void *retval_from_stbi_load)
    953 {
    954    STBI_FREE(retval_from_stbi_load);
    955 }
    956 
    957 #ifndef STBI_NO_LINEAR
    958 static float   *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp);
    959 #endif
    960 
    961 #ifndef STBI_NO_HDR
    962 static stbi_uc *stbi__hdr_to_ldr(float   *data, int x, int y, int comp);
    963 #endif
    964 
    965 static int stbi__vertically_flip_on_load = 0;
    966 
    967 STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip)
    968 {
    969     stbi__vertically_flip_on_load = flag_true_if_should_flip;
    970 }
    971 
    972 static void *stbi__load_main(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri, int bpc)
    973 {
    974    memset(ri, 0, sizeof(*ri)); // make sure it's initialized if we add new fields
    975    ri->bits_per_channel = 8; // default is 8 so most paths don't have to be changed
    976    ri->channel_order = STBI_ORDER_RGB; // all current input & output are this, but this is here so we can add BGR order
    977    ri->num_channels = 0;
    978 
    979    #ifndef STBI_NO_JPEG
    980    if (stbi__jpeg_test(s)) return stbi__jpeg_load(s,x,y,comp,req_comp, ri);
    981    #endif
    982    #ifndef STBI_NO_PNG
    983    if (stbi__png_test(s))  return stbi__png_load(s,x,y,comp,req_comp, ri);
    984    #endif
    985    #ifndef STBI_NO_BMP
    986    if (stbi__bmp_test(s))  return stbi__bmp_load(s,x,y,comp,req_comp, ri);
    987    #endif
    988    #ifndef STBI_NO_GIF
    989    if (stbi__gif_test(s))  return stbi__gif_load(s,x,y,comp,req_comp, ri);
    990    #endif
    991    #ifndef STBI_NO_PSD
    992    if (stbi__psd_test(s))  return stbi__psd_load(s,x,y,comp,req_comp, ri, bpc);
    993    #endif
    994    #ifndef STBI_NO_PIC
    995    if (stbi__pic_test(s))  return stbi__pic_load(s,x,y,comp,req_comp, ri);
    996    #endif
    997    #ifndef STBI_NO_PNM
    998    if (stbi__pnm_test(s))  return stbi__pnm_load(s,x,y,comp,req_comp, ri);
    999    #endif
   1000 
   1001    #ifndef STBI_NO_HDR
   1002    if (stbi__hdr_test(s)) {
   1003       float *hdr = stbi__hdr_load(s, x,y,comp,req_comp, ri);
   1004       return stbi__hdr_to_ldr(hdr, *x, *y, req_comp ? req_comp : *comp);
   1005    }
   1006    #endif
   1007 
   1008    #ifndef STBI_NO_TGA
   1009    // test tga last because it's a crappy test!
   1010    if (stbi__tga_test(s))
   1011       return stbi__tga_load(s,x,y,comp,req_comp, ri);
   1012    #endif
   1013 
   1014    return stbi__errpuc("unknown image type", "Image not of any known type, or corrupt");
   1015 }
   1016 
   1017 static stbi_uc *stbi__convert_16_to_8(stbi__uint16 *orig, int w, int h, int channels)
   1018 {
   1019    int i;
   1020    int img_len = w * h * channels;
   1021    stbi_uc *reduced;
   1022 
   1023    reduced = (stbi_uc *) stbi__malloc(img_len);
   1024    if (reduced == NULL) return stbi__errpuc("outofmem", "Out of memory");
   1025 
   1026    for (i = 0; i < img_len; ++i)
   1027       reduced[i] = (stbi_uc)((orig[i] >> 8) & 0xFF); // top half of each byte is sufficient approx of 16->8 bit scaling
   1028 
   1029    STBI_FREE(orig);
   1030    return reduced;
   1031 }
   1032 
   1033 static stbi__uint16 *stbi__convert_8_to_16(stbi_uc *orig, int w, int h, int channels)
   1034 {
   1035    int i;
   1036    int img_len = w * h * channels;
   1037    stbi__uint16 *enlarged;
   1038 
   1039    enlarged = (stbi__uint16 *) stbi__malloc(img_len*2);
   1040    if (enlarged == NULL) return (stbi__uint16 *) stbi__errpuc("outofmem", "Out of memory");
   1041 
   1042    for (i = 0; i < img_len; ++i)
   1043       enlarged[i] = (stbi__uint16)((orig[i] << 8) + orig[i]); // replicate to high and low byte, maps 0->0, 255->0xffff
   1044 
   1045    STBI_FREE(orig);
   1046    return enlarged;
   1047 }
   1048 
   1049 static void stbi__vertical_flip(void *image, int w, int h, int bytes_per_pixel)
   1050 {
   1051    int row;
   1052    size_t bytes_per_row = (size_t)w * bytes_per_pixel;
   1053    stbi_uc temp[2048];
   1054    stbi_uc *bytes = (stbi_uc *)image;
   1055 
   1056    for (row = 0; row < (h>>1); row++) {
   1057       stbi_uc *row0 = bytes + row*bytes_per_row;
   1058       stbi_uc *row1 = bytes + (h - row - 1)*bytes_per_row;
   1059       // swap row0 with row1
   1060       size_t bytes_left = bytes_per_row;
   1061       while (bytes_left) {
   1062          size_t bytes_copy = (bytes_left < sizeof(temp)) ? bytes_left : sizeof(temp);
   1063          memcpy(temp, row0, bytes_copy);
   1064          memcpy(row0, row1, bytes_copy);
   1065          memcpy(row1, temp, bytes_copy);
   1066          row0 += bytes_copy;
   1067          row1 += bytes_copy;
   1068          bytes_left -= bytes_copy;
   1069       }
   1070    }
   1071 }
   1072 
   1073 static void stbi__vertical_flip_slices(void *image, int w, int h, int z, int bytes_per_pixel)
   1074 {
   1075    int slice;
   1076    int slice_size = w * h * bytes_per_pixel;
   1077 
   1078    stbi_uc *bytes = (stbi_uc *)image;
   1079    for (slice = 0; slice < z; ++slice) {
   1080       stbi__vertical_flip(bytes, w, h, bytes_per_pixel); 
   1081       bytes += slice_size; 
   1082    }
   1083 }
   1084 
   1085 static unsigned char *stbi__load_and_postprocess_8bit(stbi__context *s, int *x, int *y, int *comp, int req_comp)
   1086 {
   1087    stbi__result_info ri;
   1088    void *result = stbi__load_main(s, x, y, comp, req_comp, &ri, 8);
   1089 
   1090    if (result == NULL)
   1091       return NULL;
   1092 
   1093    if (ri.bits_per_channel != 8) {
   1094       STBI_ASSERT(ri.bits_per_channel == 16);
   1095       result = stbi__convert_16_to_8((stbi__uint16 *) result, *x, *y, req_comp == 0 ? *comp : req_comp);
   1096       ri.bits_per_channel = 8;
   1097    }
   1098 
   1099    // @TODO: move stbi__convert_format to here
   1100 
   1101    if (stbi__vertically_flip_on_load) {
   1102       int channels = req_comp ? req_comp : *comp;
   1103       stbi__vertical_flip(result, *x, *y, channels * sizeof(stbi_uc));
   1104    }
   1105 
   1106    return (unsigned char *) result;
   1107 }
   1108 
   1109 static stbi__uint16 *stbi__load_and_postprocess_16bit(stbi__context *s, int *x, int *y, int *comp, int req_comp)
   1110 {
   1111    stbi__result_info ri;
   1112    void *result = stbi__load_main(s, x, y, comp, req_comp, &ri, 16);
   1113 
   1114    if (result == NULL)
   1115       return NULL;
   1116 
   1117    if (ri.bits_per_channel != 16) {
   1118       STBI_ASSERT(ri.bits_per_channel == 8);
   1119       result = stbi__convert_8_to_16((stbi_uc *) result, *x, *y, req_comp == 0 ? *comp : req_comp);
   1120       ri.bits_per_channel = 16;
   1121    }
   1122 
   1123    // @TODO: move stbi__convert_format16 to here
   1124    // @TODO: special case RGB-to-Y (and RGBA-to-YA) for 8-bit-to-16-bit case to keep more precision
   1125 
   1126    if (stbi__vertically_flip_on_load) {
   1127       int channels = req_comp ? req_comp : *comp;
   1128       stbi__vertical_flip(result, *x, *y, channels * sizeof(stbi__uint16));
   1129    }
   1130 
   1131    return (stbi__uint16 *) result;
   1132 }
   1133 
   1134 #if !defined(STBI_NO_HDR) || !defined(STBI_NO_LINEAR)
   1135 static void stbi__float_postprocess(float *result, int *x, int *y, int *comp, int req_comp)
   1136 {
   1137    if (stbi__vertically_flip_on_load && result != NULL) {
   1138       int channels = req_comp ? req_comp : *comp;
   1139       stbi__vertical_flip(result, *x, *y, channels * sizeof(float));
   1140    }
   1141 }
   1142 #endif
   1143 
   1144 #ifndef STBI_NO_STDIO
   1145 
   1146 static FILE *stbi__fopen(char const *filename, char const *mode)
   1147 {
   1148    FILE *f;
   1149 #if defined(_MSC_VER) && _MSC_VER >= 1400
   1150    if (0 != fopen_s(&f, filename, mode))
   1151       f=0;
   1152 #else
   1153    f = fopen(filename, mode);
   1154 #endif
   1155    return f;
   1156 }
   1157 
   1158 
   1159 STBIDEF stbi_uc *stbi_load(char const *filename, int *x, int *y, int *comp, int req_comp)
   1160 {
   1161    FILE *f = stbi__fopen(filename, "rb");
   1162    unsigned char *result;
   1163    if (!f) return stbi__errpuc("can't fopen", "Unable to open file");
   1164    result = stbi_load_from_file(f,x,y,comp,req_comp);
   1165    fclose(f);
   1166    return result;
   1167 }
   1168 
   1169 STBIDEF stbi_uc *stbi_load_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
   1170 {
   1171    unsigned char *result;
   1172    stbi__context s;
   1173    stbi__start_file(&s,f);
   1174    result = stbi__load_and_postprocess_8bit(&s,x,y,comp,req_comp);
   1175    if (result) {
   1176       // need to 'unget' all the characters in the IO buffer
   1177       fseek(f, - (int) (s.img_buffer_end - s.img_buffer), SEEK_CUR);
   1178    }
   1179    return result;
   1180 }
   1181 
   1182 STBIDEF stbi__uint16 *stbi_load_from_file_16(FILE *f, int *x, int *y, int *comp, int req_comp)
   1183 {
   1184    stbi__uint16 *result;
   1185    stbi__context s;
   1186    stbi__start_file(&s,f);
   1187    result = stbi__load_and_postprocess_16bit(&s,x,y,comp,req_comp);
   1188    if (result) {
   1189       // need to 'unget' all the characters in the IO buffer
   1190       fseek(f, - (int) (s.img_buffer_end - s.img_buffer), SEEK_CUR);
   1191    }
   1192    return result;
   1193 }
   1194 
   1195 STBIDEF stbi_us *stbi_load_16(char const *filename, int *x, int *y, int *comp, int req_comp)
   1196 {
   1197    FILE *f = stbi__fopen(filename, "rb");
   1198    stbi__uint16 *result;
   1199    if (!f) return (stbi_us *) stbi__errpuc("can't fopen", "Unable to open file");
   1200    result = stbi_load_from_file_16(f,x,y,comp,req_comp);
   1201    fclose(f);
   1202    return result;
   1203 }
   1204 
   1205 
   1206 #endif //!STBI_NO_STDIO
   1207 
   1208 STBIDEF stbi_us *stbi_load_16_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *channels_in_file, int desired_channels)
   1209 {
   1210    stbi__context s;
   1211    stbi__start_mem(&s,buffer,len);
   1212    return stbi__load_and_postprocess_16bit(&s,x,y,channels_in_file,desired_channels);
   1213 }
   1214 
   1215 STBIDEF stbi_us *stbi_load_16_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *channels_in_file, int desired_channels)
   1216 {
   1217    stbi__context s;
   1218    stbi__start_callbacks(&s, (stbi_io_callbacks *)clbk, user);
   1219    return stbi__load_and_postprocess_16bit(&s,x,y,channels_in_file,desired_channels);
   1220 }
   1221 
   1222 STBIDEF stbi_uc *stbi_load_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
   1223 {
   1224    stbi__context s;
   1225    stbi__start_mem(&s,buffer,len);
   1226    return stbi__load_and_postprocess_8bit(&s,x,y,comp,req_comp);
   1227 }
   1228 
   1229 STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
   1230 {
   1231    stbi__context s;
   1232    stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
   1233    return stbi__load_and_postprocess_8bit(&s,x,y,comp,req_comp);
   1234 }
   1235 
   1236 #ifndef STBI_NO_GIF
   1237 STBIDEF stbi_uc *stbi_load_gif_from_memory(stbi_uc const *buffer, int len, int **delays, int *x, int *y, int *z, int *comp, int req_comp)
   1238 {
   1239    unsigned char *result;
   1240    stbi__context s; 
   1241    stbi__start_mem(&s,buffer,len); 
   1242    
   1243    result = (unsigned char*) stbi__load_gif_main(&s, delays, x, y, z, comp, req_comp);
   1244    if (stbi__vertically_flip_on_load) {
   1245       stbi__vertical_flip_slices( result, *x, *y, *z, *comp ); 
   1246    }
   1247 
   1248    return result; 
   1249 }
   1250 #endif
   1251 
   1252 #ifndef STBI_NO_LINEAR
   1253 static float *stbi__loadf_main(stbi__context *s, int *x, int *y, int *comp, int req_comp)
   1254 {
   1255    unsigned char *data;
   1256    #ifndef STBI_NO_HDR
   1257    if (stbi__hdr_test(s)) {
   1258       stbi__result_info ri;
   1259       float *hdr_data = stbi__hdr_load(s,x,y,comp,req_comp, &ri);
   1260       if (hdr_data)
   1261          stbi__float_postprocess(hdr_data,x,y,comp,req_comp);
   1262       return hdr_data;
   1263    }
   1264    #endif
   1265    data = stbi__load_and_postprocess_8bit(s, x, y, comp, req_comp);
   1266    if (data)
   1267       return stbi__ldr_to_hdr(data, *x, *y, req_comp ? req_comp : *comp);
   1268    return stbi__errpf("unknown image type", "Image not of any known type, or corrupt");
   1269 }
   1270 
   1271 STBIDEF float *stbi_loadf_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
   1272 {
   1273    stbi__context s;
   1274    stbi__start_mem(&s,buffer,len);
   1275    return stbi__loadf_main(&s,x,y,comp,req_comp);
   1276 }
   1277 
   1278 STBIDEF float *stbi_loadf_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
   1279 {
   1280    stbi__context s;
   1281    stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
   1282    return stbi__loadf_main(&s,x,y,comp,req_comp);
   1283 }
   1284 
   1285 #ifndef STBI_NO_STDIO
   1286 STBIDEF float *stbi_loadf(char const *filename, int *x, int *y, int *comp, int req_comp)
   1287 {
   1288    float *result;
   1289    FILE *f = stbi__fopen(filename, "rb");
   1290    if (!f) return stbi__errpf("can't fopen", "Unable to open file");
   1291    result = stbi_loadf_from_file(f,x,y,comp,req_comp);
   1292    fclose(f);
   1293    return result;
   1294 }
   1295 
   1296 STBIDEF float *stbi_loadf_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
   1297 {
   1298    stbi__context s;
   1299    stbi__start_file(&s,f);
   1300    return stbi__loadf_main(&s,x,y,comp,req_comp);
   1301 }
   1302 #endif // !STBI_NO_STDIO
   1303 
   1304 #endif // !STBI_NO_LINEAR
   1305 
   1306 // these is-hdr-or-not is defined independent of whether STBI_NO_LINEAR is
   1307 // defined, for API simplicity; if STBI_NO_LINEAR is defined, it always
   1308 // reports false!
   1309 
   1310 STBIDEF int stbi_is_hdr_from_memory(stbi_uc const *buffer, int len)
   1311 {
   1312    #ifndef STBI_NO_HDR
   1313    stbi__context s;
   1314    stbi__start_mem(&s,buffer,len);
   1315    return stbi__hdr_test(&s);
   1316    #else
   1317    STBI_NOTUSED(buffer);
   1318    STBI_NOTUSED(len);
   1319    return 0;
   1320    #endif
   1321 }
   1322 
   1323 #ifndef STBI_NO_STDIO
   1324 STBIDEF int      stbi_is_hdr          (char const *filename)
   1325 {
   1326    FILE *f = stbi__fopen(filename, "rb");
   1327    int result=0;
   1328    if (f) {
   1329       result = stbi_is_hdr_from_file(f);
   1330       fclose(f);
   1331    }
   1332    return result;
   1333 }
   1334 
   1335 STBIDEF int stbi_is_hdr_from_file(FILE *f)
   1336 {
   1337    #ifndef STBI_NO_HDR
   1338    long pos = ftell(f);
   1339    int res;
   1340    stbi__context s;
   1341    stbi__start_file(&s,f);
   1342    res = stbi__hdr_test(&s);
   1343    fseek(f, pos, SEEK_SET);
   1344    return res;
   1345    #else
   1346    STBI_NOTUSED(f);
   1347    return 0;
   1348    #endif
   1349 }
   1350 #endif // !STBI_NO_STDIO
   1351 
   1352 STBIDEF int      stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user)
   1353 {
   1354    #ifndef STBI_NO_HDR
   1355    stbi__context s;
   1356    stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
   1357    return stbi__hdr_test(&s);
   1358    #else
   1359    STBI_NOTUSED(clbk);
   1360    STBI_NOTUSED(user);
   1361    return 0;
   1362    #endif
   1363 }
   1364 
   1365 #ifndef STBI_NO_LINEAR
   1366 static float stbi__l2h_gamma=2.2f, stbi__l2h_scale=1.0f;
   1367 
   1368 STBIDEF void   stbi_ldr_to_hdr_gamma(float gamma) { stbi__l2h_gamma = gamma; }
   1369 STBIDEF void   stbi_ldr_to_hdr_scale(float scale) { stbi__l2h_scale = scale; }
   1370 #endif
   1371 
   1372 static float stbi__h2l_gamma_i=1.0f/2.2f, stbi__h2l_scale_i=1.0f;
   1373 
   1374 STBIDEF void   stbi_hdr_to_ldr_gamma(float gamma) { stbi__h2l_gamma_i = 1/gamma; }
   1375 STBIDEF void   stbi_hdr_to_ldr_scale(float scale) { stbi__h2l_scale_i = 1/scale; }
   1376 
   1377 
   1378 //////////////////////////////////////////////////////////////////////////////
   1379 //
   1380 // Common code used by all image loaders
   1381 //
   1382 
   1383 enum
   1384 {
   1385    STBI__SCAN_load=0,
   1386    STBI__SCAN_type,
   1387    STBI__SCAN_header
   1388 };
   1389 
   1390 static void stbi__refill_buffer(stbi__context *s)
   1391 {
   1392    int n = (s->io.read)(s->io_user_data,(char*)s->buffer_start,s->buflen);
   1393    if (n == 0) {
   1394       // at end of file, treat same as if from memory, but need to handle case
   1395       // where s->img_buffer isn't pointing to safe memory, e.g. 0-byte file
   1396       s->read_from_callbacks = 0;
   1397       s->img_buffer = s->buffer_start;
   1398       s->img_buffer_end = s->buffer_start+1;
   1399       *s->img_buffer = 0;
   1400    } else {
   1401       s->img_buffer = s->buffer_start;
   1402       s->img_buffer_end = s->buffer_start + n;
   1403    }
   1404 }
   1405 
   1406 stbi_inline static stbi_uc stbi__get8(stbi__context *s)
   1407 {
   1408    if (s->img_buffer < s->img_buffer_end)
   1409       return *s->img_buffer++;
   1410    if (s->read_from_callbacks) {
   1411       stbi__refill_buffer(s);
   1412       return *s->img_buffer++;
   1413    }
   1414    return 0;
   1415 }
   1416 
   1417 stbi_inline static int stbi__at_eof(stbi__context *s)
   1418 {
   1419    if (s->io.read) {
   1420       if (!(s->io.eof)(s->io_user_data)) return 0;
   1421       // if feof() is true, check if buffer = end
   1422       // special case: we've only got the special 0 character at the end
   1423       if (s->read_from_callbacks == 0) return 1;
   1424    }
   1425 
   1426    return s->img_buffer >= s->img_buffer_end;
   1427 }
   1428 
   1429 static void stbi__skip(stbi__context *s, int n)
   1430 {
   1431    if (n < 0) {
   1432       s->img_buffer = s->img_buffer_end;
   1433       return;
   1434    }
   1435    if (s->io.read) {
   1436       int blen = (int) (s->img_buffer_end - s->img_buffer);
   1437       if (blen < n) {
   1438          s->img_buffer = s->img_buffer_end;
   1439          (s->io.skip)(s->io_user_data, n - blen);
   1440          return;
   1441       }
   1442    }
   1443    s->img_buffer += n;
   1444 }
   1445 
   1446 static int stbi__getn(stbi__context *s, stbi_uc *buffer, int n)
   1447 {
   1448    if (s->io.read) {
   1449       int blen = (int) (s->img_buffer_end - s->img_buffer);
   1450       if (blen < n) {
   1451          int res, count;
   1452 
   1453          memcpy(buffer, s->img_buffer, blen);
   1454 
   1455          count = (s->io.read)(s->io_user_data, (char*) buffer + blen, n - blen);
   1456          res = (count == (n-blen));
   1457          s->img_buffer = s->img_buffer_end;
   1458          return res;
   1459       }
   1460    }
   1461 
   1462    if (s->img_buffer+n <= s->img_buffer_end) {
   1463       memcpy(buffer, s->img_buffer, n);
   1464       s->img_buffer += n;
   1465       return 1;
   1466    } else
   1467       return 0;
   1468 }
   1469 
   1470 static int stbi__get16be(stbi__context *s)
   1471 {
   1472    int z = stbi__get8(s);
   1473    return (z << 8) + stbi__get8(s);
   1474 }
   1475 
   1476 static stbi__uint32 stbi__get32be(stbi__context *s)
   1477 {
   1478    stbi__uint32 z = stbi__get16be(s);
   1479    return (z << 16) + stbi__get16be(s);
   1480 }
   1481 
   1482 #if defined(STBI_NO_BMP) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF)
   1483 // nothing
   1484 #else
   1485 static int stbi__get16le(stbi__context *s)
   1486 {
   1487    int z = stbi__get8(s);
   1488    return z + (stbi__get8(s) << 8);
   1489 }
   1490 #endif
   1491 
   1492 #ifndef STBI_NO_BMP
   1493 static stbi__uint32 stbi__get32le(stbi__context *s)
   1494 {
   1495    stbi__uint32 z = stbi__get16le(s);
   1496    return z + (stbi__get16le(s) << 16);
   1497 }
   1498 #endif
   1499 
   1500 #define STBI__BYTECAST(x)  ((stbi_uc) ((x) & 255))  // truncate int to byte without warnings
   1501 
   1502 
   1503 //////////////////////////////////////////////////////////////////////////////
   1504 //
   1505 //  generic converter from built-in img_n to req_comp
   1506 //    individual types do this automatically as much as possible (e.g. jpeg
   1507 //    does all cases internally since it needs to colorspace convert anyway,
   1508 //    and it never has alpha, so very few cases ). png can automatically
   1509 //    interleave an alpha=255 channel, but falls back to this for other cases
   1510 //
   1511 //  assume data buffer is malloced, so malloc a new one and free that one
   1512 //  only failure mode is malloc failing
   1513 
   1514 static stbi_uc stbi__compute_y(int r, int g, int b)
   1515 {
   1516    return (stbi_uc) (((r*77) + (g*150) +  (29*b)) >> 8);
   1517 }
   1518 
   1519 static unsigned char *stbi__convert_format(unsigned char *data, int img_n, int req_comp, unsigned int x, unsigned int y)
   1520 {
   1521    int i,j;
   1522    unsigned char *good;
   1523 
   1524    if (req_comp == img_n) return data;
   1525    STBI_ASSERT(req_comp >= 1 && req_comp <= 4);
   1526 
   1527    good = (unsigned char *) stbi__malloc_mad3(req_comp, x, y, 0);
   1528    if (good == NULL) {
   1529       STBI_FREE(data);
   1530       return stbi__errpuc("outofmem", "Out of memory");
   1531    }
   1532 
   1533    for (j=0; j < (int) y; ++j) {
   1534       unsigned char *src  = data + j * x * img_n   ;
   1535       unsigned char *dest = good + j * x * req_comp;
   1536 
   1537       #define STBI__COMBO(a,b)  ((a)*8+(b))
   1538       #define STBI__CASE(a,b)   case STBI__COMBO(a,b): for(i=x-1; i >= 0; --i, src += a, dest += b)
   1539       // convert source image with img_n components to one with req_comp components;
   1540       // avoid switch per pixel, so use switch per scanline and massive macros
   1541       switch (STBI__COMBO(img_n, req_comp)) {
   1542          STBI__CASE(1,2) { dest[0]=src[0], dest[1]=255;                                     } break;
   1543          STBI__CASE(1,3) { dest[0]=dest[1]=dest[2]=src[0];                                  } break;
   1544          STBI__CASE(1,4) { dest[0]=dest[1]=dest[2]=src[0], dest[3]=255;                     } break;
   1545          STBI__CASE(2,1) { dest[0]=src[0];                                                  } break;
   1546          STBI__CASE(2,3) { dest[0]=dest[1]=dest[2]=src[0];                                  } break;
   1547          STBI__CASE(2,4) { dest[0]=dest[1]=dest[2]=src[0], dest[3]=src[1];                  } break;
   1548          STBI__CASE(3,4) { dest[0]=src[0],dest[1]=src[1],dest[2]=src[2],dest[3]=255;        } break;
   1549          STBI__CASE(3,1) { dest[0]=stbi__compute_y(src[0],src[1],src[2]);                   } break;
   1550          STBI__CASE(3,2) { dest[0]=stbi__compute_y(src[0],src[1],src[2]), dest[1] = 255;    } break;
   1551          STBI__CASE(4,1) { dest[0]=stbi__compute_y(src[0],src[1],src[2]);                   } break;
   1552          STBI__CASE(4,2) { dest[0]=stbi__compute_y(src[0],src[1],src[2]), dest[1] = src[3]; } break;
   1553          STBI__CASE(4,3) { dest[0]=src[0],dest[1]=src[1],dest[2]=src[2];                    } break;
   1554          default: STBI_ASSERT(0);
   1555       }
   1556       #undef STBI__CASE
   1557    }
   1558 
   1559    STBI_FREE(data);
   1560    return good;
   1561 }
   1562 
   1563 static stbi__uint16 stbi__compute_y_16(int r, int g, int b)
   1564 {
   1565    return (stbi__uint16) (((r*77) + (g*150) +  (29*b)) >> 8);
   1566 }
   1567 
   1568 static stbi__uint16 *stbi__convert_format16(stbi__uint16 *data, int img_n, int req_comp, unsigned int x, unsigned int y)
   1569 {
   1570    int i,j;
   1571    stbi__uint16 *good;
   1572 
   1573    if (req_comp == img_n) return data;
   1574    STBI_ASSERT(req_comp >= 1 && req_comp <= 4);
   1575 
   1576    good = (stbi__uint16 *) stbi__malloc(req_comp * x * y * 2);
   1577    if (good == NULL) {
   1578       STBI_FREE(data);
   1579       return (stbi__uint16 *) stbi__errpuc("outofmem", "Out of memory");
   1580    }
   1581 
   1582    for (j=0; j < (int) y; ++j) {
   1583       stbi__uint16 *src  = data + j * x * img_n   ;
   1584       stbi__uint16 *dest = good + j * x * req_comp;
   1585 
   1586       #define STBI__COMBO(a,b)  ((a)*8+(b))
   1587       #define STBI__CASE(a,b)   case STBI__COMBO(a,b): for(i=x-1; i >= 0; --i, src += a, dest += b)
   1588       // convert source image with img_n components to one with req_comp components;
   1589       // avoid switch per pixel, so use switch per scanline and massive macros
   1590       switch (STBI__COMBO(img_n, req_comp)) {
   1591          STBI__CASE(1,2) { dest[0]=src[0], dest[1]=0xffff;                                     } break;
   1592          STBI__CASE(1,3) { dest[0]=dest[1]=dest[2]=src[0];                                     } break;
   1593          STBI__CASE(1,4) { dest[0]=dest[1]=dest[2]=src[0], dest[3]=0xffff;                     } break;
   1594          STBI__CASE(2,1) { dest[0]=src[0];                                                     } break;
   1595          STBI__CASE(2,3) { dest[0]=dest[1]=dest[2]=src[0];                                     } break;
   1596          STBI__CASE(2,4) { dest[0]=dest[1]=dest[2]=src[0], dest[3]=src[1];                     } break;
   1597          STBI__CASE(3,4) { dest[0]=src[0],dest[1]=src[1],dest[2]=src[2],dest[3]=0xffff;        } break;
   1598          STBI__CASE(3,1) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]);                   } break;
   1599          STBI__CASE(3,2) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]), dest[1] = 0xffff; } break;
   1600          STBI__CASE(4,1) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]);                   } break;
   1601          STBI__CASE(4,2) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]), dest[1] = src[3]; } break;
   1602          STBI__CASE(4,3) { dest[0]=src[0],dest[1]=src[1],dest[2]=src[2];                       } break;
   1603          default: STBI_ASSERT(0);
   1604       }
   1605       #undef STBI__CASE
   1606    }
   1607 
   1608    STBI_FREE(data);
   1609    return good;
   1610 }
   1611 
   1612 #ifndef STBI_NO_LINEAR
   1613 static float   *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp)
   1614 {
   1615    int i,k,n;
   1616    float *output;
   1617    if (!data) return NULL;
   1618    output = (float *) stbi__malloc_mad4(x, y, comp, sizeof(float), 0);
   1619    if (output == NULL) { STBI_FREE(data); return stbi__errpf("outofmem", "Out of memory"); }
   1620    // compute number of non-alpha components
   1621    if (comp & 1) n = comp; else n = comp-1;
   1622    for (i=0; i < x*y; ++i) {
   1623       for (k=0; k < n; ++k) {
   1624          output[i*comp + k] = (float) (pow(data[i*comp+k]/255.0f, stbi__l2h_gamma) * stbi__l2h_scale);
   1625       }
   1626       if (k < comp) output[i*comp + k] = data[i*comp+k]/255.0f;
   1627    }
   1628    STBI_FREE(data);
   1629    return output;
   1630 }
   1631 #endif
   1632 
   1633 #ifndef STBI_NO_HDR
   1634 #define stbi__float2int(x)   ((int) (x))
   1635 static stbi_uc *stbi__hdr_to_ldr(float   *data, int x, int y, int comp)
   1636 {
   1637    int i,k,n;
   1638    stbi_uc *output;
   1639    if (!data) return NULL;
   1640    output = (stbi_uc *) stbi__malloc_mad3(x, y, comp, 0);
   1641    if (output == NULL) { STBI_FREE(data); return stbi__errpuc("outofmem", "Out of memory"); }
   1642    // compute number of non-alpha components
   1643    if (comp & 1) n = comp; else n = comp-1;
   1644    for (i=0; i < x*y; ++i) {
   1645       for (k=0; k < n; ++k) {
   1646          float z = (float) pow(data[i*comp+k]*stbi__h2l_scale_i, stbi__h2l_gamma_i) * 255 + 0.5f;
   1647          if (z < 0) z = 0;
   1648          if (z > 255) z = 255;
   1649          output[i*comp + k] = (stbi_uc) stbi__float2int(z);
   1650       }
   1651       if (k < comp) {
   1652          float z = data[i*comp+k] * 255 + 0.5f;
   1653          if (z < 0) z = 0;
   1654          if (z > 255) z = 255;
   1655          output[i*comp + k] = (stbi_uc) stbi__float2int(z);
   1656       }
   1657    }
   1658    STBI_FREE(data);
   1659    return output;
   1660 }
   1661 #endif
   1662 
   1663 //////////////////////////////////////////////////////////////////////////////
   1664 //
   1665 //  "baseline" JPEG/JFIF decoder
   1666 //
   1667 //    simple implementation
   1668 //      - doesn't support delayed output of y-dimension
   1669 //      - simple interface (only one output format: 8-bit interleaved RGB)
   1670 //      - doesn't try to recover corrupt jpegs
   1671 //      - doesn't allow partial loading, loading multiple at once
   1672 //      - still fast on x86 (copying globals into locals doesn't help x86)
   1673 //      - allocates lots of intermediate memory (full size of all components)
   1674 //        - non-interleaved case requires this anyway
   1675 //        - allows good upsampling (see next)
   1676 //    high-quality
   1677 //      - upsampled channels are bilinearly interpolated, even across blocks
   1678 //      - quality integer IDCT derived from IJG's 'slow'
   1679 //    performance
   1680 //      - fast huffman; reasonable integer IDCT
   1681 //      - some SIMD kernels for common paths on targets with SSE2/NEON
   1682 //      - uses a lot of intermediate memory, could cache poorly
   1683 
   1684 #ifndef STBI_NO_JPEG
   1685 
   1686 // huffman decoding acceleration
   1687 #define FAST_BITS   9  // larger handles more cases; smaller stomps less cache
   1688 
   1689 typedef struct
   1690 {
   1691    stbi_uc  fast[1 << FAST_BITS];
   1692    // weirdly, repacking this into AoS is a 10% speed loss, instead of a win
   1693    stbi__uint16 code[256];
   1694    stbi_uc  values[256];
   1695    stbi_uc  size[257];
   1696    unsigned int maxcode[18];
   1697    int    delta[17];   // old 'firstsymbol' - old 'firstcode'
   1698 } stbi__huffman;
   1699 
   1700 typedef struct
   1701 {
   1702    stbi__context *s;
   1703    stbi__huffman huff_dc[4];
   1704    stbi__huffman huff_ac[4];
   1705    stbi__uint16 dequant[4][64];
   1706    stbi__int16 fast_ac[4][1 << FAST_BITS];
   1707 
   1708 // sizes for components, interleaved MCUs
   1709    int img_h_max, img_v_max;
   1710    int img_mcu_x, img_mcu_y;
   1711    int img_mcu_w, img_mcu_h;
   1712 
   1713 // definition of jpeg image component
   1714    struct
   1715    {
   1716       int id;
   1717       int h,v;
   1718       int tq;
   1719       int hd,ha;
   1720       int dc_pred;
   1721 
   1722       int x,y,w2,h2;
   1723       stbi_uc *data;
   1724       void *raw_data, *raw_coeff;
   1725       stbi_uc *linebuf;
   1726       short   *coeff;   // progressive only
   1727       int      coeff_w, coeff_h; // number of 8x8 coefficient blocks
   1728    } img_comp[4];
   1729 
   1730    stbi__uint32   code_buffer; // jpeg entropy-coded buffer
   1731    int            code_bits;   // number of valid bits
   1732    unsigned char  marker;      // marker seen while filling entropy buffer
   1733    int            nomore;      // flag if we saw a marker so must stop
   1734 
   1735    int            progressive;
   1736    int            spec_start;
   1737    int            spec_end;
   1738    int            succ_high;
   1739    int            succ_low;
   1740    int            eob_run;
   1741    int            jfif;
   1742    int            app14_color_transform; // Adobe APP14 tag
   1743    int            rgb;
   1744 
   1745    int scan_n, order[4];
   1746    int restart_interval, todo;
   1747 
   1748 // kernels
   1749    void (*idct_block_kernel)(stbi_uc *out, int out_stride, short data[64]);
   1750    void (*YCbCr_to_RGB_kernel)(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step);
   1751    stbi_uc *(*resample_row_hv_2_kernel)(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs);
   1752 } stbi__jpeg;
   1753 
   1754 static int stbi__build_huffman(stbi__huffman *h, int *count)
   1755 {
   1756    int i,j,k=0;
   1757    unsigned int code;
   1758    // build size list for each symbol (from JPEG spec)
   1759    for (i=0; i < 16; ++i)
   1760       for (j=0; j < count[i]; ++j)
   1761          h->size[k++] = (stbi_uc) (i+1);
   1762    h->size[k] = 0;
   1763 
   1764    // compute actual symbols (from jpeg spec)
   1765    code = 0;
   1766    k = 0;
   1767    for(j=1; j <= 16; ++j) {
   1768       // compute delta to add to code to compute symbol id
   1769       h->delta[j] = k - code;
   1770       if (h->size[k] == j) {
   1771          while (h->size[k] == j)
   1772             h->code[k++] = (stbi__uint16) (code++);
   1773          if (code-1 >= (1u << j)) return stbi__err("bad code lengths","Corrupt JPEG");
   1774       }
   1775       // compute largest code + 1 for this size, preshifted as needed later
   1776       h->maxcode[j] = code << (16-j);
   1777       code <<= 1;
   1778    }
   1779    h->maxcode[j] = 0xffffffff;
   1780 
   1781    // build non-spec acceleration table; 255 is flag for not-accelerated
   1782    memset(h->fast, 255, 1 << FAST_BITS);
   1783    for (i=0; i < k; ++i) {
   1784       int s = h->size[i];
   1785       if (s <= FAST_BITS) {
   1786          int c = h->code[i] << (FAST_BITS-s);
   1787          int m = 1 << (FAST_BITS-s);
   1788          for (j=0; j < m; ++j) {
   1789             h->fast[c+j] = (stbi_uc) i;
   1790          }
   1791       }
   1792    }
   1793    return 1;
   1794 }
   1795 
   1796 // build a table that decodes both magnitude and value of small ACs in
   1797 // one go.
   1798 static void stbi__build_fast_ac(stbi__int16 *fast_ac, stbi__huffman *h)
   1799 {
   1800    int i;
   1801    for (i=0; i < (1 << FAST_BITS); ++i) {
   1802       stbi_uc fast = h->fast[i];
   1803       fast_ac[i] = 0;
   1804       if (fast < 255) {
   1805          int rs = h->values[fast];
   1806          int run = (rs >> 4) & 15;
   1807          int magbits = rs & 15;
   1808          int len = h->size[fast];
   1809 
   1810          if (magbits && len + magbits <= FAST_BITS) {
   1811             // magnitude code followed by receive_extend code
   1812             int k = ((i << len) & ((1 << FAST_BITS) - 1)) >> (FAST_BITS - magbits);
   1813             int m = 1 << (magbits - 1);
   1814             if (k < m) k += (~0U << magbits) + 1;
   1815             // if the result is small enough, we can fit it in fast_ac table
   1816             if (k >= -128 && k <= 127)
   1817                fast_ac[i] = (stbi__int16) ((k * 256) + (run * 16) + (len + magbits));
   1818          }
   1819       }
   1820    }
   1821 }
   1822 
   1823 static void stbi__grow_buffer_unsafe(stbi__jpeg *j)
   1824 {
   1825    do {
   1826       unsigned int b = j->nomore ? 0 : stbi__get8(j->s);
   1827       if (b == 0xff) {
   1828          int c = stbi__get8(j->s);
   1829          while (c == 0xff) c = stbi__get8(j->s); // consume fill bytes
   1830          if (c != 0) {
   1831             j->marker = (unsigned char) c;
   1832             j->nomore = 1;
   1833             return;
   1834          }
   1835       }
   1836       j->code_buffer |= b << (24 - j->code_bits);
   1837       j->code_bits += 8;
   1838    } while (j->code_bits <= 24);
   1839 }
   1840 
   1841 // (1 << n) - 1
   1842 static const stbi__uint32 stbi__bmask[17]={0,1,3,7,15,31,63,127,255,511,1023,2047,4095,8191,16383,32767,65535};
   1843 
   1844 // decode a jpeg huffman value from the bitstream
   1845 stbi_inline static int stbi__jpeg_huff_decode(stbi__jpeg *j, stbi__huffman *h)
   1846 {
   1847    unsigned int temp;
   1848    int c,k;
   1849 
   1850    if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
   1851 
   1852    // look at the top FAST_BITS and determine what symbol ID it is,
   1853    // if the code is <= FAST_BITS
   1854    c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
   1855    k = h->fast[c];
   1856    if (k < 255) {
   1857       int s = h->size[k];
   1858       if (s > j->code_bits)
   1859          return -1;
   1860       j->code_buffer <<= s;
   1861       j->code_bits -= s;
   1862       return h->values[k];
   1863    }
   1864 
   1865    // naive test is to shift the code_buffer down so k bits are
   1866    // valid, then test against maxcode. To speed this up, we've
   1867    // preshifted maxcode left so that it has (16-k) 0s at the
   1868    // end; in other words, regardless of the number of bits, it
   1869    // wants to be compared against something shifted to have 16;
   1870    // that way we don't need to shift inside the loop.
   1871    temp = j->code_buffer >> 16;
   1872    for (k=FAST_BITS+1 ; ; ++k)
   1873       if (temp < h->maxcode[k])
   1874          break;
   1875    if (k == 17) {
   1876       // error! code not found
   1877       j->code_bits -= 16;
   1878       return -1;
   1879    }
   1880 
   1881    if (k > j->code_bits)
   1882       return -1;
   1883 
   1884    // convert the huffman code to the symbol id
   1885    c = ((j->code_buffer >> (32 - k)) & stbi__bmask[k]) + h->delta[k];
   1886    STBI_ASSERT((((j->code_buffer) >> (32 - h->size[c])) & stbi__bmask[h->size[c]]) == h->code[c]);
   1887 
   1888    // convert the id to a symbol
   1889    j->code_bits -= k;
   1890    j->code_buffer <<= k;
   1891    return h->values[c];
   1892 }
   1893 
   1894 // bias[n] = (-1<<n) + 1
   1895 static const int stbi__jbias[16] = {0,-1,-3,-7,-15,-31,-63,-127,-255,-511,-1023,-2047,-4095,-8191,-16383,-32767};
   1896 
   1897 // combined JPEG 'receive' and JPEG 'extend', since baseline
   1898 // always extends everything it receives.
   1899 stbi_inline static int stbi__extend_receive(stbi__jpeg *j, int n)
   1900 {
   1901    unsigned int k;
   1902    int sgn;
   1903    if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
   1904 
   1905    sgn = (stbi__int32)j->code_buffer >> 31; // sign bit is always in MSB
   1906    k = stbi_lrot(j->code_buffer, n);
   1907    STBI_ASSERT(n >= 0 && n < (int) (sizeof(stbi__bmask)/sizeof(*stbi__bmask)));
   1908    j->code_buffer = k & ~stbi__bmask[n];
   1909    k &= stbi__bmask[n];
   1910    j->code_bits -= n;
   1911    return k + (stbi__jbias[n] & ~sgn);
   1912 }
   1913 
   1914 // get some unsigned bits
   1915 stbi_inline static int stbi__jpeg_get_bits(stbi__jpeg *j, int n)
   1916 {
   1917    unsigned int k;
   1918    if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
   1919    k = stbi_lrot(j->code_buffer, n);
   1920    j->code_buffer = k & ~stbi__bmask[n];
   1921    k &= stbi__bmask[n];
   1922    j->code_bits -= n;
   1923    return k;
   1924 }
   1925 
   1926 stbi_inline static int stbi__jpeg_get_bit(stbi__jpeg *j)
   1927 {
   1928    unsigned int k;
   1929    if (j->code_bits < 1) stbi__grow_buffer_unsafe(j);
   1930    k = j->code_buffer;
   1931    j->code_buffer <<= 1;
   1932    --j->code_bits;
   1933    return k & 0x80000000;
   1934 }
   1935 
   1936 // given a value that's at position X in the zigzag stream,
   1937 // where does it appear in the 8x8 matrix coded as row-major?
   1938 static const stbi_uc stbi__jpeg_dezigzag[64+15] =
   1939 {
   1940     0,  1,  8, 16,  9,  2,  3, 10,
   1941    17, 24, 32, 25, 18, 11,  4,  5,
   1942    12, 19, 26, 33, 40, 48, 41, 34,
   1943    27, 20, 13,  6,  7, 14, 21, 28,
   1944    35, 42, 49, 56, 57, 50, 43, 36,
   1945    29, 22, 15, 23, 30, 37, 44, 51,
   1946    58, 59, 52, 45, 38, 31, 39, 46,
   1947    53, 60, 61, 54, 47, 55, 62, 63,
   1948    // let corrupt input sample past end
   1949    63, 63, 63, 63, 63, 63, 63, 63,
   1950    63, 63, 63, 63, 63, 63, 63
   1951 };
   1952 
   1953 // decode one 64-entry block--
   1954 static int stbi__jpeg_decode_block(stbi__jpeg *j, short data[64], stbi__huffman *hdc, stbi__huffman *hac, stbi__int16 *fac, int b, stbi__uint16 *dequant)
   1955 {
   1956    int diff,dc,k;
   1957    int t;
   1958 
   1959    if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
   1960    t = stbi__jpeg_huff_decode(j, hdc);
   1961    if (t < 0) return stbi__err("bad huffman code","Corrupt JPEG");
   1962 
   1963    // 0 all the ac values now so we can do it 32-bits at a time
   1964    memset(data,0,64*sizeof(data[0]));
   1965 
   1966    diff = t ? stbi__extend_receive(j, t) : 0;
   1967    dc = j->img_comp[b].dc_pred + diff;
   1968    j->img_comp[b].dc_pred = dc;
   1969    data[0] = (short) (dc * dequant[0]);
   1970 
   1971    // decode AC components, see JPEG spec
   1972    k = 1;
   1973    do {
   1974       unsigned int zig;
   1975       int c,r,s;
   1976       if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
   1977       c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
   1978       r = fac[c];
   1979       if (r) { // fast-AC path
   1980          k += (r >> 4) & 15; // run
   1981          s = r & 15; // combined length
   1982          j->code_buffer <<= s;
   1983          j->code_bits -= s;
   1984          // decode into unzigzag'd location
   1985          zig = stbi__jpeg_dezigzag[k++];
   1986          data[zig] = (short) ((r >> 8) * dequant[zig]);
   1987       } else {
   1988          int rs = stbi__jpeg_huff_decode(j, hac);
   1989          if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
   1990          s = rs & 15;
   1991          r = rs >> 4;
   1992          if (s == 0) {
   1993             if (rs != 0xf0) break; // end block
   1994             k += 16;
   1995          } else {
   1996             k += r;
   1997             // decode into unzigzag'd location
   1998             zig = stbi__jpeg_dezigzag[k++];
   1999             data[zig] = (short) (stbi__extend_receive(j,s) * dequant[zig]);
   2000          }
   2001       }
   2002    } while (k < 64);
   2003    return 1;
   2004 }
   2005 
   2006 static int stbi__jpeg_decode_block_prog_dc(stbi__jpeg *j, short data[64], stbi__huffman *hdc, int b)
   2007 {
   2008    int diff,dc;
   2009    int t;
   2010    if (j->spec_end != 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
   2011 
   2012    if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
   2013 
   2014    if (j->succ_high == 0) {
   2015       // first scan for DC coefficient, must be first
   2016       memset(data,0,64*sizeof(data[0])); // 0 all the ac values now
   2017       t = stbi__jpeg_huff_decode(j, hdc);
   2018       diff = t ? stbi__extend_receive(j, t) : 0;
   2019 
   2020       dc = j->img_comp[b].dc_pred + diff;
   2021       j->img_comp[b].dc_pred = dc;
   2022       data[0] = (short) (dc << j->succ_low);
   2023    } else {
   2024       // refinement scan for DC coefficient
   2025       if (stbi__jpeg_get_bit(j))
   2026          data[0] += (short) (1 << j->succ_low);
   2027    }
   2028    return 1;
   2029 }
   2030 
   2031 // @OPTIMIZE: store non-zigzagged during the decode passes,
   2032 // and only de-zigzag when dequantizing
   2033 static int stbi__jpeg_decode_block_prog_ac(stbi__jpeg *j, short data[64], stbi__huffman *hac, stbi__int16 *fac)
   2034 {
   2035    int k;
   2036    if (j->spec_start == 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
   2037 
   2038    if (j->succ_high == 0) {
   2039       int shift = j->succ_low;
   2040 
   2041       if (j->eob_run) {
   2042          --j->eob_run;
   2043          return 1;
   2044       }
   2045 
   2046       k = j->spec_start;
   2047       do {
   2048          unsigned int zig;
   2049          int c,r,s;
   2050          if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
   2051          c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
   2052          r = fac[c];
   2053          if (r) { // fast-AC path
   2054             k += (r >> 4) & 15; // run
   2055             s = r & 15; // combined length
   2056             j->code_buffer <<= s;
   2057             j->code_bits -= s;
   2058             zig = stbi__jpeg_dezigzag[k++];
   2059             data[zig] = (short) ((r >> 8) << shift);
   2060          } else {
   2061             int rs = stbi__jpeg_huff_decode(j, hac);
   2062             if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
   2063             s = rs & 15;
   2064             r = rs >> 4;
   2065             if (s == 0) {
   2066                if (r < 15) {
   2067                   j->eob_run = (1 << r);
   2068                   if (r)
   2069                      j->eob_run += stbi__jpeg_get_bits(j, r);
   2070                   --j->eob_run;
   2071                   break;
   2072                }
   2073                k += 16;
   2074             } else {
   2075                k += r;
   2076                zig = stbi__jpeg_dezigzag[k++];
   2077                data[zig] = (short) (stbi__extend_receive(j,s) << shift);
   2078             }
   2079          }
   2080       } while (k <= j->spec_end);
   2081    } else {
   2082       // refinement scan for these AC coefficients
   2083 
   2084       short bit = (short) (1 << j->succ_low);
   2085 
   2086       if (j->eob_run) {
   2087          --j->eob_run;
   2088          for (k = j->spec_start; k <= j->spec_end; ++k) {
   2089             short *p = &data[stbi__jpeg_dezigzag[k]];
   2090             if (*p != 0)
   2091                if (stbi__jpeg_get_bit(j))
   2092                   if ((*p & bit)==0) {
   2093                      if (*p > 0)
   2094                         *p += bit;
   2095                      else
   2096                         *p -= bit;
   2097                   }
   2098          }
   2099       } else {
   2100          k = j->spec_start;
   2101          do {
   2102             int r,s;
   2103             int rs = stbi__jpeg_huff_decode(j, hac); // @OPTIMIZE see if we can use the fast path here, advance-by-r is so slow, eh
   2104             if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
   2105             s = rs & 15;
   2106             r = rs >> 4;
   2107             if (s == 0) {
   2108                if (r < 15) {
   2109                   j->eob_run = (1 << r) - 1;
   2110                   if (r)
   2111                      j->eob_run += stbi__jpeg_get_bits(j, r);
   2112                   r = 64; // force end of block
   2113                } else {
   2114                   // r=15 s=0 should write 16 0s, so we just do
   2115                   // a run of 15 0s and then write s (which is 0),
   2116                   // so we don't have to do anything special here
   2117                }
   2118             } else {
   2119                if (s != 1) return stbi__err("bad huffman code", "Corrupt JPEG");
   2120                // sign bit
   2121                if (stbi__jpeg_get_bit(j))
   2122                   s = bit;
   2123                else
   2124                   s = -bit;
   2125             }
   2126 
   2127             // advance by r
   2128             while (k <= j->spec_end) {
   2129                short *p = &data[stbi__jpeg_dezigzag[k++]];
   2130                if (*p != 0) {
   2131                   if (stbi__jpeg_get_bit(j))
   2132                      if ((*p & bit)==0) {
   2133                         if (*p > 0)
   2134                            *p += bit;
   2135                         else
   2136                            *p -= bit;
   2137                      }
   2138                } else {
   2139                   if (r == 0) {
   2140                      *p = (short) s;
   2141                      break;
   2142                   }
   2143                   --r;
   2144                }
   2145             }
   2146          } while (k <= j->spec_end);
   2147       }
   2148    }
   2149    return 1;
   2150 }
   2151 
   2152 // take a -128..127 value and stbi__clamp it and convert to 0..255
   2153 stbi_inline static stbi_uc stbi__clamp(int x)
   2154 {
   2155    // trick to use a single test to catch both cases
   2156    if ((unsigned int) x > 255) {
   2157       if (x < 0) return 0;
   2158       if (x > 255) return 255;
   2159    }
   2160    return (stbi_uc) x;
   2161 }
   2162 
   2163 #define stbi__f2f(x)  ((int) (((x) * 4096 + 0.5)))
   2164 #define stbi__fsh(x)  ((x) * 4096)
   2165 
   2166 // derived from jidctint -- DCT_ISLOW
   2167 #define STBI__IDCT_1D(s0,s1,s2,s3,s4,s5,s6,s7) \
   2168    int t0,t1,t2,t3,p1,p2,p3,p4,p5,x0,x1,x2,x3; \
   2169    p2 = s2;                                    \
   2170    p3 = s6;                                    \
   2171    p1 = (p2+p3) * stbi__f2f(0.5411961f);       \
   2172    t2 = p1 + p3*stbi__f2f(-1.847759065f);      \
   2173    t3 = p1 + p2*stbi__f2f( 0.765366865f);      \
   2174    p2 = s0;                                    \
   2175    p3 = s4;                                    \
   2176    t0 = stbi__fsh(p2+p3);                      \
   2177    t1 = stbi__fsh(p2-p3);                      \
   2178    x0 = t0+t3;                                 \
   2179    x3 = t0-t3;                                 \
   2180    x1 = t1+t2;                                 \
   2181    x2 = t1-t2;                                 \
   2182    t0 = s7;                                    \
   2183    t1 = s5;                                    \
   2184    t2 = s3;                                    \
   2185    t3 = s1;                                    \
   2186    p3 = t0+t2;                                 \
   2187    p4 = t1+t3;                                 \
   2188    p1 = t0+t3;                                 \
   2189    p2 = t1+t2;                                 \
   2190    p5 = (p3+p4)*stbi__f2f( 1.175875602f);      \
   2191    t0 = t0*stbi__f2f( 0.298631336f);           \
   2192    t1 = t1*stbi__f2f( 2.053119869f);           \
   2193    t2 = t2*stbi__f2f( 3.072711026f);           \
   2194    t3 = t3*stbi__f2f( 1.501321110f);           \
   2195    p1 = p5 + p1*stbi__f2f(-0.899976223f);      \
   2196    p2 = p5 + p2*stbi__f2f(-2.562915447f);      \
   2197    p3 = p3*stbi__f2f(-1.961570560f);           \
   2198    p4 = p4*stbi__f2f(-0.390180644f);           \
   2199    t3 += p1+p4;                                \
   2200    t2 += p2+p3;                                \
   2201    t1 += p2+p4;                                \
   2202    t0 += p1+p3;
   2203 
   2204 static void stbi__idct_block(stbi_uc *out, int out_stride, short data[64])
   2205 {
   2206    int i,val[64],*v=val;
   2207    stbi_uc *o;
   2208    short *d = data;
   2209 
   2210    // columns
   2211    for (i=0; i < 8; ++i,++d, ++v) {
   2212       // if all zeroes, shortcut -- this avoids dequantizing 0s and IDCTing
   2213       if (d[ 8]==0 && d[16]==0 && d[24]==0 && d[32]==0
   2214            && d[40]==0 && d[48]==0 && d[56]==0) {
   2215          //    no shortcut                 0     seconds
   2216          //    (1|2|3|4|5|6|7)==0          0     seconds
   2217          //    all separate               -0.047 seconds
   2218          //    1 && 2|3 && 4|5 && 6|7:    -0.047 seconds
   2219          int dcterm = d[0]*4;
   2220          v[0] = v[8] = v[16] = v[24] = v[32] = v[40] = v[48] = v[56] = dcterm;
   2221       } else {
   2222          STBI__IDCT_1D(d[ 0],d[ 8],d[16],d[24],d[32],d[40],d[48],d[56])
   2223          // constants scaled things up by 1<<12; let's bring them back
   2224          // down, but keep 2 extra bits of precision
   2225          x0 += 512; x1 += 512; x2 += 512; x3 += 512;
   2226          v[ 0] = (x0+t3) >> 10;
   2227          v[56] = (x0-t3) >> 10;
   2228          v[ 8] = (x1+t2) >> 10;
   2229          v[48] = (x1-t2) >> 10;
   2230          v[16] = (x2+t1) >> 10;
   2231          v[40] = (x2-t1) >> 10;
   2232          v[24] = (x3+t0) >> 10;
   2233          v[32] = (x3-t0) >> 10;
   2234       }
   2235    }
   2236 
   2237    for (i=0, v=val, o=out; i < 8; ++i,v+=8,o+=out_stride) {
   2238       // no fast case since the first 1D IDCT spread components out
   2239       STBI__IDCT_1D(v[0],v[1],v[2],v[3],v[4],v[5],v[6],v[7])
   2240       // constants scaled things up by 1<<12, plus we had 1<<2 from first
   2241       // loop, plus horizontal and vertical each scale by sqrt(8) so together
   2242       // we've got an extra 1<<3, so 1<<17 total we need to remove.
   2243       // so we want to round that, which means adding 0.5 * 1<<17,
   2244       // aka 65536. Also, we'll end up with -128 to 127 that we want
   2245       // to encode as 0..255 by adding 128, so we'll add that before the shift
   2246       x0 += 65536 + (128<<17);
   2247       x1 += 65536 + (128<<17);
   2248       x2 += 65536 + (128<<17);
   2249       x3 += 65536 + (128<<17);
   2250       // tried computing the shifts into temps, or'ing the temps to see
   2251       // if any were out of range, but that was slower
   2252       o[0] = stbi__clamp((x0+t3) >> 17);
   2253       o[7] = stbi__clamp((x0-t3) >> 17);
   2254       o[1] = stbi__clamp((x1+t2) >> 17);
   2255       o[6] = stbi__clamp((x1-t2) >> 17);
   2256       o[2] = stbi__clamp((x2+t1) >> 17);
   2257       o[5] = stbi__clamp((x2-t1) >> 17);
   2258       o[3] = stbi__clamp((x3+t0) >> 17);
   2259       o[4] = stbi__clamp((x3-t0) >> 17);
   2260    }
   2261 }
   2262 
   2263 #ifdef STBI_SSE2
   2264 // sse2 integer IDCT. not the fastest possible implementation but it
   2265 // produces bit-identical results to the generic C version so it's
   2266 // fully "transparent".
   2267 static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
   2268 {
   2269    // This is constructed to match our regular (generic) integer IDCT exactly.
   2270    __m128i row0, row1, row2, row3, row4, row5, row6, row7;
   2271    __m128i tmp;
   2272 
   2273    // dot product constant: even elems=x, odd elems=y
   2274    #define dct_const(x,y)  _mm_setr_epi16((x),(y),(x),(y),(x),(y),(x),(y))
   2275 
   2276    // out(0) = c0[even]*x + c0[odd]*y   (c0, x, y 16-bit, out 32-bit)
   2277    // out(1) = c1[even]*x + c1[odd]*y
   2278    #define dct_rot(out0,out1, x,y,c0,c1) \
   2279       __m128i c0##lo = _mm_unpacklo_epi16((x),(y)); \
   2280       __m128i c0##hi = _mm_unpackhi_epi16((x),(y)); \
   2281       __m128i out0##_l = _mm_madd_epi16(c0##lo, c0); \
   2282       __m128i out0##_h = _mm_madd_epi16(c0##hi, c0); \
   2283       __m128i out1##_l = _mm_madd_epi16(c0##lo, c1); \
   2284       __m128i out1##_h = _mm_madd_epi16(c0##hi, c1)
   2285 
   2286    // out = in << 12  (in 16-bit, out 32-bit)
   2287    #define dct_widen(out, in) \
   2288       __m128i out##_l = _mm_srai_epi32(_mm_unpacklo_epi16(_mm_setzero_si128(), (in)), 4); \
   2289       __m128i out##_h = _mm_srai_epi32(_mm_unpackhi_epi16(_mm_setzero_si128(), (in)), 4)
   2290 
   2291    // wide add
   2292    #define dct_wadd(out, a, b) \
   2293       __m128i out##_l = _mm_add_epi32(a##_l, b##_l); \
   2294       __m128i out##_h = _mm_add_epi32(a##_h, b##_h)
   2295 
   2296    // wide sub
   2297    #define dct_wsub(out, a, b) \
   2298       __m128i out##_l = _mm_sub_epi32(a##_l, b##_l); \
   2299       __m128i out##_h = _mm_sub_epi32(a##_h, b##_h)
   2300 
   2301    // butterfly a/b, add bias, then shift by "s" and pack
   2302    #define dct_bfly32o(out0, out1, a,b,bias,s) \
   2303       { \
   2304          __m128i abiased_l = _mm_add_epi32(a##_l, bias); \
   2305          __m128i abiased_h = _mm_add_epi32(a##_h, bias); \
   2306          dct_wadd(sum, abiased, b); \
   2307          dct_wsub(dif, abiased, b); \
   2308          out0 = _mm_packs_epi32(_mm_srai_epi32(sum_l, s), _mm_srai_epi32(sum_h, s)); \
   2309          out1 = _mm_packs_epi32(_mm_srai_epi32(dif_l, s), _mm_srai_epi32(dif_h, s)); \
   2310       }
   2311 
   2312    // 8-bit interleave step (for transposes)
   2313    #define dct_interleave8(a, b) \
   2314       tmp = a; \
   2315       a = _mm_unpacklo_epi8(a, b); \
   2316       b = _mm_unpackhi_epi8(tmp, b)
   2317 
   2318    // 16-bit interleave step (for transposes)
   2319    #define dct_interleave16(a, b) \
   2320       tmp = a; \
   2321       a = _mm_unpacklo_epi16(a, b); \
   2322       b = _mm_unpackhi_epi16(tmp, b)
   2323 
   2324    #define dct_pass(bias,shift) \
   2325       { \
   2326          /* even part */ \
   2327          dct_rot(t2e,t3e, row2,row6, rot0_0,rot0_1); \
   2328          __m128i sum04 = _mm_add_epi16(row0, row4); \
   2329          __m128i dif04 = _mm_sub_epi16(row0, row4); \
   2330          dct_widen(t0e, sum04); \
   2331          dct_widen(t1e, dif04); \
   2332          dct_wadd(x0, t0e, t3e); \
   2333          dct_wsub(x3, t0e, t3e); \
   2334          dct_wadd(x1, t1e, t2e); \
   2335          dct_wsub(x2, t1e, t2e); \
   2336          /* odd part */ \
   2337          dct_rot(y0o,y2o, row7,row3, rot2_0,rot2_1); \
   2338          dct_rot(y1o,y3o, row5,row1, rot3_0,rot3_1); \
   2339          __m128i sum17 = _mm_add_epi16(row1, row7); \
   2340          __m128i sum35 = _mm_add_epi16(row3, row5); \
   2341          dct_rot(y4o,y5o, sum17,sum35, rot1_0,rot1_1); \
   2342          dct_wadd(x4, y0o, y4o); \
   2343          dct_wadd(x5, y1o, y5o); \
   2344          dct_wadd(x6, y2o, y5o); \
   2345          dct_wadd(x7, y3o, y4o); \
   2346          dct_bfly32o(row0,row7, x0,x7,bias,shift); \
   2347          dct_bfly32o(row1,row6, x1,x6,bias,shift); \
   2348          dct_bfly32o(row2,row5, x2,x5,bias,shift); \
   2349          dct_bfly32o(row3,row4, x3,x4,bias,shift); \
   2350       }
   2351 
   2352    __m128i rot0_0 = dct_const(stbi__f2f(0.5411961f), stbi__f2f(0.5411961f) + stbi__f2f(-1.847759065f));
   2353    __m128i rot0_1 = dct_const(stbi__f2f(0.5411961f) + stbi__f2f( 0.765366865f), stbi__f2f(0.5411961f));
   2354    __m128i rot1_0 = dct_const(stbi__f2f(1.175875602f) + stbi__f2f(-0.899976223f), stbi__f2f(1.175875602f));
   2355    __m128i rot1_1 = dct_const(stbi__f2f(1.175875602f), stbi__f2f(1.175875602f) + stbi__f2f(-2.562915447f));
   2356    __m128i rot2_0 = dct_const(stbi__f2f(-1.961570560f) + stbi__f2f( 0.298631336f), stbi__f2f(-1.961570560f));
   2357    __m128i rot2_1 = dct_const(stbi__f2f(-1.961570560f), stbi__f2f(-1.961570560f) + stbi__f2f( 3.072711026f));
   2358    __m128i rot3_0 = dct_const(stbi__f2f(-0.390180644f) + stbi__f2f( 2.053119869f), stbi__f2f(-0.390180644f));
   2359    __m128i rot3_1 = dct_const(stbi__f2f(-0.390180644f), stbi__f2f(-0.390180644f) + stbi__f2f( 1.501321110f));
   2360 
   2361    // rounding biases in column/row passes, see stbi__idct_block for explanation.
   2362    __m128i bias_0 = _mm_set1_epi32(512);
   2363    __m128i bias_1 = _mm_set1_epi32(65536 + (128<<17));
   2364 
   2365    // load
   2366    row0 = _mm_load_si128((const __m128i *) (data + 0*8));
   2367    row1 = _mm_load_si128((const __m128i *) (data + 1*8));
   2368    row2 = _mm_load_si128((const __m128i *) (data + 2*8));
   2369    row3 = _mm_load_si128((const __m128i *) (data + 3*8));
   2370    row4 = _mm_load_si128((const __m128i *) (data + 4*8));
   2371    row5 = _mm_load_si128((const __m128i *) (data + 5*8));
   2372    row6 = _mm_load_si128((const __m128i *) (data + 6*8));
   2373    row7 = _mm_load_si128((const __m128i *) (data + 7*8));
   2374 
   2375    // column pass
   2376    dct_pass(bias_0, 10);
   2377 
   2378    {
   2379       // 16bit 8x8 transpose pass 1
   2380       dct_interleave16(row0, row4);
   2381       dct_interleave16(row1, row5);
   2382       dct_interleave16(row2, row6);
   2383       dct_interleave16(row3, row7);
   2384 
   2385       // transpose pass 2
   2386       dct_interleave16(row0, row2);
   2387       dct_interleave16(row1, row3);
   2388       dct_interleave16(row4, row6);
   2389       dct_interleave16(row5, row7);
   2390 
   2391       // transpose pass 3
   2392       dct_interleave16(row0, row1);
   2393       dct_interleave16(row2, row3);
   2394       dct_interleave16(row4, row5);
   2395       dct_interleave16(row6, row7);
   2396    }
   2397 
   2398    // row pass
   2399    dct_pass(bias_1, 17);
   2400 
   2401    {
   2402       // pack
   2403       __m128i p0 = _mm_packus_epi16(row0, row1); // a0a1a2a3...a7b0b1b2b3...b7
   2404       __m128i p1 = _mm_packus_epi16(row2, row3);
   2405       __m128i p2 = _mm_packus_epi16(row4, row5);
   2406       __m128i p3 = _mm_packus_epi16(row6, row7);
   2407 
   2408       // 8bit 8x8 transpose pass 1
   2409       dct_interleave8(p0, p2); // a0e0a1e1...
   2410       dct_interleave8(p1, p3); // c0g0c1g1...
   2411 
   2412       // transpose pass 2
   2413       dct_interleave8(p0, p1); // a0c0e0g0...
   2414       dct_interleave8(p2, p3); // b0d0f0h0...
   2415 
   2416       // transpose pass 3
   2417       dct_interleave8(p0, p2); // a0b0c0d0...
   2418       dct_interleave8(p1, p3); // a4b4c4d4...
   2419 
   2420       // store
   2421       _mm_storel_epi64((__m128i *) out, p0); out += out_stride;
   2422       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p0, 0x4e)); out += out_stride;
   2423       _mm_storel_epi64((__m128i *) out, p2); out += out_stride;
   2424       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p2, 0x4e)); out += out_stride;
   2425       _mm_storel_epi64((__m128i *) out, p1); out += out_stride;
   2426       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p1, 0x4e)); out += out_stride;
   2427       _mm_storel_epi64((__m128i *) out, p3); out += out_stride;
   2428       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p3, 0x4e));
   2429    }
   2430 
   2431 #undef dct_const
   2432 #undef dct_rot
   2433 #undef dct_widen
   2434 #undef dct_wadd
   2435 #undef dct_wsub
   2436 #undef dct_bfly32o
   2437 #undef dct_interleave8
   2438 #undef dct_interleave16
   2439 #undef dct_pass
   2440 }
   2441 
   2442 #endif // STBI_SSE2
   2443 
   2444 #ifdef STBI_NEON
   2445 
   2446 // NEON integer IDCT. should produce bit-identical
   2447 // results to the generic C version.
   2448 static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
   2449 {
   2450    int16x8_t row0, row1, row2, row3, row4, row5, row6, row7;
   2451 
   2452    int16x4_t rot0_0 = vdup_n_s16(stbi__f2f(0.5411961f));
   2453    int16x4_t rot0_1 = vdup_n_s16(stbi__f2f(-1.847759065f));
   2454    int16x4_t rot0_2 = vdup_n_s16(stbi__f2f( 0.765366865f));
   2455    int16x4_t rot1_0 = vdup_n_s16(stbi__f2f( 1.175875602f));
   2456    int16x4_t rot1_1 = vdup_n_s16(stbi__f2f(-0.899976223f));
   2457    int16x4_t rot1_2 = vdup_n_s16(stbi__f2f(-2.562915447f));
   2458    int16x4_t rot2_0 = vdup_n_s16(stbi__f2f(-1.961570560f));
   2459    int16x4_t rot2_1 = vdup_n_s16(stbi__f2f(-0.390180644f));
   2460    int16x4_t rot3_0 = vdup_n_s16(stbi__f2f( 0.298631336f));
   2461    int16x4_t rot3_1 = vdup_n_s16(stbi__f2f( 2.053119869f));
   2462    int16x4_t rot3_2 = vdup_n_s16(stbi__f2f( 3.072711026f));
   2463    int16x4_t rot3_3 = vdup_n_s16(stbi__f2f( 1.501321110f));
   2464 
   2465 #define dct_long_mul(out, inq, coeff) \
   2466    int32x4_t out##_l = vmull_s16(vget_low_s16(inq), coeff); \
   2467    int32x4_t out##_h = vmull_s16(vget_high_s16(inq), coeff)
   2468 
   2469 #define dct_long_mac(out, acc, inq, coeff) \
   2470    int32x4_t out##_l = vmlal_s16(acc##_l, vget_low_s16(inq), coeff); \
   2471    int32x4_t out##_h = vmlal_s16(acc##_h, vget_high_s16(inq), coeff)
   2472 
   2473 #define dct_widen(out, inq) \
   2474    int32x4_t out##_l = vshll_n_s16(vget_low_s16(inq), 12); \
   2475    int32x4_t out##_h = vshll_n_s16(vget_high_s16(inq), 12)
   2476 
   2477 // wide add
   2478 #define dct_wadd(out, a, b) \
   2479    int32x4_t out##_l = vaddq_s32(a##_l, b##_l); \
   2480    int32x4_t out##_h = vaddq_s32(a##_h, b##_h)
   2481 
   2482 // wide sub
   2483 #define dct_wsub(out, a, b) \
   2484    int32x4_t out##_l = vsubq_s32(a##_l, b##_l); \
   2485    int32x4_t out##_h = vsubq_s32(a##_h, b##_h)
   2486 
   2487 // butterfly a/b, then shift using "shiftop" by "s" and pack
   2488 #define dct_bfly32o(out0,out1, a,b,shiftop,s) \
   2489    { \
   2490       dct_wadd(sum, a, b); \
   2491       dct_wsub(dif, a, b); \
   2492       out0 = vcombine_s16(shiftop(sum_l, s), shiftop(sum_h, s)); \
   2493       out1 = vcombine_s16(shiftop(dif_l, s), shiftop(dif_h, s)); \
   2494    }
   2495 
   2496 #define dct_pass(shiftop, shift) \
   2497    { \
   2498       /* even part */ \
   2499       int16x8_t sum26 = vaddq_s16(row2, row6); \
   2500       dct_long_mul(p1e, sum26, rot0_0); \
   2501       dct_long_mac(t2e, p1e, row6, rot0_1); \
   2502       dct_long_mac(t3e, p1e, row2, rot0_2); \
   2503       int16x8_t sum04 = vaddq_s16(row0, row4); \
   2504       int16x8_t dif04 = vsubq_s16(row0, row4); \
   2505       dct_widen(t0e, sum04); \
   2506       dct_widen(t1e, dif04); \
   2507       dct_wadd(x0, t0e, t3e); \
   2508       dct_wsub(x3, t0e, t3e); \
   2509       dct_wadd(x1, t1e, t2e); \
   2510       dct_wsub(x2, t1e, t2e); \
   2511       /* odd part */ \
   2512       int16x8_t sum15 = vaddq_s16(row1, row5); \
   2513       int16x8_t sum17 = vaddq_s16(row1, row7); \
   2514       int16x8_t sum35 = vaddq_s16(row3, row5); \
   2515       int16x8_t sum37 = vaddq_s16(row3, row7); \
   2516       int16x8_t sumodd = vaddq_s16(sum17, sum35); \
   2517       dct_long_mul(p5o, sumodd, rot1_0); \
   2518       dct_long_mac(p1o, p5o, sum17, rot1_1); \
   2519       dct_long_mac(p2o, p5o, sum35, rot1_2); \
   2520       dct_long_mul(p3o, sum37, rot2_0); \
   2521       dct_long_mul(p4o, sum15, rot2_1); \
   2522       dct_wadd(sump13o, p1o, p3o); \
   2523       dct_wadd(sump24o, p2o, p4o); \
   2524       dct_wadd(sump23o, p2o, p3o); \
   2525       dct_wadd(sump14o, p1o, p4o); \
   2526       dct_long_mac(x4, sump13o, row7, rot3_0); \
   2527       dct_long_mac(x5, sump24o, row5, rot3_1); \
   2528       dct_long_mac(x6, sump23o, row3, rot3_2); \
   2529       dct_long_mac(x7, sump14o, row1, rot3_3); \
   2530       dct_bfly32o(row0,row7, x0,x7,shiftop,shift); \
   2531       dct_bfly32o(row1,row6, x1,x6,shiftop,shift); \
   2532       dct_bfly32o(row2,row5, x2,x5,shiftop,shift); \
   2533       dct_bfly32o(row3,row4, x3,x4,shiftop,shift); \
   2534    }
   2535 
   2536    // load
   2537    row0 = vld1q_s16(data + 0*8);
   2538    row1 = vld1q_s16(data + 1*8);
   2539    row2 = vld1q_s16(data + 2*8);
   2540    row3 = vld1q_s16(data + 3*8);
   2541    row4 = vld1q_s16(data + 4*8);
   2542    row5 = vld1q_s16(data + 5*8);
   2543    row6 = vld1q_s16(data + 6*8);
   2544    row7 = vld1q_s16(data + 7*8);
   2545 
   2546    // add DC bias
   2547    row0 = vaddq_s16(row0, vsetq_lane_s16(1024, vdupq_n_s16(0), 0));
   2548 
   2549    // column pass
   2550    dct_pass(vrshrn_n_s32, 10);
   2551 
   2552    // 16bit 8x8 transpose
   2553    {
   2554 // these three map to a single VTRN.16, VTRN.32, and VSWP, respectively.
   2555 // whether compilers actually get this is another story, sadly.
   2556 #define dct_trn16(x, y) { int16x8x2_t t = vtrnq_s16(x, y); x = t.val[0]; y = t.val[1]; }
   2557 #define dct_trn32(x, y) { int32x4x2_t t = vtrnq_s32(vreinterpretq_s32_s16(x), vreinterpretq_s32_s16(y)); x = vreinterpretq_s16_s32(t.val[0]); y = vreinterpretq_s16_s32(t.val[1]); }
   2558 #define dct_trn64(x, y) { int16x8_t x0 = x; int16x8_t y0 = y; x = vcombine_s16(vget_low_s16(x0), vget_low_s16(y0)); y = vcombine_s16(vget_high_s16(x0), vget_high_s16(y0)); }
   2559 
   2560       // pass 1
   2561       dct_trn16(row0, row1); // a0b0a2b2a4b4a6b6
   2562       dct_trn16(row2, row3);
   2563       dct_trn16(row4, row5);
   2564       dct_trn16(row6, row7);
   2565 
   2566       // pass 2
   2567       dct_trn32(row0, row2); // a0b0c0d0a4b4c4d4
   2568       dct_trn32(row1, row3);
   2569       dct_trn32(row4, row6);
   2570       dct_trn32(row5, row7);
   2571 
   2572       // pass 3
   2573       dct_trn64(row0, row4); // a0b0c0d0e0f0g0h0
   2574       dct_trn64(row1, row5);
   2575       dct_trn64(row2, row6);
   2576       dct_trn64(row3, row7);
   2577 
   2578 #undef dct_trn16
   2579 #undef dct_trn32
   2580 #undef dct_trn64
   2581    }
   2582 
   2583    // row pass
   2584    // vrshrn_n_s32 only supports shifts up to 16, we need
   2585    // 17. so do a non-rounding shift of 16 first then follow
   2586    // up with a rounding shift by 1.
   2587    dct_pass(vshrn_n_s32, 16);
   2588 
   2589    {
   2590       // pack and round
   2591       uint8x8_t p0 = vqrshrun_n_s16(row0, 1);
   2592       uint8x8_t p1 = vqrshrun_n_s16(row1, 1);
   2593       uint8x8_t p2 = vqrshrun_n_s16(row2, 1);
   2594       uint8x8_t p3 = vqrshrun_n_s16(row3, 1);
   2595       uint8x8_t p4 = vqrshrun_n_s16(row4, 1);
   2596       uint8x8_t p5 = vqrshrun_n_s16(row5, 1);
   2597       uint8x8_t p6 = vqrshrun_n_s16(row6, 1);
   2598       uint8x8_t p7 = vqrshrun_n_s16(row7, 1);
   2599 
   2600       // again, these can translate into one instruction, but often don't.
   2601 #define dct_trn8_8(x, y) { uint8x8x2_t t = vtrn_u8(x, y); x = t.val[0]; y = t.val[1]; }
   2602 #define dct_trn8_16(x, y) { uint16x4x2_t t = vtrn_u16(vreinterpret_u16_u8(x), vreinterpret_u16_u8(y)); x = vreinterpret_u8_u16(t.val[0]); y = vreinterpret_u8_u16(t.val[1]); }
   2603 #define dct_trn8_32(x, y) { uint32x2x2_t t = vtrn_u32(vreinterpret_u32_u8(x), vreinterpret_u32_u8(y)); x = vreinterpret_u8_u32(t.val[0]); y = vreinterpret_u8_u32(t.val[1]); }
   2604 
   2605       // sadly can't use interleaved stores here since we only write
   2606       // 8 bytes to each scan line!
   2607 
   2608       // 8x8 8-bit transpose pass 1
   2609       dct_trn8_8(p0, p1);
   2610       dct_trn8_8(p2, p3);
   2611       dct_trn8_8(p4, p5);
   2612       dct_trn8_8(p6, p7);
   2613 
   2614       // pass 2
   2615       dct_trn8_16(p0, p2);
   2616       dct_trn8_16(p1, p3);
   2617       dct_trn8_16(p4, p6);
   2618       dct_trn8_16(p5, p7);
   2619 
   2620       // pass 3
   2621       dct_trn8_32(p0, p4);
   2622       dct_trn8_32(p1, p5);
   2623       dct_trn8_32(p2, p6);
   2624       dct_trn8_32(p3, p7);
   2625 
   2626       // store
   2627       vst1_u8(out, p0); out += out_stride;
   2628       vst1_u8(out, p1); out += out_stride;
   2629       vst1_u8(out, p2); out += out_stride;
   2630       vst1_u8(out, p3); out += out_stride;
   2631       vst1_u8(out, p4); out += out_stride;
   2632       vst1_u8(out, p5); out += out_stride;
   2633       vst1_u8(out, p6); out += out_stride;
   2634       vst1_u8(out, p7);
   2635 
   2636 #undef dct_trn8_8
   2637 #undef dct_trn8_16
   2638 #undef dct_trn8_32
   2639    }
   2640 
   2641 #undef dct_long_mul
   2642 #undef dct_long_mac
   2643 #undef dct_widen
   2644 #undef dct_wadd
   2645 #undef dct_wsub
   2646 #undef dct_bfly32o
   2647 #undef dct_pass
   2648 }
   2649 
   2650 #endif // STBI_NEON
   2651 
   2652 #define STBI__MARKER_none  0xff
   2653 // if there's a pending marker from the entropy stream, return that
   2654 // otherwise, fetch from the stream and get a marker. if there's no
   2655 // marker, return 0xff, which is never a valid marker value
   2656 static stbi_uc stbi__get_marker(stbi__jpeg *j)
   2657 {
   2658    stbi_uc x;
   2659    if (j->marker != STBI__MARKER_none) { x = j->marker; j->marker = STBI__MARKER_none; return x; }
   2660    x = stbi__get8(j->s);
   2661    if (x != 0xff) return STBI__MARKER_none;
   2662    while (x == 0xff)
   2663       x = stbi__get8(j->s); // consume repeated 0xff fill bytes
   2664    return x;
   2665 }
   2666 
   2667 // in each scan, we'll have scan_n components, and the order
   2668 // of the components is specified by order[]
   2669 #define STBI__RESTART(x)     ((x) >= 0xd0 && (x) <= 0xd7)
   2670 
   2671 // after a restart interval, stbi__jpeg_reset the entropy decoder and
   2672 // the dc prediction
   2673 static void stbi__jpeg_reset(stbi__jpeg *j)
   2674 {
   2675    j->code_bits = 0;
   2676    j->code_buffer = 0;
   2677    j->nomore = 0;
   2678    j->img_comp[0].dc_pred = j->img_comp[1].dc_pred = j->img_comp[2].dc_pred = j->img_comp[3].dc_pred = 0;
   2679    j->marker = STBI__MARKER_none;
   2680    j->todo = j->restart_interval ? j->restart_interval : 0x7fffffff;
   2681    j->eob_run = 0;
   2682    // no more than 1<<31 MCUs if no restart_interal? that's plenty safe,
   2683    // since we don't even allow 1<<30 pixels
   2684 }
   2685 
   2686 static int stbi__parse_entropy_coded_data(stbi__jpeg *z)
   2687 {
   2688    stbi__jpeg_reset(z);
   2689    if (!z->progressive) {
   2690       if (z->scan_n == 1) {
   2691          int i,j;
   2692          STBI_SIMD_ALIGN(short, data[64]);
   2693          int n = z->order[0];
   2694          // non-interleaved data, we just need to process one block at a time,
   2695          // in trivial scanline order
   2696          // number of blocks to do just depends on how many actual "pixels" this
   2697          // component has, independent of interleaved MCU blocking and such
   2698          int w = (z->img_comp[n].x+7) >> 3;
   2699          int h = (z->img_comp[n].y+7) >> 3;
   2700          for (j=0; j < h; ++j) {
   2701             for (i=0; i < w; ++i) {
   2702                int ha = z->img_comp[n].ha;
   2703                if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
   2704                z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
   2705                // every data block is an MCU, so countdown the restart interval
   2706                if (--z->todo <= 0) {
   2707                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
   2708                   // if it's NOT a restart, then just bail, so we get corrupt data
   2709                   // rather than no data
   2710                   if (!STBI__RESTART(z->marker)) return 1;
   2711                   stbi__jpeg_reset(z);
   2712                }
   2713             }
   2714          }
   2715          return 1;
   2716       } else { // interleaved
   2717          int i,j,k,x,y;
   2718          STBI_SIMD_ALIGN(short, data[64]);
   2719          for (j=0; j < z->img_mcu_y; ++j) {
   2720             for (i=0; i < z->img_mcu_x; ++i) {
   2721                // scan an interleaved mcu... process scan_n components in order
   2722                for (k=0; k < z->scan_n; ++k) {
   2723                   int n = z->order[k];
   2724                   // scan out an mcu's worth of this component; that's just determined
   2725                   // by the basic H and V specified for the component
   2726                   for (y=0; y < z->img_comp[n].v; ++y) {
   2727                      for (x=0; x < z->img_comp[n].h; ++x) {
   2728                         int x2 = (i*z->img_comp[n].h + x)*8;
   2729                         int y2 = (j*z->img_comp[n].v + y)*8;
   2730                         int ha = z->img_comp[n].ha;
   2731                         if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
   2732                         z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*y2+x2, z->img_comp[n].w2, data);
   2733                      }
   2734                   }
   2735                }
   2736                // after all interleaved components, that's an interleaved MCU,
   2737                // so now count down the restart interval
   2738                if (--z->todo <= 0) {
   2739                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
   2740                   if (!STBI__RESTART(z->marker)) return 1;
   2741                   stbi__jpeg_reset(z);
   2742                }
   2743             }
   2744          }
   2745          return 1;
   2746       }
   2747    } else {
   2748       if (z->scan_n == 1) {
   2749          int i,j;
   2750          int n = z->order[0];
   2751          // non-interleaved data, we just need to process one block at a time,
   2752          // in trivial scanline order
   2753          // number of blocks to do just depends on how many actual "pixels" this
   2754          // component has, independent of interleaved MCU blocking and such
   2755          int w = (z->img_comp[n].x+7) >> 3;
   2756          int h = (z->img_comp[n].y+7) >> 3;
   2757          for (j=0; j < h; ++j) {
   2758             for (i=0; i < w; ++i) {
   2759                short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
   2760                if (z->spec_start == 0) {
   2761                   if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
   2762                      return 0;
   2763                } else {
   2764                   int ha = z->img_comp[n].ha;
   2765                   if (!stbi__jpeg_decode_block_prog_ac(z, data, &z->huff_ac[ha], z->fast_ac[ha]))
   2766                      return 0;
   2767                }
   2768                // every data block is an MCU, so countdown the restart interval
   2769                if (--z->todo <= 0) {
   2770                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
   2771                   if (!STBI__RESTART(z->marker)) return 1;
   2772                   stbi__jpeg_reset(z);
   2773                }
   2774             }
   2775          }
   2776          return 1;
   2777       } else { // interleaved
   2778          int i,j,k,x,y;
   2779          for (j=0; j < z->img_mcu_y; ++j) {
   2780             for (i=0; i < z->img_mcu_x; ++i) {
   2781                // scan an interleaved mcu... process scan_n components in order
   2782                for (k=0; k < z->scan_n; ++k) {
   2783                   int n = z->order[k];
   2784                   // scan out an mcu's worth of this component; that's just determined
   2785                   // by the basic H and V specified for the component
   2786                   for (y=0; y < z->img_comp[n].v; ++y) {
   2787                      for (x=0; x < z->img_comp[n].h; ++x) {
   2788                         int x2 = (i*z->img_comp[n].h + x);
   2789                         int y2 = (j*z->img_comp[n].v + y);
   2790                         short *data = z->img_comp[n].coeff + 64 * (x2 + y2 * z->img_comp[n].coeff_w);
   2791                         if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
   2792                            return 0;
   2793                      }
   2794                   }
   2795                }
   2796                // after all interleaved components, that's an interleaved MCU,
   2797                // so now count down the restart interval
   2798                if (--z->todo <= 0) {
   2799                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
   2800                   if (!STBI__RESTART(z->marker)) return 1;
   2801                   stbi__jpeg_reset(z);
   2802                }
   2803             }
   2804          }
   2805          return 1;
   2806       }
   2807    }
   2808 }
   2809 
   2810 static void stbi__jpeg_dequantize(short *data, stbi__uint16 *dequant)
   2811 {
   2812    int i;
   2813    for (i=0; i < 64; ++i)
   2814       data[i] *= dequant[i];
   2815 }
   2816 
   2817 static void stbi__jpeg_finish(stbi__jpeg *z)
   2818 {
   2819    if (z->progressive) {
   2820       // dequantize and idct the data
   2821       int i,j,n;
   2822       for (n=0; n < z->s->img_n; ++n) {
   2823          int w = (z->img_comp[n].x+7) >> 3;
   2824          int h = (z->img_comp[n].y+7) >> 3;
   2825          for (j=0; j < h; ++j) {
   2826             for (i=0; i < w; ++i) {
   2827                short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
   2828                stbi__jpeg_dequantize(data, z->dequant[z->img_comp[n].tq]);
   2829                z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
   2830             }
   2831          }
   2832       }
   2833    }
   2834 }
   2835 
   2836 static int stbi__process_marker(stbi__jpeg *z, int m)
   2837 {
   2838    int L;
   2839    switch (m) {
   2840       case STBI__MARKER_none: // no marker found
   2841          return stbi__err("expected marker","Corrupt JPEG");
   2842 
   2843       case 0xDD: // DRI - specify restart interval
   2844          if (stbi__get16be(z->s) != 4) return stbi__err("bad DRI len","Corrupt JPEG");
   2845          z->restart_interval = stbi__get16be(z->s);
   2846          return 1;
   2847 
   2848       case 0xDB: // DQT - define quantization table
   2849          L = stbi__get16be(z->s)-2;
   2850          while (L > 0) {
   2851             int q = stbi__get8(z->s);
   2852             int p = q >> 4, sixteen = (p != 0);
   2853             int t = q & 15,i;
   2854             if (p != 0 && p != 1) return stbi__err("bad DQT type","Corrupt JPEG");
   2855             if (t > 3) return stbi__err("bad DQT table","Corrupt JPEG");
   2856 
   2857             for (i=0; i < 64; ++i)
   2858                z->dequant[t][stbi__jpeg_dezigzag[i]] = (stbi__uint16)(sixteen ? stbi__get16be(z->s) : stbi__get8(z->s));
   2859             L -= (sixteen ? 129 : 65);
   2860          }
   2861          return L==0;
   2862 
   2863       case 0xC4: // DHT - define huffman table
   2864          L = stbi__get16be(z->s)-2;
   2865          while (L > 0) {
   2866             stbi_uc *v;
   2867             int sizes[16],i,n=0;
   2868             int q = stbi__get8(z->s);
   2869             int tc = q >> 4;
   2870             int th = q & 15;
   2871             if (tc > 1 || th > 3) return stbi__err("bad DHT header","Corrupt JPEG");
   2872             for (i=0; i < 16; ++i) {
   2873                sizes[i] = stbi__get8(z->s);
   2874                n += sizes[i];
   2875             }
   2876             L -= 17;
   2877             if (tc == 0) {
   2878                if (!stbi__build_huffman(z->huff_dc+th, sizes)) return 0;
   2879                v = z->huff_dc[th].values;
   2880             } else {
   2881                if (!stbi__build_huffman(z->huff_ac+th, sizes)) return 0;
   2882                v = z->huff_ac[th].values;
   2883             }
   2884             for (i=0; i < n; ++i)
   2885                v[i] = stbi__get8(z->s);
   2886             if (tc != 0)
   2887                stbi__build_fast_ac(z->fast_ac[th], z->huff_ac + th);
   2888             L -= n;
   2889          }
   2890          return L==0;
   2891    }
   2892 
   2893    // check for comment block or APP blocks
   2894    if ((m >= 0xE0 && m <= 0xEF) || m == 0xFE) {
   2895       L = stbi__get16be(z->s);
   2896       if (L < 2) {
   2897          if (m == 0xFE)
   2898             return stbi__err("bad COM len","Corrupt JPEG");
   2899          else
   2900             return stbi__err("bad APP len","Corrupt JPEG");
   2901       }
   2902       L -= 2;
   2903 
   2904       if (m == 0xE0 && L >= 5) { // JFIF APP0 segment
   2905          static const unsigned char tag[5] = {'J','F','I','F','\0'};
   2906          int ok = 1;
   2907          int i;
   2908          for (i=0; i < 5; ++i)
   2909             if (stbi__get8(z->s) != tag[i])
   2910                ok = 0;
   2911          L -= 5;
   2912          if (ok)
   2913             z->jfif = 1;
   2914       } else if (m == 0xEE && L >= 12) { // Adobe APP14 segment
   2915          static const unsigned char tag[6] = {'A','d','o','b','e','\0'};
   2916          int ok = 1;
   2917          int i;
   2918          for (i=0; i < 6; ++i)
   2919             if (stbi__get8(z->s) != tag[i])
   2920                ok = 0;
   2921          L -= 6;
   2922          if (ok) {
   2923             stbi__get8(z->s); // version
   2924             stbi__get16be(z->s); // flags0
   2925             stbi__get16be(z->s); // flags1
   2926             z->app14_color_transform = stbi__get8(z->s); // color transform
   2927             L -= 6;
   2928          }
   2929       }
   2930 
   2931       stbi__skip(z->s, L);
   2932       return 1;
   2933    }
   2934 
   2935    return stbi__err("unknown marker","Corrupt JPEG");
   2936 }
   2937 
   2938 // after we see SOS
   2939 static int stbi__process_scan_header(stbi__jpeg *z)
   2940 {
   2941    int i;
   2942    int Ls = stbi__get16be(z->s);
   2943    z->scan_n = stbi__get8(z->s);
   2944    if (z->scan_n < 1 || z->scan_n > 4 || z->scan_n > (int) z->s->img_n) return stbi__err("bad SOS component count","Corrupt JPEG");
   2945    if (Ls != 6+2*z->scan_n) return stbi__err("bad SOS len","Corrupt JPEG");
   2946    for (i=0; i < z->scan_n; ++i) {
   2947       int id = stbi__get8(z->s), which;
   2948       int q = stbi__get8(z->s);
   2949       for (which = 0; which < z->s->img_n; ++which)
   2950          if (z->img_comp[which].id == id)
   2951             break;
   2952       if (which == z->s->img_n) return 0; // no match
   2953       z->img_comp[which].hd = q >> 4;   if (z->img_comp[which].hd > 3) return stbi__err("bad DC huff","Corrupt JPEG");
   2954       z->img_comp[which].ha = q & 15;   if (z->img_comp[which].ha > 3) return stbi__err("bad AC huff","Corrupt JPEG");
   2955       z->order[i] = which;
   2956    }
   2957 
   2958    {
   2959       int aa;
   2960       z->spec_start = stbi__get8(z->s);
   2961       z->spec_end   = stbi__get8(z->s); // should be 63, but might be 0
   2962       aa = stbi__get8(z->s);
   2963       z->succ_high = (aa >> 4);
   2964       z->succ_low  = (aa & 15);
   2965       if (z->progressive) {
   2966          if (z->spec_start > 63 || z->spec_end > 63  || z->spec_start > z->spec_end || z->succ_high > 13 || z->succ_low > 13)
   2967             return stbi__err("bad SOS", "Corrupt JPEG");
   2968       } else {
   2969          if (z->spec_start != 0) return stbi__err("bad SOS","Corrupt JPEG");
   2970          if (z->succ_high != 0 || z->succ_low != 0) return stbi__err("bad SOS","Corrupt JPEG");
   2971          z->spec_end = 63;
   2972       }
   2973    }
   2974 
   2975    return 1;
   2976 }
   2977 
   2978 static int stbi__free_jpeg_components(stbi__jpeg *z, int ncomp, int why)
   2979 {
   2980    int i;
   2981    for (i=0; i < ncomp; ++i) {
   2982       if (z->img_comp[i].raw_data) {
   2983          STBI_FREE(z->img_comp[i].raw_data);
   2984          z->img_comp[i].raw_data = NULL;
   2985          z->img_comp[i].data = NULL;
   2986       }
   2987       if (z->img_comp[i].raw_coeff) {
   2988          STBI_FREE(z->img_comp[i].raw_coeff);
   2989          z->img_comp[i].raw_coeff = 0;
   2990          z->img_comp[i].coeff = 0;
   2991       }
   2992       if (z->img_comp[i].linebuf) {
   2993          STBI_FREE(z->img_comp[i].linebuf);
   2994          z->img_comp[i].linebuf = NULL;
   2995       }
   2996    }
   2997    return why;
   2998 }
   2999 
   3000 static int stbi__process_frame_header(stbi__jpeg *z, int scan)
   3001 {
   3002    stbi__context *s = z->s;
   3003    int Lf,p,i,q, h_max=1,v_max=1,c;
   3004    Lf = stbi__get16be(s);         if (Lf < 11) return stbi__err("bad SOF len","Corrupt JPEG"); // JPEG
   3005    p  = stbi__get8(s);            if (p != 8) return stbi__err("only 8-bit","JPEG format not supported: 8-bit only"); // JPEG baseline
   3006    s->img_y = stbi__get16be(s);   if (s->img_y == 0) return stbi__err("no header height", "JPEG format not supported: delayed height"); // Legal, but we don't handle it--but neither does IJG
   3007    s->img_x = stbi__get16be(s);   if (s->img_x == 0) return stbi__err("0 width","Corrupt JPEG"); // JPEG requires
   3008    c = stbi__get8(s);
   3009    if (c != 3 && c != 1 && c != 4) return stbi__err("bad component count","Corrupt JPEG");
   3010    s->img_n = c;
   3011    for (i=0; i < c; ++i) {
   3012       z->img_comp[i].data = NULL;
   3013       z->img_comp[i].linebuf = NULL;
   3014    }
   3015 
   3016    if (Lf != 8+3*s->img_n) return stbi__err("bad SOF len","Corrupt JPEG");
   3017 
   3018    z->rgb = 0;
   3019    for (i=0; i < s->img_n; ++i) {
   3020       static const unsigned char rgb[3] = { 'R', 'G', 'B' };
   3021       z->img_comp[i].id = stbi__get8(s);
   3022       if (s->img_n == 3 && z->img_comp[i].id == rgb[i])
   3023          ++z->rgb;
   3024       q = stbi__get8(s);
   3025       z->img_comp[i].h = (q >> 4);  if (!z->img_comp[i].h || z->img_comp[i].h > 4) return stbi__err("bad H","Corrupt JPEG");
   3026       z->img_comp[i].v = q & 15;    if (!z->img_comp[i].v || z->img_comp[i].v > 4) return stbi__err("bad V","Corrupt JPEG");
   3027       z->img_comp[i].tq = stbi__get8(s);  if (z->img_comp[i].tq > 3) return stbi__err("bad TQ","Corrupt JPEG");
   3028    }
   3029 
   3030    if (scan != STBI__SCAN_load) return 1;
   3031 
   3032    if (!stbi__mad3sizes_valid(s->img_x, s->img_y, s->img_n, 0)) return stbi__err("too large", "Image too large to decode");
   3033 
   3034    for (i=0; i < s->img_n; ++i) {
   3035       if (z->img_comp[i].h > h_max) h_max = z->img_comp[i].h;
   3036       if (z->img_comp[i].v > v_max) v_max = z->img_comp[i].v;
   3037    }
   3038 
   3039    // compute interleaved mcu info
   3040    z->img_h_max = h_max;
   3041    z->img_v_max = v_max;
   3042    z->img_mcu_w = h_max * 8;
   3043    z->img_mcu_h = v_max * 8;
   3044    // these sizes can't be more than 17 bits
   3045    z->img_mcu_x = (s->img_x + z->img_mcu_w-1) / z->img_mcu_w;
   3046    z->img_mcu_y = (s->img_y + z->img_mcu_h-1) / z->img_mcu_h;
   3047 
   3048    for (i=0; i < s->img_n; ++i) {
   3049       // number of effective pixels (e.g. for non-interleaved MCU)
   3050       z->img_comp[i].x = (s->img_x * z->img_comp[i].h + h_max-1) / h_max;
   3051       z->img_comp[i].y = (s->img_y * z->img_comp[i].v + v_max-1) / v_max;
   3052       // to simplify generation, we'll allocate enough memory to decode
   3053       // the bogus oversized data from using interleaved MCUs and their
   3054       // big blocks (e.g. a 16x16 iMCU on an image of width 33); we won't
   3055       // discard the extra data until colorspace conversion
   3056       //
   3057       // img_mcu_x, img_mcu_y: <=17 bits; comp[i].h and .v are <=4 (checked earlier)
   3058       // so these muls can't overflow with 32-bit ints (which we require)
   3059       z->img_comp[i].w2 = z->img_mcu_x * z->img_comp[i].h * 8;
   3060       z->img_comp[i].h2 = z->img_mcu_y * z->img_comp[i].v * 8;
   3061       z->img_comp[i].coeff = 0;
   3062       z->img_comp[i].raw_coeff = 0;
   3063       z->img_comp[i].linebuf = NULL;
   3064       z->img_comp[i].raw_data = stbi__malloc_mad2(z->img_comp[i].w2, z->img_comp[i].h2, 15);
   3065       if (z->img_comp[i].raw_data == NULL)
   3066          return stbi__free_jpeg_components(z, i+1, stbi__err("outofmem", "Out of memory"));
   3067       // align blocks for idct using mmx/sse
   3068       z->img_comp[i].data = (stbi_uc*) (((size_t) z->img_comp[i].raw_data + 15) & ~15);
   3069       if (z->progressive) {
   3070          // w2, h2 are multiples of 8 (see above)
   3071          z->img_comp[i].coeff_w = z->img_comp[i].w2 / 8;
   3072          z->img_comp[i].coeff_h = z->img_comp[i].h2 / 8;
   3073          z->img_comp[i].raw_coeff = stbi__malloc_mad3(z->img_comp[i].w2, z->img_comp[i].h2, sizeof(short), 15);
   3074          if (z->img_comp[i].raw_coeff == NULL)
   3075             return stbi__free_jpeg_components(z, i+1, stbi__err("outofmem", "Out of memory"));
   3076          z->img_comp[i].coeff = (short*) (((size_t) z->img_comp[i].raw_coeff + 15) & ~15);
   3077       }
   3078    }
   3079 
   3080    return 1;
   3081 }
   3082 
   3083 // use comparisons since in some cases we handle more than one case (e.g. SOF)
   3084 #define stbi__DNL(x)         ((x) == 0xdc)
   3085 #define stbi__SOI(x)         ((x) == 0xd8)
   3086 #define stbi__EOI(x)         ((x) == 0xd9)
   3087 #define stbi__SOF(x)         ((x) == 0xc0 || (x) == 0xc1 || (x) == 0xc2)
   3088 #define stbi__SOS(x)         ((x) == 0xda)
   3089 
   3090 #define stbi__SOF_progressive(x)   ((x) == 0xc2)
   3091 
   3092 static int stbi__decode_jpeg_header(stbi__jpeg *z, int scan)
   3093 {
   3094    int m;
   3095    z->jfif = 0;
   3096    z->app14_color_transform = -1; // valid values are 0,1,2
   3097    z->marker = STBI__MARKER_none; // initialize cached marker to empty
   3098    m = stbi__get_marker(z);
   3099    if (!stbi__SOI(m)) return stbi__err("no SOI","Corrupt JPEG");
   3100    if (scan == STBI__SCAN_type) return 1;
   3101    m = stbi__get_marker(z);
   3102    while (!stbi__SOF(m)) {
   3103       if (!stbi__process_marker(z,m)) return 0;
   3104       m = stbi__get_marker(z);
   3105       while (m == STBI__MARKER_none) {
   3106          // some files have extra padding after their blocks, so ok, we'll scan
   3107          if (stbi__at_eof(z->s)) return stbi__err("no SOF", "Corrupt JPEG");
   3108          m = stbi__get_marker(z);
   3109       }
   3110    }
   3111    z->progressive = stbi__SOF_progressive(m);
   3112    if (!stbi__process_frame_header(z, scan)) return 0;
   3113    return 1;
   3114 }
   3115 
   3116 // decode image to YCbCr format
   3117 static int stbi__decode_jpeg_image(stbi__jpeg *j)
   3118 {
   3119    int m;
   3120    for (m = 0; m < 4; m++) {
   3121       j->img_comp[m].raw_data = NULL;
   3122       j->img_comp[m].raw_coeff = NULL;
   3123    }
   3124    j->restart_interval = 0;
   3125    if (!stbi__decode_jpeg_header(j, STBI__SCAN_load)) return 0;
   3126    m = stbi__get_marker(j);
   3127    while (!stbi__EOI(m)) {
   3128       if (stbi__SOS(m)) {
   3129          if (!stbi__process_scan_header(j)) return 0;
   3130          if (!stbi__parse_entropy_coded_data(j)) return 0;
   3131          if (j->marker == STBI__MARKER_none ) {
   3132             // handle 0s at the end of image data from IP Kamera 9060
   3133             while (!stbi__at_eof(j->s)) {
   3134                int x = stbi__get8(j->s);
   3135                if (x == 255) {
   3136                   j->marker = stbi__get8(j->s);
   3137                   break;
   3138                }
   3139             }
   3140             // if we reach eof without hitting a marker, stbi__get_marker() below will fail and we'll eventually return 0
   3141          }
   3142       } else if (stbi__DNL(m)) {
   3143          int Ld = stbi__get16be(j->s);
   3144          stbi__uint32 NL = stbi__get16be(j->s);
   3145          if (Ld != 4) return stbi__err("bad DNL len", "Corrupt JPEG");
   3146          if (NL != j->s->img_y) return stbi__err("bad DNL height", "Corrupt JPEG");
   3147       } else {
   3148          if (!stbi__process_marker(j, m)) return 0;
   3149       }
   3150       m = stbi__get_marker(j);
   3151    }
   3152    if (j->progressive)
   3153       stbi__jpeg_finish(j);
   3154    return 1;
   3155 }
   3156 
   3157 // static jfif-centered resampling (across block boundaries)
   3158 
   3159 typedef stbi_uc *(*resample_row_func)(stbi_uc *out, stbi_uc *in0, stbi_uc *in1,
   3160                                     int w, int hs);
   3161 
   3162 #define stbi__div4(x) ((stbi_uc) ((x) >> 2))
   3163 
   3164 static stbi_uc *resample_row_1(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
   3165 {
   3166    STBI_NOTUSED(out);
   3167    STBI_NOTUSED(in_far);
   3168    STBI_NOTUSED(w);
   3169    STBI_NOTUSED(hs);
   3170    return in_near;
   3171 }
   3172 
   3173 static stbi_uc* stbi__resample_row_v_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
   3174 {
   3175    // need to generate two samples vertically for every one in input
   3176    int i;
   3177    STBI_NOTUSED(hs);
   3178    for (i=0; i < w; ++i)
   3179       out[i] = stbi__div4(3*in_near[i] + in_far[i] + 2);
   3180    return out;
   3181 }
   3182 
   3183 static stbi_uc*  stbi__resample_row_h_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
   3184 {
   3185    // need to generate two samples horizontally for every one in input
   3186    int i;
   3187    stbi_uc *input = in_near;
   3188 
   3189    if (w == 1) {
   3190       // if only one sample, can't do any interpolation
   3191       out[0] = out[1] = input[0];
   3192       return out;
   3193    }
   3194 
   3195    out[0] = input[0];
   3196    out[1] = stbi__div4(input[0]*3 + input[1] + 2);
   3197    for (i=1; i < w-1; ++i) {
   3198       int n = 3*input[i]+2;
   3199       out[i*2+0] = stbi__div4(n+input[i-1]);
   3200       out[i*2+1] = stbi__div4(n+input[i+1]);
   3201    }
   3202    out[i*2+0] = stbi__div4(input[w-2]*3 + input[w-1] + 2);
   3203    out[i*2+1] = input[w-1];
   3204 
   3205    STBI_NOTUSED(in_far);
   3206    STBI_NOTUSED(hs);
   3207 
   3208    return out;
   3209 }
   3210 
   3211 #define stbi__div16(x) ((stbi_uc) ((x) >> 4))
   3212 
   3213 static stbi_uc *stbi__resample_row_hv_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
   3214 {
   3215    // need to generate 2x2 samples for every one in input
   3216    int i,t0,t1;
   3217    if (w == 1) {
   3218       out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
   3219       return out;
   3220    }
   3221 
   3222    t1 = 3*in_near[0] + in_far[0];
   3223    out[0] = stbi__div4(t1+2);
   3224    for (i=1; i < w; ++i) {
   3225       t0 = t1;
   3226       t1 = 3*in_near[i]+in_far[i];
   3227       out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
   3228       out[i*2  ] = stbi__div16(3*t1 + t0 + 8);
   3229    }
   3230    out[w*2-1] = stbi__div4(t1+2);
   3231 
   3232    STBI_NOTUSED(hs);
   3233 
   3234    return out;
   3235 }
   3236 
   3237 #if defined(STBI_SSE2) || defined(STBI_NEON)
   3238 static stbi_uc *stbi__resample_row_hv_2_simd(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
   3239 {
   3240    // need to generate 2x2 samples for every one in input
   3241    int i=0,t0,t1;
   3242 
   3243    if (w == 1) {
   3244       out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
   3245       return out;
   3246    }
   3247 
   3248    t1 = 3*in_near[0] + in_far[0];
   3249    // process groups of 8 pixels for as long as we can.
   3250    // note we can't handle the last pixel in a row in this loop
   3251    // because we need to handle the filter boundary conditions.
   3252    for (; i < ((w-1) & ~7); i += 8) {
   3253 #if defined(STBI_SSE2)
   3254       // load and perform the vertical filtering pass
   3255       // this uses 3*x + y = 4*x + (y - x)
   3256       __m128i zero  = _mm_setzero_si128();
   3257       __m128i farb  = _mm_loadl_epi64((__m128i *) (in_far + i));
   3258       __m128i nearb = _mm_loadl_epi64((__m128i *) (in_near + i));
   3259       __m128i farw  = _mm_unpacklo_epi8(farb, zero);
   3260       __m128i nearw = _mm_unpacklo_epi8(nearb, zero);
   3261       __m128i diff  = _mm_sub_epi16(farw, nearw);
   3262       __m128i nears = _mm_slli_epi16(nearw, 2);
   3263       __m128i curr  = _mm_add_epi16(nears, diff); // current row
   3264 
   3265       // horizontal filter works the same based on shifted vers of current
   3266       // row. "prev" is current row shifted right by 1 pixel; we need to
   3267       // insert the previous pixel value (from t1).
   3268       // "next" is current row shifted left by 1 pixel, with first pixel
   3269       // of next block of 8 pixels added in.
   3270       __m128i prv0 = _mm_slli_si128(curr, 2);
   3271       __m128i nxt0 = _mm_srli_si128(curr, 2);
   3272       __m128i prev = _mm_insert_epi16(prv0, t1, 0);
   3273       __m128i next = _mm_insert_epi16(nxt0, 3*in_near[i+8] + in_far[i+8], 7);
   3274 
   3275       // horizontal filter, polyphase implementation since it's convenient:
   3276       // even pixels = 3*cur + prev = cur*4 + (prev - cur)
   3277       // odd  pixels = 3*cur + next = cur*4 + (next - cur)
   3278       // note the shared term.
   3279       __m128i bias  = _mm_set1_epi16(8);
   3280       __m128i curs = _mm_slli_epi16(curr, 2);
   3281       __m128i prvd = _mm_sub_epi16(prev, curr);
   3282       __m128i nxtd = _mm_sub_epi16(next, curr);
   3283       __m128i curb = _mm_add_epi16(curs, bias);
   3284       __m128i even = _mm_add_epi16(prvd, curb);
   3285       __m128i odd  = _mm_add_epi16(nxtd, curb);
   3286 
   3287       // interleave even and odd pixels, then undo scaling.
   3288       __m128i int0 = _mm_unpacklo_epi16(even, odd);
   3289       __m128i int1 = _mm_unpackhi_epi16(even, odd);
   3290       __m128i de0  = _mm_srli_epi16(int0, 4);
   3291       __m128i de1  = _mm_srli_epi16(int1, 4);
   3292 
   3293       // pack and write output
   3294       __m128i outv = _mm_packus_epi16(de0, de1);
   3295       _mm_storeu_si128((__m128i *) (out + i*2), outv);
   3296 #elif defined(STBI_NEON)
   3297       // load and perform the vertical filtering pass
   3298       // this uses 3*x + y = 4*x + (y - x)
   3299       uint8x8_t farb  = vld1_u8(in_far + i);
   3300       uint8x8_t nearb = vld1_u8(in_near + i);
   3301       int16x8_t diff  = vreinterpretq_s16_u16(vsubl_u8(farb, nearb));
   3302       int16x8_t nears = vreinterpretq_s16_u16(vshll_n_u8(nearb, 2));
   3303       int16x8_t curr  = vaddq_s16(nears, diff); // current row
   3304 
   3305       // horizontal filter works the same based on shifted vers of current
   3306       // row. "prev" is current row shifted right by 1 pixel; we need to
   3307       // insert the previous pixel value (from t1).
   3308       // "next" is current row shifted left by 1 pixel, with first pixel
   3309       // of next block of 8 pixels added in.
   3310       int16x8_t prv0 = vextq_s16(curr, curr, 7);
   3311       int16x8_t nxt0 = vextq_s16(curr, curr, 1);
   3312       int16x8_t prev = vsetq_lane_s16(t1, prv0, 0);
   3313       int16x8_t next = vsetq_lane_s16(3*in_near[i+8] + in_far[i+8], nxt0, 7);
   3314 
   3315       // horizontal filter, polyphase implementation since it's convenient:
   3316       // even pixels = 3*cur + prev = cur*4 + (prev - cur)
   3317       // odd  pixels = 3*cur + next = cur*4 + (next - cur)
   3318       // note the shared term.
   3319       int16x8_t curs = vshlq_n_s16(curr, 2);
   3320       int16x8_t prvd = vsubq_s16(prev, curr);
   3321       int16x8_t nxtd = vsubq_s16(next, curr);
   3322       int16x8_t even = vaddq_s16(curs, prvd);
   3323       int16x8_t odd  = vaddq_s16(curs, nxtd);
   3324 
   3325       // undo scaling and round, then store with even/odd phases interleaved
   3326       uint8x8x2_t o;
   3327       o.val[0] = vqrshrun_n_s16(even, 4);
   3328       o.val[1] = vqrshrun_n_s16(odd,  4);
   3329       vst2_u8(out + i*2, o);
   3330 #endif
   3331 
   3332       // "previous" value for next iter
   3333       t1 = 3*in_near[i+7] + in_far[i+7];
   3334    }
   3335 
   3336    t0 = t1;
   3337    t1 = 3*in_near[i] + in_far[i];
   3338    out[i*2] = stbi__div16(3*t1 + t0 + 8);
   3339 
   3340    for (++i; i < w; ++i) {
   3341       t0 = t1;
   3342       t1 = 3*in_near[i]+in_far[i];
   3343       out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
   3344       out[i*2  ] = stbi__div16(3*t1 + t0 + 8);
   3345    }
   3346    out[w*2-1] = stbi__div4(t1+2);
   3347 
   3348    STBI_NOTUSED(hs);
   3349 
   3350    return out;
   3351 }
   3352 #endif
   3353 
   3354 static stbi_uc *stbi__resample_row_generic(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
   3355 {
   3356    // resample with nearest-neighbor
   3357    int i,j;
   3358    STBI_NOTUSED(in_far);
   3359    for (i=0; i < w; ++i)
   3360       for (j=0; j < hs; ++j)
   3361          out[i*hs+j] = in_near[i];
   3362    return out;
   3363 }
   3364 
   3365 // this is a reduced-precision calculation of YCbCr-to-RGB introduced
   3366 // to make sure the code produces the same results in both SIMD and scalar
   3367 #define stbi__float2fixed(x)  (((int) ((x) * 4096.0f + 0.5f)) << 8)
   3368 static void stbi__YCbCr_to_RGB_row(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step)
   3369 {
   3370    int i;
   3371    for (i=0; i < count; ++i) {
   3372       int y_fixed = (y[i] << 20) + (1<<19); // rounding
   3373       int r,g,b;
   3374       int cr = pcr[i] - 128;
   3375       int cb = pcb[i] - 128;
   3376       r = y_fixed +  cr* stbi__float2fixed(1.40200f);
   3377       g = y_fixed + (cr*-stbi__float2fixed(0.71414f)) + ((cb*-stbi__float2fixed(0.34414f)) & 0xffff0000);
   3378       b = y_fixed                                     +   cb* stbi__float2fixed(1.77200f);
   3379       r >>= 20;
   3380       g >>= 20;
   3381       b >>= 20;
   3382       if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
   3383       if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
   3384       if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
   3385       out[0] = (stbi_uc)r;
   3386       out[1] = (stbi_uc)g;
   3387       out[2] = (stbi_uc)b;
   3388       out[3] = 255;
   3389       out += step;
   3390    }
   3391 }
   3392 
   3393 #if defined(STBI_SSE2) || defined(STBI_NEON)
   3394 static void stbi__YCbCr_to_RGB_simd(stbi_uc *out, stbi_uc const *y, stbi_uc const *pcb, stbi_uc const *pcr, int count, int step)
   3395 {
   3396    int i = 0;
   3397 
   3398 #ifdef STBI_SSE2
   3399    // step == 3 is pretty ugly on the final interleave, and i'm not convinced
   3400    // it's useful in practice (you wouldn't use it for textures, for example).
   3401    // so just accelerate step == 4 case.
   3402    if (step == 4) {
   3403       // this is a fairly straightforward implementation and not super-optimized.
   3404       __m128i signflip  = _mm_set1_epi8(-0x80);
   3405       __m128i cr_const0 = _mm_set1_epi16(   (short) ( 1.40200f*4096.0f+0.5f));
   3406       __m128i cr_const1 = _mm_set1_epi16( - (short) ( 0.71414f*4096.0f+0.5f));
   3407       __m128i cb_const0 = _mm_set1_epi16( - (short) ( 0.34414f*4096.0f+0.5f));
   3408       __m128i cb_const1 = _mm_set1_epi16(   (short) ( 1.77200f*4096.0f+0.5f));
   3409       __m128i y_bias = _mm_set1_epi8((char) (unsigned char) 128);
   3410       __m128i xw = _mm_set1_epi16(255); // alpha channel
   3411 
   3412       for (; i+7 < count; i += 8) {
   3413          // load
   3414          __m128i y_bytes = _mm_loadl_epi64((__m128i *) (y+i));
   3415          __m128i cr_bytes = _mm_loadl_epi64((__m128i *) (pcr+i));
   3416          __m128i cb_bytes = _mm_loadl_epi64((__m128i *) (pcb+i));
   3417          __m128i cr_biased = _mm_xor_si128(cr_bytes, signflip); // -128
   3418          __m128i cb_biased = _mm_xor_si128(cb_bytes, signflip); // -128
   3419 
   3420          // unpack to short (and left-shift cr, cb by 8)
   3421          __m128i yw  = _mm_unpacklo_epi8(y_bias, y_bytes);
   3422          __m128i crw = _mm_unpacklo_epi8(_mm_setzero_si128(), cr_biased);
   3423          __m128i cbw = _mm_unpacklo_epi8(_mm_setzero_si128(), cb_biased);
   3424 
   3425          // color transform
   3426          __m128i yws = _mm_srli_epi16(yw, 4);
   3427          __m128i cr0 = _mm_mulhi_epi16(cr_const0, crw);
   3428          __m128i cb0 = _mm_mulhi_epi16(cb_const0, cbw);
   3429          __m128i cb1 = _mm_mulhi_epi16(cbw, cb_const1);
   3430          __m128i cr1 = _mm_mulhi_epi16(crw, cr_const1);
   3431          __m128i rws = _mm_add_epi16(cr0, yws);
   3432          __m128i gwt = _mm_add_epi16(cb0, yws);
   3433          __m128i bws = _mm_add_epi16(yws, cb1);
   3434          __m128i gws = _mm_add_epi16(gwt, cr1);
   3435 
   3436          // descale
   3437          __m128i rw = _mm_srai_epi16(rws, 4);
   3438          __m128i bw = _mm_srai_epi16(bws, 4);
   3439          __m128i gw = _mm_srai_epi16(gws, 4);
   3440 
   3441          // back to byte, set up for transpose
   3442          __m128i brb = _mm_packus_epi16(rw, bw);
   3443          __m128i gxb = _mm_packus_epi16(gw, xw);
   3444 
   3445          // transpose to interleave channels
   3446          __m128i t0 = _mm_unpacklo_epi8(brb, gxb);
   3447          __m128i t1 = _mm_unpackhi_epi8(brb, gxb);
   3448          __m128i o0 = _mm_unpacklo_epi16(t0, t1);
   3449          __m128i o1 = _mm_unpackhi_epi16(t0, t1);
   3450 
   3451          // store
   3452          _mm_storeu_si128((__m128i *) (out + 0), o0);
   3453          _mm_storeu_si128((__m128i *) (out + 16), o1);
   3454          out += 32;
   3455       }
   3456    }
   3457 #endif
   3458 
   3459 #ifdef STBI_NEON
   3460    // in this version, step=3 support would be easy to add. but is there demand?
   3461    if (step == 4) {
   3462       // this is a fairly straightforward implementation and not super-optimized.
   3463       uint8x8_t signflip = vdup_n_u8(0x80);
   3464       int16x8_t cr_const0 = vdupq_n_s16(   (short) ( 1.40200f*4096.0f+0.5f));
   3465       int16x8_t cr_const1 = vdupq_n_s16( - (short) ( 0.71414f*4096.0f+0.5f));
   3466       int16x8_t cb_const0 = vdupq_n_s16( - (short) ( 0.34414f*4096.0f+0.5f));
   3467       int16x8_t cb_const1 = vdupq_n_s16(   (short) ( 1.77200f*4096.0f+0.5f));
   3468 
   3469       for (; i+7 < count; i += 8) {
   3470          // load
   3471          uint8x8_t y_bytes  = vld1_u8(y + i);
   3472          uint8x8_t cr_bytes = vld1_u8(pcr + i);
   3473          uint8x8_t cb_bytes = vld1_u8(pcb + i);
   3474          int8x8_t cr_biased = vreinterpret_s8_u8(vsub_u8(cr_bytes, signflip));
   3475          int8x8_t cb_biased = vreinterpret_s8_u8(vsub_u8(cb_bytes, signflip));
   3476 
   3477          // expand to s16
   3478          int16x8_t yws = vreinterpretq_s16_u16(vshll_n_u8(y_bytes, 4));
   3479          int16x8_t crw = vshll_n_s8(cr_biased, 7);
   3480          int16x8_t cbw = vshll_n_s8(cb_biased, 7);
   3481 
   3482          // color transform
   3483          int16x8_t cr0 = vqdmulhq_s16(crw, cr_const0);
   3484          int16x8_t cb0 = vqdmulhq_s16(cbw, cb_const0);
   3485          int16x8_t cr1 = vqdmulhq_s16(crw, cr_const1);
   3486          int16x8_t cb1 = vqdmulhq_s16(cbw, cb_const1);
   3487          int16x8_t rws = vaddq_s16(yws, cr0);
   3488          int16x8_t gws = vaddq_s16(vaddq_s16(yws, cb0), cr1);
   3489          int16x8_t bws = vaddq_s16(yws, cb1);
   3490 
   3491          // undo scaling, round, convert to byte
   3492          uint8x8x4_t o;
   3493          o.val[0] = vqrshrun_n_s16(rws, 4);
   3494          o.val[1] = vqrshrun_n_s16(gws, 4);
   3495          o.val[2] = vqrshrun_n_s16(bws, 4);
   3496          o.val[3] = vdup_n_u8(255);
   3497 
   3498          // store, interleaving r/g/b/a
   3499          vst4_u8(out, o);
   3500          out += 8*4;
   3501       }
   3502    }
   3503 #endif
   3504 
   3505    for (; i < count; ++i) {
   3506       int y_fixed = (y[i] << 20) + (1<<19); // rounding
   3507       int r,g,b;
   3508       int cr = pcr[i] - 128;
   3509       int cb = pcb[i] - 128;
   3510       r = y_fixed + cr* stbi__float2fixed(1.40200f);
   3511       g = y_fixed + cr*-stbi__float2fixed(0.71414f) + ((cb*-stbi__float2fixed(0.34414f)) & 0xffff0000);
   3512       b = y_fixed                                   +   cb* stbi__float2fixed(1.77200f);
   3513       r >>= 20;
   3514       g >>= 20;
   3515       b >>= 20;
   3516       if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
   3517       if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
   3518       if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
   3519       out[0] = (stbi_uc)r;
   3520       out[1] = (stbi_uc)g;
   3521       out[2] = (stbi_uc)b;
   3522       out[3] = 255;
   3523       out += step;
   3524    }
   3525 }
   3526 #endif
   3527 
   3528 // set up the kernels
   3529 static void stbi__setup_jpeg(stbi__jpeg *j)
   3530 {
   3531    j->idct_block_kernel = stbi__idct_block;
   3532    j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_row;
   3533    j->resample_row_hv_2_kernel = stbi__resample_row_hv_2;
   3534 
   3535 #ifdef STBI_SSE2
   3536    if (stbi__sse2_available()) {
   3537       j->idct_block_kernel = stbi__idct_simd;
   3538       j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
   3539       j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
   3540    }
   3541 #endif
   3542 
   3543 #ifdef STBI_NEON
   3544    j->idct_block_kernel = stbi__idct_simd;
   3545    j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
   3546    j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
   3547 #endif
   3548 }
   3549 
   3550 // clean up the temporary component buffers
   3551 static void stbi__cleanup_jpeg(stbi__jpeg *j)
   3552 {
   3553    stbi__free_jpeg_components(j, j->s->img_n, 0);
   3554 }
   3555 
   3556 typedef struct
   3557 {
   3558    resample_row_func resample;
   3559    stbi_uc *line0,*line1;
   3560    int hs,vs;   // expansion factor in each axis
   3561    int w_lores; // horizontal pixels pre-expansion
   3562    int ystep;   // how far through vertical expansion we are
   3563    int ypos;    // which pre-expansion row we're on
   3564 } stbi__resample;
   3565 
   3566 // fast 0..255 * 0..255 => 0..255 rounded multiplication
   3567 static stbi_uc stbi__blinn_8x8(stbi_uc x, stbi_uc y)
   3568 {
   3569    unsigned int t = x*y + 128;
   3570    return (stbi_uc) ((t + (t >>8)) >> 8);
   3571 }
   3572 
   3573 static stbi_uc *load_jpeg_image(stbi__jpeg *z, int *out_x, int *out_y, int *comp, int req_comp)
   3574 {
   3575    int n, decode_n, is_rgb;
   3576    z->s->img_n = 0; // make stbi__cleanup_jpeg safe
   3577 
   3578    // validate req_comp
   3579    if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
   3580 
   3581    // load a jpeg image from whichever source, but leave in YCbCr format
   3582    if (!stbi__decode_jpeg_image(z)) { stbi__cleanup_jpeg(z); return NULL; }
   3583 
   3584    // determine actual number of components to generate
   3585    n = req_comp ? req_comp : z->s->img_n >= 3 ? 3 : 1;
   3586 
   3587    is_rgb = z->s->img_n == 3 && (z->rgb == 3 || (z->app14_color_transform == 0 && !z->jfif));
   3588 
   3589    if (z->s->img_n == 3 && n < 3 && !is_rgb)
   3590       decode_n = 1;
   3591    else
   3592       decode_n = z->s->img_n;
   3593 
   3594    // resample and color-convert
   3595    {
   3596       int k;
   3597       unsigned int i,j;
   3598       stbi_uc *output;
   3599       stbi_uc *coutput[4];
   3600 
   3601       stbi__resample res_comp[4];
   3602 
   3603       for (k=0; k < decode_n; ++k) {
   3604          stbi__resample *r = &res_comp[k];
   3605 
   3606          // allocate line buffer big enough for upsampling off the edges
   3607          // with upsample factor of 4
   3608          z->img_comp[k].linebuf = (stbi_uc *) stbi__malloc(z->s->img_x + 3);
   3609          if (!z->img_comp[k].linebuf) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
   3610 
   3611          r->hs      = z->img_h_max / z->img_comp[k].h;
   3612          r->vs      = z->img_v_max / z->img_comp[k].v;
   3613          r->ystep   = r->vs >> 1;
   3614          r->w_lores = (z->s->img_x + r->hs-1) / r->hs;
   3615          r->ypos    = 0;
   3616          r->line0   = r->line1 = z->img_comp[k].data;
   3617 
   3618          if      (r->hs == 1 && r->vs == 1) r->resample = resample_row_1;
   3619          else if (r->hs == 1 && r->vs == 2) r->resample = stbi__resample_row_v_2;
   3620          else if (r->hs == 2 && r->vs == 1) r->resample = stbi__resample_row_h_2;
   3621          else if (r->hs == 2 && r->vs == 2) r->resample = z->resample_row_hv_2_kernel;
   3622          else                               r->resample = stbi__resample_row_generic;
   3623       }
   3624 
   3625       // can't error after this so, this is safe
   3626       output = (stbi_uc *) stbi__malloc_mad3(n, z->s->img_x, z->s->img_y, 1);
   3627       if (!output) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
   3628 
   3629       // now go ahead and resample
   3630       for (j=0; j < z->s->img_y; ++j) {
   3631          stbi_uc *out = output + n * z->s->img_x * j;
   3632          for (k=0; k < decode_n; ++k) {
   3633             stbi__resample *r = &res_comp[k];
   3634             int y_bot = r->ystep >= (r->vs >> 1);
   3635             coutput[k] = r->resample(z->img_comp[k].linebuf,
   3636                                      y_bot ? r->line1 : r->line0,
   3637                                      y_bot ? r->line0 : r->line1,
   3638                                      r->w_lores, r->hs);
   3639             if (++r->ystep >= r->vs) {
   3640                r->ystep = 0;
   3641                r->line0 = r->line1;
   3642                if (++r->ypos < z->img_comp[k].y)
   3643                   r->line1 += z->img_comp[k].w2;
   3644             }
   3645          }
   3646          if (n >= 3) {
   3647             stbi_uc *y = coutput[0];
   3648             if (z->s->img_n == 3) {
   3649                if (is_rgb) {
   3650                   for (i=0; i < z->s->img_x; ++i) {
   3651                      out[0] = y[i];
   3652                      out[1] = coutput[1][i];
   3653                      out[2] = coutput[2][i];
   3654                      out[3] = 255;
   3655                      out += n;
   3656                   }
   3657                } else {
   3658                   z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
   3659                }
   3660             } else if (z->s->img_n == 4) {
   3661                if (z->app14_color_transform == 0) { // CMYK
   3662                   for (i=0; i < z->s->img_x; ++i) {
   3663                      stbi_uc m = coutput[3][i];
   3664                      out[0] = stbi__blinn_8x8(coutput[0][i], m);
   3665                      out[1] = stbi__blinn_8x8(coutput[1][i], m);
   3666                      out[2] = stbi__blinn_8x8(coutput[2][i], m);
   3667                      out[3] = 255;
   3668                      out += n;
   3669                   }
   3670                } else if (z->app14_color_transform == 2) { // YCCK
   3671                   z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
   3672                   for (i=0; i < z->s->img_x; ++i) {
   3673                      stbi_uc m = coutput[3][i];
   3674                      out[0] = stbi__blinn_8x8(255 - out[0], m);
   3675                      out[1] = stbi__blinn_8x8(255 - out[1], m);
   3676                      out[2] = stbi__blinn_8x8(255 - out[2], m);
   3677                      out += n;
   3678                   }
   3679                } else { // YCbCr + alpha?  Ignore the fourth channel for now
   3680                   z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
   3681                }
   3682             } else
   3683                for (i=0; i < z->s->img_x; ++i) {
   3684                   out[0] = out[1] = out[2] = y[i];
   3685                   out[3] = 255; // not used if n==3
   3686                   out += n;
   3687                }
   3688          } else {
   3689             if (is_rgb) {
   3690                if (n == 1)
   3691                   for (i=0; i < z->s->img_x; ++i)
   3692                      *out++ = stbi__compute_y(coutput[0][i], coutput[1][i], coutput[2][i]);
   3693                else {
   3694                   for (i=0; i < z->s->img_x; ++i, out += 2) {
   3695                      out[0] = stbi__compute_y(coutput[0][i], coutput[1][i], coutput[2][i]);
   3696                      out[1] = 255;
   3697                   }
   3698                }
   3699             } else if (z->s->img_n == 4 && z->app14_color_transform == 0) {
   3700                for (i=0; i < z->s->img_x; ++i) {
   3701                   stbi_uc m = coutput[3][i];
   3702                   stbi_uc r = stbi__blinn_8x8(coutput[0][i], m);
   3703                   stbi_uc g = stbi__blinn_8x8(coutput[1][i], m);
   3704                   stbi_uc b = stbi__blinn_8x8(coutput[2][i], m);
   3705                   out[0] = stbi__compute_y(r, g, b);
   3706                   out[1] = 255;
   3707                   out += n;
   3708                }
   3709             } else if (z->s->img_n == 4 && z->app14_color_transform == 2) {
   3710                for (i=0; i < z->s->img_x; ++i) {
   3711                   out[0] = stbi__blinn_8x8(255 - coutput[0][i], coutput[3][i]);
   3712                   out[1] = 255;
   3713                   out += n;
   3714                }
   3715             } else {
   3716                stbi_uc *y = coutput[0];
   3717                if (n == 1)
   3718                   for (i=0; i < z->s->img_x; ++i) out[i] = y[i];
   3719                else
   3720                   for (i=0; i < z->s->img_x; ++i) *out++ = y[i], *out++ = 255;
   3721             }
   3722          }
   3723       }
   3724       stbi__cleanup_jpeg(z);
   3725       *out_x = z->s->img_x;
   3726       *out_y = z->s->img_y;
   3727       if (comp) *comp = z->s->img_n >= 3 ? 3 : 1; // report original components, not output
   3728       return output;
   3729    }
   3730 }
   3731 
   3732 static void *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
   3733 {
   3734    unsigned char* result;
   3735    stbi__jpeg* j = (stbi__jpeg*) stbi__malloc(sizeof(stbi__jpeg));
   3736    STBI_NOTUSED(ri);
   3737    j->s = s;
   3738    stbi__setup_jpeg(j);
   3739    result = load_jpeg_image(j, x,y,comp,req_comp);
   3740    STBI_FREE(j);
   3741    return result;
   3742 }
   3743 
   3744 static int stbi__jpeg_test(stbi__context *s)
   3745 {
   3746    int r;
   3747    stbi__jpeg* j = (stbi__jpeg*)stbi__malloc(sizeof(stbi__jpeg));
   3748    j->s = s;
   3749    stbi__setup_jpeg(j);
   3750    r = stbi__decode_jpeg_header(j, STBI__SCAN_type);
   3751    stbi__rewind(s);
   3752    STBI_FREE(j);
   3753    return r;
   3754 }
   3755 
   3756 static int stbi__jpeg_info_raw(stbi__jpeg *j, int *x, int *y, int *comp)
   3757 {
   3758    if (!stbi__decode_jpeg_header(j, STBI__SCAN_header)) {
   3759       stbi__rewind( j->s );
   3760       return 0;
   3761    }
   3762    if (x) *x = j->s->img_x;
   3763    if (y) *y = j->s->img_y;
   3764    if (comp) *comp = j->s->img_n >= 3 ? 3 : 1;
   3765    return 1;
   3766 }
   3767 
   3768 static int stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp)
   3769 {
   3770    int result;
   3771    stbi__jpeg* j = (stbi__jpeg*) (stbi__malloc(sizeof(stbi__jpeg)));
   3772    j->s = s;
   3773    result = stbi__jpeg_info_raw(j, x, y, comp);
   3774    STBI_FREE(j);
   3775    return result;
   3776 }
   3777 #endif
   3778 
   3779 // public domain zlib decode    v0.2  Sean Barrett 2006-11-18
   3780 //    simple implementation
   3781 //      - all input must be provided in an upfront buffer
   3782 //      - all output is written to a single output buffer (can malloc/realloc)
   3783 //    performance
   3784 //      - fast huffman
   3785 
   3786 #ifndef STBI_NO_ZLIB
   3787 
   3788 // fast-way is faster to check than jpeg huffman, but slow way is slower
   3789 #define STBI__ZFAST_BITS  9 // accelerate all cases in default tables
   3790 #define STBI__ZFAST_MASK  ((1 << STBI__ZFAST_BITS) - 1)
   3791 
   3792 // zlib-style huffman encoding
   3793 // (jpegs packs from left, zlib from right, so can't share code)
   3794 typedef struct
   3795 {
   3796    stbi__uint16 fast[1 << STBI__ZFAST_BITS];
   3797    stbi__uint16 firstcode[16];
   3798    int maxcode[17];
   3799    stbi__uint16 firstsymbol[16];
   3800    stbi_uc  size[288];
   3801    stbi__uint16 value[288];
   3802 } stbi__zhuffman;
   3803 
   3804 stbi_inline static int stbi__bitreverse16(int n)
   3805 {
   3806   n = ((n & 0xAAAA) >>  1) | ((n & 0x5555) << 1);
   3807   n = ((n & 0xCCCC) >>  2) | ((n & 0x3333) << 2);
   3808   n = ((n & 0xF0F0) >>  4) | ((n & 0x0F0F) << 4);
   3809   n = ((n & 0xFF00) >>  8) | ((n & 0x00FF) << 8);
   3810   return n;
   3811 }
   3812 
   3813 stbi_inline static int stbi__bit_reverse(int v, int bits)
   3814 {
   3815    STBI_ASSERT(bits <= 16);
   3816    // to bit reverse n bits, reverse 16 and shift
   3817    // e.g. 11 bits, bit reverse and shift away 5
   3818    return stbi__bitreverse16(v) >> (16-bits);
   3819 }
   3820 
   3821 static int stbi__zbuild_huffman(stbi__zhuffman *z, const stbi_uc *sizelist, int num)
   3822 {
   3823    int i,k=0;
   3824    int code, next_code[16], sizes[17];
   3825 
   3826    // DEFLATE spec for generating codes
   3827    memset(sizes, 0, sizeof(sizes));
   3828    memset(z->fast, 0, sizeof(z->fast));
   3829    for (i=0; i < num; ++i)
   3830       ++sizes[sizelist[i]];
   3831    sizes[0] = 0;
   3832    for (i=1; i < 16; ++i)
   3833       if (sizes[i] > (1 << i))
   3834          return stbi__err("bad sizes", "Corrupt PNG");
   3835    code = 0;
   3836    for (i=1; i < 16; ++i) {
   3837       next_code[i] = code;
   3838       z->firstcode[i] = (stbi__uint16) code;
   3839       z->firstsymbol[i] = (stbi__uint16) k;
   3840       code = (code + sizes[i]);
   3841       if (sizes[i])
   3842          if (code-1 >= (1 << i)) return stbi__err("bad codelengths","Corrupt PNG");
   3843       z->maxcode[i] = code << (16-i); // preshift for inner loop
   3844       code <<= 1;
   3845       k += sizes[i];
   3846    }
   3847    z->maxcode[16] = 0x10000; // sentinel
   3848    for (i=0; i < num; ++i) {
   3849       int s = sizelist[i];
   3850       if (s) {
   3851          int c = next_code[s] - z->firstcode[s] + z->firstsymbol[s];
   3852          stbi__uint16 fastv = (stbi__uint16) ((s << 9) | i);
   3853          z->size [c] = (stbi_uc     ) s;
   3854          z->value[c] = (stbi__uint16) i;
   3855          if (s <= STBI__ZFAST_BITS) {
   3856             int j = stbi__bit_reverse(next_code[s],s);
   3857             while (j < (1 << STBI__ZFAST_BITS)) {
   3858                z->fast[j] = fastv;
   3859                j += (1 << s);
   3860             }
   3861          }
   3862          ++next_code[s];
   3863       }
   3864    }
   3865    return 1;
   3866 }
   3867 
   3868 // zlib-from-memory implementation for PNG reading
   3869 //    because PNG allows splitting the zlib stream arbitrarily,
   3870 //    and it's annoying structurally to have PNG call ZLIB call PNG,
   3871 //    we require PNG read all the IDATs and combine them into a single
   3872 //    memory buffer
   3873 
   3874 typedef struct
   3875 {
   3876    stbi_uc *zbuffer, *zbuffer_end;
   3877    int num_bits;
   3878    stbi__uint32 code_buffer;
   3879 
   3880    char *zout;
   3881    char *zout_start;
   3882    char *zout_end;
   3883    int   z_expandable;
   3884 
   3885    stbi__zhuffman z_length, z_distance;
   3886 } stbi__zbuf;
   3887 
   3888 stbi_inline static stbi_uc stbi__zget8(stbi__zbuf *z)
   3889 {
   3890    if (z->zbuffer >= z->zbuffer_end) return 0;
   3891    return *z->zbuffer++;
   3892 }
   3893 
   3894 static void stbi__fill_bits(stbi__zbuf *z)
   3895 {
   3896    do {
   3897       STBI_ASSERT(z->code_buffer < (1U << z->num_bits));
   3898       z->code_buffer |= (unsigned int) stbi__zget8(z) << z->num_bits;
   3899       z->num_bits += 8;
   3900    } while (z->num_bits <= 24);
   3901 }
   3902 
   3903 stbi_inline static unsigned int stbi__zreceive(stbi__zbuf *z, int n)
   3904 {
   3905    unsigned int k;
   3906    if (z->num_bits < n) stbi__fill_bits(z);
   3907    k = z->code_buffer & ((1 << n) - 1);
   3908    z->code_buffer >>= n;
   3909    z->num_bits -= n;
   3910    return k;
   3911 }
   3912 
   3913 static int stbi__zhuffman_decode_slowpath(stbi__zbuf *a, stbi__zhuffman *z)
   3914 {
   3915    int b,s,k;
   3916    // not resolved by fast table, so compute it the slow way
   3917    // use jpeg approach, which requires MSbits at top
   3918    k = stbi__bit_reverse(a->code_buffer, 16);
   3919    for (s=STBI__ZFAST_BITS+1; ; ++s)
   3920       if (k < z->maxcode[s])
   3921          break;
   3922    if (s == 16) return -1; // invalid code!
   3923    // code size is s, so:
   3924    b = (k >> (16-s)) - z->firstcode[s] + z->firstsymbol[s];
   3925    STBI_ASSERT(z->size[b] == s);
   3926    a->code_buffer >>= s;
   3927    a->num_bits -= s;
   3928    return z->value[b];
   3929 }
   3930 
   3931 stbi_inline static int stbi__zhuffman_decode(stbi__zbuf *a, stbi__zhuffman *z)
   3932 {
   3933    int b,s;
   3934    if (a->num_bits < 16) stbi__fill_bits(a);
   3935    b = z->fast[a->code_buffer & STBI__ZFAST_MASK];
   3936    if (b) {
   3937       s = b >> 9;
   3938       a->code_buffer >>= s;
   3939       a->num_bits -= s;
   3940       return b & 511;
   3941    }
   3942    return stbi__zhuffman_decode_slowpath(a, z);
   3943 }
   3944 
   3945 static int stbi__zexpand(stbi__zbuf *z, char *zout, int n)  // need to make room for n bytes
   3946 {
   3947    char *q;
   3948    int cur, limit, old_limit;
   3949    z->zout = zout;
   3950    if (!z->z_expandable) return stbi__err("output buffer limit","Corrupt PNG");
   3951    cur   = (int) (z->zout     - z->zout_start);
   3952    limit = old_limit = (int) (z->zout_end - z->zout_start);
   3953    while (cur + n > limit)
   3954       limit *= 2;
   3955    q = (char *) STBI_REALLOC_SIZED(z->zout_start, old_limit, limit);
   3956    STBI_NOTUSED(old_limit);
   3957    if (q == NULL) return stbi__err("outofmem", "Out of memory");
   3958    z->zout_start = q;
   3959    z->zout       = q + cur;
   3960    z->zout_end   = q + limit;
   3961    return 1;
   3962 }
   3963 
   3964 static const int stbi__zlength_base[31] = {
   3965    3,4,5,6,7,8,9,10,11,13,
   3966    15,17,19,23,27,31,35,43,51,59,
   3967    67,83,99,115,131,163,195,227,258,0,0 };
   3968 
   3969 static const int stbi__zlength_extra[31]=
   3970 { 0,0,0,0,0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5,0,0,0 };
   3971 
   3972 static const int stbi__zdist_base[32] = { 1,2,3,4,5,7,9,13,17,25,33,49,65,97,129,193,
   3973 257,385,513,769,1025,1537,2049,3073,4097,6145,8193,12289,16385,24577,0,0};
   3974 
   3975 static const int stbi__zdist_extra[32] =
   3976 { 0,0,0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13};
   3977 
   3978 static int stbi__parse_huffman_block(stbi__zbuf *a)
   3979 {
   3980    char *zout = a->zout;
   3981    for(;;) {
   3982       int z = stbi__zhuffman_decode(a, &a->z_length);
   3983       if (z < 256) {
   3984          if (z < 0) return stbi__err("bad huffman code","Corrupt PNG"); // error in huffman codes
   3985          if (zout >= a->zout_end) {
   3986             if (!stbi__zexpand(a, zout, 1)) return 0;
   3987             zout = a->zout;
   3988          }
   3989          *zout++ = (char) z;
   3990       } else {
   3991          stbi_uc *p;
   3992          int len,dist;
   3993          if (z == 256) {
   3994             a->zout = zout;
   3995             return 1;
   3996          }
   3997          z -= 257;
   3998          len = stbi__zlength_base[z];
   3999          if (stbi__zlength_extra[z]) len += stbi__zreceive(a, stbi__zlength_extra[z]);
   4000          z = stbi__zhuffman_decode(a, &a->z_distance);
   4001          if (z < 0) return stbi__err("bad huffman code","Corrupt PNG");
   4002          dist = stbi__zdist_base[z];
   4003          if (stbi__zdist_extra[z]) dist += stbi__zreceive(a, stbi__zdist_extra[z]);
   4004          if (zout - a->zout_start < dist) return stbi__err("bad dist","Corrupt PNG");
   4005          if (zout + len > a->zout_end) {
   4006             if (!stbi__zexpand(a, zout, len)) return 0;
   4007             zout = a->zout;
   4008          }
   4009          p = (stbi_uc *) (zout - dist);
   4010          if (dist == 1) { // run of one byte; common in images.
   4011             stbi_uc v = *p;
   4012             if (len) { do *zout++ = v; while (--len); }
   4013          } else {
   4014             if (len) { do *zout++ = *p++; while (--len); }
   4015          }
   4016       }
   4017    }
   4018 }
   4019 
   4020 static int stbi__compute_huffman_codes(stbi__zbuf *a)
   4021 {
   4022    static const stbi_uc length_dezigzag[19] = { 16,17,18,0,8,7,9,6,10,5,11,4,12,3,13,2,14,1,15 };
   4023    stbi__zhuffman z_codelength;
   4024    stbi_uc lencodes[286+32+137];//padding for maximum single op
   4025    stbi_uc codelength_sizes[19];
   4026    int i,n;
   4027 
   4028    int hlit  = stbi__zreceive(a,5) + 257;
   4029    int hdist = stbi__zreceive(a,5) + 1;
   4030    int hclen = stbi__zreceive(a,4) + 4;
   4031    int ntot  = hlit + hdist;
   4032 
   4033    memset(codelength_sizes, 0, sizeof(codelength_sizes));
   4034    for (i=0; i < hclen; ++i) {
   4035       int s = stbi__zreceive(a,3);
   4036       codelength_sizes[length_dezigzag[i]] = (stbi_uc) s;
   4037    }
   4038    if (!stbi__zbuild_huffman(&z_codelength, codelength_sizes, 19)) return 0;
   4039 
   4040    n = 0;
   4041    while (n < ntot) {
   4042       int c = stbi__zhuffman_decode(a, &z_codelength);
   4043       if (c < 0 || c >= 19) return stbi__err("bad codelengths", "Corrupt PNG");
   4044       if (c < 16)
   4045          lencodes[n++] = (stbi_uc) c;
   4046       else {
   4047          stbi_uc fill = 0;
   4048          if (c == 16) {
   4049             c = stbi__zreceive(a,2)+3;
   4050             if (n == 0) return stbi__err("bad codelengths", "Corrupt PNG");
   4051             fill = lencodes[n-1];
   4052          } else if (c == 17)
   4053             c = stbi__zreceive(a,3)+3;
   4054          else {
   4055             STBI_ASSERT(c == 18);
   4056             c = stbi__zreceive(a,7)+11;
   4057          }
   4058          if (ntot - n < c) return stbi__err("bad codelengths", "Corrupt PNG");
   4059          memset(lencodes+n, fill, c);
   4060          n += c;
   4061       }
   4062    }
   4063    if (n != ntot) return stbi__err("bad codelengths","Corrupt PNG");
   4064    if (!stbi__zbuild_huffman(&a->z_length, lencodes, hlit)) return 0;
   4065    if (!stbi__zbuild_huffman(&a->z_distance, lencodes+hlit, hdist)) return 0;
   4066    return 1;
   4067 }
   4068 
   4069 static int stbi__parse_uncompressed_block(stbi__zbuf *a)
   4070 {
   4071    stbi_uc header[4];
   4072    int len,nlen,k;
   4073    if (a->num_bits & 7)
   4074       stbi__zreceive(a, a->num_bits & 7); // discard
   4075    // drain the bit-packed data into header
   4076    k = 0;
   4077    while (a->num_bits > 0) {
   4078       header[k++] = (stbi_uc) (a->code_buffer & 255); // suppress MSVC run-time check
   4079       a->code_buffer >>= 8;
   4080       a->num_bits -= 8;
   4081    }
   4082    STBI_ASSERT(a->num_bits == 0);
   4083    // now fill header the normal way
   4084    while (k < 4)
   4085       header[k++] = stbi__zget8(a);
   4086    len  = header[1] * 256 + header[0];
   4087    nlen = header[3] * 256 + header[2];
   4088    if (nlen != (len ^ 0xffff)) return stbi__err("zlib corrupt","Corrupt PNG");
   4089    if (a->zbuffer + len > a->zbuffer_end) return stbi__err("read past buffer","Corrupt PNG");
   4090    if (a->zout + len > a->zout_end)
   4091       if (!stbi__zexpand(a, a->zout, len)) return 0;
   4092    memcpy(a->zout, a->zbuffer, len);
   4093    a->zbuffer += len;
   4094    a->zout += len;
   4095    return 1;
   4096 }
   4097 
   4098 static int stbi__parse_zlib_header(stbi__zbuf *a)
   4099 {
   4100    int cmf   = stbi__zget8(a);
   4101    int cm    = cmf & 15;
   4102    /* int cinfo = cmf >> 4; */
   4103    int flg   = stbi__zget8(a);
   4104    if ((cmf*256+flg) % 31 != 0) return stbi__err("bad zlib header","Corrupt PNG"); // zlib spec
   4105    if (flg & 32) return stbi__err("no preset dict","Corrupt PNG"); // preset dictionary not allowed in png
   4106    if (cm != 8) return stbi__err("bad compression","Corrupt PNG"); // DEFLATE required for png
   4107    // window = 1 << (8 + cinfo)... but who cares, we fully buffer output
   4108    return 1;
   4109 }
   4110 
   4111 static const stbi_uc stbi__zdefault_length[288] =
   4112 {
   4113    8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
   4114    8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
   4115    8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
   4116    8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
   4117    8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
   4118    9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
   4119    9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
   4120    9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
   4121    7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7, 7,7,7,7,7,7,7,7,8,8,8,8,8,8,8,8
   4122 };
   4123 static const stbi_uc stbi__zdefault_distance[32] =
   4124 {
   4125    5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5
   4126 };
   4127 /*
   4128 Init algorithm:
   4129 {
   4130    int i;   // use <= to match clearly with spec
   4131    for (i=0; i <= 143; ++i)     stbi__zdefault_length[i]   = 8;
   4132    for (   ; i <= 255; ++i)     stbi__zdefault_length[i]   = 9;
   4133    for (   ; i <= 279; ++i)     stbi__zdefault_length[i]   = 7;
   4134    for (   ; i <= 287; ++i)     stbi__zdefault_length[i]   = 8;
   4135 
   4136    for (i=0; i <=  31; ++i)     stbi__zdefault_distance[i] = 5;
   4137 }
   4138 */
   4139 
   4140 static int stbi__parse_zlib(stbi__zbuf *a, int parse_header)
   4141 {
   4142    int final, type;
   4143    if (parse_header)
   4144       if (!stbi__parse_zlib_header(a)) return 0;
   4145    a->num_bits = 0;
   4146    a->code_buffer = 0;
   4147    do {
   4148       final = stbi__zreceive(a,1);
   4149       type = stbi__zreceive(a,2);
   4150       if (type == 0) {
   4151          if (!stbi__parse_uncompressed_block(a)) return 0;
   4152       } else if (type == 3) {
   4153          return 0;
   4154       } else {
   4155          if (type == 1) {
   4156             // use fixed code lengths
   4157             if (!stbi__zbuild_huffman(&a->z_length  , stbi__zdefault_length  , 288)) return 0;
   4158             if (!stbi__zbuild_huffman(&a->z_distance, stbi__zdefault_distance,  32)) return 0;
   4159          } else {
   4160             if (!stbi__compute_huffman_codes(a)) return 0;
   4161          }
   4162          if (!stbi__parse_huffman_block(a)) return 0;
   4163       }
   4164    } while (!final);
   4165    return 1;
   4166 }
   4167 
   4168 static int stbi__do_zlib(stbi__zbuf *a, char *obuf, int olen, int exp, int parse_header)
   4169 {
   4170    a->zout_start = obuf;
   4171    a->zout       = obuf;
   4172    a->zout_end   = obuf + olen;
   4173    a->z_expandable = exp;
   4174 
   4175    return stbi__parse_zlib(a, parse_header);
   4176 }
   4177 
   4178 STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen)
   4179 {
   4180    stbi__zbuf a;
   4181    char *p = (char *) stbi__malloc(initial_size);
   4182    if (p == NULL) return NULL;
   4183    a.zbuffer = (stbi_uc *) buffer;
   4184    a.zbuffer_end = (stbi_uc *) buffer + len;
   4185    if (stbi__do_zlib(&a, p, initial_size, 1, 1)) {
   4186       if (outlen) *outlen = (int) (a.zout - a.zout_start);
   4187       return a.zout_start;
   4188    } else {
   4189       STBI_FREE(a.zout_start);
   4190       return NULL;
   4191    }
   4192 }
   4193 
   4194 STBIDEF char *stbi_zlib_decode_malloc(char const *buffer, int len, int *outlen)
   4195 {
   4196    return stbi_zlib_decode_malloc_guesssize(buffer, len, 16384, outlen);
   4197 }
   4198 
   4199 STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header)
   4200 {
   4201    stbi__zbuf a;
   4202    char *p = (char *) stbi__malloc(initial_size);
   4203    if (p == NULL) return NULL;
   4204    a.zbuffer = (stbi_uc *) buffer;
   4205    a.zbuffer_end = (stbi_uc *) buffer + len;
   4206    if (stbi__do_zlib(&a, p, initial_size, 1, parse_header)) {
   4207       if (outlen) *outlen = (int) (a.zout - a.zout_start);
   4208       return a.zout_start;
   4209    } else {
   4210       STBI_FREE(a.zout_start);
   4211       return NULL;
   4212    }
   4213 }
   4214 
   4215 STBIDEF int stbi_zlib_decode_buffer(char *obuffer, int olen, char const *ibuffer, int ilen)
   4216 {
   4217    stbi__zbuf a;
   4218    a.zbuffer = (stbi_uc *) ibuffer;
   4219    a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
   4220    if (stbi__do_zlib(&a, obuffer, olen, 0, 1))
   4221       return (int) (a.zout - a.zout_start);
   4222    else
   4223       return -1;
   4224 }
   4225 
   4226 STBIDEF char *stbi_zlib_decode_noheader_malloc(char const *buffer, int len, int *outlen)
   4227 {
   4228    stbi__zbuf a;
   4229    char *p = (char *) stbi__malloc(16384);
   4230    if (p == NULL) return NULL;
   4231    a.zbuffer = (stbi_uc *) buffer;
   4232    a.zbuffer_end = (stbi_uc *) buffer+len;
   4233    if (stbi__do_zlib(&a, p, 16384, 1, 0)) {
   4234       if (outlen) *outlen = (int) (a.zout - a.zout_start);
   4235       return a.zout_start;
   4236    } else {
   4237       STBI_FREE(a.zout_start);
   4238       return NULL;
   4239    }
   4240 }
   4241 
   4242 STBIDEF int stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen)
   4243 {
   4244    stbi__zbuf a;
   4245    a.zbuffer = (stbi_uc *) ibuffer;
   4246    a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
   4247    if (stbi__do_zlib(&a, obuffer, olen, 0, 0))
   4248       return (int) (a.zout - a.zout_start);
   4249    else
   4250       return -1;
   4251 }
   4252 #endif
   4253 
   4254 // public domain "baseline" PNG decoder   v0.10  Sean Barrett 2006-11-18
   4255 //    simple implementation
   4256 //      - only 8-bit samples
   4257 //      - no CRC checking
   4258 //      - allocates lots of intermediate memory
   4259 //        - avoids problem of streaming data between subsystems
   4260 //        - avoids explicit window management
   4261 //    performance
   4262 //      - uses stb_zlib, a PD zlib implementation with fast huffman decoding
   4263 
   4264 #ifndef STBI_NO_PNG
   4265 typedef struct
   4266 {
   4267    stbi__uint32 length;
   4268    stbi__uint32 type;
   4269 } stbi__pngchunk;
   4270 
   4271 static stbi__pngchunk stbi__get_chunk_header(stbi__context *s)
   4272 {
   4273    stbi__pngchunk c;
   4274    c.length = stbi__get32be(s);
   4275    c.type   = stbi__get32be(s);
   4276    return c;
   4277 }
   4278 
   4279 static int stbi__check_png_header(stbi__context *s)
   4280 {
   4281    static const stbi_uc png_sig[8] = { 137,80,78,71,13,10,26,10 };
   4282    int i;
   4283    for (i=0; i < 8; ++i)
   4284       if (stbi__get8(s) != png_sig[i]) return stbi__err("bad png sig","Not a PNG");
   4285    return 1;
   4286 }
   4287 
   4288 typedef struct
   4289 {
   4290    stbi__context *s;
   4291    stbi_uc *idata, *expanded, *out;
   4292    int depth;
   4293 } stbi__png;
   4294 
   4295 
   4296 enum {
   4297    STBI__F_none=0,
   4298    STBI__F_sub=1,
   4299    STBI__F_up=2,
   4300    STBI__F_avg=3,
   4301    STBI__F_paeth=4,
   4302    // synthetic filters used for first scanline to avoid needing a dummy row of 0s
   4303    STBI__F_avg_first,
   4304    STBI__F_paeth_first
   4305 };
   4306 
   4307 static stbi_uc first_row_filter[5] =
   4308 {
   4309    STBI__F_none,
   4310    STBI__F_sub,
   4311    STBI__F_none,
   4312    STBI__F_avg_first,
   4313    STBI__F_paeth_first
   4314 };
   4315 
   4316 static int stbi__paeth(int a, int b, int c)
   4317 {
   4318    int p = a + b - c;
   4319    int pa = abs(p-a);
   4320    int pb = abs(p-b);
   4321    int pc = abs(p-c);
   4322    if (pa <= pb && pa <= pc) return a;
   4323    if (pb <= pc) return b;
   4324    return c;
   4325 }
   4326 
   4327 static const stbi_uc stbi__depth_scale_table[9] = { 0, 0xff, 0x55, 0, 0x11, 0,0,0, 0x01 };
   4328 
   4329 // create the png data from post-deflated data
   4330 static int stbi__create_png_image_raw(stbi__png *a, stbi_uc *raw, stbi__uint32 raw_len, int out_n, stbi__uint32 x, stbi__uint32 y, int depth, int color)
   4331 {
   4332    int bytes = (depth == 16? 2 : 1);
   4333    stbi__context *s = a->s;
   4334    stbi__uint32 i,j,stride = x*out_n*bytes;
   4335    stbi__uint32 img_len, img_width_bytes;
   4336    int k;
   4337    int img_n = s->img_n; // copy it into a local for later
   4338 
   4339    int output_bytes = out_n*bytes;
   4340    int filter_bytes = img_n*bytes;
   4341    int width = x;
   4342 
   4343    STBI_ASSERT(out_n == s->img_n || out_n == s->img_n+1);
   4344    a->out = (stbi_uc *) stbi__malloc_mad3(x, y, output_bytes, 0); // extra bytes to write off the end into
   4345    if (!a->out) return stbi__err("outofmem", "Out of memory");
   4346 
   4347    if (!stbi__mad3sizes_valid(img_n, x, depth, 7)) return stbi__err("too large", "Corrupt PNG");
   4348    img_width_bytes = (((img_n * x * depth) + 7) >> 3);
   4349    img_len = (img_width_bytes + 1) * y;
   4350 
   4351    // we used to check for exact match between raw_len and img_len on non-interlaced PNGs,
   4352    // but issue #276 reported a PNG in the wild that had extra data at the end (all zeros),
   4353    // so just check for raw_len < img_len always.
   4354    if (raw_len < img_len) return stbi__err("not enough pixels","Corrupt PNG");
   4355 
   4356    for (j=0; j < y; ++j) {
   4357       stbi_uc *cur = a->out + stride*j;
   4358       stbi_uc *prior;
   4359       int filter = *raw++;
   4360 
   4361       if (filter > 4)
   4362          return stbi__err("invalid filter","Corrupt PNG");
   4363 
   4364       if (depth < 8) {
   4365          STBI_ASSERT(img_width_bytes <= x);
   4366          cur += x*out_n - img_width_bytes; // store output to the rightmost img_len bytes, so we can decode in place
   4367          filter_bytes = 1;
   4368          width = img_width_bytes;
   4369       }
   4370       prior = cur - stride; // bugfix: need to compute this after 'cur +=' computation above
   4371 
   4372       // if first row, use special filter that doesn't sample previous row
   4373       if (j == 0) filter = first_row_filter[filter];
   4374 
   4375       // handle first byte explicitly
   4376       for (k=0; k < filter_bytes; ++k) {
   4377          switch (filter) {
   4378             case STBI__F_none       : cur[k] = raw[k]; break;
   4379             case STBI__F_sub        : cur[k] = raw[k]; break;
   4380             case STBI__F_up         : cur[k] = STBI__BYTECAST(raw[k] + prior[k]); break;
   4381             case STBI__F_avg        : cur[k] = STBI__BYTECAST(raw[k] + (prior[k]>>1)); break;
   4382             case STBI__F_paeth      : cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(0,prior[k],0)); break;
   4383             case STBI__F_avg_first  : cur[k] = raw[k]; break;
   4384             case STBI__F_paeth_first: cur[k] = raw[k]; break;
   4385          }
   4386       }
   4387 
   4388       if (depth == 8) {
   4389          if (img_n != out_n)
   4390             cur[img_n] = 255; // first pixel
   4391          raw += img_n;
   4392          cur += out_n;
   4393          prior += out_n;
   4394       } else if (depth == 16) {
   4395          if (img_n != out_n) {
   4396             cur[filter_bytes]   = 255; // first pixel top byte
   4397             cur[filter_bytes+1] = 255; // first pixel bottom byte
   4398          }
   4399          raw += filter_bytes;
   4400          cur += output_bytes;
   4401          prior += output_bytes;
   4402       } else {
   4403          raw += 1;
   4404          cur += 1;
   4405          prior += 1;
   4406       }
   4407 
   4408       // this is a little gross, so that we don't switch per-pixel or per-component
   4409       if (depth < 8 || img_n == out_n) {
   4410          int nk = (width - 1)*filter_bytes;
   4411          #define STBI__CASE(f) \
   4412              case f:     \
   4413                 for (k=0; k < nk; ++k)
   4414          switch (filter) {
   4415             // "none" filter turns into a memcpy here; make that explicit.
   4416             case STBI__F_none:         memcpy(cur, raw, nk); break;
   4417             STBI__CASE(STBI__F_sub)          { cur[k] = STBI__BYTECAST(raw[k] + cur[k-filter_bytes]); } break;
   4418             STBI__CASE(STBI__F_up)           { cur[k] = STBI__BYTECAST(raw[k] + prior[k]); } break;
   4419             STBI__CASE(STBI__F_avg)          { cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k-filter_bytes])>>1)); } break;
   4420             STBI__CASE(STBI__F_paeth)        { cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes],prior[k],prior[k-filter_bytes])); } break;
   4421             STBI__CASE(STBI__F_avg_first)    { cur[k] = STBI__BYTECAST(raw[k] + (cur[k-filter_bytes] >> 1)); } break;
   4422             STBI__CASE(STBI__F_paeth_first)  { cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes],0,0)); } break;
   4423          }
   4424          #undef STBI__CASE
   4425          raw += nk;
   4426       } else {
   4427          STBI_ASSERT(img_n+1 == out_n);
   4428          #define STBI__CASE(f) \
   4429              case f:     \
   4430                 for (i=x-1; i >= 1; --i, cur[filter_bytes]=255,raw+=filter_bytes,cur+=output_bytes,prior+=output_bytes) \
   4431                    for (k=0; k < filter_bytes; ++k)
   4432          switch (filter) {
   4433             STBI__CASE(STBI__F_none)         { cur[k] = raw[k]; } break;
   4434             STBI__CASE(STBI__F_sub)          { cur[k] = STBI__BYTECAST(raw[k] + cur[k- output_bytes]); } break;
   4435             STBI__CASE(STBI__F_up)           { cur[k] = STBI__BYTECAST(raw[k] + prior[k]); } break;
   4436             STBI__CASE(STBI__F_avg)          { cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k- output_bytes])>>1)); } break;
   4437             STBI__CASE(STBI__F_paeth)        { cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k- output_bytes],prior[k],prior[k- output_bytes])); } break;
   4438             STBI__CASE(STBI__F_avg_first)    { cur[k] = STBI__BYTECAST(raw[k] + (cur[k- output_bytes] >> 1)); } break;
   4439             STBI__CASE(STBI__F_paeth_first)  { cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k- output_bytes],0,0)); } break;
   4440          }
   4441          #undef STBI__CASE
   4442 
   4443          // the loop above sets the high byte of the pixels' alpha, but for
   4444          // 16 bit png files we also need the low byte set. we'll do that here.
   4445          if (depth == 16) {
   4446             cur = a->out + stride*j; // start at the beginning of the row again
   4447             for (i=0; i < x; ++i,cur+=output_bytes) {
   4448                cur[filter_bytes+1] = 255;
   4449             }
   4450          }
   4451       }
   4452    }
   4453 
   4454    // we make a separate pass to expand bits to pixels; for performance,
   4455    // this could run two scanlines behind the above code, so it won't
   4456    // intefere with filtering but will still be in the cache.
   4457    if (depth < 8) {
   4458       for (j=0; j < y; ++j) {
   4459          stbi_uc *cur = a->out + stride*j;
   4460          stbi_uc *in  = a->out + stride*j + x*out_n - img_width_bytes;
   4461          // unpack 1/2/4-bit into a 8-bit buffer. allows us to keep the common 8-bit path optimal at minimal cost for 1/2/4-bit
   4462          // png guarante byte alignment, if width is not multiple of 8/4/2 we'll decode dummy trailing data that will be skipped in the later loop
   4463          stbi_uc scale = (color == 0) ? stbi__depth_scale_table[depth] : 1; // scale grayscale values to 0..255 range
   4464 
   4465          // note that the final byte might overshoot and write more data than desired.
   4466          // we can allocate enough data that this never writes out of memory, but it
   4467          // could also overwrite the next scanline. can it overwrite non-empty data
   4468          // on the next scanline? yes, consider 1-pixel-wide scanlines with 1-bit-per-pixel.
   4469          // so we need to explicitly clamp the final ones
   4470 
   4471          if (depth == 4) {
   4472             for (k=x*img_n; k >= 2; k-=2, ++in) {
   4473                *cur++ = scale * ((*in >> 4)       );
   4474                *cur++ = scale * ((*in     ) & 0x0f);
   4475             }
   4476             if (k > 0) *cur++ = scale * ((*in >> 4)       );
   4477          } else if (depth == 2) {
   4478             for (k=x*img_n; k >= 4; k-=4, ++in) {
   4479                *cur++ = scale * ((*in >> 6)       );
   4480                *cur++ = scale * ((*in >> 4) & 0x03);
   4481                *cur++ = scale * ((*in >> 2) & 0x03);
   4482                *cur++ = scale * ((*in     ) & 0x03);
   4483             }
   4484             if (k > 0) *cur++ = scale * ((*in >> 6)       );
   4485             if (k > 1) *cur++ = scale * ((*in >> 4) & 0x03);
   4486             if (k > 2) *cur++ = scale * ((*in >> 2) & 0x03);
   4487          } else if (depth == 1) {
   4488             for (k=x*img_n; k >= 8; k-=8, ++in) {
   4489                *cur++ = scale * ((*in >> 7)       );
   4490                *cur++ = scale * ((*in >> 6) & 0x01);
   4491                *cur++ = scale * ((*in >> 5) & 0x01);
   4492                *cur++ = scale * ((*in >> 4) & 0x01);
   4493                *cur++ = scale * ((*in >> 3) & 0x01);
   4494                *cur++ = scale * ((*in >> 2) & 0x01);
   4495                *cur++ = scale * ((*in >> 1) & 0x01);
   4496                *cur++ = scale * ((*in     ) & 0x01);
   4497             }
   4498             if (k > 0) *cur++ = scale * ((*in >> 7)       );
   4499             if (k > 1) *cur++ = scale * ((*in >> 6) & 0x01);
   4500             if (k > 2) *cur++ = scale * ((*in >> 5) & 0x01);
   4501             if (k > 3) *cur++ = scale * ((*in >> 4) & 0x01);
   4502             if (k > 4) *cur++ = scale * ((*in >> 3) & 0x01);
   4503             if (k > 5) *cur++ = scale * ((*in >> 2) & 0x01);
   4504             if (k > 6) *cur++ = scale * ((*in >> 1) & 0x01);
   4505          }
   4506          if (img_n != out_n) {
   4507             int q;
   4508             // insert alpha = 255
   4509             cur = a->out + stride*j;
   4510             if (img_n == 1) {
   4511                for (q=x-1; q >= 0; --q) {
   4512                   cur[q*2+1] = 255;
   4513                   cur[q*2+0] = cur[q];
   4514                }
   4515             } else {
   4516                STBI_ASSERT(img_n == 3);
   4517                for (q=x-1; q >= 0; --q) {
   4518                   cur[q*4+3] = 255;
   4519                   cur[q*4+2] = cur[q*3+2];
   4520                   cur[q*4+1] = cur[q*3+1];
   4521                   cur[q*4+0] = cur[q*3+0];
   4522                }
   4523             }
   4524          }
   4525       }
   4526    } else if (depth == 16) {
   4527       // force the image data from big-endian to platform-native.
   4528       // this is done in a separate pass due to the decoding relying
   4529       // on the data being untouched, but could probably be done
   4530       // per-line during decode if care is taken.
   4531       stbi_uc *cur = a->out;
   4532       stbi__uint16 *cur16 = (stbi__uint16*)cur;
   4533 
   4534       for(i=0; i < x*y*out_n; ++i,cur16++,cur+=2) {
   4535          *cur16 = (cur[0] << 8) | cur[1];
   4536       }
   4537    }
   4538 
   4539    return 1;
   4540 }
   4541 
   4542 static int stbi__create_png_image(stbi__png *a, stbi_uc *image_data, stbi__uint32 image_data_len, int out_n, int depth, int color, int interlaced)
   4543 {
   4544    int bytes = (depth == 16 ? 2 : 1);
   4545    int out_bytes = out_n * bytes;
   4546    stbi_uc *final;
   4547    int p;
   4548    if (!interlaced)
   4549       return stbi__create_png_image_raw(a, image_data, image_data_len, out_n, a->s->img_x, a->s->img_y, depth, color);
   4550 
   4551    // de-interlacing
   4552    final = (stbi_uc *) stbi__malloc_mad3(a->s->img_x, a->s->img_y, out_bytes, 0);
   4553    for (p=0; p < 7; ++p) {
   4554       int xorig[] = { 0,4,0,2,0,1,0 };
   4555       int yorig[] = { 0,0,4,0,2,0,1 };
   4556       int xspc[]  = { 8,8,4,4,2,2,1 };
   4557       int yspc[]  = { 8,8,8,4,4,2,2 };
   4558       int i,j,x,y;
   4559       // pass1_x[4] = 0, pass1_x[5] = 1, pass1_x[12] = 1
   4560       x = (a->s->img_x - xorig[p] + xspc[p]-1) / xspc[p];
   4561       y = (a->s->img_y - yorig[p] + yspc[p]-1) / yspc[p];
   4562       if (x && y) {
   4563          stbi__uint32 img_len = ((((a->s->img_n * x * depth) + 7) >> 3) + 1) * y;
   4564          if (!stbi__create_png_image_raw(a, image_data, image_data_len, out_n, x, y, depth, color)) {
   4565             STBI_FREE(final);
   4566             return 0;
   4567          }
   4568          for (j=0; j < y; ++j) {
   4569             for (i=0; i < x; ++i) {
   4570                int out_y = j*yspc[p]+yorig[p];
   4571                int out_x = i*xspc[p]+xorig[p];
   4572                memcpy(final + out_y*a->s->img_x*out_bytes + out_x*out_bytes,
   4573                       a->out + (j*x+i)*out_bytes, out_bytes);
   4574             }
   4575          }
   4576          STBI_FREE(a->out);
   4577          image_data += img_len;
   4578          image_data_len -= img_len;
   4579       }
   4580    }
   4581    a->out = final;
   4582 
   4583    return 1;
   4584 }
   4585 
   4586 static int stbi__compute_transparency(stbi__png *z, stbi_uc tc[3], int out_n)
   4587 {
   4588    stbi__context *s = z->s;
   4589    stbi__uint32 i, pixel_count = s->img_x * s->img_y;
   4590    stbi_uc *p = z->out;
   4591 
   4592    // compute color-based transparency, assuming we've
   4593    // already got 255 as the alpha value in the output
   4594    STBI_ASSERT(out_n == 2 || out_n == 4);
   4595 
   4596    if (out_n == 2) {
   4597       for (i=0; i < pixel_count; ++i) {
   4598          p[1] = (p[0] == tc[0] ? 0 : 255);
   4599          p += 2;
   4600       }
   4601    } else {
   4602       for (i=0; i < pixel_count; ++i) {
   4603          if (p[0] == tc[0] && p[1] == tc[1] && p[2] == tc[2])
   4604             p[3] = 0;
   4605          p += 4;
   4606       }
   4607    }
   4608    return 1;
   4609 }
   4610 
   4611 static int stbi__compute_transparency16(stbi__png *z, stbi__uint16 tc[3], int out_n)
   4612 {
   4613    stbi__context *s = z->s;
   4614    stbi__uint32 i, pixel_count = s->img_x * s->img_y;
   4615    stbi__uint16 *p = (stbi__uint16*) z->out;
   4616 
   4617    // compute color-based transparency, assuming we've
   4618    // already got 65535 as the alpha value in the output
   4619    STBI_ASSERT(out_n == 2 || out_n == 4);
   4620 
   4621    if (out_n == 2) {
   4622       for (i = 0; i < pixel_count; ++i) {
   4623          p[1] = (p[0] == tc[0] ? 0 : 65535);
   4624          p += 2;
   4625       }
   4626    } else {
   4627       for (i = 0; i < pixel_count; ++i) {
   4628          if (p[0] == tc[0] && p[1] == tc[1] && p[2] == tc[2])
   4629             p[3] = 0;
   4630          p += 4;
   4631       }
   4632    }
   4633    return 1;
   4634 }
   4635 
   4636 static int stbi__expand_png_palette(stbi__png *a, stbi_uc *palette, int len, int pal_img_n)
   4637 {
   4638    stbi__uint32 i, pixel_count = a->s->img_x * a->s->img_y;
   4639    stbi_uc *p, *temp_out, *orig = a->out;
   4640 
   4641    p = (stbi_uc *) stbi__malloc_mad2(pixel_count, pal_img_n, 0);
   4642    if (p == NULL) return stbi__err("outofmem", "Out of memory");
   4643 
   4644    // between here and free(out) below, exitting would leak
   4645    temp_out = p;
   4646 
   4647    if (pal_img_n == 3) {
   4648       for (i=0; i < pixel_count; ++i) {
   4649          int n = orig[i]*4;
   4650          p[0] = palette[n  ];
   4651          p[1] = palette[n+1];
   4652          p[2] = palette[n+2];
   4653          p += 3;
   4654       }
   4655    } else {
   4656       for (i=0; i < pixel_count; ++i) {
   4657          int n = orig[i]*4;
   4658          p[0] = palette[n  ];
   4659          p[1] = palette[n+1];
   4660          p[2] = palette[n+2];
   4661          p[3] = palette[n+3];
   4662          p += 4;
   4663       }
   4664    }
   4665    STBI_FREE(a->out);
   4666    a->out = temp_out;
   4667 
   4668    STBI_NOTUSED(len);
   4669 
   4670    return 1;
   4671 }
   4672 
   4673 static int stbi__unpremultiply_on_load = 0;
   4674 static int stbi__de_iphone_flag = 0;
   4675 
   4676 STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply)
   4677 {
   4678    stbi__unpremultiply_on_load = flag_true_if_should_unpremultiply;
   4679 }
   4680 
   4681 STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert)
   4682 {
   4683    stbi__de_iphone_flag = flag_true_if_should_convert;
   4684 }
   4685 
   4686 static void stbi__de_iphone(stbi__png *z)
   4687 {
   4688    stbi__context *s = z->s;
   4689    stbi__uint32 i, pixel_count = s->img_x * s->img_y;
   4690    stbi_uc *p = z->out;
   4691 
   4692    if (s->img_out_n == 3) {  // convert bgr to rgb
   4693       for (i=0; i < pixel_count; ++i) {
   4694          stbi_uc t = p[0];
   4695          p[0] = p[2];
   4696          p[2] = t;
   4697          p += 3;
   4698       }
   4699    } else {
   4700       STBI_ASSERT(s->img_out_n == 4);
   4701       if (stbi__unpremultiply_on_load) {
   4702          // convert bgr to rgb and unpremultiply
   4703          for (i=0; i < pixel_count; ++i) {
   4704             stbi_uc a = p[3];
   4705             stbi_uc t = p[0];
   4706             if (a) {
   4707                stbi_uc half = a / 2;
   4708                p[0] = (p[2] * 255 + half) / a;
   4709                p[1] = (p[1] * 255 + half) / a;
   4710                p[2] = ( t   * 255 + half) / a;
   4711             } else {
   4712                p[0] = p[2];
   4713                p[2] = t;
   4714             }
   4715             p += 4;
   4716          }
   4717       } else {
   4718          // convert bgr to rgb
   4719          for (i=0; i < pixel_count; ++i) {
   4720             stbi_uc t = p[0];
   4721             p[0] = p[2];
   4722             p[2] = t;
   4723             p += 4;
   4724          }
   4725       }
   4726    }
   4727 }
   4728 
   4729 #define STBI__PNG_TYPE(a,b,c,d)  (((unsigned) (a) << 24) + ((unsigned) (b) << 16) + ((unsigned) (c) << 8) + (unsigned) (d))
   4730 
   4731 static int stbi__parse_png_file(stbi__png *z, int scan, int req_comp)
   4732 {
   4733    stbi_uc palette[1024], pal_img_n=0;
   4734    stbi_uc has_trans=0, tc[3];
   4735    stbi__uint16 tc16[3];
   4736    stbi__uint32 ioff=0, idata_limit=0, i, pal_len=0;
   4737    int first=1,k,interlace=0, color=0, is_iphone=0;
   4738    stbi__context *s = z->s;
   4739 
   4740    z->expanded = NULL;
   4741    z->idata = NULL;
   4742    z->out = NULL;
   4743 
   4744    if (!stbi__check_png_header(s)) return 0;
   4745 
   4746    if (scan == STBI__SCAN_type) return 1;
   4747 
   4748    for (;;) {
   4749       stbi__pngchunk c = stbi__get_chunk_header(s);
   4750       switch (c.type) {
   4751          case STBI__PNG_TYPE('C','g','B','I'):
   4752             is_iphone = 1;
   4753             stbi__skip(s, c.length);
   4754             break;
   4755          case STBI__PNG_TYPE('I','H','D','R'): {
   4756             int comp,filter;
   4757             if (!first) return stbi__err("multiple IHDR","Corrupt PNG");
   4758             first = 0;
   4759             if (c.length != 13) return stbi__err("bad IHDR len","Corrupt PNG");
   4760             s->img_x = stbi__get32be(s); if (s->img_x > (1 << 24)) return stbi__err("too large","Very large image (corrupt?)");
   4761             s->img_y = stbi__get32be(s); if (s->img_y > (1 << 24)) return stbi__err("too large","Very large image (corrupt?)");
   4762             z->depth = stbi__get8(s);  if (z->depth != 1 && z->depth != 2 && z->depth != 4 && z->depth != 8 && z->depth != 16)  return stbi__err("1/2/4/8/16-bit only","PNG not supported: 1/2/4/8/16-bit only");
   4763             color = stbi__get8(s);  if (color > 6)         return stbi__err("bad ctype","Corrupt PNG");
   4764             if (color == 3 && z->depth == 16)                  return stbi__err("bad ctype","Corrupt PNG");
   4765             if (color == 3) pal_img_n = 3; else if (color & 1) return stbi__err("bad ctype","Corrupt PNG");
   4766             comp  = stbi__get8(s);  if (comp) return stbi__err("bad comp method","Corrupt PNG");
   4767             filter= stbi__get8(s);  if (filter) return stbi__err("bad filter method","Corrupt PNG");
   4768             interlace = stbi__get8(s); if (interlace>1) return stbi__err("bad interlace method","Corrupt PNG");
   4769             if (!s->img_x || !s->img_y) return stbi__err("0-pixel image","Corrupt PNG");
   4770             if (!pal_img_n) {
   4771                s->img_n = (color & 2 ? 3 : 1) + (color & 4 ? 1 : 0);
   4772                if ((1 << 30) / s->img_x / s->img_n < s->img_y) return stbi__err("too large", "Image too large to decode");
   4773                if (scan == STBI__SCAN_header) return 1;
   4774             } else {
   4775                // if paletted, then pal_n is our final components, and
   4776                // img_n is # components to decompress/filter.
   4777                s->img_n = 1;
   4778                if ((1 << 30) / s->img_x / 4 < s->img_y) return stbi__err("too large","Corrupt PNG");
   4779                // if SCAN_header, have to scan to see if we have a tRNS
   4780             }
   4781             break;
   4782          }
   4783 
   4784          case STBI__PNG_TYPE('P','L','T','E'):  {
   4785             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
   4786             if (c.length > 256*3) return stbi__err("invalid PLTE","Corrupt PNG");
   4787             pal_len = c.length / 3;
   4788             if (pal_len * 3 != c.length) return stbi__err("invalid PLTE","Corrupt PNG");
   4789             for (i=0; i < pal_len; ++i) {
   4790                palette[i*4+0] = stbi__get8(s);
   4791                palette[i*4+1] = stbi__get8(s);
   4792                palette[i*4+2] = stbi__get8(s);
   4793                palette[i*4+3] = 255;
   4794             }
   4795             break;
   4796          }
   4797 
   4798          case STBI__PNG_TYPE('t','R','N','S'): {
   4799             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
   4800             if (z->idata) return stbi__err("tRNS after IDAT","Corrupt PNG");
   4801             if (pal_img_n) {
   4802                if (scan == STBI__SCAN_header) { s->img_n = 4; return 1; }
   4803                if (pal_len == 0) return stbi__err("tRNS before PLTE","Corrupt PNG");
   4804                if (c.length > pal_len) return stbi__err("bad tRNS len","Corrupt PNG");
   4805                pal_img_n = 4;
   4806                for (i=0; i < c.length; ++i)
   4807                   palette[i*4+3] = stbi__get8(s);
   4808             } else {
   4809                if (!(s->img_n & 1)) return stbi__err("tRNS with alpha","Corrupt PNG");
   4810                if (c.length != (stbi__uint32) s->img_n*2) return stbi__err("bad tRNS len","Corrupt PNG");
   4811                has_trans = 1;
   4812                if (z->depth == 16) {
   4813                   for (k = 0; k < s->img_n; ++k) tc16[k] = (stbi__uint16)stbi__get16be(s); // copy the values as-is
   4814                } else {
   4815                   for (k = 0; k < s->img_n; ++k) tc[k] = (stbi_uc)(stbi__get16be(s) & 255) * stbi__depth_scale_table[z->depth]; // non 8-bit images will be larger
   4816                }
   4817             }
   4818             break;
   4819          }
   4820 
   4821          case STBI__PNG_TYPE('I','D','A','T'): {
   4822             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
   4823             if (pal_img_n && !pal_len) return stbi__err("no PLTE","Corrupt PNG");
   4824             if (scan == STBI__SCAN_header) { s->img_n = pal_img_n; return 1; }
   4825             if ((int)(ioff + c.length) < (int)ioff) return 0;
   4826             if (ioff + c.length > idata_limit) {
   4827                stbi__uint32 idata_limit_old = idata_limit;
   4828                stbi_uc *p;
   4829                if (idata_limit == 0) idata_limit = c.length > 4096 ? c.length : 4096;
   4830                while (ioff + c.length > idata_limit)
   4831                   idata_limit *= 2;
   4832                STBI_NOTUSED(idata_limit_old);
   4833                p = (stbi_uc *) STBI_REALLOC_SIZED(z->idata, idata_limit_old, idata_limit); if (p == NULL) return stbi__err("outofmem", "Out of memory");
   4834                z->idata = p;
   4835             }
   4836             if (!stbi__getn(s, z->idata+ioff,c.length)) return stbi__err("outofdata","Corrupt PNG");
   4837             ioff += c.length;
   4838             break;
   4839          }
   4840 
   4841          case STBI__PNG_TYPE('I','E','N','D'): {
   4842             stbi__uint32 raw_len, bpl;
   4843             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
   4844             if (scan != STBI__SCAN_load) return 1;
   4845             if (z->idata == NULL) return stbi__err("no IDAT","Corrupt PNG");
   4846             // initial guess for decoded data size to avoid unnecessary reallocs
   4847             bpl = (s->img_x * z->depth + 7) / 8; // bytes per line, per component
   4848             raw_len = bpl * s->img_y * s->img_n /* pixels */ + s->img_y /* filter mode per row */;
   4849             z->expanded = (stbi_uc *) stbi_zlib_decode_malloc_guesssize_headerflag((char *) z->idata, ioff, raw_len, (int *) &raw_len, !is_iphone);
   4850             if (z->expanded == NULL) return 0; // zlib should set error
   4851             STBI_FREE(z->idata); z->idata = NULL;
   4852             if ((req_comp == s->img_n+1 && req_comp != 3 && !pal_img_n) || has_trans)
   4853                s->img_out_n = s->img_n+1;
   4854             else
   4855                s->img_out_n = s->img_n;
   4856             if (!stbi__create_png_image(z, z->expanded, raw_len, s->img_out_n, z->depth, color, interlace)) return 0;
   4857             if (has_trans) {
   4858                if (z->depth == 16) {
   4859                   if (!stbi__compute_transparency16(z, tc16, s->img_out_n)) return 0;
   4860                } else {
   4861                   if (!stbi__compute_transparency(z, tc, s->img_out_n)) return 0;
   4862                }
   4863             }
   4864             if (is_iphone && stbi__de_iphone_flag && s->img_out_n > 2)
   4865                stbi__de_iphone(z);
   4866             if (pal_img_n) {
   4867                // pal_img_n == 3 or 4
   4868                s->img_n = pal_img_n; // record the actual colors we had
   4869                s->img_out_n = pal_img_n;
   4870                if (req_comp >= 3) s->img_out_n = req_comp;
   4871                if (!stbi__expand_png_palette(z, palette, pal_len, s->img_out_n))
   4872                   return 0;
   4873             } else if (has_trans) {
   4874                // non-paletted image with tRNS -> source image has (constant) alpha
   4875                ++s->img_n;
   4876             }
   4877             STBI_FREE(z->expanded); z->expanded = NULL;
   4878             return 1;
   4879          }
   4880 
   4881          default:
   4882             // if critical, fail
   4883             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
   4884             if ((c.type & (1 << 29)) == 0) {
   4885                #ifndef STBI_NO_FAILURE_STRINGS
   4886                // not threadsafe
   4887                static char invalid_chunk[] = "XXXX PNG chunk not known";
   4888                invalid_chunk[0] = STBI__BYTECAST(c.type >> 24);
   4889                invalid_chunk[1] = STBI__BYTECAST(c.type >> 16);
   4890                invalid_chunk[2] = STBI__BYTECAST(c.type >>  8);
   4891                invalid_chunk[3] = STBI__BYTECAST(c.type >>  0);
   4892                #endif
   4893                return stbi__err(invalid_chunk, "PNG not supported: unknown PNG chunk type");
   4894             }
   4895             stbi__skip(s, c.length);
   4896             break;
   4897       }
   4898       // end of PNG chunk, read and skip CRC
   4899       stbi__get32be(s);
   4900    }
   4901 }
   4902 
   4903 static void *stbi__do_png(stbi__png *p, int *x, int *y, int *n, int req_comp, stbi__result_info *ri)
   4904 {
   4905    void *result=NULL;
   4906    if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
   4907    if (stbi__parse_png_file(p, STBI__SCAN_load, req_comp)) {
   4908       if (p->depth < 8)
   4909          ri->bits_per_channel = 8;
   4910       else
   4911          ri->bits_per_channel = p->depth;
   4912       result = p->out;
   4913       p->out = NULL;
   4914       if (req_comp && req_comp != p->s->img_out_n) {
   4915          if (ri->bits_per_channel == 8)
   4916             result = stbi__convert_format((unsigned char *) result, p->s->img_out_n, req_comp, p->s->img_x, p->s->img_y);
   4917          else
   4918             result = stbi__convert_format16((stbi__uint16 *) result, p->s->img_out_n, req_comp, p->s->img_x, p->s->img_y);
   4919          p->s->img_out_n = req_comp;
   4920          if (result == NULL) return result;
   4921       }
   4922       *x = p->s->img_x;
   4923       *y = p->s->img_y;
   4924       if (n) *n = p->s->img_n;
   4925    }
   4926    STBI_FREE(p->out);      p->out      = NULL;
   4927    STBI_FREE(p->expanded); p->expanded = NULL;
   4928    STBI_FREE(p->idata);    p->idata    = NULL;
   4929 
   4930    return result;
   4931 }
   4932 
   4933 static void *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
   4934 {
   4935    stbi__png p;
   4936    p.s = s;
   4937    return stbi__do_png(&p, x,y,comp,req_comp, ri);
   4938 }
   4939 
   4940 static int stbi__png_test(stbi__context *s)
   4941 {
   4942    int r;
   4943    r = stbi__check_png_header(s);
   4944    stbi__rewind(s);
   4945    return r;
   4946 }
   4947 
   4948 static int stbi__png_info_raw(stbi__png *p, int *x, int *y, int *comp)
   4949 {
   4950    if (!stbi__parse_png_file(p, STBI__SCAN_header, 0)) {
   4951       stbi__rewind( p->s );
   4952       return 0;
   4953    }
   4954    if (x) *x = p->s->img_x;
   4955    if (y) *y = p->s->img_y;
   4956    if (comp) *comp = p->s->img_n;
   4957    return 1;
   4958 }
   4959 
   4960 static int stbi__png_info(stbi__context *s, int *x, int *y, int *comp)
   4961 {
   4962    stbi__png p;
   4963    p.s = s;
   4964    return stbi__png_info_raw(&p, x, y, comp);
   4965 }
   4966 
   4967 static int stbi__png_is16(stbi__context *s)
   4968 {
   4969    stbi__png p;
   4970    p.s = s;
   4971    if (!stbi__png_info_raw(&p, NULL, NULL, NULL))
   4972 	   return 0;
   4973    if (p.depth != 16) {
   4974       stbi__rewind(p.s);
   4975       return 0;
   4976    }
   4977    return 1;
   4978 }
   4979 #endif
   4980 
   4981 // Microsoft/Windows BMP image
   4982 
   4983 #ifndef STBI_NO_BMP
   4984 static int stbi__bmp_test_raw(stbi__context *s)
   4985 {
   4986    int r;
   4987    int sz;
   4988    if (stbi__get8(s) != 'B') return 0;
   4989    if (stbi__get8(s) != 'M') return 0;
   4990    stbi__get32le(s); // discard filesize
   4991    stbi__get16le(s); // discard reserved
   4992    stbi__get16le(s); // discard reserved
   4993    stbi__get32le(s); // discard data offset
   4994    sz = stbi__get32le(s);
   4995    r = (sz == 12 || sz == 40 || sz == 56 || sz == 108 || sz == 124);
   4996    return r;
   4997 }
   4998 
   4999 static int stbi__bmp_test(stbi__context *s)
   5000 {
   5001    int r = stbi__bmp_test_raw(s);
   5002    stbi__rewind(s);
   5003    return r;
   5004 }
   5005 
   5006 
   5007 // returns 0..31 for the highest set bit
   5008 static int stbi__high_bit(unsigned int z)
   5009 {
   5010    int n=0;
   5011    if (z == 0) return -1;
   5012    if (z >= 0x10000) n += 16, z >>= 16;
   5013    if (z >= 0x00100) n +=  8, z >>=  8;
   5014    if (z >= 0x00010) n +=  4, z >>=  4;
   5015    if (z >= 0x00004) n +=  2, z >>=  2;
   5016    if (z >= 0x00002) n +=  1, z >>=  1;
   5017    return n;
   5018 }
   5019 
   5020 static int stbi__bitcount(unsigned int a)
   5021 {
   5022    a = (a & 0x55555555) + ((a >>  1) & 0x55555555); // max 2
   5023    a = (a & 0x33333333) + ((a >>  2) & 0x33333333); // max 4
   5024    a = (a + (a >> 4)) & 0x0f0f0f0f; // max 8 per 4, now 8 bits
   5025    a = (a + (a >> 8)); // max 16 per 8 bits
   5026    a = (a + (a >> 16)); // max 32 per 8 bits
   5027    return a & 0xff;
   5028 }
   5029 
   5030 // extract an arbitrarily-aligned N-bit value (N=bits)
   5031 // from v, and then make it 8-bits long and fractionally
   5032 // extend it to full full range.
   5033 static int stbi__shiftsigned(int v, int shift, int bits)
   5034 {
   5035    static unsigned int mul_table[9] = {
   5036       0,
   5037       0xff/*0b11111111*/, 0x55/*0b01010101*/, 0x49/*0b01001001*/, 0x11/*0b00010001*/,
   5038       0x21/*0b00100001*/, 0x41/*0b01000001*/, 0x81/*0b10000001*/, 0x01/*0b00000001*/,
   5039    };
   5040    static unsigned int shift_table[9] = {
   5041       0, 0,0,1,0,2,4,6,0,
   5042    };
   5043    if (shift < 0)
   5044       v <<= -shift;
   5045    else
   5046       v >>= shift;
   5047    STBI_ASSERT(v >= 0 && v < 256);
   5048    v >>= (8-bits);
   5049    STBI_ASSERT(bits >= 0 && bits <= 8);
   5050    return (int) ((unsigned) v * mul_table[bits]) >> shift_table[bits];
   5051 }
   5052 
   5053 typedef struct
   5054 {
   5055    int bpp, offset, hsz;
   5056    unsigned int mr,mg,mb,ma, all_a;
   5057 } stbi__bmp_data;
   5058 
   5059 static void *stbi__bmp_parse_header(stbi__context *s, stbi__bmp_data *info)
   5060 {
   5061    int hsz;
   5062    if (stbi__get8(s) != 'B' || stbi__get8(s) != 'M') return stbi__errpuc("not BMP", "Corrupt BMP");
   5063    stbi__get32le(s); // discard filesize
   5064    stbi__get16le(s); // discard reserved
   5065    stbi__get16le(s); // discard reserved
   5066    info->offset = stbi__get32le(s);
   5067    info->hsz = hsz = stbi__get32le(s);
   5068    info->mr = info->mg = info->mb = info->ma = 0;
   5069 
   5070    if (hsz != 12 && hsz != 40 && hsz != 56 && hsz != 108 && hsz != 124) return stbi__errpuc("unknown BMP", "BMP type not supported: unknown");
   5071    if (hsz == 12) {
   5072       s->img_x = stbi__get16le(s);
   5073       s->img_y = stbi__get16le(s);
   5074    } else {
   5075       s->img_x = stbi__get32le(s);
   5076       s->img_y = stbi__get32le(s);
   5077    }
   5078    if (stbi__get16le(s) != 1) return stbi__errpuc("bad BMP", "bad BMP");
   5079    info->bpp = stbi__get16le(s);
   5080    if (hsz != 12) {
   5081       int compress = stbi__get32le(s);
   5082       if (compress == 1 || compress == 2) return stbi__errpuc("BMP RLE", "BMP type not supported: RLE");
   5083       stbi__get32le(s); // discard sizeof
   5084       stbi__get32le(s); // discard hres
   5085       stbi__get32le(s); // discard vres
   5086       stbi__get32le(s); // discard colorsused
   5087       stbi__get32le(s); // discard max important
   5088       if (hsz == 40 || hsz == 56) {
   5089          if (hsz == 56) {
   5090             stbi__get32le(s);
   5091             stbi__get32le(s);
   5092             stbi__get32le(s);
   5093             stbi__get32le(s);
   5094          }
   5095          if (info->bpp == 16 || info->bpp == 32) {
   5096             if (compress == 0) {
   5097                if (info->bpp == 32) {
   5098                   info->mr = 0xffu << 16;
   5099                   info->mg = 0xffu <<  8;
   5100                   info->mb = 0xffu <<  0;
   5101                   info->ma = 0xffu << 24;
   5102                   info->all_a = 0; // if all_a is 0 at end, then we loaded alpha channel but it was all 0
   5103                } else {
   5104                   info->mr = 31u << 10;
   5105                   info->mg = 31u <<  5;
   5106                   info->mb = 31u <<  0;
   5107                }
   5108             } else if (compress == 3) {
   5109                info->mr = stbi__get32le(s);
   5110                info->mg = stbi__get32le(s);
   5111                info->mb = stbi__get32le(s);
   5112                // not documented, but generated by photoshop and handled by mspaint
   5113                if (info->mr == info->mg && info->mg == info->mb) {
   5114                   // ?!?!?
   5115                   return stbi__errpuc("bad BMP", "bad BMP");
   5116                }
   5117             } else
   5118                return stbi__errpuc("bad BMP", "bad BMP");
   5119          }
   5120       } else {
   5121          int i;
   5122          if (hsz != 108 && hsz != 124)
   5123             return stbi__errpuc("bad BMP", "bad BMP");
   5124          info->mr = stbi__get32le(s);
   5125          info->mg = stbi__get32le(s);
   5126          info->mb = stbi__get32le(s);
   5127          info->ma = stbi__get32le(s);
   5128          stbi__get32le(s); // discard color space
   5129          for (i=0; i < 12; ++i)
   5130             stbi__get32le(s); // discard color space parameters
   5131          if (hsz == 124) {
   5132             stbi__get32le(s); // discard rendering intent
   5133             stbi__get32le(s); // discard offset of profile data
   5134             stbi__get32le(s); // discard size of profile data
   5135             stbi__get32le(s); // discard reserved
   5136          }
   5137       }
   5138    }
   5139    return (void *) 1;
   5140 }
   5141 
   5142 
   5143 static void *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
   5144 {
   5145    stbi_uc *out;
   5146    unsigned int mr=0,mg=0,mb=0,ma=0, all_a;
   5147    stbi_uc pal[256][4];
   5148    int psize=0,i,j,width;
   5149    int flip_vertically, pad, target;
   5150    stbi__bmp_data info;
   5151    STBI_NOTUSED(ri);
   5152 
   5153    info.all_a = 255;
   5154    if (stbi__bmp_parse_header(s, &info) == NULL)
   5155       return NULL; // error code already set
   5156 
   5157    flip_vertically = ((int) s->img_y) > 0;
   5158    s->img_y = abs((int) s->img_y);
   5159 
   5160    mr = info.mr;
   5161    mg = info.mg;
   5162    mb = info.mb;
   5163    ma = info.ma;
   5164    all_a = info.all_a;
   5165 
   5166    if (info.hsz == 12) {
   5167       if (info.bpp < 24)
   5168          psize = (info.offset - 14 - 24) / 3;
   5169    } else {
   5170       if (info.bpp < 16)
   5171          psize = (info.offset - 14 - info.hsz) >> 2;
   5172    }
   5173 
   5174    s->img_n = ma ? 4 : 3;
   5175    if (req_comp && req_comp >= 3) // we can directly decode 3 or 4
   5176       target = req_comp;
   5177    else
   5178       target = s->img_n; // if they want monochrome, we'll post-convert
   5179 
   5180    // sanity-check size
   5181    if (!stbi__mad3sizes_valid(target, s->img_x, s->img_y, 0))
   5182       return stbi__errpuc("too large", "Corrupt BMP");
   5183 
   5184    out = (stbi_uc *) stbi__malloc_mad3(target, s->img_x, s->img_y, 0);
   5185    if (!out) return stbi__errpuc("outofmem", "Out of memory");
   5186    if (info.bpp < 16) {
   5187       int z=0;
   5188       if (psize == 0 || psize > 256) { STBI_FREE(out); return stbi__errpuc("invalid", "Corrupt BMP"); }
   5189       for (i=0; i < psize; ++i) {
   5190          pal[i][2] = stbi__get8(s);
   5191          pal[i][1] = stbi__get8(s);
   5192          pal[i][0] = stbi__get8(s);
   5193          if (info.hsz != 12) stbi__get8(s);
   5194          pal[i][3] = 255;
   5195       }
   5196       stbi__skip(s, info.offset - 14 - info.hsz - psize * (info.hsz == 12 ? 3 : 4));
   5197       if (info.bpp == 1) width = (s->img_x + 7) >> 3;
   5198       else if (info.bpp == 4) width = (s->img_x + 1) >> 1;
   5199       else if (info.bpp == 8) width = s->img_x;
   5200       else { STBI_FREE(out); return stbi__errpuc("bad bpp", "Corrupt BMP"); }
   5201       pad = (-width)&3;
   5202       if (info.bpp == 1) {
   5203          for (j=0; j < (int) s->img_y; ++j) {
   5204             int bit_offset = 7, v = stbi__get8(s);
   5205             for (i=0; i < (int) s->img_x; ++i) {
   5206                int color = (v>>bit_offset)&0x1;
   5207                out[z++] = pal[color][0];
   5208                out[z++] = pal[color][1];
   5209                out[z++] = pal[color][2];
   5210                if((--bit_offset) < 0) {
   5211                   bit_offset = 7;
   5212                   v = stbi__get8(s);
   5213                }
   5214             }
   5215             stbi__skip(s, pad);
   5216          }
   5217       } else {
   5218          for (j=0; j < (int) s->img_y; ++j) {
   5219             for (i=0; i < (int) s->img_x; i += 2) {
   5220                int v=stbi__get8(s),v2=0;
   5221                if (info.bpp == 4) {
   5222                   v2 = v & 15;
   5223                   v >>= 4;
   5224                }
   5225                out[z++] = pal[v][0];
   5226                out[z++] = pal[v][1];
   5227                out[z++] = pal[v][2];
   5228                if (target == 4) out[z++] = 255;
   5229                if (i+1 == (int) s->img_x) break;
   5230                v = (info.bpp == 8) ? stbi__get8(s) : v2;
   5231                out[z++] = pal[v][0];
   5232                out[z++] = pal[v][1];
   5233                out[z++] = pal[v][2];
   5234                if (target == 4) out[z++] = 255;
   5235             }
   5236             stbi__skip(s, pad);
   5237          }
   5238       }
   5239    } else {
   5240       int rshift=0,gshift=0,bshift=0,ashift=0,rcount=0,gcount=0,bcount=0,acount=0;
   5241       int z = 0;
   5242       int easy=0;
   5243       stbi__skip(s, info.offset - 14 - info.hsz);
   5244       if (info.bpp == 24) width = 3 * s->img_x;
   5245       else if (info.bpp == 16) width = 2*s->img_x;
   5246       else /* bpp = 32 and pad = 0 */ width=0;
   5247       pad = (-width) & 3;
   5248       if (info.bpp == 24) {
   5249          easy = 1;
   5250       } else if (info.bpp == 32) {
   5251          if (mb == 0xff && mg == 0xff00 && mr == 0x00ff0000 && ma == 0xff000000)
   5252             easy = 2;
   5253       }
   5254       if (!easy) {
   5255          if (!mr || !mg || !mb) { STBI_FREE(out); return stbi__errpuc("bad masks", "Corrupt BMP"); }
   5256          // right shift amt to put high bit in position #7
   5257          rshift = stbi__high_bit(mr)-7; rcount = stbi__bitcount(mr);
   5258          gshift = stbi__high_bit(mg)-7; gcount = stbi__bitcount(mg);
   5259          bshift = stbi__high_bit(mb)-7; bcount = stbi__bitcount(mb);
   5260          ashift = stbi__high_bit(ma)-7; acount = stbi__bitcount(ma);
   5261       }
   5262       for (j=0; j < (int) s->img_y; ++j) {
   5263          if (easy) {
   5264             for (i=0; i < (int) s->img_x; ++i) {
   5265                unsigned char a;
   5266                out[z+2] = stbi__get8(s);
   5267                out[z+1] = stbi__get8(s);
   5268                out[z+0] = stbi__get8(s);
   5269                z += 3;
   5270                a = (easy == 2 ? stbi__get8(s) : 255);
   5271                all_a |= a;
   5272                if (target == 4) out[z++] = a;
   5273             }
   5274          } else {
   5275             int bpp = info.bpp;
   5276             for (i=0; i < (int) s->img_x; ++i) {
   5277                stbi__uint32 v = (bpp == 16 ? (stbi__uint32) stbi__get16le(s) : stbi__get32le(s));
   5278                unsigned int a;
   5279                out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mr, rshift, rcount));
   5280                out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mg, gshift, gcount));
   5281                out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mb, bshift, bcount));
   5282                a = (ma ? stbi__shiftsigned(v & ma, ashift, acount) : 255);
   5283                all_a |= a;
   5284                if (target == 4) out[z++] = STBI__BYTECAST(a);
   5285             }
   5286          }
   5287          stbi__skip(s, pad);
   5288       }
   5289    }
   5290 
   5291    // if alpha channel is all 0s, replace with all 255s
   5292    if (target == 4 && all_a == 0)
   5293       for (i=4*s->img_x*s->img_y-1; i >= 0; i -= 4)
   5294          out[i] = 255;
   5295 
   5296    if (flip_vertically) {
   5297       stbi_uc t;
   5298       for (j=0; j < (int) s->img_y>>1; ++j) {
   5299          stbi_uc *p1 = out +      j     *s->img_x*target;
   5300          stbi_uc *p2 = out + (s->img_y-1-j)*s->img_x*target;
   5301          for (i=0; i < (int) s->img_x*target; ++i) {
   5302             t = p1[i], p1[i] = p2[i], p2[i] = t;
   5303          }
   5304       }
   5305    }
   5306 
   5307    if (req_comp && req_comp != target) {
   5308       out = stbi__convert_format(out, target, req_comp, s->img_x, s->img_y);
   5309       if (out == NULL) return out; // stbi__convert_format frees input on failure
   5310    }
   5311 
   5312    *x = s->img_x;
   5313    *y = s->img_y;
   5314    if (comp) *comp = s->img_n;
   5315    return out;
   5316 }
   5317 #endif
   5318 
   5319 // Targa Truevision - TGA
   5320 // by Jonathan Dummer
   5321 #ifndef STBI_NO_TGA
   5322 // returns STBI_rgb or whatever, 0 on error
   5323 static int stbi__tga_get_comp(int bits_per_pixel, int is_grey, int* is_rgb16)
   5324 {
   5325    // only RGB or RGBA (incl. 16bit) or grey allowed
   5326    if (is_rgb16) *is_rgb16 = 0;
   5327    switch(bits_per_pixel) {
   5328       case 8:  return STBI_grey;
   5329       case 16: if(is_grey) return STBI_grey_alpha;
   5330                // fallthrough
   5331       case 15: if(is_rgb16) *is_rgb16 = 1;
   5332                return STBI_rgb;
   5333       case 24: // fallthrough
   5334       case 32: return bits_per_pixel/8;
   5335       default: return 0;
   5336    }
   5337 }
   5338 
   5339 static int stbi__tga_info(stbi__context *s, int *x, int *y, int *comp)
   5340 {
   5341     int tga_w, tga_h, tga_comp, tga_image_type, tga_bits_per_pixel, tga_colormap_bpp;
   5342     int sz, tga_colormap_type;
   5343     stbi__get8(s);                   // discard Offset
   5344     tga_colormap_type = stbi__get8(s); // colormap type
   5345     if( tga_colormap_type > 1 ) {
   5346         stbi__rewind(s);
   5347         return 0;      // only RGB or indexed allowed
   5348     }
   5349     tga_image_type = stbi__get8(s); // image type
   5350     if ( tga_colormap_type == 1 ) { // colormapped (paletted) image
   5351         if (tga_image_type != 1 && tga_image_type != 9) {
   5352             stbi__rewind(s);
   5353             return 0;
   5354         }
   5355         stbi__skip(s,4);       // skip index of first colormap entry and number of entries
   5356         sz = stbi__get8(s);    //   check bits per palette color entry
   5357         if ( (sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32) ) {
   5358             stbi__rewind(s);
   5359             return 0;
   5360         }
   5361         stbi__skip(s,4);       // skip image x and y origin
   5362         tga_colormap_bpp = sz;
   5363     } else { // "normal" image w/o colormap - only RGB or grey allowed, +/- RLE
   5364         if ( (tga_image_type != 2) && (tga_image_type != 3) && (tga_image_type != 10) && (tga_image_type != 11) ) {
   5365             stbi__rewind(s);
   5366             return 0; // only RGB or grey allowed, +/- RLE
   5367         }
   5368         stbi__skip(s,9); // skip colormap specification and image x/y origin
   5369         tga_colormap_bpp = 0;
   5370     }
   5371     tga_w = stbi__get16le(s);
   5372     if( tga_w < 1 ) {
   5373         stbi__rewind(s);
   5374         return 0;   // test width
   5375     }
   5376     tga_h = stbi__get16le(s);
   5377     if( tga_h < 1 ) {
   5378         stbi__rewind(s);
   5379         return 0;   // test height
   5380     }
   5381     tga_bits_per_pixel = stbi__get8(s); // bits per pixel
   5382     stbi__get8(s); // ignore alpha bits
   5383     if (tga_colormap_bpp != 0) {
   5384         if((tga_bits_per_pixel != 8) && (tga_bits_per_pixel != 16)) {
   5385             // when using a colormap, tga_bits_per_pixel is the size of the indexes
   5386             // I don't think anything but 8 or 16bit indexes makes sense
   5387             stbi__rewind(s);
   5388             return 0;
   5389         }
   5390         tga_comp = stbi__tga_get_comp(tga_colormap_bpp, 0, NULL);
   5391     } else {
   5392         tga_comp = stbi__tga_get_comp(tga_bits_per_pixel, (tga_image_type == 3) || (tga_image_type == 11), NULL);
   5393     }
   5394     if(!tga_comp) {
   5395       stbi__rewind(s);
   5396       return 0;
   5397     }
   5398     if (x) *x = tga_w;
   5399     if (y) *y = tga_h;
   5400     if (comp) *comp = tga_comp;
   5401     return 1;                   // seems to have passed everything
   5402 }
   5403 
   5404 static int stbi__tga_test(stbi__context *s)
   5405 {
   5406    int res = 0;
   5407    int sz, tga_color_type;
   5408    stbi__get8(s);      //   discard Offset
   5409    tga_color_type = stbi__get8(s);   //   color type
   5410    if ( tga_color_type > 1 ) goto errorEnd;   //   only RGB or indexed allowed
   5411    sz = stbi__get8(s);   //   image type
   5412    if ( tga_color_type == 1 ) { // colormapped (paletted) image
   5413       if (sz != 1 && sz != 9) goto errorEnd; // colortype 1 demands image type 1 or 9
   5414       stbi__skip(s,4);       // skip index of first colormap entry and number of entries
   5415       sz = stbi__get8(s);    //   check bits per palette color entry
   5416       if ( (sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32) ) goto errorEnd;
   5417       stbi__skip(s,4);       // skip image x and y origin
   5418    } else { // "normal" image w/o colormap
   5419       if ( (sz != 2) && (sz != 3) && (sz != 10) && (sz != 11) ) goto errorEnd; // only RGB or grey allowed, +/- RLE
   5420       stbi__skip(s,9); // skip colormap specification and image x/y origin
   5421    }
   5422    if ( stbi__get16le(s) < 1 ) goto errorEnd;      //   test width
   5423    if ( stbi__get16le(s) < 1 ) goto errorEnd;      //   test height
   5424    sz = stbi__get8(s);   //   bits per pixel
   5425    if ( (tga_color_type == 1) && (sz != 8) && (sz != 16) ) goto errorEnd; // for colormapped images, bpp is size of an index
   5426    if ( (sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32) ) goto errorEnd;
   5427 
   5428    res = 1; // if we got this far, everything's good and we can return 1 instead of 0
   5429 
   5430 errorEnd:
   5431    stbi__rewind(s);
   5432    return res;
   5433 }
   5434 
   5435 // read 16bit value and convert to 24bit RGB
   5436 static void stbi__tga_read_rgb16(stbi__context *s, stbi_uc* out)
   5437 {
   5438    stbi__uint16 px = (stbi__uint16)stbi__get16le(s);
   5439    stbi__uint16 fiveBitMask = 31;
   5440    // we have 3 channels with 5bits each
   5441    int r = (px >> 10) & fiveBitMask;
   5442    int g = (px >> 5) & fiveBitMask;
   5443    int b = px & fiveBitMask;
   5444    // Note that this saves the data in RGB(A) order, so it doesn't need to be swapped later
   5445    out[0] = (stbi_uc)((r * 255)/31);
   5446    out[1] = (stbi_uc)((g * 255)/31);
   5447    out[2] = (stbi_uc)((b * 255)/31);
   5448 
   5449    // some people claim that the most significant bit might be used for alpha
   5450    // (possibly if an alpha-bit is set in the "image descriptor byte")
   5451    // but that only made 16bit test images completely translucent..
   5452    // so let's treat all 15 and 16bit TGAs as RGB with no alpha.
   5453 }
   5454 
   5455 static void *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
   5456 {
   5457    //   read in the TGA header stuff
   5458    int tga_offset = stbi__get8(s);
   5459    int tga_indexed = stbi__get8(s);
   5460    int tga_image_type = stbi__get8(s);
   5461    int tga_is_RLE = 0;
   5462    int tga_palette_start = stbi__get16le(s);
   5463    int tga_palette_len = stbi__get16le(s);
   5464    int tga_palette_bits = stbi__get8(s);
   5465    int tga_x_origin = stbi__get16le(s);
   5466    int tga_y_origin = stbi__get16le(s);
   5467    int tga_width = stbi__get16le(s);
   5468    int tga_height = stbi__get16le(s);
   5469    int tga_bits_per_pixel = stbi__get8(s);
   5470    int tga_comp, tga_rgb16=0;
   5471    int tga_inverted = stbi__get8(s);
   5472    // int tga_alpha_bits = tga_inverted & 15; // the 4 lowest bits - unused (useless?)
   5473    //   image data
   5474    unsigned char *tga_data;
   5475    unsigned char *tga_palette = NULL;
   5476    int i, j;
   5477    unsigned char raw_data[4] = {0};
   5478    int RLE_count = 0;
   5479    int RLE_repeating = 0;
   5480    int read_next_pixel = 1;
   5481    STBI_NOTUSED(ri);
   5482 
   5483    //   do a tiny bit of precessing
   5484    if ( tga_image_type >= 8 )
   5485    {
   5486       tga_image_type -= 8;
   5487       tga_is_RLE = 1;
   5488    }
   5489    tga_inverted = 1 - ((tga_inverted >> 5) & 1);
   5490 
   5491    //   If I'm paletted, then I'll use the number of bits from the palette
   5492    if ( tga_indexed ) tga_comp = stbi__tga_get_comp(tga_palette_bits, 0, &tga_rgb16);
   5493    else tga_comp = stbi__tga_get_comp(tga_bits_per_pixel, (tga_image_type == 3), &tga_rgb16);
   5494 
   5495    if(!tga_comp) // shouldn't really happen, stbi__tga_test() should have ensured basic consistency
   5496       return stbi__errpuc("bad format", "Can't find out TGA pixelformat");
   5497 
   5498    //   tga info
   5499    *x = tga_width;
   5500    *y = tga_height;
   5501    if (comp) *comp = tga_comp;
   5502 
   5503    if (!stbi__mad3sizes_valid(tga_width, tga_height, tga_comp, 0))
   5504       return stbi__errpuc("too large", "Corrupt TGA");
   5505 
   5506    tga_data = (unsigned char*)stbi__malloc_mad3(tga_width, tga_height, tga_comp, 0);
   5507    if (!tga_data) return stbi__errpuc("outofmem", "Out of memory");
   5508 
   5509    // skip to the data's starting position (offset usually = 0)
   5510    stbi__skip(s, tga_offset );
   5511 
   5512    if ( !tga_indexed && !tga_is_RLE && !tga_rgb16 ) {
   5513       for (i=0; i < tga_height; ++i) {
   5514          int row = tga_inverted ? tga_height -i - 1 : i;
   5515          stbi_uc *tga_row = tga_data + row*tga_width*tga_comp;
   5516          stbi__getn(s, tga_row, tga_width * tga_comp);
   5517       }
   5518    } else  {
   5519       //   do I need to load a palette?
   5520       if ( tga_indexed)
   5521       {
   5522          //   any data to skip? (offset usually = 0)
   5523          stbi__skip(s, tga_palette_start );
   5524          //   load the palette
   5525          tga_palette = (unsigned char*)stbi__malloc_mad2(tga_palette_len, tga_comp, 0);
   5526          if (!tga_palette) {
   5527             STBI_FREE(tga_data);
   5528             return stbi__errpuc("outofmem", "Out of memory");
   5529          }
   5530          if (tga_rgb16) {
   5531             stbi_uc *pal_entry = tga_palette;
   5532             STBI_ASSERT(tga_comp == STBI_rgb);
   5533             for (i=0; i < tga_palette_len; ++i) {
   5534                stbi__tga_read_rgb16(s, pal_entry);
   5535                pal_entry += tga_comp;
   5536             }
   5537          } else if (!stbi__getn(s, tga_palette, tga_palette_len * tga_comp)) {
   5538                STBI_FREE(tga_data);
   5539                STBI_FREE(tga_palette);
   5540                return stbi__errpuc("bad palette", "Corrupt TGA");
   5541          }
   5542       }
   5543       //   load the data
   5544       for (i=0; i < tga_width * tga_height; ++i)
   5545       {
   5546          //   if I'm in RLE mode, do I need to get a RLE stbi__pngchunk?
   5547          if ( tga_is_RLE )
   5548          {
   5549             if ( RLE_count == 0 )
   5550             {
   5551                //   yep, get the next byte as a RLE command
   5552                int RLE_cmd = stbi__get8(s);
   5553                RLE_count = 1 + (RLE_cmd & 127);
   5554                RLE_repeating = RLE_cmd >> 7;
   5555                read_next_pixel = 1;
   5556             } else if ( !RLE_repeating )
   5557             {
   5558                read_next_pixel = 1;
   5559             }
   5560          } else
   5561          {
   5562             read_next_pixel = 1;
   5563          }
   5564          //   OK, if I need to read a pixel, do it now
   5565          if ( read_next_pixel )
   5566          {
   5567             //   load however much data we did have
   5568             if ( tga_indexed )
   5569             {
   5570                // read in index, then perform the lookup
   5571                int pal_idx = (tga_bits_per_pixel == 8) ? stbi__get8(s) : stbi__get16le(s);
   5572                if ( pal_idx >= tga_palette_len ) {
   5573                   // invalid index
   5574                   pal_idx = 0;
   5575                }
   5576                pal_idx *= tga_comp;
   5577                for (j = 0; j < tga_comp; ++j) {
   5578                   raw_data[j] = tga_palette[pal_idx+j];
   5579                }
   5580             } else if(tga_rgb16) {
   5581                STBI_ASSERT(tga_comp == STBI_rgb);
   5582                stbi__tga_read_rgb16(s, raw_data);
   5583             } else {
   5584                //   read in the data raw
   5585                for (j = 0; j < tga_comp; ++j) {
   5586                   raw_data[j] = stbi__get8(s);
   5587                }
   5588             }
   5589             //   clear the reading flag for the next pixel
   5590             read_next_pixel = 0;
   5591          } // end of reading a pixel
   5592 
   5593          // copy data
   5594          for (j = 0; j < tga_comp; ++j)
   5595            tga_data[i*tga_comp+j] = raw_data[j];
   5596 
   5597          //   in case we're in RLE mode, keep counting down
   5598          --RLE_count;
   5599       }
   5600       //   do I need to invert the image?
   5601       if ( tga_inverted )
   5602       {
   5603          for (j = 0; j*2 < tga_height; ++j)
   5604          {
   5605             int index1 = j * tga_width * tga_comp;
   5606             int index2 = (tga_height - 1 - j) * tga_width * tga_comp;
   5607             for (i = tga_width * tga_comp; i > 0; --i)
   5608             {
   5609                unsigned char temp = tga_data[index1];
   5610                tga_data[index1] = tga_data[index2];
   5611                tga_data[index2] = temp;
   5612                ++index1;
   5613                ++index2;
   5614             }
   5615          }
   5616       }
   5617       //   clear my palette, if I had one
   5618       if ( tga_palette != NULL )
   5619       {
   5620          STBI_FREE( tga_palette );
   5621       }
   5622    }
   5623 
   5624    // swap RGB - if the source data was RGB16, it already is in the right order
   5625    if (tga_comp >= 3 && !tga_rgb16)
   5626    {
   5627       unsigned char* tga_pixel = tga_data;
   5628       for (i=0; i < tga_width * tga_height; ++i)
   5629       {
   5630          unsigned char temp = tga_pixel[0];
   5631          tga_pixel[0] = tga_pixel[2];
   5632          tga_pixel[2] = temp;
   5633          tga_pixel += tga_comp;
   5634       }
   5635    }
   5636 
   5637    // convert to target component count
   5638    if (req_comp && req_comp != tga_comp)
   5639       tga_data = stbi__convert_format(tga_data, tga_comp, req_comp, tga_width, tga_height);
   5640 
   5641    //   the things I do to get rid of an error message, and yet keep
   5642    //   Microsoft's C compilers happy... [8^(
   5643    tga_palette_start = tga_palette_len = tga_palette_bits =
   5644          tga_x_origin = tga_y_origin = 0;
   5645    //   OK, done
   5646    return tga_data;
   5647 }
   5648 #endif
   5649 
   5650 // *************************************************************************************************
   5651 // Photoshop PSD loader -- PD by Thatcher Ulrich, integration by Nicolas Schulz, tweaked by STB
   5652 
   5653 #ifndef STBI_NO_PSD
   5654 static int stbi__psd_test(stbi__context *s)
   5655 {
   5656    int r = (stbi__get32be(s) == 0x38425053);
   5657    stbi__rewind(s);
   5658    return r;
   5659 }
   5660 
   5661 static int stbi__psd_decode_rle(stbi__context *s, stbi_uc *p, int pixelCount)
   5662 {
   5663    int count, nleft, len;
   5664 
   5665    count = 0;
   5666    while ((nleft = pixelCount - count) > 0) {
   5667       len = stbi__get8(s);
   5668       if (len == 128) {
   5669          // No-op.
   5670       } else if (len < 128) {
   5671          // Copy next len+1 bytes literally.
   5672          len++;
   5673          if (len > nleft) return 0; // corrupt data
   5674          count += len;
   5675          while (len) {
   5676             *p = stbi__get8(s);
   5677             p += 4;
   5678             len--;
   5679          }
   5680       } else if (len > 128) {
   5681          stbi_uc   val;
   5682          // Next -len+1 bytes in the dest are replicated from next source byte.
   5683          // (Interpret len as a negative 8-bit int.)
   5684          len = 257 - len;
   5685          if (len > nleft) return 0; // corrupt data
   5686          val = stbi__get8(s);
   5687          count += len;
   5688          while (len) {
   5689             *p = val;
   5690             p += 4;
   5691             len--;
   5692          }
   5693       }
   5694    }
   5695 
   5696    return 1;
   5697 }
   5698 
   5699 static void *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri, int bpc)
   5700 {
   5701    int pixelCount;
   5702    int channelCount, compression;
   5703    int channel, i;
   5704    int bitdepth;
   5705    int w,h;
   5706    stbi_uc *out;
   5707    STBI_NOTUSED(ri);
   5708 
   5709    // Check identifier
   5710    if (stbi__get32be(s) != 0x38425053)   // "8BPS"
   5711       return stbi__errpuc("not PSD", "Corrupt PSD image");
   5712 
   5713    // Check file type version.
   5714    if (stbi__get16be(s) != 1)
   5715       return stbi__errpuc("wrong version", "Unsupported version of PSD image");
   5716 
   5717    // Skip 6 reserved bytes.
   5718    stbi__skip(s, 6 );
   5719 
   5720    // Read the number of channels (R, G, B, A, etc).
   5721    channelCount = stbi__get16be(s);
   5722    if (channelCount < 0 || channelCount > 16)
   5723       return stbi__errpuc("wrong channel count", "Unsupported number of channels in PSD image");
   5724 
   5725    // Read the rows and columns of the image.
   5726    h = stbi__get32be(s);
   5727    w = stbi__get32be(s);
   5728 
   5729    // Make sure the depth is 8 bits.
   5730    bitdepth = stbi__get16be(s);
   5731    if (bitdepth != 8 && bitdepth != 16)
   5732       return stbi__errpuc("unsupported bit depth", "PSD bit depth is not 8 or 16 bit");
   5733 
   5734    // Make sure the color mode is RGB.
   5735    // Valid options are:
   5736    //   0: Bitmap
   5737    //   1: Grayscale
   5738    //   2: Indexed color
   5739    //   3: RGB color
   5740    //   4: CMYK color
   5741    //   7: Multichannel
   5742    //   8: Duotone
   5743    //   9: Lab color
   5744    if (stbi__get16be(s) != 3)
   5745       return stbi__errpuc("wrong color format", "PSD is not in RGB color format");
   5746 
   5747    // Skip the Mode Data.  (It's the palette for indexed color; other info for other modes.)
   5748    stbi__skip(s,stbi__get32be(s) );
   5749 
   5750    // Skip the image resources.  (resolution, pen tool paths, etc)
   5751    stbi__skip(s, stbi__get32be(s) );
   5752 
   5753    // Skip the reserved data.
   5754    stbi__skip(s, stbi__get32be(s) );
   5755 
   5756    // Find out if the data is compressed.
   5757    // Known values:
   5758    //   0: no compression
   5759    //   1: RLE compressed
   5760    compression = stbi__get16be(s);
   5761    if (compression > 1)
   5762       return stbi__errpuc("bad compression", "PSD has an unknown compression format");
   5763 
   5764    // Check size
   5765    if (!stbi__mad3sizes_valid(4, w, h, 0))
   5766       return stbi__errpuc("too large", "Corrupt PSD");
   5767 
   5768    // Create the destination image.
   5769 
   5770    if (!compression && bitdepth == 16 && bpc == 16) {
   5771       out = (stbi_uc *) stbi__malloc_mad3(8, w, h, 0);
   5772       ri->bits_per_channel = 16;
   5773    } else
   5774       out = (stbi_uc *) stbi__malloc(4 * w*h);
   5775 
   5776    if (!out) return stbi__errpuc("outofmem", "Out of memory");
   5777    pixelCount = w*h;
   5778 
   5779    // Initialize the data to zero.
   5780    //memset( out, 0, pixelCount * 4 );
   5781 
   5782    // Finally, the image data.
   5783    if (compression) {
   5784       // RLE as used by .PSD and .TIFF
   5785       // Loop until you get the number of unpacked bytes you are expecting:
   5786       //     Read the next source byte into n.
   5787       //     If n is between 0 and 127 inclusive, copy the next n+1 bytes literally.
   5788       //     Else if n is between -127 and -1 inclusive, copy the next byte -n+1 times.
   5789       //     Else if n is 128, noop.
   5790       // Endloop
   5791 
   5792       // The RLE-compressed data is preceeded by a 2-byte data count for each row in the data,
   5793       // which we're going to just skip.
   5794       stbi__skip(s, h * channelCount * 2 );
   5795 
   5796       // Read the RLE data by channel.
   5797       for (channel = 0; channel < 4; channel++) {
   5798          stbi_uc *p;
   5799 
   5800          p = out+channel;
   5801          if (channel >= channelCount) {
   5802             // Fill this channel with default data.
   5803             for (i = 0; i < pixelCount; i++, p += 4)
   5804                *p = (channel == 3 ? 255 : 0);
   5805          } else {
   5806             // Read the RLE data.
   5807             if (!stbi__psd_decode_rle(s, p, pixelCount)) {
   5808                STBI_FREE(out);
   5809                return stbi__errpuc("corrupt", "bad RLE data");
   5810             }
   5811          }
   5812       }
   5813 
   5814    } else {
   5815       // We're at the raw image data.  It's each channel in order (Red, Green, Blue, Alpha, ...)
   5816       // where each channel consists of an 8-bit (or 16-bit) value for each pixel in the image.
   5817 
   5818       // Read the data by channel.
   5819       for (channel = 0; channel < 4; channel++) {
   5820          if (channel >= channelCount) {
   5821             // Fill this channel with default data.
   5822             if (bitdepth == 16 && bpc == 16) {
   5823                stbi__uint16 *q = ((stbi__uint16 *) out) + channel;
   5824                stbi__uint16 val = channel == 3 ? 65535 : 0;
   5825                for (i = 0; i < pixelCount; i++, q += 4)
   5826                   *q = val;
   5827             } else {
   5828                stbi_uc *p = out+channel;
   5829                stbi_uc val = channel == 3 ? 255 : 0;
   5830                for (i = 0; i < pixelCount; i++, p += 4)
   5831                   *p = val;
   5832             }
   5833          } else {
   5834             if (ri->bits_per_channel == 16) {    // output bpc
   5835                stbi__uint16 *q = ((stbi__uint16 *) out) + channel;
   5836                for (i = 0; i < pixelCount; i++, q += 4)
   5837                   *q = (stbi__uint16) stbi__get16be(s);
   5838             } else {
   5839                stbi_uc *p = out+channel;
   5840                if (bitdepth == 16) {  // input bpc
   5841                   for (i = 0; i < pixelCount; i++, p += 4)
   5842                      *p = (stbi_uc) (stbi__get16be(s) >> 8);
   5843                } else {
   5844                   for (i = 0; i < pixelCount; i++, p += 4)
   5845                      *p = stbi__get8(s);
   5846                }
   5847             }
   5848          }
   5849       }
   5850    }
   5851 
   5852    // remove weird white matte from PSD
   5853    if (channelCount >= 4) {
   5854       if (ri->bits_per_channel == 16) {
   5855          for (i=0; i < w*h; ++i) {
   5856             stbi__uint16 *pixel = (stbi__uint16 *) out + 4*i;
   5857             if (pixel[3] != 0 && pixel[3] != 65535) {
   5858                float a = pixel[3] / 65535.0f;
   5859                float ra = 1.0f / a;
   5860                float inv_a = 65535.0f * (1 - ra);
   5861                pixel[0] = (stbi__uint16) (pixel[0]*ra + inv_a);
   5862                pixel[1] = (stbi__uint16) (pixel[1]*ra + inv_a);
   5863                pixel[2] = (stbi__uint16) (pixel[2]*ra + inv_a);
   5864             }
   5865          }
   5866       } else {
   5867          for (i=0; i < w*h; ++i) {
   5868             unsigned char *pixel = out + 4*i;
   5869             if (pixel[3] != 0 && pixel[3] != 255) {
   5870                float a = pixel[3] / 255.0f;
   5871                float ra = 1.0f / a;
   5872                float inv_a = 255.0f * (1 - ra);
   5873                pixel[0] = (unsigned char) (pixel[0]*ra + inv_a);
   5874                pixel[1] = (unsigned char) (pixel[1]*ra + inv_a);
   5875                pixel[2] = (unsigned char) (pixel[2]*ra + inv_a);
   5876             }
   5877          }
   5878       }
   5879    }
   5880 
   5881    // convert to desired output format
   5882    if (req_comp && req_comp != 4) {
   5883       if (ri->bits_per_channel == 16)
   5884          out = (stbi_uc *) stbi__convert_format16((stbi__uint16 *) out, 4, req_comp, w, h);
   5885       else
   5886          out = stbi__convert_format(out, 4, req_comp, w, h);
   5887       if (out == NULL) return out; // stbi__convert_format frees input on failure
   5888    }
   5889 
   5890    if (comp) *comp = 4;
   5891    *y = h;
   5892    *x = w;
   5893 
   5894    return out;
   5895 }
   5896 #endif
   5897 
   5898 // *************************************************************************************************
   5899 // Softimage PIC loader
   5900 // by Tom Seddon
   5901 //
   5902 // See http://softimage.wiki.softimage.com/index.php/INFO:_PIC_file_format
   5903 // See http://ozviz.wasp.uwa.edu.au/~pbourke/dataformats/softimagepic/
   5904 
   5905 #ifndef STBI_NO_PIC
   5906 static int stbi__pic_is4(stbi__context *s,const char *str)
   5907 {
   5908    int i;
   5909    for (i=0; i<4; ++i)
   5910       if (stbi__get8(s) != (stbi_uc)str[i])
   5911          return 0;
   5912 
   5913    return 1;
   5914 }
   5915 
   5916 static int stbi__pic_test_core(stbi__context *s)
   5917 {
   5918    int i;
   5919 
   5920    if (!stbi__pic_is4(s,"\x53\x80\xF6\x34"))
   5921       return 0;
   5922 
   5923    for(i=0;i<84;++i)
   5924       stbi__get8(s);
   5925 
   5926    if (!stbi__pic_is4(s,"PICT"))
   5927       return 0;
   5928 
   5929    return 1;
   5930 }
   5931 
   5932 typedef struct
   5933 {
   5934    stbi_uc size,type,channel;
   5935 } stbi__pic_packet;
   5936 
   5937 static stbi_uc *stbi__readval(stbi__context *s, int channel, stbi_uc *dest)
   5938 {
   5939    int mask=0x80, i;
   5940 
   5941    for (i=0; i<4; ++i, mask>>=1) {
   5942       if (channel & mask) {
   5943          if (stbi__at_eof(s)) return stbi__errpuc("bad file","PIC file too short");
   5944          dest[i]=stbi__get8(s);
   5945       }
   5946    }
   5947 
   5948    return dest;
   5949 }
   5950 
   5951 static void stbi__copyval(int channel,stbi_uc *dest,const stbi_uc *src)
   5952 {
   5953    int mask=0x80,i;
   5954 
   5955    for (i=0;i<4; ++i, mask>>=1)
   5956       if (channel&mask)
   5957          dest[i]=src[i];
   5958 }
   5959 
   5960 static stbi_uc *stbi__pic_load_core(stbi__context *s,int width,int height,int *comp, stbi_uc *result)
   5961 {
   5962    int act_comp=0,num_packets=0,y,chained;
   5963    stbi__pic_packet packets[10];
   5964 
   5965    // this will (should...) cater for even some bizarre stuff like having data
   5966     // for the same channel in multiple packets.
   5967    do {
   5968       stbi__pic_packet *packet;
   5969 
   5970       if (num_packets==sizeof(packets)/sizeof(packets[0]))
   5971          return stbi__errpuc("bad format","too many packets");
   5972 
   5973       packet = &packets[num_packets++];
   5974 
   5975       chained = stbi__get8(s);
   5976       packet->size    = stbi__get8(s);
   5977       packet->type    = stbi__get8(s);
   5978       packet->channel = stbi__get8(s);
   5979 
   5980       act_comp |= packet->channel;
   5981 
   5982       if (stbi__at_eof(s))          return stbi__errpuc("bad file","file too short (reading packets)");
   5983       if (packet->size != 8)  return stbi__errpuc("bad format","packet isn't 8bpp");
   5984    } while (chained);
   5985 
   5986    *comp = (act_comp & 0x10 ? 4 : 3); // has alpha channel?
   5987 
   5988    for(y=0; y<height; ++y) {
   5989       int packet_idx;
   5990 
   5991       for(packet_idx=0; packet_idx < num_packets; ++packet_idx) {
   5992          stbi__pic_packet *packet = &packets[packet_idx];
   5993          stbi_uc *dest = result+y*width*4;
   5994 
   5995          switch (packet->type) {
   5996             default:
   5997                return stbi__errpuc("bad format","packet has bad compression type");
   5998 
   5999             case 0: {//uncompressed
   6000                int x;
   6001 
   6002                for(x=0;x<width;++x, dest+=4)
   6003                   if (!stbi__readval(s,packet->channel,dest))
   6004                      return 0;
   6005                break;
   6006             }
   6007 
   6008             case 1://Pure RLE
   6009                {
   6010                   int left=width, i;
   6011 
   6012                   while (left>0) {
   6013                      stbi_uc count,value[4];
   6014 
   6015                      count=stbi__get8(s);
   6016                      if (stbi__at_eof(s))   return stbi__errpuc("bad file","file too short (pure read count)");
   6017 
   6018                      if (count > left)
   6019                         count = (stbi_uc) left;
   6020 
   6021                      if (!stbi__readval(s,packet->channel,value))  return 0;
   6022 
   6023                      for(i=0; i<count; ++i,dest+=4)
   6024                         stbi__copyval(packet->channel,dest,value);
   6025                      left -= count;
   6026                   }
   6027                }
   6028                break;
   6029 
   6030             case 2: {//Mixed RLE
   6031                int left=width;
   6032                while (left>0) {
   6033                   int count = stbi__get8(s), i;
   6034                   if (stbi__at_eof(s))  return stbi__errpuc("bad file","file too short (mixed read count)");
   6035 
   6036                   if (count >= 128) { // Repeated
   6037                      stbi_uc value[4];
   6038 
   6039                      if (count==128)
   6040                         count = stbi__get16be(s);
   6041                      else
   6042                         count -= 127;
   6043                      if (count > left)
   6044                         return stbi__errpuc("bad file","scanline overrun");
   6045 
   6046                      if (!stbi__readval(s,packet->channel,value))
   6047                         return 0;
   6048 
   6049                      for(i=0;i<count;++i, dest += 4)
   6050                         stbi__copyval(packet->channel,dest,value);
   6051                   } else { // Raw
   6052                      ++count;
   6053                      if (count>left) return stbi__errpuc("bad file","scanline overrun");
   6054 
   6055                      for(i=0;i<count;++i, dest+=4)
   6056                         if (!stbi__readval(s,packet->channel,dest))
   6057                            return 0;
   6058                   }
   6059                   left-=count;
   6060                }
   6061                break;
   6062             }
   6063          }
   6064       }
   6065    }
   6066 
   6067    return result;
   6068 }
   6069 
   6070 static void *stbi__pic_load(stbi__context *s,int *px,int *py,int *comp,int req_comp, stbi__result_info *ri)
   6071 {
   6072    stbi_uc *result;
   6073    int i, x,y, internal_comp;
   6074    STBI_NOTUSED(ri);
   6075 
   6076    if (!comp) comp = &internal_comp;
   6077 
   6078    for (i=0; i<92; ++i)
   6079       stbi__get8(s);
   6080 
   6081    x = stbi__get16be(s);
   6082    y = stbi__get16be(s);
   6083    if (stbi__at_eof(s))  return stbi__errpuc("bad file","file too short (pic header)");
   6084    if (!stbi__mad3sizes_valid(x, y, 4, 0)) return stbi__errpuc("too large", "PIC image too large to decode");
   6085 
   6086    stbi__get32be(s); //skip `ratio'
   6087    stbi__get16be(s); //skip `fields'
   6088    stbi__get16be(s); //skip `pad'
   6089 
   6090    // intermediate buffer is RGBA
   6091    result = (stbi_uc *) stbi__malloc_mad3(x, y, 4, 0);
   6092    memset(result, 0xff, x*y*4);
   6093 
   6094    if (!stbi__pic_load_core(s,x,y,comp, result)) {
   6095       STBI_FREE(result);
   6096       result=0;
   6097    }
   6098    *px = x;
   6099    *py = y;
   6100    if (req_comp == 0) req_comp = *comp;
   6101    result=stbi__convert_format(result,4,req_comp,x,y);
   6102 
   6103    return result;
   6104 }
   6105 
   6106 static int stbi__pic_test(stbi__context *s)
   6107 {
   6108    int r = stbi__pic_test_core(s);
   6109    stbi__rewind(s);
   6110    return r;
   6111 }
   6112 #endif
   6113 
   6114 // *************************************************************************************************
   6115 // GIF loader -- public domain by Jean-Marc Lienher -- simplified/shrunk by stb
   6116 
   6117 #ifndef STBI_NO_GIF
   6118 typedef struct
   6119 {
   6120    stbi__int16 prefix;
   6121    stbi_uc first;
   6122    stbi_uc suffix;
   6123 } stbi__gif_lzw;
   6124 
   6125 typedef struct
   6126 {
   6127    int w,h;
   6128    stbi_uc *out;                 // output buffer (always 4 components)
   6129    stbi_uc *background;          // The current "background" as far as a gif is concerned
   6130    stbi_uc *history; 
   6131    int flags, bgindex, ratio, transparent, eflags;
   6132    stbi_uc  pal[256][4];
   6133    stbi_uc lpal[256][4];
   6134    stbi__gif_lzw codes[8192];
   6135    stbi_uc *color_table;
   6136    int parse, step;
   6137    int lflags;
   6138    int start_x, start_y;
   6139    int max_x, max_y;
   6140    int cur_x, cur_y;
   6141    int line_size;
   6142    int delay;
   6143 } stbi__gif;
   6144 
   6145 static int stbi__gif_test_raw(stbi__context *s)
   6146 {
   6147    int sz;
   6148    if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8') return 0;
   6149    sz = stbi__get8(s);
   6150    if (sz != '9' && sz != '7') return 0;
   6151    if (stbi__get8(s) != 'a') return 0;
   6152    return 1;
   6153 }
   6154 
   6155 static int stbi__gif_test(stbi__context *s)
   6156 {
   6157    int r = stbi__gif_test_raw(s);
   6158    stbi__rewind(s);
   6159    return r;
   6160 }
   6161 
   6162 static void stbi__gif_parse_colortable(stbi__context *s, stbi_uc pal[256][4], int num_entries, int transp)
   6163 {
   6164    int i;
   6165    for (i=0; i < num_entries; ++i) {
   6166       pal[i][2] = stbi__get8(s);
   6167       pal[i][1] = stbi__get8(s);
   6168       pal[i][0] = stbi__get8(s);
   6169       pal[i][3] = transp == i ? 0 : 255;
   6170    }
   6171 }
   6172 
   6173 static int stbi__gif_header(stbi__context *s, stbi__gif *g, int *comp, int is_info)
   6174 {
   6175    stbi_uc version;
   6176    if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8')
   6177       return stbi__err("not GIF", "Corrupt GIF");
   6178 
   6179    version = stbi__get8(s);
   6180    if (version != '7' && version != '9')    return stbi__err("not GIF", "Corrupt GIF");
   6181    if (stbi__get8(s) != 'a')                return stbi__err("not GIF", "Corrupt GIF");
   6182 
   6183    stbi__g_failure_reason = "";
   6184    g->w = stbi__get16le(s);
   6185    g->h = stbi__get16le(s);
   6186    g->flags = stbi__get8(s);
   6187    g->bgindex = stbi__get8(s);
   6188    g->ratio = stbi__get8(s);
   6189    g->transparent = -1;
   6190 
   6191    if (comp != 0) *comp = 4;  // can't actually tell whether it's 3 or 4 until we parse the comments
   6192 
   6193    if (is_info) return 1;
   6194 
   6195    if (g->flags & 0x80)
   6196       stbi__gif_parse_colortable(s,g->pal, 2 << (g->flags & 7), -1);
   6197 
   6198    return 1;
   6199 }
   6200 
   6201 static int stbi__gif_info_raw(stbi__context *s, int *x, int *y, int *comp)
   6202 {
   6203    stbi__gif* g = (stbi__gif*) stbi__malloc(sizeof(stbi__gif));
   6204    if (!stbi__gif_header(s, g, comp, 1)) {
   6205       STBI_FREE(g);
   6206       stbi__rewind( s );
   6207       return 0;
   6208    }
   6209    if (x) *x = g->w;
   6210    if (y) *y = g->h;
   6211    STBI_FREE(g);
   6212    return 1;
   6213 }
   6214 
   6215 static void stbi__out_gif_code(stbi__gif *g, stbi__uint16 code)
   6216 {
   6217    stbi_uc *p, *c;
   6218    int idx; 
   6219 
   6220    // recurse to decode the prefixes, since the linked-list is backwards,
   6221    // and working backwards through an interleaved image would be nasty
   6222    if (g->codes[code].prefix >= 0)
   6223       stbi__out_gif_code(g, g->codes[code].prefix);
   6224 
   6225    if (g->cur_y >= g->max_y) return;
   6226 
   6227    idx = g->cur_x + g->cur_y; 
   6228    p = &g->out[idx];
   6229    g->history[idx / 4] = 1;  
   6230 
   6231    c = &g->color_table[g->codes[code].suffix * 4];
   6232    if (c[3] > 128) { // don't render transparent pixels; 
   6233       p[0] = c[2];
   6234       p[1] = c[1];
   6235       p[2] = c[0];
   6236       p[3] = c[3];
   6237    }
   6238    g->cur_x += 4;
   6239 
   6240    if (g->cur_x >= g->max_x) {
   6241       g->cur_x = g->start_x;
   6242       g->cur_y += g->step;
   6243 
   6244       while (g->cur_y >= g->max_y && g->parse > 0) {
   6245          g->step = (1 << g->parse) * g->line_size;
   6246          g->cur_y = g->start_y + (g->step >> 1);
   6247          --g->parse;
   6248       }
   6249    }
   6250 }
   6251 
   6252 static stbi_uc *stbi__process_gif_raster(stbi__context *s, stbi__gif *g)
   6253 {
   6254    stbi_uc lzw_cs;
   6255    stbi__int32 len, init_code;
   6256    stbi__uint32 first;
   6257    stbi__int32 codesize, codemask, avail, oldcode, bits, valid_bits, clear;
   6258    stbi__gif_lzw *p;
   6259 
   6260    lzw_cs = stbi__get8(s);
   6261    if (lzw_cs > 12) return NULL;
   6262    clear = 1 << lzw_cs;
   6263    first = 1;
   6264    codesize = lzw_cs + 1;
   6265    codemask = (1 << codesize) - 1;
   6266    bits = 0;
   6267    valid_bits = 0;
   6268    for (init_code = 0; init_code < clear; init_code++) {
   6269       g->codes[init_code].prefix = -1;
   6270       g->codes[init_code].first = (stbi_uc) init_code;
   6271       g->codes[init_code].suffix = (stbi_uc) init_code;
   6272    }
   6273 
   6274    // support no starting clear code
   6275    avail = clear+2;
   6276    oldcode = -1;
   6277 
   6278    len = 0;
   6279    for(;;) {
   6280       if (valid_bits < codesize) {
   6281          if (len == 0) {
   6282             len = stbi__get8(s); // start new block
   6283             if (len == 0)
   6284                return g->out;
   6285          }
   6286          --len;
   6287          bits |= (stbi__int32) stbi__get8(s) << valid_bits;
   6288          valid_bits += 8;
   6289       } else {
   6290          stbi__int32 code = bits & codemask;
   6291          bits >>= codesize;
   6292          valid_bits -= codesize;
   6293          // @OPTIMIZE: is there some way we can accelerate the non-clear path?
   6294          if (code == clear) {  // clear code
   6295             codesize = lzw_cs + 1;
   6296             codemask = (1 << codesize) - 1;
   6297             avail = clear + 2;
   6298             oldcode = -1;
   6299             first = 0;
   6300          } else if (code == clear + 1) { // end of stream code
   6301             stbi__skip(s, len);
   6302             while ((len = stbi__get8(s)) > 0)
   6303                stbi__skip(s,len);
   6304             return g->out;
   6305          } else if (code <= avail) {
   6306             if (first) {
   6307                return stbi__errpuc("no clear code", "Corrupt GIF");
   6308             }
   6309 
   6310             if (oldcode >= 0) {
   6311                p = &g->codes[avail++];
   6312                if (avail > 8192) {
   6313                   return stbi__errpuc("too many codes", "Corrupt GIF");
   6314                }
   6315 
   6316                p->prefix = (stbi__int16) oldcode;
   6317                p->first = g->codes[oldcode].first;
   6318                p->suffix = (code == avail) ? p->first : g->codes[code].first;
   6319             } else if (code == avail)
   6320                return stbi__errpuc("illegal code in raster", "Corrupt GIF");
   6321 
   6322             stbi__out_gif_code(g, (stbi__uint16) code);
   6323 
   6324             if ((avail & codemask) == 0 && avail <= 0x0FFF) {
   6325                codesize++;
   6326                codemask = (1 << codesize) - 1;
   6327             }
   6328 
   6329             oldcode = code;
   6330          } else {
   6331             return stbi__errpuc("illegal code in raster", "Corrupt GIF");
   6332          }
   6333       }
   6334    }
   6335 }
   6336 
   6337 // this function is designed to support animated gifs, although stb_image doesn't support it
   6338 // two back is the image from two frames ago, used for a very specific disposal format
   6339 static stbi_uc *stbi__gif_load_next(stbi__context *s, stbi__gif *g, int *comp, int req_comp, stbi_uc *two_back)
   6340 {
   6341    int dispose; 
   6342    int first_frame; 
   6343    int pi; 
   6344    int pcount; 
   6345 
   6346    // on first frame, any non-written pixels get the background colour (non-transparent)
   6347    first_frame = 0; 
   6348    if (g->out == 0) {
   6349       if (!stbi__gif_header(s, g, comp,0))     return 0; // stbi__g_failure_reason set by stbi__gif_header
   6350       g->out = (stbi_uc *) stbi__malloc(4 * g->w * g->h);
   6351       g->background = (stbi_uc *) stbi__malloc(4 * g->w * g->h); 
   6352       g->history = (stbi_uc *) stbi__malloc(g->w * g->h); 
   6353       if (g->out == 0)                      return stbi__errpuc("outofmem", "Out of memory");
   6354 
   6355       // image is treated as "tranparent" at the start - ie, nothing overwrites the current background; 
   6356       // background colour is only used for pixels that are not rendered first frame, after that "background"
   6357       // color refers to teh color that was there the previous frame. 
   6358       memset( g->out, 0x00, 4 * g->w * g->h ); 
   6359       memset( g->background, 0x00, 4 * g->w * g->h ); // state of the background (starts transparent)
   6360       memset( g->history, 0x00, g->w * g->h );        // pixels that were affected previous frame
   6361       first_frame = 1; 
   6362    } else {
   6363       // second frame - how do we dispoase of the previous one?
   6364       dispose = (g->eflags & 0x1C) >> 2; 
   6365       pcount = g->w * g->h; 
   6366 
   6367       if ((dispose == 3) && (two_back == 0)) {
   6368          dispose = 2; // if I don't have an image to revert back to, default to the old background
   6369       }
   6370 
   6371       if (dispose == 3) { // use previous graphic
   6372          for (pi = 0; pi < pcount; ++pi) {
   6373             if (g->history[pi]) {
   6374                memcpy( &g->out[pi * 4], &two_back[pi * 4], 4 ); 
   6375             }
   6376          }
   6377       } else if (dispose == 2) { 
   6378          // restore what was changed last frame to background before that frame; 
   6379          for (pi = 0; pi < pcount; ++pi) {
   6380             if (g->history[pi]) {
   6381                memcpy( &g->out[pi * 4], &g->background[pi * 4], 4 ); 
   6382             }
   6383          }
   6384       } else {
   6385          // This is a non-disposal case eithe way, so just 
   6386          // leave the pixels as is, and they will become the new background
   6387          // 1: do not dispose
   6388          // 0:  not specified.
   6389       }
   6390 
   6391       // background is what out is after the undoing of the previou frame; 
   6392       memcpy( g->background, g->out, 4 * g->w * g->h ); 
   6393    }
   6394 
   6395    // clear my history; 
   6396    memset( g->history, 0x00, g->w * g->h );        // pixels that were affected previous frame
   6397 
   6398    for (;;) {
   6399       int tag = stbi__get8(s); 
   6400       switch (tag) {
   6401          case 0x2C: /* Image Descriptor */
   6402          {
   6403             stbi__int32 x, y, w, h;
   6404             stbi_uc *o;
   6405 
   6406             x = stbi__get16le(s);
   6407             y = stbi__get16le(s);
   6408             w = stbi__get16le(s);
   6409             h = stbi__get16le(s);
   6410             if (((x + w) > (g->w)) || ((y + h) > (g->h)))
   6411                return stbi__errpuc("bad Image Descriptor", "Corrupt GIF");
   6412 
   6413             g->line_size = g->w * 4;
   6414             g->start_x = x * 4;
   6415             g->start_y = y * g->line_size;
   6416             g->max_x   = g->start_x + w * 4;
   6417             g->max_y   = g->start_y + h * g->line_size;
   6418             g->cur_x   = g->start_x;
   6419             g->cur_y   = g->start_y;
   6420 
   6421             g->lflags = stbi__get8(s);
   6422 
   6423             if (g->lflags & 0x40) {
   6424                g->step = 8 * g->line_size; // first interlaced spacing
   6425                g->parse = 3;
   6426             } else {
   6427                g->step = g->line_size;
   6428                g->parse = 0;
   6429             }
   6430 
   6431             if (g->lflags & 0x80) {
   6432                stbi__gif_parse_colortable(s,g->lpal, 2 << (g->lflags & 7), g->eflags & 0x01 ? g->transparent : -1);
   6433                g->color_table = (stbi_uc *) g->lpal;
   6434             } else if (g->flags & 0x80) {
   6435                g->color_table = (stbi_uc *) g->pal;
   6436             } else
   6437                return stbi__errpuc("missing color table", "Corrupt GIF");            
   6438             
   6439             o = stbi__process_gif_raster(s, g);
   6440             if (o == NULL) return NULL;
   6441 
   6442             // if this was the first frame, 
   6443             pcount = g->w * g->h; 
   6444             if (first_frame && (g->bgindex > 0)) {
   6445                // if first frame, any pixel not drawn to gets the background color
   6446                for (pi = 0; pi < pcount; ++pi) {
   6447                   if (g->history[pi] == 0) {
   6448                      g->pal[g->bgindex][3] = 255; // just in case it was made transparent, undo that; It will be reset next frame if need be; 
   6449                      memcpy( &g->out[pi * 4], &g->pal[g->bgindex], 4 ); 
   6450                   }
   6451                }
   6452             }
   6453 
   6454             return o;
   6455          }
   6456 
   6457          case 0x21: // Comment Extension.
   6458          {
   6459             int len;
   6460             int ext = stbi__get8(s); 
   6461             if (ext == 0xF9) { // Graphic Control Extension.
   6462                len = stbi__get8(s);
   6463                if (len == 4) {
   6464                   g->eflags = stbi__get8(s);
   6465                   g->delay = 10 * stbi__get16le(s); // delay - 1/100th of a second, saving as 1/1000ths.
   6466 
   6467                   // unset old transparent
   6468                   if (g->transparent >= 0) {
   6469                      g->pal[g->transparent][3] = 255; 
   6470                   } 
   6471                   if (g->eflags & 0x01) {
   6472                      g->transparent = stbi__get8(s);
   6473                      if (g->transparent >= 0) {
   6474                         g->pal[g->transparent][3] = 0; 
   6475                      }
   6476                   } else {
   6477                      // don't need transparent
   6478                      stbi__skip(s, 1); 
   6479                      g->transparent = -1; 
   6480                   }
   6481                } else {
   6482                   stbi__skip(s, len);
   6483                   break;
   6484                }
   6485             } 
   6486             while ((len = stbi__get8(s)) != 0) {
   6487                stbi__skip(s, len);
   6488             }
   6489             break;
   6490          }
   6491 
   6492          case 0x3B: // gif stream termination code
   6493             return (stbi_uc *) s; // using '1' causes warning on some compilers
   6494 
   6495          default:
   6496             return stbi__errpuc("unknown code", "Corrupt GIF");
   6497       }
   6498    }
   6499 }
   6500 
   6501 static void *stbi__load_gif_main(stbi__context *s, int **delays, int *x, int *y, int *z, int *comp, int req_comp)
   6502 {
   6503    if (stbi__gif_test(s)) {
   6504       int layers = 0; 
   6505       stbi_uc *u = 0;
   6506       stbi_uc *out = 0;
   6507       stbi_uc *two_back = 0; 
   6508       stbi__gif g;
   6509       int stride; 
   6510       memset(&g, 0, sizeof(g));
   6511       if (delays) {
   6512          *delays = 0; 
   6513       }
   6514 
   6515       do {
   6516          u = stbi__gif_load_next(s, &g, comp, req_comp, two_back);
   6517          if (u == (stbi_uc *) s) u = 0;  // end of animated gif marker
   6518 
   6519          if (u) {
   6520             *x = g.w;
   6521             *y = g.h;
   6522             ++layers; 
   6523             stride = g.w * g.h * 4; 
   6524          
   6525             if (out) {
   6526                out = (stbi_uc*) STBI_REALLOC( out, layers * stride ); 
   6527                if (delays) {
   6528                   *delays = (int*) STBI_REALLOC( *delays, sizeof(int) * layers ); 
   6529                }
   6530             } else {
   6531                out = (stbi_uc*)stbi__malloc( layers * stride ); 
   6532                if (delays) {
   6533                   *delays = (int*) stbi__malloc( layers * sizeof(int) ); 
   6534                }
   6535             }
   6536             memcpy( out + ((layers - 1) * stride), u, stride ); 
   6537             if (layers >= 2) {
   6538                two_back = out - 2 * stride; 
   6539             }
   6540 
   6541             if (delays) {
   6542                (*delays)[layers - 1U] = g.delay; 
   6543             }
   6544          }
   6545       } while (u != 0); 
   6546 
   6547       // free temp buffer; 
   6548       STBI_FREE(g.out); 
   6549       STBI_FREE(g.history); 
   6550       STBI_FREE(g.background); 
   6551 
   6552       // do the final conversion after loading everything; 
   6553       if (req_comp && req_comp != 4)
   6554          out = stbi__convert_format(out, 4, req_comp, layers * g.w, g.h);
   6555 
   6556       *z = layers; 
   6557       return out;
   6558    } else {
   6559       return stbi__errpuc("not GIF", "Image was not as a gif type."); 
   6560    }
   6561 }
   6562 
   6563 static void *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
   6564 {
   6565    stbi_uc *u = 0;
   6566    stbi__gif g;
   6567    memset(&g, 0, sizeof(g));
   6568 
   6569    u = stbi__gif_load_next(s, &g, comp, req_comp, 0);
   6570    if (u == (stbi_uc *) s) u = 0;  // end of animated gif marker
   6571    if (u) {
   6572       *x = g.w;
   6573       *y = g.h;
   6574 
   6575       // moved conversion to after successful load so that the same
   6576       // can be done for multiple frames. 
   6577       if (req_comp && req_comp != 4)
   6578          u = stbi__convert_format(u, 4, req_comp, g.w, g.h);
   6579    }
   6580 
   6581    // free buffers needed for multiple frame loading; 
   6582    STBI_FREE(g.history);
   6583    STBI_FREE(g.background); 
   6584 
   6585    return u;
   6586 }
   6587 
   6588 static int stbi__gif_info(stbi__context *s, int *x, int *y, int *comp)
   6589 {
   6590    return stbi__gif_info_raw(s,x,y,comp);
   6591 }
   6592 #endif
   6593 
   6594 // *************************************************************************************************
   6595 // Radiance RGBE HDR loader
   6596 // originally by Nicolas Schulz
   6597 #ifndef STBI_NO_HDR
   6598 static int stbi__hdr_test_core(stbi__context *s, const char *signature)
   6599 {
   6600    int i;
   6601    for (i=0; signature[i]; ++i)
   6602       if (stbi__get8(s) != signature[i])
   6603           return 0;
   6604    stbi__rewind(s);
   6605    return 1;
   6606 }
   6607 
   6608 static int stbi__hdr_test(stbi__context* s)
   6609 {
   6610    int r = stbi__hdr_test_core(s, "#?RADIANCE\n");
   6611    stbi__rewind(s);
   6612    if(!r) {
   6613        r = stbi__hdr_test_core(s, "#?RGBE\n");
   6614        stbi__rewind(s);
   6615    }
   6616    return r;
   6617 }
   6618 
   6619 #define STBI__HDR_BUFLEN  1024
   6620 static char *stbi__hdr_gettoken(stbi__context *z, char *buffer)
   6621 {
   6622    int len=0;
   6623    char c = '\0';
   6624 
   6625    c = (char) stbi__get8(z);
   6626 
   6627    while (!stbi__at_eof(z) && c != '\n') {
   6628       buffer[len++] = c;
   6629       if (len == STBI__HDR_BUFLEN-1) {
   6630          // flush to end of line
   6631          while (!stbi__at_eof(z) && stbi__get8(z) != '\n')
   6632             ;
   6633          break;
   6634       }
   6635       c = (char) stbi__get8(z);
   6636    }
   6637 
   6638    buffer[len] = 0;
   6639    return buffer;
   6640 }
   6641 
   6642 static void stbi__hdr_convert(float *output, stbi_uc *input, int req_comp)
   6643 {
   6644    if ( input[3] != 0 ) {
   6645       float f1;
   6646       // Exponent
   6647       f1 = (float) ldexp(1.0f, input[3] - (int)(128 + 8));
   6648       if (req_comp <= 2)
   6649          output[0] = (input[0] + input[1] + input[2]) * f1 / 3;
   6650       else {
   6651          output[0] = input[0] * f1;
   6652          output[1] = input[1] * f1;
   6653          output[2] = input[2] * f1;
   6654       }
   6655       if (req_comp == 2) output[1] = 1;
   6656       if (req_comp == 4) output[3] = 1;
   6657    } else {
   6658       switch (req_comp) {
   6659          case 4: output[3] = 1; /* fallthrough */
   6660          case 3: output[0] = output[1] = output[2] = 0;
   6661                  break;
   6662          case 2: output[1] = 1; /* fallthrough */
   6663          case 1: output[0] = 0;
   6664                  break;
   6665       }
   6666    }
   6667 }
   6668 
   6669 static float *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
   6670 {
   6671    char buffer[STBI__HDR_BUFLEN];
   6672    char *token;
   6673    int valid = 0;
   6674    int width, height;
   6675    stbi_uc *scanline;
   6676    float *hdr_data;
   6677    int len;
   6678    unsigned char count, value;
   6679    int i, j, k, c1,c2, z;
   6680    const char *headerToken;
   6681    STBI_NOTUSED(ri);
   6682 
   6683    // Check identifier
   6684    headerToken = stbi__hdr_gettoken(s,buffer);
   6685    if (strcmp(headerToken, "#?RADIANCE") != 0 && strcmp(headerToken, "#?RGBE") != 0)
   6686       return stbi__errpf("not HDR", "Corrupt HDR image");
   6687 
   6688    // Parse header
   6689    for(;;) {
   6690       token = stbi__hdr_gettoken(s,buffer);
   6691       if (token[0] == 0) break;
   6692       if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
   6693    }
   6694 
   6695    if (!valid)    return stbi__errpf("unsupported format", "Unsupported HDR format");
   6696 
   6697    // Parse width and height
   6698    // can't use sscanf() if we're not using stdio!
   6699    token = stbi__hdr_gettoken(s,buffer);
   6700    if (strncmp(token, "-Y ", 3))  return stbi__errpf("unsupported data layout", "Unsupported HDR format");
   6701    token += 3;
   6702    height = (int) strtol(token, &token, 10);
   6703    while (*token == ' ') ++token;
   6704    if (strncmp(token, "+X ", 3))  return stbi__errpf("unsupported data layout", "Unsupported HDR format");
   6705    token += 3;
   6706    width = (int) strtol(token, NULL, 10);
   6707 
   6708    *x = width;
   6709    *y = height;
   6710 
   6711    if (comp) *comp = 3;
   6712    if (req_comp == 0) req_comp = 3;
   6713 
   6714    if (!stbi__mad4sizes_valid(width, height, req_comp, sizeof(float), 0))
   6715       return stbi__errpf("too large", "HDR image is too large");
   6716 
   6717    // Read data
   6718    hdr_data = (float *) stbi__malloc_mad4(width, height, req_comp, sizeof(float), 0);
   6719    if (!hdr_data)
   6720       return stbi__errpf("outofmem", "Out of memory");
   6721 
   6722    // Load image data
   6723    // image data is stored as some number of sca
   6724    if ( width < 8 || width >= 32768) {
   6725       // Read flat data
   6726       for (j=0; j < height; ++j) {
   6727          for (i=0; i < width; ++i) {
   6728             stbi_uc rgbe[4];
   6729            main_decode_loop:
   6730             stbi__getn(s, rgbe, 4);
   6731             stbi__hdr_convert(hdr_data + j * width * req_comp + i * req_comp, rgbe, req_comp);
   6732          }
   6733       }
   6734    } else {
   6735       // Read RLE-encoded data
   6736       scanline = NULL;
   6737 
   6738       for (j = 0; j < height; ++j) {
   6739          c1 = stbi__get8(s);
   6740          c2 = stbi__get8(s);
   6741          len = stbi__get8(s);
   6742          if (c1 != 2 || c2 != 2 || (len & 0x80)) {
   6743             // not run-length encoded, so we have to actually use THIS data as a decoded
   6744             // pixel (note this can't be a valid pixel--one of RGB must be >= 128)
   6745             stbi_uc rgbe[4];
   6746             rgbe[0] = (stbi_uc) c1;
   6747             rgbe[1] = (stbi_uc) c2;
   6748             rgbe[2] = (stbi_uc) len;
   6749             rgbe[3] = (stbi_uc) stbi__get8(s);
   6750             stbi__hdr_convert(hdr_data, rgbe, req_comp);
   6751             i = 1;
   6752             j = 0;
   6753             STBI_FREE(scanline);
   6754             goto main_decode_loop; // yes, this makes no sense
   6755          }
   6756          len <<= 8;
   6757          len |= stbi__get8(s);
   6758          if (len != width) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("invalid decoded scanline length", "corrupt HDR"); }
   6759          if (scanline == NULL) {
   6760             scanline = (stbi_uc *) stbi__malloc_mad2(width, 4, 0);
   6761             if (!scanline) {
   6762                STBI_FREE(hdr_data);
   6763                return stbi__errpf("outofmem", "Out of memory");
   6764             }
   6765          }
   6766 
   6767          for (k = 0; k < 4; ++k) {
   6768             int nleft;
   6769             i = 0;
   6770             while ((nleft = width - i) > 0) {
   6771                count = stbi__get8(s);
   6772                if (count > 128) {
   6773                   // Run
   6774                   value = stbi__get8(s);
   6775                   count -= 128;
   6776                   if (count > nleft) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("corrupt", "bad RLE data in HDR"); }
   6777                   for (z = 0; z < count; ++z)
   6778                      scanline[i++ * 4 + k] = value;
   6779                } else {
   6780                   // Dump
   6781                   if (count > nleft) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("corrupt", "bad RLE data in HDR"); }
   6782                   for (z = 0; z < count; ++z)
   6783                      scanline[i++ * 4 + k] = stbi__get8(s);
   6784                }
   6785             }
   6786          }
   6787          for (i=0; i < width; ++i)
   6788             stbi__hdr_convert(hdr_data+(j*width + i)*req_comp, scanline + i*4, req_comp);
   6789       }
   6790       if (scanline)
   6791          STBI_FREE(scanline);
   6792    }
   6793 
   6794    return hdr_data;
   6795 }
   6796 
   6797 static int stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp)
   6798 {
   6799    char buffer[STBI__HDR_BUFLEN];
   6800    char *token;
   6801    int valid = 0;
   6802    int dummy;
   6803 
   6804    if (!x) x = &dummy;
   6805    if (!y) y = &dummy;
   6806    if (!comp) comp = &dummy;
   6807 
   6808    if (stbi__hdr_test(s) == 0) {
   6809        stbi__rewind( s );
   6810        return 0;
   6811    }
   6812 
   6813    for(;;) {
   6814       token = stbi__hdr_gettoken(s,buffer);
   6815       if (token[0] == 0) break;
   6816       if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
   6817    }
   6818 
   6819    if (!valid) {
   6820        stbi__rewind( s );
   6821        return 0;
   6822    }
   6823    token = stbi__hdr_gettoken(s,buffer);
   6824    if (strncmp(token, "-Y ", 3)) {
   6825        stbi__rewind( s );
   6826        return 0;
   6827    }
   6828    token += 3;
   6829    *y = (int) strtol(token, &token, 10);
   6830    while (*token == ' ') ++token;
   6831    if (strncmp(token, "+X ", 3)) {
   6832        stbi__rewind( s );
   6833        return 0;
   6834    }
   6835    token += 3;
   6836    *x = (int) strtol(token, NULL, 10);
   6837    *comp = 3;
   6838    return 1;
   6839 }
   6840 #endif // STBI_NO_HDR
   6841 
   6842 #ifndef STBI_NO_BMP
   6843 static int stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp)
   6844 {
   6845    void *p;
   6846    stbi__bmp_data info;
   6847 
   6848    info.all_a = 255;
   6849    p = stbi__bmp_parse_header(s, &info);
   6850    stbi__rewind( s );
   6851    if (p == NULL)
   6852       return 0;
   6853    if (x) *x = s->img_x;
   6854    if (y) *y = s->img_y;
   6855    if (comp) *comp = info.ma ? 4 : 3;
   6856    return 1;
   6857 }
   6858 #endif
   6859 
   6860 #ifndef STBI_NO_PSD
   6861 static int stbi__psd_info(stbi__context *s, int *x, int *y, int *comp)
   6862 {
   6863    int channelCount, dummy, depth;
   6864    if (!x) x = &dummy;
   6865    if (!y) y = &dummy;
   6866    if (!comp) comp = &dummy;
   6867    if (stbi__get32be(s) != 0x38425053) {
   6868        stbi__rewind( s );
   6869        return 0;
   6870    }
   6871    if (stbi__get16be(s) != 1) {
   6872        stbi__rewind( s );
   6873        return 0;
   6874    }
   6875    stbi__skip(s, 6);
   6876    channelCount = stbi__get16be(s);
   6877    if (channelCount < 0 || channelCount > 16) {
   6878        stbi__rewind( s );
   6879        return 0;
   6880    }
   6881    *y = stbi__get32be(s);
   6882    *x = stbi__get32be(s);
   6883    depth = stbi__get16be(s);
   6884    if (depth != 8 && depth != 16) {
   6885        stbi__rewind( s );
   6886        return 0;
   6887    }
   6888    if (stbi__get16be(s) != 3) {
   6889        stbi__rewind( s );
   6890        return 0;
   6891    }
   6892    *comp = 4;
   6893    return 1;
   6894 }
   6895 
   6896 static int stbi__psd_is16(stbi__context *s)
   6897 {
   6898    int channelCount, depth;
   6899    if (stbi__get32be(s) != 0x38425053) {
   6900        stbi__rewind( s );
   6901        return 0;
   6902    }
   6903    if (stbi__get16be(s) != 1) {
   6904        stbi__rewind( s );
   6905        return 0;
   6906    }
   6907    stbi__skip(s, 6);
   6908    channelCount = stbi__get16be(s);
   6909    if (channelCount < 0 || channelCount > 16) {
   6910        stbi__rewind( s );
   6911        return 0;
   6912    }
   6913    (void) stbi__get32be(s);
   6914    (void) stbi__get32be(s);
   6915    depth = stbi__get16be(s);
   6916    if (depth != 16) {
   6917        stbi__rewind( s );
   6918        return 0;
   6919    }
   6920    return 1;
   6921 }
   6922 #endif
   6923 
   6924 #ifndef STBI_NO_PIC
   6925 static int stbi__pic_info(stbi__context *s, int *x, int *y, int *comp)
   6926 {
   6927    int act_comp=0,num_packets=0,chained,dummy;
   6928    stbi__pic_packet packets[10];
   6929 
   6930    if (!x) x = &dummy;
   6931    if (!y) y = &dummy;
   6932    if (!comp) comp = &dummy;
   6933 
   6934    if (!stbi__pic_is4(s,"\x53\x80\xF6\x34")) {
   6935       stbi__rewind(s);
   6936       return 0;
   6937    }
   6938 
   6939    stbi__skip(s, 88);
   6940 
   6941    *x = stbi__get16be(s);
   6942    *y = stbi__get16be(s);
   6943    if (stbi__at_eof(s)) {
   6944       stbi__rewind( s);
   6945       return 0;
   6946    }
   6947    if ( (*x) != 0 && (1 << 28) / (*x) < (*y)) {
   6948       stbi__rewind( s );
   6949       return 0;
   6950    }
   6951 
   6952    stbi__skip(s, 8);
   6953 
   6954    do {
   6955       stbi__pic_packet *packet;
   6956 
   6957       if (num_packets==sizeof(packets)/sizeof(packets[0]))
   6958          return 0;
   6959 
   6960       packet = &packets[num_packets++];
   6961       chained = stbi__get8(s);
   6962       packet->size    = stbi__get8(s);
   6963       packet->type    = stbi__get8(s);
   6964       packet->channel = stbi__get8(s);
   6965       act_comp |= packet->channel;
   6966 
   6967       if (stbi__at_eof(s)) {
   6968           stbi__rewind( s );
   6969           return 0;
   6970       }
   6971       if (packet->size != 8) {
   6972           stbi__rewind( s );
   6973           return 0;
   6974       }
   6975    } while (chained);
   6976 
   6977    *comp = (act_comp & 0x10 ? 4 : 3);
   6978 
   6979    return 1;
   6980 }
   6981 #endif
   6982 
   6983 // *************************************************************************************************
   6984 // Portable Gray Map and Portable Pixel Map loader
   6985 // by Ken Miller
   6986 //
   6987 // PGM: http://netpbm.sourceforge.net/doc/pgm.html
   6988 // PPM: http://netpbm.sourceforge.net/doc/ppm.html
   6989 //
   6990 // Known limitations:
   6991 //    Does not support comments in the header section
   6992 //    Does not support ASCII image data (formats P2 and P3)
   6993 //    Does not support 16-bit-per-channel
   6994 
   6995 #ifndef STBI_NO_PNM
   6996 
   6997 static int      stbi__pnm_test(stbi__context *s)
   6998 {
   6999    char p, t;
   7000    p = (char) stbi__get8(s);
   7001    t = (char) stbi__get8(s);
   7002    if (p != 'P' || (t != '5' && t != '6')) {
   7003        stbi__rewind( s );
   7004        return 0;
   7005    }
   7006    return 1;
   7007 }
   7008 
   7009 static void *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
   7010 {
   7011    stbi_uc *out;
   7012    STBI_NOTUSED(ri);
   7013 
   7014    if (!stbi__pnm_info(s, (int *)&s->img_x, (int *)&s->img_y, (int *)&s->img_n))
   7015       return 0;
   7016 
   7017    *x = s->img_x;
   7018    *y = s->img_y;
   7019    if (comp) *comp = s->img_n;
   7020 
   7021    if (!stbi__mad3sizes_valid(s->img_n, s->img_x, s->img_y, 0))
   7022       return stbi__errpuc("too large", "PNM too large");
   7023 
   7024    out = (stbi_uc *) stbi__malloc_mad3(s->img_n, s->img_x, s->img_y, 0);
   7025    if (!out) return stbi__errpuc("outofmem", "Out of memory");
   7026    stbi__getn(s, out, s->img_n * s->img_x * s->img_y);
   7027 
   7028    if (req_comp && req_comp != s->img_n) {
   7029       out = stbi__convert_format(out, s->img_n, req_comp, s->img_x, s->img_y);
   7030       if (out == NULL) return out; // stbi__convert_format frees input on failure
   7031    }
   7032    return out;
   7033 }
   7034 
   7035 static int      stbi__pnm_isspace(char c)
   7036 {
   7037    return c == ' ' || c == '\t' || c == '\n' || c == '\v' || c == '\f' || c == '\r';
   7038 }
   7039 
   7040 static void     stbi__pnm_skip_whitespace(stbi__context *s, char *c)
   7041 {
   7042    for (;;) {
   7043       while (!stbi__at_eof(s) && stbi__pnm_isspace(*c))
   7044          *c = (char) stbi__get8(s);
   7045 
   7046       if (stbi__at_eof(s) || *c != '#')
   7047          break;
   7048 
   7049       while (!stbi__at_eof(s) && *c != '\n' && *c != '\r' )
   7050          *c = (char) stbi__get8(s);
   7051    }
   7052 }
   7053 
   7054 static int      stbi__pnm_isdigit(char c)
   7055 {
   7056    return c >= '0' && c <= '9';
   7057 }
   7058 
   7059 static int      stbi__pnm_getinteger(stbi__context *s, char *c)
   7060 {
   7061    int value = 0;
   7062 
   7063    while (!stbi__at_eof(s) && stbi__pnm_isdigit(*c)) {
   7064       value = value*10 + (*c - '0');
   7065       *c = (char) stbi__get8(s);
   7066    }
   7067 
   7068    return value;
   7069 }
   7070 
   7071 static int      stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp)
   7072 {
   7073    int maxv, dummy;
   7074    char c, p, t;
   7075 
   7076    if (!x) x = &dummy;
   7077    if (!y) y = &dummy;
   7078    if (!comp) comp = &dummy;
   7079 
   7080    stbi__rewind(s);
   7081 
   7082    // Get identifier
   7083    p = (char) stbi__get8(s);
   7084    t = (char) stbi__get8(s);
   7085    if (p != 'P' || (t != '5' && t != '6')) {
   7086        stbi__rewind(s);
   7087        return 0;
   7088    }
   7089 
   7090    *comp = (t == '6') ? 3 : 1;  // '5' is 1-component .pgm; '6' is 3-component .ppm
   7091 
   7092    c = (char) stbi__get8(s);
   7093    stbi__pnm_skip_whitespace(s, &c);
   7094 
   7095    *x = stbi__pnm_getinteger(s, &c); // read width
   7096    stbi__pnm_skip_whitespace(s, &c);
   7097 
   7098    *y = stbi__pnm_getinteger(s, &c); // read height
   7099    stbi__pnm_skip_whitespace(s, &c);
   7100 
   7101    maxv = stbi__pnm_getinteger(s, &c);  // read max value
   7102 
   7103    if (maxv > 255)
   7104       return stbi__err("max value > 255", "PPM image not 8-bit");
   7105    else
   7106       return 1;
   7107 }
   7108 #endif
   7109 
   7110 static int stbi__info_main(stbi__context *s, int *x, int *y, int *comp)
   7111 {
   7112    #ifndef STBI_NO_JPEG
   7113    if (stbi__jpeg_info(s, x, y, comp)) return 1;
   7114    #endif
   7115 
   7116    #ifndef STBI_NO_PNG
   7117    if (stbi__png_info(s, x, y, comp))  return 1;
   7118    #endif
   7119 
   7120    #ifndef STBI_NO_GIF
   7121    if (stbi__gif_info(s, x, y, comp))  return 1;
   7122    #endif
   7123 
   7124    #ifndef STBI_NO_BMP
   7125    if (stbi__bmp_info(s, x, y, comp))  return 1;
   7126    #endif
   7127 
   7128    #ifndef STBI_NO_PSD
   7129    if (stbi__psd_info(s, x, y, comp))  return 1;
   7130    #endif
   7131 
   7132    #ifndef STBI_NO_PIC
   7133    if (stbi__pic_info(s, x, y, comp))  return 1;
   7134    #endif
   7135 
   7136    #ifndef STBI_NO_PNM
   7137    if (stbi__pnm_info(s, x, y, comp))  return 1;
   7138    #endif
   7139 
   7140    #ifndef STBI_NO_HDR
   7141    if (stbi__hdr_info(s, x, y, comp))  return 1;
   7142    #endif
   7143 
   7144    // test tga last because it's a crappy test!
   7145    #ifndef STBI_NO_TGA
   7146    if (stbi__tga_info(s, x, y, comp))
   7147        return 1;
   7148    #endif
   7149    return stbi__err("unknown image type", "Image not of any known type, or corrupt");
   7150 }
   7151 
   7152 static int stbi__is_16_main(stbi__context *s)
   7153 {
   7154    #ifndef STBI_NO_PNG
   7155    if (stbi__png_is16(s))  return 1;
   7156    #endif
   7157 
   7158    #ifndef STBI_NO_PSD
   7159    if (stbi__psd_is16(s))  return 1;
   7160    #endif
   7161 
   7162    return 0;
   7163 }
   7164 
   7165 #ifndef STBI_NO_STDIO
   7166 STBIDEF int stbi_info(char const *filename, int *x, int *y, int *comp)
   7167 {
   7168     FILE *f = stbi__fopen(filename, "rb");
   7169     int result;
   7170     if (!f) return stbi__err("can't fopen", "Unable to open file");
   7171     result = stbi_info_from_file(f, x, y, comp);
   7172     fclose(f);
   7173     return result;
   7174 }
   7175 
   7176 STBIDEF int stbi_info_from_file(FILE *f, int *x, int *y, int *comp)
   7177 {
   7178    int r;
   7179    stbi__context s;
   7180    long pos = ftell(f);
   7181    stbi__start_file(&s, f);
   7182    r = stbi__info_main(&s,x,y,comp);
   7183    fseek(f,pos,SEEK_SET);
   7184    return r;
   7185 }
   7186 
   7187 STBIDEF int stbi_is_16_bit(char const *filename)
   7188 {
   7189     FILE *f = stbi__fopen(filename, "rb");
   7190     int result;
   7191     if (!f) return stbi__err("can't fopen", "Unable to open file");
   7192     result = stbi_is_16_bit_from_file(f);
   7193     fclose(f);
   7194     return result;
   7195 }
   7196 
   7197 STBIDEF int stbi_is_16_bit_from_file(FILE *f)
   7198 {
   7199    int r;
   7200    stbi__context s;
   7201    long pos = ftell(f);
   7202    stbi__start_file(&s, f);
   7203    r = stbi__is_16_main(&s);
   7204    fseek(f,pos,SEEK_SET);
   7205    return r;
   7206 }
   7207 #endif // !STBI_NO_STDIO
   7208 
   7209 STBIDEF int stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp)
   7210 {
   7211    stbi__context s;
   7212    stbi__start_mem(&s,buffer,len);
   7213    return stbi__info_main(&s,x,y,comp);
   7214 }
   7215 
   7216 STBIDEF int stbi_info_from_callbacks(stbi_io_callbacks const *c, void *user, int *x, int *y, int *comp)
   7217 {
   7218    stbi__context s;
   7219    stbi__start_callbacks(&s, (stbi_io_callbacks *) c, user);
   7220    return stbi__info_main(&s,x,y,comp);
   7221 }
   7222 
   7223 STBIDEF int stbi_is_16_bit_from_memory(stbi_uc const *buffer, int len)
   7224 {
   7225    stbi__context s;
   7226    stbi__start_mem(&s,buffer,len);
   7227    return stbi__is_16_main(&s);
   7228 }
   7229 
   7230 STBIDEF int stbi_is_16_bit_from_callbacks(stbi_io_callbacks const *c, void *user)
   7231 {
   7232    stbi__context s;
   7233    stbi__start_callbacks(&s, (stbi_io_callbacks *) c, user);
   7234    return stbi__is_16_main(&s);
   7235 }
   7236 
   7237 #endif // STB_IMAGE_IMPLEMENTATION
   7238 
   7239 /*
   7240    revision history:
   7241       2.19  (2018-02-11) fix warning
   7242       2.18  (2018-01-30) fix warnings
   7243       2.17  (2018-01-29) change sbti__shiftsigned to avoid clang -O2 bug
   7244                          1-bit BMP
   7245                          *_is_16_bit api
   7246                          avoid warnings
   7247       2.16  (2017-07-23) all functions have 16-bit variants;
   7248                          STBI_NO_STDIO works again;
   7249                          compilation fixes;
   7250                          fix rounding in unpremultiply;
   7251                          optimize vertical flip;
   7252                          disable raw_len validation;
   7253                          documentation fixes
   7254       2.15  (2017-03-18) fix png-1,2,4 bug; now all Imagenet JPGs decode;
   7255                          warning fixes; disable run-time SSE detection on gcc;
   7256                          uniform handling of optional "return" values;
   7257                          thread-safe initialization of zlib tables
   7258       2.14  (2017-03-03) remove deprecated STBI_JPEG_OLD; fixes for Imagenet JPGs
   7259       2.13  (2016-11-29) add 16-bit API, only supported for PNG right now
   7260       2.12  (2016-04-02) fix typo in 2.11 PSD fix that caused crashes
   7261       2.11  (2016-04-02) allocate large structures on the stack
   7262                          remove white matting for transparent PSD
   7263                          fix reported channel count for PNG & BMP
   7264                          re-enable SSE2 in non-gcc 64-bit
   7265                          support RGB-formatted JPEG
   7266                          read 16-bit PNGs (only as 8-bit)
   7267       2.10  (2016-01-22) avoid warning introduced in 2.09 by STBI_REALLOC_SIZED
   7268       2.09  (2016-01-16) allow comments in PNM files
   7269                          16-bit-per-pixel TGA (not bit-per-component)
   7270                          info() for TGA could break due to .hdr handling
   7271                          info() for BMP to shares code instead of sloppy parse
   7272                          can use STBI_REALLOC_SIZED if allocator doesn't support realloc
   7273                          code cleanup
   7274       2.08  (2015-09-13) fix to 2.07 cleanup, reading RGB PSD as RGBA
   7275       2.07  (2015-09-13) fix compiler warnings
   7276                          partial animated GIF support
   7277                          limited 16-bpc PSD support
   7278                          #ifdef unused functions
   7279                          bug with < 92 byte PIC,PNM,HDR,TGA
   7280       2.06  (2015-04-19) fix bug where PSD returns wrong '*comp' value
   7281       2.05  (2015-04-19) fix bug in progressive JPEG handling, fix warning
   7282       2.04  (2015-04-15) try to re-enable SIMD on MinGW 64-bit
   7283       2.03  (2015-04-12) extra corruption checking (mmozeiko)
   7284                          stbi_set_flip_vertically_on_load (nguillemot)
   7285                          fix NEON support; fix mingw support
   7286       2.02  (2015-01-19) fix incorrect assert, fix warning
   7287       2.01  (2015-01-17) fix various warnings; suppress SIMD on gcc 32-bit without -msse2
   7288       2.00b (2014-12-25) fix STBI_MALLOC in progressive JPEG
   7289       2.00  (2014-12-25) optimize JPG, including x86 SSE2 & NEON SIMD (ryg)
   7290                          progressive JPEG (stb)
   7291                          PGM/PPM support (Ken Miller)
   7292                          STBI_MALLOC,STBI_REALLOC,STBI_FREE
   7293                          GIF bugfix -- seemingly never worked
   7294                          STBI_NO_*, STBI_ONLY_*
   7295       1.48  (2014-12-14) fix incorrectly-named assert()
   7296       1.47  (2014-12-14) 1/2/4-bit PNG support, both direct and paletted (Omar Cornut & stb)
   7297                          optimize PNG (ryg)
   7298                          fix bug in interlaced PNG with user-specified channel count (stb)
   7299       1.46  (2014-08-26)
   7300               fix broken tRNS chunk (colorkey-style transparency) in non-paletted PNG
   7301       1.45  (2014-08-16)
   7302               fix MSVC-ARM internal compiler error by wrapping malloc
   7303       1.44  (2014-08-07)
   7304               various warning fixes from Ronny Chevalier
   7305       1.43  (2014-07-15)
   7306               fix MSVC-only compiler problem in code changed in 1.42
   7307       1.42  (2014-07-09)
   7308               don't define _CRT_SECURE_NO_WARNINGS (affects user code)
   7309               fixes to stbi__cleanup_jpeg path
   7310               added STBI_ASSERT to avoid requiring assert.h
   7311       1.41  (2014-06-25)
   7312               fix search&replace from 1.36 that messed up comments/error messages
   7313       1.40  (2014-06-22)
   7314               fix gcc struct-initialization warning
   7315       1.39  (2014-06-15)
   7316               fix to TGA optimization when req_comp != number of components in TGA;
   7317               fix to GIF loading because BMP wasn't rewinding (whoops, no GIFs in my test suite)
   7318               add support for BMP version 5 (more ignored fields)
   7319       1.38  (2014-06-06)
   7320               suppress MSVC warnings on integer casts truncating values
   7321               fix accidental rename of 'skip' field of I/O
   7322       1.37  (2014-06-04)
   7323               remove duplicate typedef
   7324       1.36  (2014-06-03)
   7325               convert to header file single-file library
   7326               if de-iphone isn't set, load iphone images color-swapped instead of returning NULL
   7327       1.35  (2014-05-27)
   7328               various warnings
   7329               fix broken STBI_SIMD path
   7330               fix bug where stbi_load_from_file no longer left file pointer in correct place
   7331               fix broken non-easy path for 32-bit BMP (possibly never used)
   7332               TGA optimization by Arseny Kapoulkine
   7333       1.34  (unknown)
   7334               use STBI_NOTUSED in stbi__resample_row_generic(), fix one more leak in tga failure case
   7335       1.33  (2011-07-14)
   7336               make stbi_is_hdr work in STBI_NO_HDR (as specified), minor compiler-friendly improvements
   7337       1.32  (2011-07-13)
   7338               support for "info" function for all supported filetypes (SpartanJ)
   7339       1.31  (2011-06-20)
   7340               a few more leak fixes, bug in PNG handling (SpartanJ)
   7341       1.30  (2011-06-11)
   7342               added ability to load files via callbacks to accomidate custom input streams (Ben Wenger)
   7343               removed deprecated format-specific test/load functions
   7344               removed support for installable file formats (stbi_loader) -- would have been broken for IO callbacks anyway
   7345               error cases in bmp and tga give messages and don't leak (Raymond Barbiero, grisha)
   7346               fix inefficiency in decoding 32-bit BMP (David Woo)
   7347       1.29  (2010-08-16)
   7348               various warning fixes from Aurelien Pocheville
   7349       1.28  (2010-08-01)
   7350               fix bug in GIF palette transparency (SpartanJ)
   7351       1.27  (2010-08-01)
   7352               cast-to-stbi_uc to fix warnings
   7353       1.26  (2010-07-24)
   7354               fix bug in file buffering for PNG reported by SpartanJ
   7355       1.25  (2010-07-17)
   7356               refix trans_data warning (Won Chun)
   7357       1.24  (2010-07-12)
   7358               perf improvements reading from files on platforms with lock-heavy fgetc()
   7359               minor perf improvements for jpeg
   7360               deprecated type-specific functions so we'll get feedback if they're needed
   7361               attempt to fix trans_data warning (Won Chun)
   7362       1.23    fixed bug in iPhone support
   7363       1.22  (2010-07-10)
   7364               removed image *writing* support
   7365               stbi_info support from Jetro Lauha
   7366               GIF support from Jean-Marc Lienher
   7367               iPhone PNG-extensions from James Brown
   7368               warning-fixes from Nicolas Schulz and Janez Zemva (i.stbi__err. Janez (U+017D)emva)
   7369       1.21    fix use of 'stbi_uc' in header (reported by jon blow)
   7370       1.20    added support for Softimage PIC, by Tom Seddon
   7371       1.19    bug in interlaced PNG corruption check (found by ryg)
   7372       1.18  (2008-08-02)
   7373               fix a threading bug (local mutable static)
   7374       1.17    support interlaced PNG
   7375       1.16    major bugfix - stbi__convert_format converted one too many pixels
   7376       1.15    initialize some fields for thread safety
   7377       1.14    fix threadsafe conversion bug
   7378               header-file-only version (#define STBI_HEADER_FILE_ONLY before including)
   7379       1.13    threadsafe
   7380       1.12    const qualifiers in the API
   7381       1.11    Support installable IDCT, colorspace conversion routines
   7382       1.10    Fixes for 64-bit (don't use "unsigned long")
   7383               optimized upsampling by Fabian "ryg" Giesen
   7384       1.09    Fix format-conversion for PSD code (bad global variables!)
   7385       1.08    Thatcher Ulrich's PSD code integrated by Nicolas Schulz
   7386       1.07    attempt to fix C++ warning/errors again
   7387       1.06    attempt to fix C++ warning/errors again
   7388       1.05    fix TGA loading to return correct *comp and use good luminance calc
   7389       1.04    default float alpha is 1, not 255; use 'void *' for stbi_image_free
   7390       1.03    bugfixes to STBI_NO_STDIO, STBI_NO_HDR
   7391       1.02    support for (subset of) HDR files, float interface for preferred access to them
   7392       1.01    fix bug: possible bug in handling right-side up bmps... not sure
   7393               fix bug: the stbi__bmp_load() and stbi__tga_load() functions didn't work at all
   7394       1.00    interface to zlib that skips zlib header
   7395       0.99    correct handling of alpha in palette
   7396       0.98    TGA loader by lonesock; dynamically add loaders (untested)
   7397       0.97    jpeg errors on too large a file; also catch another malloc failure
   7398       0.96    fix detection of invalid v value - particleman@mollyrocket forum
   7399       0.95    during header scan, seek to markers in case of padding
   7400       0.94    STBI_NO_STDIO to disable stdio usage; rename all #defines the same
   7401       0.93    handle jpegtran output; verbose errors
   7402       0.92    read 4,8,16,24,32-bit BMP files of several formats
   7403       0.91    output 24-bit Windows 3.0 BMP files
   7404       0.90    fix a few more warnings; bump version number to approach 1.0
   7405       0.61    bugfixes due to Marc LeBlanc, Christopher Lloyd
   7406       0.60    fix compiling as c++
   7407       0.59    fix warnings: merge Dave Moore's -Wall fixes
   7408       0.58    fix bug: zlib uncompressed mode len/nlen was wrong endian
   7409       0.57    fix bug: jpg last huffman symbol before marker was >9 bits but less than 16 available
   7410       0.56    fix bug: zlib uncompressed mode len vs. nlen
   7411       0.55    fix bug: restart_interval not initialized to 0
   7412       0.54    allow NULL for 'int *comp'
   7413       0.53    fix bug in png 3->4; speedup png decoding
   7414       0.52    png handles req_comp=3,4 directly; minor cleanup; jpeg comments
   7415       0.51    obey req_comp requests, 1-component jpegs return as 1-component,
   7416               on 'test' only check type, not whether we support this variant
   7417       0.50  (2006-11-19)
   7418               first released version
   7419 */
   7420 
   7421 
   7422 /*
   7423 ------------------------------------------------------------------------------
   7424 This software is available under 2 licenses -- choose whichever you prefer.
   7425 ------------------------------------------------------------------------------
   7426 ALTERNATIVE A - MIT License
   7427 Copyright (c) 2017 Sean Barrett
   7428 Permission is hereby granted, free of charge, to any person obtaining a copy of
   7429 this software and associated documentation files (the "Software"), to deal in
   7430 the Software without restriction, including without limitation the rights to
   7431 use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
   7432 of the Software, and to permit persons to whom the Software is furnished to do
   7433 so, subject to the following conditions:
   7434 The above copyright notice and this permission notice shall be included in all
   7435 copies or substantial portions of the Software.
   7436 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
   7437 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
   7438 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
   7439 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
   7440 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
   7441 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
   7442 SOFTWARE.
   7443 ------------------------------------------------------------------------------
   7444 ALTERNATIVE B - Public Domain (www.unlicense.org)
   7445 This is free and unencumbered software released into the public domain.
   7446 Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
   7447 software, either in source code form or as a compiled binary, for any purpose,
   7448 commercial or non-commercial, and by any means.
   7449 In jurisdictions that recognize copyright laws, the author or authors of this
   7450 software dedicate any and all copyright interest in the software to the public
   7451 domain. We make this dedication for the benefit of the public at large and to
   7452 the detriment of our heirs and successors. We intend this dedication to be an
   7453 overt act of relinquishment in perpetuity of all present and future rights to
   7454 this software under copyright law.
   7455 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
   7456 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
   7457 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
   7458 AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
   7459 ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
   7460 WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
   7461 ------------------------------------------------------------------------------
   7462 */