A New Image File Format Efficiently Stores Invisible Light Data (arstechnica.com) 11

Posted by BeauHD on Friday March 28, 2025 @07:50PM from the world-beyond-RGB dept.

An anonymous reader quotes a report from Ars Technica: Imagine working with special cameras that capture light your eyes can't even see -- ultraviolet rays that cause sunburn, infrared heat signatures that reveal hidden writing, or specific wavelengths that plants use for photosynthesis. Or perhaps using a special camera designed to distinguish the subtle visible differences that make paint colors appear just right under specific lighting. Scientists and engineers do this every day, and they're drowning in the resulting data. A new compression format called Spectral JPEG XL might finally solve this growing problem in scientific visualization and computer graphics. Researchers Alban Fichet and Christoph Peters of Intel Corporation detailed the format in a recent paper published in the Journal of Computer Graphics Techniques (JCGT). It tackles a serious bottleneck for industries working with these specialized images. These spectral files can contain 30, 100, or more data points per pixel, causing file sizes to balloon into multi-gigabyte territory -- making them unwieldy to store and analyze.

[...] The current standard format for storing this kind of data, OpenEXR, wasn't designed with these massive spectral requirements in mind. Even with built-in lossless compression methods like ZIP, the files remain unwieldy for practical work as these methods struggle with the large number of spectral channels. Spectral JPEG XL utilizes a technique used with human-visible images, a math trick called a discrete cosine transform (DCT), to make these massive files smaller. Instead of storing the exact light intensity at every single wavelength (which creates huge files), it transforms this information into a different form. [...]

According to the researchers, the massive file sizes of spectral images have reportedly been a real barrier to adoption in industries that would benefit from their accuracy. Smaller files mean faster transfer times, reduced storage costs, and the ability to work with these images more interactively without specialized hardware. The results reported by the researchers seem impressive -- with their technique, spectral image files shrink by 10 to 60 times compared to standard OpenEXR lossless compression, bringing them down to sizes comparable to regular high-quality photos. They also preserve key OpenEXR features like metadata and high dynamic range support. The report notes that broader adoption "hinges on the continued development and refinement of the software tools that handle JPEG XL encoding and decoding."

Some scientific applications may also see JPEG XL's lossy approach as a drawback. "Some researchers working with spectral data might readily accept the trade-off for the practical benefits of smaller files and faster processing," reports Ars. "Others handling particularly sensitive measurements might need to seek alternative methods of storage."

A New Image File Format Efficiently Stores Invisible Light Data

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 11 Comments Log In/Create an Account

Comments Filter:

At first I thought... (Score:3)

by Hans Lehmann ( 571625 ) writes: on Friday March 28, 2025 @08:28PM (#65266897)

At first I thought, well duh, just use EXR since it can accommodate an arbitrary number of layers. Though they're usually used for matte layers, you could assign a layer per range of wavelength, that sort of thing.

What the article seems to be getting at however, I think, is instead of just doing compression in the X & Y direction, also do compression in, I guess, the Z direction? i.e. across the spectrum of interest?

- Re:At first I thought... (Score:5, Interesting)
  
  by VaccinesCauseAdults ( 7114361 ) writes: on Friday March 28, 2025 @08:36PM (#65266917)
  
  Yes, they seem to be compressing in the direction often written as lambda (wavelength). EXR could store the individual wavelengths as MainLayer.450nm, MainLayer.460nm etc instead of MainLayer.R, .G and .B. But it would compress each layer individually, rather than exploit the coherence across X,Y,Lambda and potentially Time
  
Ye olde? (Score:2)

by pipatron ( 966506 ) writes:

That's weird, why wasn't this already standard practice?
The idea of collecting the energy from multiple layers like DSP 101, I first read about it in graphics literature from the early nineties when I started, and multi-spectral images in astronomy is the standard example of where this is useful.
Of course JPEG and many other related formats already do this with the RGB layers but with a hardcoded transform designed as a decent compromise between image perception and being easy to calculate.
- Re: Ye olde? (Score:5, Informative)
  
  by RightwingNutjob ( 1302813 ) writes: on Friday March 28, 2025 @11:09PM (#65267157)
  
  It's not standard practice because for people who collect spectra, the interesting bits are precisely what's lost by compression.
  Compression by DCT saves space by eliminating high-frequency parts of the data. The high-frequency parts of a spectrum correspond precisely to absorption or emission lines that are often (but not always) the thing you want to measure with a hyperspectral imager to begin with.
  
  - Re: Ye olde? (Score:2)
    
    by ahoffer0 ( 1372847 ) writes:
    
    Someone wants to promote a new image format and wants to use Slashdot as a shill? Probably. There usually isn't a good reason to put multi spectral data in the same file. The file gets too big. I've only seen it done in NITF files.
- Re: (Score:2)
  
  by nashv ( 1479253 ) writes:
  
  Nope. Storing any kind of scientific image in JPEG is universally considered a sign of being a moron.
  Source: I do microscopy - CLSM, Electron, Raman, FLIM for a living.
  - Re: (Score:3)
    
    by pipatron ( 966506 ) writes:
    
    I'm explicitly not talking about storing this in JPEG - it's even mentioned as an example of other formats.
    But every format that does compression - even lossless - performs a number of transforms in every dimension, spatial, spectral etc. It's to make it easier to locate redundant information. If the transform and compression is reversible (for example delta-encoding neighbouring pixels and run-length coding), it's lossless.
    Transforming the channels with something like a principal component analysis would
    - Re: (Score:2)
      
      by ceoyoyo ( 59147 ) writes:
      
      The article is specifically about OpenEXR and adding a compression step that reduces dynamic range in high frequency components in the wavelength axis. As someone said, for scientific data, those components are often what you're interested in, and you're unlikely to cram your ultra-expensive imaging data through lossy compression anyway. The paper is specifically about computer graphics rendering though, so probably useful. The result is then run through standard JPEG XL compression on the spatial dimension
From Wikipedia (Score:2)

by Pf0tzenpfritz ( 1402005 ) writes:

The bitstream was informally frozen on 24 December 2020 with the release of version 0.2 of the libjxl reference software.[15] The file format and core coding system were formally standardized on 13 October 2021 and 30 March 2022 respectively.
Spectral JPEG isn't really new.
vs. FITS with Rice compression? (Score:3)

by oneiros27 ( 46144 ) writes: on Saturday March 29, 2025 @09:18AM (#65267659) Homepage

They compared how well it compresses relative to lossless, but is that compressed lossless, or uncompressed? And how does it affect 'doing science' with the data?
Most non-earth science imaging data (astronomy and solar physics, but also medical and even some archival document scans) uses FITS (Flexible Image Transport System) or variations of it. You can then compress the data portion of the file, with most groups using Rice (aka Golomb) compression.
It supports data cubes and higher dimensional data (you can either stack multiple data segments into a file, or define multiple dimensions and how they're organized within the data segment, then give the bytestream for the data)
VOTable is derived from FITS, but has an XML header to get around some of the quirks of the FITS header (designed for punch cards and before bytes were standardized at 8bits, so you have 12 character max on variable names; need to use continues to store long strings (which older libraries will truncate), and the headers are always a multiple of 2880 bits (with some groups padding it out so they can splice in new variables without needing to re-write the whole file). But VOTable is mostly used for data tables, not images or other binary data.
I know that DKIST (solar telescope) decided to use one of the earth science data standards ASDF (Advanced Scientific Data Format), which I think is partly derived from FITS, but uses YAML for its headers. ... but I got out of the field before DKIST went live.
(They were originally planning on using HDF5, which they might still be using for their level 0 data) ...
The big issue with lossy compression is that what's considered noise for one researcher might be exactly what another researcher is looking for. There have been compression schemes with variable loss... where they keep the features of interest in high fidelity, but compress the other parts of the image ... so you have the context for the features of interest, but without the extra storage space or trying to store it as multiple files.
Years ago, when we started seeing the shift towards 'computer vision', I proposed that we needed to make an archive of test images and detection algorithms... so that when yet another compression scheme came along, we could compress images, decompress them, and re-run the detection routines... this would then tell us if the compression scheme screwed some groups by creating an unacceptable level of false positives or false negatives. ... I unfortunately never managed to find the right ear for it, and AISRP had just shut down due to sequestration issues.

- Re: (Score:2)
  
  by RockDoctor ( 15477 ) writes:
  
  I came to say "isn't the EXISTING standard for hyperspectral data FITS?" But you said it better.
  That's combining data from the long-wavelength radio pretty continuously up to gamma rays that are as likely to blow apart the nuclei of your detector as to be registered as a signal.
  You could make an argument the FITS is not a desktop-friendly format, and is more suited to command-line data processing pipelines. So? Who is actually going to be using hyperspectral data as hyperspectral data without being involv

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

A New Image File Format Efficiently Stores Invisible Light Data (arstechnica.com) 11

A New Image File Format Efficiently Stores Invisible Light Data More Login

A New Image File Format Efficiently Stores Invisible Light Data

At first I thought... (Score:3)

Re:At first I thought... (Score:5, Interesting)

Ye olde? (Score:2)

Re: Ye olde? (Score:5, Informative)

Re: Ye olde? (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

From Wikipedia (Score:2)

vs. FITS with Rice compression? (Score:3)

Re: (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot