Thursday, June 25, 2009

Something's rotten in the realm of watermarking?


2009 is a bad year for watermarking conferences. All of them (IH, MMSEC, IWDW, SPIE, ...) are facing a low submission number. There are certainly too many yearly conferences, but no PC wants to kill its child. Too bad! Clearly, publishing papers is no longer a problem.

Some think this is due to the brand new "yet another conference on content security", ie. WIFS. THE conference which would wipe out all the others. But no! There were 120 submissions. Not enough for the organizers who decided to restrain their ambition, as I have been told. WIFS might be single threaded without any poster session (enter the rumormill!). Maybe, they were expecting too much for a first edition.

Some think this is due to the financial crisis. Researchers have no money for traveling. The Earth thanks them for the saved CO2. But no! Conferences of our cousins the crypto broke submission records in 2009.

So? what is rotten in the realm of watermarking? Time to move on to another realm? Which one?

But there is a glimpse of hope at the end of the 2009 black tunnel. According to the latest Journal Citation Report, the TIFS journal's impact factor has jumped from 1.089 to 2.23. It is currently in the first third of the JCR ranking for electrical engineering journals, which is really good.

Conclusion: Stop calling your travel agency, and write long deep and damn good journal articles!

Miss Cucumber switching the gossip radio off. 

Tuesday, June 16, 2009

Who judges? Who does science?

No need for links here. One day or another, we all stumbled upon one of these papers in which the hidden payload is a binary bitmap logo. Maybe it is deeply rooted in one of the founding papers of the field.

Anyway. The point is: Are we doing science or not?

If we are really doing some science, we are preferably to talk about quantities like capacity. In real life with sound experiments, it boils down to compute/simulate the Bit Error Rate (BER).

But then people come around and say that a judge will better recognize a logo than the fact that a probability is quite low. We agree. A judge does not do science.

But it is always easier to make a covert channel with a given BER carry a given logo, than to compare two given data-hiding schemes only based on the visual appearance of the recovered logos.

We all aim at doing science. So please: No more logos. Please.

On image steganography (as people do it)

For sure, steganography is becoming more and more interesting.

But hey! What about all those papers dealing with bitmap images?

Do any of you Estimated Readers ever share any images in BMP or PGM format? Do you all trust these awesome steganographic rates?

Images in GIF format are encumbered with patents and JPEG2000, apart from Digital Cinema, is mostly unused on the WWW. Old school JPEG still rules.

Guess what? Due to the DCT energy compaction properties, it is pretty sure that the actual steganographic rates are to be found much lower than what claimed for bitmap images.

And it is because of the popularity of JPEG, not because of distortion reasons.

The real issue is not how many bits are to be hidden into lena.pgm, rather in lena.jpg. Even better: how many bits can be hidden into a bunch of contents?

Distortion-free 3D steganography!

Most of us think as data-hiding as a communication problem with side-information at the encoder. This widely accepted view has led to dramatic improvements of data-hiding techniques over the years. As a direct consequence, any watermark can be seen as a noise that is added to the host content.

These days, a new trend is emerging in 3D data-hiding. Such data is twofold: there is geometry and there is connectivity. Roughly speaking, geometry is a bunch of 3D Cartesian samples, and connectivity is the way the vertices connect to one another. Some people, including several of the most respected in the mesh processing area (see this paper by Bogomjakov, Gotsman and Isenburg) [1], want to hide information in the way the connectivity is described: since geometry is not affected, it leads to distortion-free data-hiding!

To us, it reads like "communications at no power" and other "infinite capacity channel" odd stories. Frightening. Sounds like a definitive breakthrough in information theory.

But wait!

There is a problem. The real question is whether connectivity should be considered useful to describe 3D data. The answer is not easy. There is already a host of works dealing with 3D point clouds data and how to synthesize connectivity from scratch. But there is more: Isenburg has done extremely interesting works on some sort of a "ghost geometry" that is already present only in the connectivity information (see his connectivity shapes).

Turn again to data-hiding as we know it: with regularly sampled host contents. Those ones that we love and that do not need description, except the dimensions of the image or the duration of the song.

Now let's make the following crazy assumption: each and every pixel of an image is to be numbered and transmitted separately. The decoder now has to know the neighborhood of every pixel. Yes. And we do not transmit triangles anymore (like for 3D meshes), but quads. That's images with explicit connectivity.

And finally we can do the same thing: distortion-free data-hiding for images. Sounds weird uh?

Interestingly enough, the paper by Bogomjakov et al. states that one needs to have a reference ordering of the vertices, so the decoder can catch the difference with the transmitted connectivity and compute the hidden message. Similar ideas are shared with so-called permutation watermarking. Therefore, there is actually some sort of distortion in their scheme. BTW, their scheme also has a capacity!

Why people transmitting images do not send quads with YUV/RGB values being the attribute data? That's because everyone assumes this reference ordering of the pixels inside an image. Everyone is happy with YUV/RGB values only.

And since there is this implicit public reference ordering of the vertices and the decoder can catch the difference, an adversary should be able to detect the hidden information quite easily. Then why is it called steganography?

So let's make it clear once for all: these guys do data-hiding on graphs. They don't do distortion-free 3D steganography. For sure.



[1] Although Gotsman already did some sort of Karhunen-Loeve Transform based compression for 3D meshes... Either it's not as real-time as claimed (one needs the decoder to compute the basis vectors -- cost: several O(N^3) diagonalizations of ~500x500 Laplacian matrices), or it is not compression (one needs to transmit the basis vectors). The improvement presented here suffers from a problem apparented to the Gibbs phenomenon.

Information Hiding 2009


Information Hiding is one of the oldest workshops on Data Hiding. The 11th edition took place in Darmstadt, June 8-10. The organizers were Stefan Katzenbeisser (TU Darmstadt) and Ahmad Sadeghi (Univ. Bochum).

One third of the papers dealt with Steganography, one third with forensics (active or passive), the others with traitor tracing and exotic applications. This edition faced a low number of submissions, hence, some accepted papers were not so good.

The German community was very impressive and well organized. Several towns/universities clearly carry the label "IT Security" and/or "Multimedia Security" and get huge fundings: Bochum, Darmstadt, Dresden and Magdeburg.

Steganography becomes (at last) a noble science. Papers are more and more theoretical (see Tomas Filler, Andrew Ker or Rainer Bohme very interesting talks). On the contrary, forensics are always very ad-hoc.

The most controversial talk was "Hardware-based public-key cryptography with public physical unclonable functions" by M. Potkonjak. From what I understood: Take a chip implementing a network of XOR gates. The system has w binary inputs, and w outputs. When the input changes from NULL to w-bit message M, many glitches appear at the output before it is stable. Indeed, these astable states of the output depend on the delay of each gate. Therefore, at a given time t (before stability), from one chip to another, the output is very random.

This could be used to identify the chip. But, here, the authors propose to use this for asymmetric cryptography. Basically, Alice publishes the model of her chip (ie. the set of delays). Bob simulates the scenario above thanks to this model. He sends the output C to Alice. Alice has the hardware, she can make a brute force attack to find back M. Eve must software simulate like Bob. Eve must lead a brute force attack like Alice. However, software simulation is much much slower, and a brute force attack is not tractable if the number of gates is big enough. This was quite a controversial talk: "Public key cryptography relies on non-proven conjectures, whereas here, we resort to technological and physical laws preventing the manufacturing of fully identical systems." What a bold statement! No need to say that the cryptos in the room were coughing.

 Miss cucumber, back from Darmstadt