Steganography &Watermarking
Hiding data
Paul Krzyzanowski
April 6, 2024
Introduction
Cryptography’s goal is to hide the contents of a message. Steganography’s goal is to hide the very existence of the message. Classic techniques included the use of invisible ink, writing a message on one’s head and allowing the hair to cover it, microdots, and carefully-clipped newspaper articles that together communicate the message.
A null cipher is one where the actual message is hidden among irrelevant data. For example, the message may comprise the first letter of each word (or each sentence, or every second letter, etc.).
Chaffing and winnowing entails the transmission of a bunch of messages, of which only certain ones are legitimate. Each message is signed with a key known only to trusted parties (e.g., a MAC). Intruders can see the messages but can’t validate the signatures to distinguish the valid messages from the bogus ones.
Image steganography
Messages can be embedded into images. This is arguably the most common way of using steganography.
There are three common ways of hiding a message in an image:
Most image formats have fields for storing textual metadata in addition to the image. PNG files, for instance, have a text field and jpeg, tiff, and raw image formats support Exif (Exchangeable image file format) data fields. This shouldn’t be considered steganography, however, since these fields are well-known and hidden only in the sense that they are not part of the image. However, they can be an effective way to transport data covertly and can be used to bypass content filtering firewalls that may consider images to be harmless.
A straightforward method to hide a message in an image is to use low-order bits (least significant bits) of an image, where the user is unlikely to notice slight changes in color. This is known as LSB steganography. An image is a collection of RGB pixels. You can mess around with the least significant bits and nobody will notice changes in the image, so a message can be encodd simply by spreading the bits of the message among the least-significant bits of the image.
You can do a similar thing but apply a frequency domain transformation, like jpeg compression does, by using a Discrete Cosine Transform (DCT). The frequency domain maps the image as a collection ranging from high-frequency areas (e.g., “noisy” parts such as leaves, grass, and edges of things) through low-frequency areas (e.g., a clear blue sky). Changes to high-frequency areas will generally be unnoticed by humans; that’s why jpeg compression works. Because modifications in these regions are unnoticed, you can add the message into those areas and then transform the data back into the spatial (bitmap) domain. Now the message is spread throughout the higher-frequency parts of the image and can be extracted if you do the DCT again and know where to look for the message.
Audio Steganography
Similar to images, audio files can host malware in their least significant bits. Also similar to images, audio steganography can take advantage of the same psychoacoustic analysis that audio compression algorithms use: place the bits in areas where human listeners simply won’t notice the distortion. Techniques like echo hiding, phase coding, and spread spectrum can embed data within audio signals without significantly altering the audio’s perceptible qualities.
In 2024, researchers at Meta created AudioSeal, a new technique to add and detect hidden watermarks in AI-generated speech. A specific goal of this is to make it possible to detect watermarks in snippets of audio to identify its use in deepfakes. It hides 32 bits of watermark data in one-second audio segments, ensuring that the watermark can be detected even if parts of the audio are cropped.
AudioSeal uses two neural networks: one to generate the watermark and another to detect it. It uses a training method to minimize the perceived distortion between the original and watermarked audio while maximizing the detection of the watermark. As part of the training, the audio is altered through various techniques (bandpass filter, boost audio, duck audio, echo, highpass filter, lowpass filter, pink noise, gaussian noise, slower, smooth, resample) to increase the likelihood that the watermark will survive the recoding or compression of the audio.
While AudioSeal produces the best results of any audio watermarking technology to date, it is still subject to adversarial attacks. Specifically, the more information about the algorithm is disclosed to attackers, the easier it is to mount an attack that will obscure the watermark. The authors propose keeping the training parameters secret. AudioSeal is freely available on github.
Video and Network Steganography
Video files are a largely a combination of audio and images (more commonly images and then motion vectors and changes to images). They provide a larger capacity for embedding more data.
Network steganography
Data can also be within network communication. The non-hidden communication can be an innocent data stream. Network steganography can embed additional data in packet headers or timing intervals between packets.
Steganography for malware delivery and exfiltration
Steganography has become a useful mechanism for attackers to deliver malware because malicious data can be hidden in “innocent” content, such as an image, and neither detected nor blocked by content-inspecting firewalls or intrusion detection systems. Similarly, attackers can use steganography to exfiltrate data from an organization by uploading images, audio, or other non-suspicious data.
For example, in April 2024, a report about the SteganoArmor campaign came out. The hacking group TA558 has been using a sophisticated method of delivering malware through the use of steganography, specifically targeting the hospitality and tourism sectors predominantly in Latin America. This method has been implicated in over 320 cyber attacks across various sectors and regions. The attacks exploit a known and old vulnerability in Microsoft Office’s Equation Editor, CVE-2017–11882, which has been patched since 2017 but still poses a threat to systems running outdated software versions.
The SteganoAmor campaign begins with phishing emails that appear benign but contain malicious document attachments, leveraging both Excel and Word formats. These documents exploit the CVE-2017–11882 vulnerability to download a Visual Basic Script from a legitimate online service upon being opened. When the script runs, it downloads a JPEG image from the internet. This image carries a hidden base-64 encoded payload. Subsequently, a PowerShell script embedded within the image downloads the final payload that is hidden in a text file, which then installs the malware. By using compromised SMTP servers, TA558 enhances the likelihood of their phishing emails bypassing standard email filters, as these messages are sent from legitimate domains. This campaign shows a blend of using old vulnerabilities and steganography to orchestrate targeted attacks.
Printers
An application of steganography is found in most modern color laser printers. This technique involves embedding a subtle pattern of nearly invisible yellow dots on every page printed. These dots, while typically undetectable to the naked eye, can be seen under specific lighting conditions or with the aid of magnifying equipment. The purpose of this steganographic method is to encode information directly onto the printed medium.
The encoded data within the yellow dot patterns typically includes the printer’s serial number and the date and time of the document’s printing. This allows each printed page to carry a unique identifier that can trace it back to its source printer, providing a valuable tool for tracking the origin of documents. For example, law enforcement agencies can use the information encoded in the dot patterns to trace counterfeit documents back to the printer that produced them, aiding in criminal investigations and the protection of sensitive information.
Watermarking
Steganography is closely related to watermarking. and the terms “steganography” and “watermarking” are often used interchangeably. Steganography can be thought of as invisible watermarking.]
The primary goal of watermarking is to create an indelible imprint on a message such that an intruder cannot remove or replace the message. It is often used to assert ownership, authenticity, or encode DRM rules. The message may be, but does not have to be, invisible.
The goal of steganography is to allow primarily one-to-one communication while hiding the existence of a message. An intruder – someone who does not know what to look for – cannot even detect the message in the data.