A simple way to view documents safely

Tutorials
by Jacob Riggs
02-04-2020

Your vote is:

4.83 of 69 votes

Working in the media industry has shown me that the weakness of many journalists is their desire to click on and open things that promise something they want. This presents a security risk, one often compounded by the demanding speed at which reporters must compete to ingest information and make rapid editorial decisions. The reality is that journalism commands a commitment to discovery, fact-finding, and meeting strict publishing deadlines. Because of this, many non-technical journalists consider a habitual routine of basic security rituals a wasteful employment of their already limited time, which means I am often relied on, by those I know, to validate if particular files are safe to open.

Validating whether or not a file is malicious in nature requires a more laborious approach than simply viewing the content safely (it’s important to understand the distinction here). Like malware, legitimate files naturally differ in size and format, and attackers are always adapting their exploitation techniques to maximise the efficiency of their payloads evading detection. If the volume of data is significant, malware analysis can occupy a considerable amount of time. In most cases, simply being able to open a document in a safe and readable format is sufficient to satisfy the necessary security balance, and there are tools that can help facilitate this.

Enter dangerzone

Dangerzone is a simple open-source application that can safely open a variety of documents (PDFs, Microsoft Office, LibreOffice, images, etc) in a sandbox environment. This is a tool I have used on the fly in substitution of Qubes, and something I would recommend for non-technical audiences aiming to safely view the contents of common files themselves.

Dangerzone GUI

In short, dangerzone leverages Linux containers to sandbox files, flattens the content into images, then uses optical character recognition (OCR) to produce a safe searchable text layer which is output into a safe-to-view PDF. Dangerzone can be installed on Windows, Mac, and Linux, and is capable of converting the following document formats: .pdf, .docx, .doc, xlsx, .xls, .pptx, .ppt, .odt, .ods, .odg, .jpg, .jpeg, .gif, .png, .tif, .tiff

Download

Windows	Mac	Linux

_{Please note, dangerzone is only a file converter. It does not perform malware analysis and is not a detection utility. Using dangerzone to convert a suspect file into a document you can view safely does not mean the original source file is clean.}

How does it work?

Dangerzone uses Linux containers (two of them), which are sort of like quick, lightweight virtual machines that share the Linux kernel with their host. The easiest way to get containers running on Mac and Windows is by using Docker Desktop. So when you first install dangerzone, if you don’t already have Docker Desktop installed, it helps you download and install it. When dangerzone starts containers, it disables networking, and the only file it mounts is the suspicious document itself. So if a malicious document hacks the container, it doesn’t have access to your data and it can’t use the internet.

The first container:

Mounts a volume with the original document
Uses LibreOffice or GraphicsMagick to convert original document to a PDF
Uses poppler to split PDF into individual pages, and to convert those to PNGs
Uses GraphicsMagick to convert PNG pages to RGB pixel data
Stores RGB pixel data in separate volume

Then that container quits. A second container starts and:

Mounts a volume with the RGB pixel data
If OCR is enabled, GraphicsMagick converts RGB pixel data to PNGs, and Tesseract converts PNGs to searchable PDFs
Otherwise uses GraphicsMagick to convert RGB pixel data into flat PDFs
Uses poppler to merge PDF pages into a single multipage PDF
Uses ghostscript to compress final save PDF
Stores safe PDF in separate volume
Then that container quits, and the user can open the newly created safe PDF.

_{Credit: Micah Lee, https://tech.firstlook.media}

Further points to consider

Converting a file can damage the integrity of any original file content.
Relying on the availability and expertise of third-party analysis could be unnecessarily exposing them to potentially confidential information.
Encrypted, password protected, or compressed files are naturally obfuscated, which attackers may rely on to circumvent network, application, and anti-virus security controls.
Uploading potentially sensitive files to cloud-based file scanning services such as VirusTotal may risk loss of confidentiality. Such services automatically index file signatures into public databases as ‘artifacts’, which APTs may correlate against known signatures (such as leaked documents) to identify what source material you possess.

ABOUT THE AUTHOR

Follow Jacob Riggs

Jacob Riggs

Jacob Riggs is a senior cyber security professional based in the UK with over a decade of experience working to improve the cyber security of various private, public, and third sector organisations. His contributions focus on expanding encryption tools, promoting crypto-anarchist philosophy, and pioneering projects centred on leveraging cryptography to protect the privacy and political freedoms of others.

E3FE 4B44 56F5 69BE 76C1 E169 E3C7 0A52 9AEF DB6F