A simple way to view documents safely
- Tutorials
- by Jacob Riggs
- 02-04-2020
Working in the media industry has shown me that the weakness of many journalists is their desire to click on and open things that promise something they want. This presents a security risk, one often compounded by the demanding speed at which reporters must compete to ingest information and make rapid editorial decisions. The reality is that journalism commands a commitment to discovery, fact-finding, and meeting strict publishing deadlines. Because of this, many non-technical journalists consider a habitual routine of basic security rituals a wasteful employment of their already limited time, which means I am often relied on, by those I know, to validate if particular files are safe to open.
Validating whether or not a file is malicious in nature requires a more laborious approach than simply viewing the content safely (it’s important to understand the distinction here). Like malware, legitimate files naturally differ in size and format, and attackers are always adapting their exploitation techniques to maximise the efficiency of their payloads evading detection. If the volume of data is significant, malware analysis can occupy a considerable amount of time. In most cases, simply being able to open a document in a safe and readable format is sufficient to satisfy the necessary security balance, and there are tools that can help facilitate this.
Enter dangerzone
Dangerzone is a simple open-source application that can safely open a variety of documents (PDFs, Microsoft Office, LibreOffice, images, etc) in a sandbox environment. This is a tool I have used on the fly in substitution of Qubes, and something I would recommend for non-technical audiences aiming to safely view the contents of common files themselves.
In short, dangerzone leverages Linux containers to sandbox files, flattens the content into images, then uses optical character recognition (OCR) to produce a safe searchable text layer which is output into a safe-to-view PDF. Dangerzone can be installed on Windows, Mac, and Linux, and is capable of converting the following document formats: .pdf, .docx, .doc, xlsx, .xls, .pptx, .ppt, .odt, .ods, .odg, .jpg, .jpeg, .gif, .png, .tif, .tiff
Download
Windows | Mac | Linux |
---|
How does it work?
Dangerzone uses Linux containers (two of them), which are sort of like quick, lightweight virtual machines that share the Linux kernel with their host. The easiest way to get containers running on Mac and Windows is by using Docker Desktop. So when you first install dangerzone, if you don’t already have Docker Desktop installed, it helps you download and install it. When dangerzone starts containers, it disables networking, and the only file it mounts is the suspicious document itself. So if a malicious document hacks the container, it doesn’t have access to your data and it can’t use the internet.
The first container:
- Mounts a volume with the original document
- Uses LibreOffice or GraphicsMagick to convert original document to a PDF
- Uses poppler to split PDF into individual pages, and to convert those to PNGs
- Uses GraphicsMagick to convert PNG pages to RGB pixel data
- Stores RGB pixel data in separate volume
Then that container quits. A second container starts and:
- Mounts a volume with the RGB pixel data
- If OCR is enabled, GraphicsMagick converts RGB pixel data to PNGs, and Tesseract converts PNGs to searchable PDFs
- Otherwise uses GraphicsMagick to convert RGB pixel data into flat PDFs
- Uses poppler to merge PDF pages into a single multipage PDF
- Uses ghostscript to compress final save PDF
- Stores safe PDF in separate volume
- Then that container quits, and the user can open the newly created safe PDF.
Credit: Micah Lee, https://tech.firstlook.media
Further points to consider
- Converting a file can damage the integrity of any original file content.
- Relying on the availability and expertise of third-party analysis could be unnecessarily exposing them to potentially confidential information.
- Encrypted, password protected, or compressed files are naturally obfuscated, which attackers may rely on to circumvent network, application, and anti-virus security controls.
- Uploading potentially sensitive files to cloud-based file scanning services such as VirusTotal may risk loss of confidentiality. Such services automatically index file signatures into public databases as ‘artifacts’, which APTs may correlate against known signatures (such as leaked documents) to identify what source material you possess.