Efficient Data Cleaning, Data Organization and Management

An advanced exploration company inherited >1M unstructured digital files on Hard Drives, CDs, magnetic tapes, and scanned PDF documents from previous operators for a Past Producing Zinc Mine. They needed a systematic approach to organizing the data into a structure the company would adopt moving forward and needed guidance on data management of spatial datasets.

THE OPPORTUNITY

The sizeable dataset contained numerous duplications and iterations of files that were not intuitively named making the dataset nearly impossible to organize in a reasonable time frame. The company needed to access their technical files quickly and were generating new files on a daily basis which required immediate data organization solutions in addition to needing to find priority files to support the client’s critical milestones of building a drillhole database and spatial datasets. 

THE CHALLENGE

Combining the expertise of Software Development, Geomatics, and Geology, Orix used proprietary scripts that created a database to index and fingerprint the entire dataset.  This was then used to identify and remove true duplicates as well as any extraneous or deletable files with consideration for domain specific files and folder structures, which allowed for ~150,000 unique and priority technical data to be parsed out.  Additionally, Orix organized, designed and applied a naming convention, and built base Workspaces in MapInfo.  The work was completed in two phases, applying scripts to identify newly created files for Phase 2 once all historical data had been organized.

THE ORIX SOLUTION

A final dataset of clean technical files, organized by deposit, used as the foundation of technical file storage and structure as the client advanced their project to Preliminary Economic Assessment. 

THE RESULT