Unstructured Data Project
Large volumes of data in unstructured text formats -- emails, PowerPoint presentations, audit reports, and instant messages—are usually very difficult to access. Board staff is using software tools that filter and categorize large amounts of unstructured text into readable information that can be searched for potential risk indicators that can lead to detection of fraud.
For example, entities receiving federal funds and spending $500,000 or more in one year must be audited by an independent certified public accountant per the Single Audit Act of 1984 and the A-133 guidance from the Office of Management and Budget. When the audits are completed, they are submitted to a central data warehouse, The Federal Audit Clearinghouse (FAC). The audit formats are not standardized and the audits themselves can be hundreds of pages long.
Early in 2013, all single audit reports and summary information from 2008 forward were transmitted from FAC to the Board’s data warehouse. Using key words and phrases, such as “debarment” or “conflict of interest,” Board analysts were able to convert and organize the text into patterns pointing to past behaviors by recipients of federal funds that might be indicative of future fraud. In addition, the analysis can assist Inspectors General (IGs) with their annual audit plans. For instance, to help the IG of Housing and Urban Development with audit planning, Board staff processed unstructured text searches of past audits to identify potential high-risk recipients of Hurricane Sandy funds.
The ability to convert unstructured text into useful information opens up thousands of documents for analysis, providing additional methods for identifying fraud.