December 15th, 2008 by ahoog

NSRL (National Software Reference Library)

The National Software Reference Library (NSRL) is a collection of traceable software which is then processed and provided to the public (primarily law enforcement and forensic analysts) for use in investigations.  Each file’s identifying information is stored in a database and the following is computed and shared in a Reference Data Set (RDS) every 3 months:

  1. Cryptographic hash values (MD5 and SHA-1) of the file’s content;
  2. File’s origin including the software package(s) containing the file and the manufacturer of the package;
  3. Original name and file size

Traceability

Often there is confusion over the types of software that the NSRL tracks.  One important goal of the NSRL is that the information will be accepted in court.  As such, they maintain tight control over the software (originals are stored in a vault and can be used to prove traceability) and adhere to strict standards.

Known (vs. Known Good and Known Bad)

Also, the NSRL does not identify software good or malicious but instead provides a simple automated file classification.  The reasoning for this approach is that software can be malicious is some settings and not in others.  If you are using the NSRL to eliminate files you must analyze during an investigation, it is important to review the “ApplicationType” field and only eliminate files you deem irrelevant.  The NSRL FAQ nicely sums things up by stating the files in the NSRL database are “known” – NOT “known good” OR “known bad” – just “known application files.”

NSRL and Hashkeeper

One final area to note is the differences between NIST’s NSRL and NDIC’s Hashkeeper.  Again, the NSRL FAQ addresses this nicely so the following is from their website:

The NSRL RDS and the NDIC’s Hashkeeper are collections of File Identification Information (FII) which are typically used to identify computer files during forensic investigations of computer systems. The principal differences between the two collections are as follows:

  1. Provenance. All NSRL data is derived from purchased or donated software which is retained in a secure facility at NIST. NSRL FII is thus traceable to the original software. Hashkeeper is a central repository of FII donated by various sources, usually obtained by law enforcement during the course of forensic investigations of suspect systems. Hashkeeper FII is not traceable to its source.
  2. Court worthiness. The FII published by the NSRL is designed to be admissible as evidence in a U.S. court of law: NSRL data is traceable and verifiable back to the original software packages which are maintained under evidence locker conditions. The Hashkeeper collection is designed to be of use to forensic investigators but is not generally admissible as evidence in a court of law.
  3. Scope. NSRL data is strictly limited to that which can be traced to physical installation media which can be obtained and held as evidence at NIST. Hashkeeper data has a potentially broader scope, dependent only on the nature and quantity of FII that law enforcement officials decide to donate.
  4. Illicit file data. The NSRL is prohibited by law from obtaining and storing certain forms of illicit file data (e.g. child pornography images) and contains no FII on such data. The Hashkeeper collection is known to contain FII on illicit file data.
  5. Specificity of data. The NSRL uses five measures to identify each file (the SHA-1 cryptographic hash, the MD5 cryptographic hash, the CRC32 checksum, the file name and the file size), whereas Hashkeeper supplies only the file name and the MD5 cryptographic hash. There is thus a lower level of confidence in the Hashkeeper data.
  • Share/Bookmark

2 comments to NSRL (National Software Reference Library)

  • Brian Deering

    HashKeeper, the predecessor of the NSRL, and the NSRL were created for different uses. HashKeeper was created to reduce the time required by those who conduct forensic examinations of seized hard drives.

    HashKeeper stores hash values of “known to be good” files, such as those from an install of a software package, and hash values from “known to be bad files, such as those from child pornography, malware and planted files. When examiners compare the known to be bad hash values against the hash values from an unknown file system they can quickly and reliably focus their attentions on those files most likely to be probative. Matches against known to be good file hashes allow the examiner to reliably overlook those files on the unknown file system.

    HashKeeper was intended to be open so that investigators could share their efforts with other investigators much the way law enforcement has been sharing their efforts for the last hundred years or so.

    Provenance – HashKeeper was never intended to prove provenance. Provenance only becomes an issue in copyright crimes.

    Admissibility – This is a curious issue. When known to be good hash values are used to eliminate files from an examination admissibility is a non-issue. In the American legal system, and I suspect the majority of legal systems, no one is required to prove the provenance or worry about admissibility of things not presented in court. As for matches with known to be bad files the match alone is meaningless (the whole probability thing). The responsibility still rests with the examiner to show that what’s being introduced is probative.

    Specificity – HashKeeper started with MD5 hash values. MD5 is reliable. The likelihood of a false positive or false negative with MD5 is really, really, really small and then really, really smaller than that. Adding SHA-1 and CRC 32 to the comparisons makes an already absurdly unlikely event more absurdly unlikely. The benefit is unclear.

  • 2cWdUu Thanks for good post

You must be logged in to post a comment.