Deltacode Scoring
Delta Objects
A File-Level Comparison of Two Codebases
A Delta object represents the file-level comparison (i.e., the “delta”) of two codebases, typically
two versions of the same codebase, using ScanCode-generated JSON output files as input for the
comparison process.
Based on how the user constructs the command-line input, DeltaCode’s naming convention treats one codebase as the “new” codebase and the other as the “old” codebase::
deltacode -n [path to the 'new' codebase] -o [path to the 'old' codebase] [...]
Basic Scoring
A DeltaCode codebase comparison produces a collection of file-level Delta objects. Depending on
the nature of the file-level change between the two codebases, each Delta object is characterized
as belonging to one of the categories listed below. Each category has an associated score intended
to convey its potential importance – from a license/copyright compliance perspective – to a
user’s analysis of the changes between the new and old codebases.
In descending order of importance, the categories are:
added: A file has been added to thenewcodebase.modified: The file is contained in both thenewandoldcodebase and has been modified (as reflected, among other things, by a change in the file’ssha1attribute).moved: The file is contained in both thenewandoldcodebase and has been moved but not modified.removed: A file has been removed from theoldcodebase.unmodified: The file is contained in both thenewandoldcodebase and has not been modified or moved.
Note
Files are determined to be Moved by looping thru the added and removed Delta objects and checking their sha1 values.
The score of a Delta object characterized as added or modified may be increased based on
the detection of license- and/or copyright-related changes. See
License Additions and Changes and Copyright Holder Additions and Changes below.
Delta Object Fields and Values
Each Delta object includes the following fields and values:
factors: One or more strings representing the factors that characterize the file-level comparison and resulting score, e.g., in JSON format::"factors": [ "added", "license info added", "copyright info added" ],
score: A number representing the magnitude/importance of the file-level change – the higher the score, the greater the change.new: The ScanCode-based file attributes (path,licenses,copyrightsetc.) for the file in the codebase designated by the user asnew.old: The ScanCode-based file attributes for the file in the codebase designated by the user asold.
Note that an added Delta object will have a new file but no old file, while a
removed Delta object will have an old file but not a new file. In each case, the
new and old keys will be present but the value for the missing file will be null.
License Additions and Changes
Certain file-level changes involving the license-related information in a Delta object will increase the object’s score.
An
addedDelta object’s score will be increased:If the
newfile contains one or more licenses (factorswill includelicense info added).If the the
newfile contains any of the following Commercial/Copyleft license categories (factorswill include, e.g.,copyleftadded):‘Commercial’
‘Copyleft’
‘Copyleft Limited’
‘Free Restricted’
‘Patent License’
‘Proprietary Free’
A
modifiedDelta object’s score will be increased:If the
oldfile has at least one license and thenewfile has no licenses (factorswill includelicense info removed).If the
oldfile has no licenses and thenewfile has at least one license (factorswill includelicense info added).If both the
oldfile andnewfile have at least one license and the license keys are not identical (e.g., theoldfile includes anmitlicense and anapache-2.0license and thenewfile includes only anmitlicense) (factorswill includelicensechange).If any of the Commercial/Copyleft license categories listed above are found in the
newfile but not in theoldfile (factorswill include, e.g.,proprietary free added).
Copyright Holder Additions and Changes
An
addedDelta object’s score will be increased if thenewfile contains one or more copyrightholders(factorswill includecopyright info added).A
modifiedDelta object’s score will be increased:If the
oldfile has at least one copyrightholderand thenewfile has no copyright holders (factorswill includecopyright info removed).If the
oldfile has no copyrightholdersand thenewfile has at least one (actorswill includecopyright info added).If both the
oldfile andnewfile have at least one copyrightholderand theholdersare not identical (factorswill includecopyrightchange).
Moved, Removed and Unmodified
As noted above in Basic Scoring Basic Scoring, from a license/copyright compliance
perspective, the three least significant Delta categories are moved, removed and
unmodified.
In the current version of DeltaCode, each of these three categories is assigned a score of 0, with no options to increase that score depending on the content of the Delta object.
However, it is possible that both moved and removed will be assigned some non-zero score in
a future version. In particular, removed could be significant from a compliance viewpoint
where, for example, the removal of a file results in the removal of a Commercial/Copyleft license
obligation.