Deltacode Scoring
Delta Objects
A File-Level Comparison of Two Codebases
A Delta object represents the file-level comparison (i.e., the “delta”) of two codebases, typically
two versions of the same codebase, using ScanCode-generated JSON
output files as input for the
comparison process.
Based on how the user constructs the command-line input, DeltaCode’s naming convention treats one codebase as the “new” codebase and the other as the “old” codebase::
deltacode -n [path to the 'new' codebase] -o [path to the 'old' codebase] [...]
Basic Scoring
A DeltaCode codebase comparison produces a collection of file-level Delta objects. Depending on
the nature of the file-level change between the two codebases, each Delta object is characterized
as belonging to one of the categories listed below. Each category has an associated score intended
to convey its potential importance – from a license/copyright compliance perspective – to a
user’s analysis of the changes between the new
and old
codebases.
In descending order of importance, the categories are:
added
: A file has been added to thenew
codebase.modified
: The file is contained in both thenew
andold
codebase and has been modified (as reflected, among other things, by a change in the file’ssha1
attribute).moved
: The file is contained in both thenew
andold
codebase and has been moved but not modified.removed
: A file has been removed from theold
codebase.unmodified
: The file is contained in both thenew
andold
codebase and has not been modified or moved.
Note
Files are determined to be Moved by looping thru the added and removed Delta objects and checking their sha1 values.
The score of a Delta object characterized as added
or modified
may be increased based on
the detection of license- and/or copyright-related changes. See
License Additions and Changes and Copyright Holder Additions and Changes below.
Delta Object Fields and Values
Each Delta object includes the following fields and values:
factors
: One or more strings representing the factors that characterize the file-level comparison and resulting score, e.g., in JSON format::"factors": [ "added", "license info added", "copyright info added" ],
score
: A number representing the magnitude/importance of the file-level change – the higher the score, the greater the change.new
: The ScanCode-based file attributes (path
,licenses
,copyrights
etc.) for the file in the codebase designated by the user asnew
.old
: The ScanCode-based file attributes for the file in the codebase designated by the user asold
.
Note that an added
Delta object will have a new
file but no old
file, while a
removed
Delta object will have an old
file but not a new
file. In each case, the
new
and old
keys will be present but the value for the missing file will be null
.
License Additions and Changes
Certain file-level changes involving the license-related information in a Delta object will increase the object’s score.
An
added
Delta object’s score will be increased:If the
new
file contains one or more licenses (factors
will includelicense info added
).If the the
new
file contains any of the following Commercial/Copyleft license categories (factors
will include, e.g.,copyleft
added):‘Commercial’
‘Copyleft’
‘Copyleft Limited’
‘Free Restricted’
‘Patent License’
‘Proprietary Free’
A
modified
Delta object’s score will be increased:If the
old
file has at least one license and thenew
file has no licenses (factors
will includelicense info removed
).If the
old
file has no licenses and thenew
file has at least one license (factors
will includelicense info added
).If both the
old
file andnew
file have at least one license and the license keys are not identical (e.g., theold
file includes anmit
license and anapache-2.0
license and thenew
file includes only anmit
license) (factors
will includelicense
change).If any of the Commercial/Copyleft license categories listed above are found in the
new
file but not in theold
file (factors
will include, e.g.,proprietary free added
).
Copyright Holder Additions and Changes
An
added
Delta object’s score will be increased if thenew
file contains one or more copyrightholders
(factors
will includecopyright info added
).A
modified
Delta object’s score will be increased:If the
old
file has at least one copyrightholder
and thenew
file has no copyright holders (factors
will includecopyright info removed
).If the
old
file has no copyrightholders
and thenew
file has at least one (actors
will includecopyright info added
).If both the
old
file andnew
file have at least one copyrightholder
and theholders
are not identical (factors
will includecopyright
change).
Moved, Removed and Unmodified
As noted above in Basic Scoring Basic Scoring, from a license/copyright compliance
perspective, the three least significant Delta categories are moved
, removed
and
unmodified
.
In the current version of DeltaCode, each of these three categories is assigned a score of 0, with no options to increase that score depending on the content of the Delta object.
However, it is possible that both moved
and removed
will be assigned some non-zero score in
a future version. In particular, removed
could be significant from a compliance viewpoint
where, for example, the removal of a file results in the removal of a Commercial/Copyleft license
obligation.