Skip to content

Add SPDX analysis utilities#38

Open
tbird20d wants to merge 1 commit into
sony:mainfrom
tbird20d:main
Open

Add SPDX analysis utilities#38
tbird20d wants to merge 1 commit into
sony:mainfrom
tbird20d:main

Conversation

@tbird20d

Copy link
Copy Markdown
Contributor

Add some helper utilities to get information on the status of SPDX-License-Identifier lines in source files reported in esstra data.

'esstra-full-paths' is a brief utility to convert Directories and filenames in 'esstra show' output into full paths (used by has-spdx-id.py) This utility could possibly be replaced with a command line argument to the 'esstra' utility to show full paths. I started looking at that but the output handling in the 'esstra' utility was a bit more complex than I was expecting. That still might be the better way to handle this issue.

'has-spdx-id.py' is a utility to scan a file or list of files and report if they have SPDX-License-Identifier lines. It can also generate a count of files with and without the desired SPDX line.

'esstra-to-spdx-list.sh' is a program to take esstra data (as reported by 'esstra show', or as saved into a standalone file), and pipe it through standard Linux utilities and has-spdx-id.py to generate a report on the status of source files (whether they have SPDX-License-Identifier lines or not).

These tools were used on the Linux kernel, using an Ubuntu (24.4) x86_64 kernel configuration for my Dell Desktop machine. The preliminary results for just the Linux kernel binary file (vmlinux at the top level directory) were:
Files with SPDX lines: 6252
Files without SPDX lines: 577

Note that this omitted other files that would normally be associated with a full kernel built, such as all kernel module sources, and the entry and decompressor code used in a compressed kernel image (such as bzImage)

@tbird20d

Copy link
Copy Markdown
Contributor Author

I noted a few typo bugs in the code comments and the Usage text as I was preparing this pull request. These can be fixed with a follow-up patch if this is accepted. Note that it would be cleaner if 'esstra show' supported a '-f' option to show full paths instead of separate directories and files. That would eliminate the need for the 'esstra-full-paths' tool.

Add some helper utilities to get information on the status of
SPDX-License-Identifier lines in source files reported in esstra data.

'esstra-full-paths' is a brief utility to convert Directories and
filenames in 'esstra show' output into full paths (used by has-spdx-id.py)
This utility could possibly be replaced with a command line argument
to the 'esstra' utility to show full paths.  I started looking at that
but the output handling in the 'esstra' utility was a bit more
complex than I was expecting.  This still might be the better way to
handle this issue.

'has-spdx-id.py' is a utility to scan a file or list of files and report
if they have SPDX-License-Identifier lines.  It can also generate a
count of files with and without the desired SPDX line.

'esstra-to-spdx-list.sh' is a program to take esstra data (as reported
by 'esstra show', or as saved into a standalone file), and pipe it
through standard Linux utilities and has-spdx-id.py to generate a report
on the status of source files (whether they have SPDX-License-Identifier
lines or not).

These tools were used on the Linux kernel, using an Ubuntu (24.4) x86_64 kernel
configuration for my Dell Desktop machine.  The preliminary results for
just the Linux kernel binary file (vmlinux at the top level directory)
were:
  Files with SPDX lines: 6252
  Files without SPDX lines: 577

Note that this omitted other files that would normally be associated
with a full kernel built, such as all kernel module sources, and
the entry and decompressor code used in a compressed kernel image
(such as bzImage)

Signed-off-by: Tim Bird <tim.bird@sony.com>
@tbird20d tbird20d force-pushed the main branch 2 times, most recently from 07e88ca to cb0274d Compare November 14, 2025 19:53
@NamaeTakuya NamaeTakuya removed their assignment Nov 17, 2025
@SadakazuNagao

Copy link
Copy Markdown
Collaborator

@tbird20d
Thank you so much for your helpful suggestions.

We agree with you that adding an option to esstra show to display full paths would be a better solution. We’ll look into this.

As for the tools that check for SPDX-License-Identifier lines, we see them as a way to help achieve the goal of identifying the license for each file.
Please give us a little time to think about how we can best integrate these tools into ESSTRA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants