This document provides a detailed step-by-step example of using the scripts provided in the osmine and osdash Python modules as part of the OPEN-NEXT project's WP2.2 month-18 deliverable. It is an annex to the main documentation here which should be read first. The walkthrough included here will cover identifying an open source hardware project's version control repository; adding the repository's information into an input file for the data-mining script; running the data-mining script; producing a simple, demonstration web dashboard to view the mined data; with an overview of key file types and locations used in the above steps. The target users of the month-18 deliverable are online version control platforms such as Wikifactory or GitHub to add analytics and visualisation capabilities, not general "end users". Therefore, existing knowledge and skills in GNU/Linux server administration, version control with Git, and running Python scripts are needed.
This guide assumes working knowledge of GNU/Linux server administration, version control with Git, and running Python scripts. To follow along, you will need user-level access to such a server with at least the software Git (version 2.7 or later) and Python (version 3.8 or later) installed. You should also have working knowledge of commandline operations in a terminal. Additionally, you should have a GitHub account in good standing with a GitHub personal access token in order to work through the example below.
The primary function of the month-18 deliverable is a set of Python scripts to mine metadata from open source hardware version control repositories. The example repository to mine from is the GitHub repository of the Open Source Rover Project developed by the Jet Propulsion Laboratory (JPL) of the United States National Aeronautics and Space Administration (NASA). This rover is functionally similar to the ones that have been deployed on Mars. With the Open Source Rover project as an example, this section will note some key characteristics of a GitHub-hosted repository to be revisited in later sections.
-
Visit the homepage of the JPL Open Source Rover project in your web browser. Notice that this page presents information about the project including interactive elements, but it is not where the actual design files of the rover are stored and managed. Instead, open source hardware (and software) projects often manage their sources files (such as computer-aided-design (CAD) files or code) using a version control software called Git. The repositories containing these version-controlled files are then hosted on online platforms such as, but not limited to, GitHub.
Figure 1. Screenshot of the JPL Open Source Rover's home page which does not contain its actual design files.
-
The Git repository for the Open Source Rover is hosted on GitHub here:
https://github.com/nasa-jpl/open-source-roverOpen this link in your web browser and you should see main page of the repository which resembles the following:Figure 2. Screenshot of the JPL Open Source Rover GitHub repository's main page.
-
A key advantage of using Git for version control is the ease with which one could record and revisit the history of changes made to files in a repository. Each of the recorded changes is called a "commit". On the GitHub page opened in the previous step, click on the icon resembling a clock with a counter clockwise circle/arrow to see the complete history of commits:
Figure 3. Link on GitHub repository page (circled in red) to view complete commit history.
-
You should now see a long list of commits made to the Open Source Rover repository, organised by date. As an example, scroll down and find commits made on "Mar 25, 2021". There should be only a single entry for this day that looks like this:
Figure 4. A typical commit - made on 25 March 2021 - in a GitHub repository commit history.
The title of this commit is "Merge pull request #240 from SConaway/pcb-assembly". Click on the title to reveal details of this commit. To save time scrolling through the repository history to find this commit, this is a direct link to its details page:
https://github.com/nasa-jpl/open-source-rover/commit/ccfa9c138bc045937dc7db846930d613cf7469a9 -
This page provides detailed information on which changes were made to which files as part of the commit. For the purposes of this walkthrough, make note of (1) the user "
apollokit" who made the commit; (2) the date of 25 March (in 2021); (3) the commit hash "ccfa9c138bc045937dc7db846930d613cf7469a9" (this hash is an alphanumeric string that uniquely identifies this commit).Figure 5. GitHub details page for a typical commit.
-
So far, we have seen features built into the Git version control software. GitHub provides additional features on top of Git such as the ability for participants in an open source project to open and close tickets discussing their work. In GitHub these are called "Issues" and can be reached by clicking its eponymous link on the top of the page. Click on it:
Figure 6. Link to GitHub Issues (actual number of currently open issues, 30 in this example, may vary).
-
The GitHub Issues page shows topics currently under discussion by participants. At time of writing, there are 30 open and 170 closed Issues for the Open Source Rover repository (actual numbers may vary by the time you see them):
Figure 7. GitHub Issues page (30 open and 170 closed Issues).
-
As an example, scroll down the page to find Issue #196 "opened on Jul 2, 2020 by Achllle" titled "DROK current+voltage sensor replacement". Click on the title to see the discussion thread. If it is no visible by the time you read this, click on the following link:
https://github.com/nasa-jpl/open-source-rover/issues/196You should now see the conversation:Figure 8. Typical page of a GitHub Issue.
Again, note for later reference that this issue #196 was opened by user "
Achllle" on 2 July 2020, is "Open" (at time of writing), and there are "5 participants" in the discussion. You can see the list of participants near the bottom of the right side bar, or by noting each user name that appears in the thread. -
The previous steps illustrate metadata commonly associated with a version control repository including commits, GitHub Issues, authorship information on those activities and their timestamps. To programmatically mine this metadata with the Python scripts in the month-18 deliverable, we need to note the URL (URL stands for uniform resource locator which is often colloquially referred to as an "address") of the repository:
https://github.com/nasa-jpl/open-source-rover/With this information, we can prepare the input data for the data-mining moduleosmine.
-
The data-mining scripts accept a single input file containing a list of version control repository URLs. In this walkthrough, we will add one URL to this list, that of the JPL Open Source Rover project:
https://github.com/nasa-jpl/open-source-rover/ -
The list should be saved in CSV (comma-separated values) format. To create this list, create a new spreadsheet using your spreadsheet software. In this new sheet, enter the following into the first row (usually called row 1) in the following order from left to right:
project,repo_url,repo_platform, andnotes(they are case-sensitive). In the row below that, enter:JPL Open Source Rover,https://github.com/nasa-jpl/open-source-rover/,GitHub. The last columnnotescan be left empty, it is for you to enter any other information for your own reference. Of particular importance are therepo_urlandrepo_platformcolumns which the data-mining scripts rely on. Once you are done, the contents of your spreadsheet should look like the following:Figure 9. Entering a list of repositories to mine into a spreadsheet to be saved as a CSV file.
-
This spreadsheet must be saved in the CSV format. To do so, save what you have as a new file named
OSH-repos.csv, and be sure to choose CSV (comma-separate values) as the format/filetype and that the file extension is.csvinstead of common spreadsheet extensions such as.odsor.xlsx.
-
In this example, we will run a local copy of the data-mining scripts (and in later steps, the demo dashboard) on your system. To begin, open a terminal window and navigate to the location where you would like to place these files.
-
Clone the GitHub repository containing the scripts onto your system by entering the following command:
git clone https://github.com/OPEN-NEXT/wp2.2_dev.git
-
Observe the contents of the
wp2.2_devdirectory that was just created and populated. You can use thelscommand in your terminal or open that directory in your graphical file manager, in which case you should see something like this:Figure 10. Contents of the
wp2.2_devGit repository containing the data-mining scripts. -
Place the CSV list containing the URL of the JPL Open Source Rover repository (which you named as
OSH-repos.csv) into theinputdirectory. Overwrite or remove any files already there. -
The
datadirectory contains sample output from the data-mining scripts. For the purposes of this walkthrough, remove everything in thedatadirectory (there should be only one file). This way, we can in later steps examine the output from running the scripts against the JPL Open Source Rover repository. -
Since the JPL Open Source Rover repository is hosted on GitHub and it will be queried via the data-mining scripts, we need a GitHub personal access token. This token can be generated from your GitHub account and is an alphanumeric string of 40 characters that resembles the following:
GNcAm5YWtUu66dq88LKS8R8D2Ck2UzikABSNqZyrInstructions for generating this token can be found here in the GitHub documentation. -
Copy and paste that 40-character string into a pure text file and save it within the
wp2.2_devrepository with the nametoken(no file extension needed). The only contents of this pure text file should be those exact 40 characters, nothing else. If successful, you should see this new file along with everything else:Figure 11. GitHub personal access token saved in a file named
tokenamong other files in thewp2.2_devrepository. -
You will also need to install library dependencies that are needed by the data-mining scripts. The dependencies for this repository are listed in the standard-conformant file
requirements.txt. To install them, run this command in your terminal window:pip install --user -r requirements.txt
-
Now, we can ask the Python module
osmine(which contains the data-mining scripts) to retrieve metadata from the JPL Open Source Rover project's GitHub repository. Start by having a terminal window open in thewp2.2_devdirectory. -
Now, run the
osminemodule to start the data-mining scripts with the following command:python osmine -t=./token
Note: The
-targument passed to theosminemodule tells it to use your GitHub personal access token stored in the filetoken. -
If successful, the first lines of terminal output after executing the command should resemble the following (details may vary):
Figure 12. First lines of output from data-mining module
osmine. -
Please wait. It will take up to several minutes or more (depending on the performance of your system's Internet connectivity) for the script to mine data from the JPL Open Source Rover repository hosted on GitHub. In the meantime, there should be additional output in the terminal that looks like the following (again, details will vary):
Figure 13. Typical terminal output when running
osmine. -
Once complete, the data-mining scripts within the
osminemodule will save the metadata of the JPL Open Source Rover repository into a file. Find this file by entering thedatasub-directory withinwp2.2_dev. This output file is namedmined_data.zip. If you navigate to this directory in your graphical file manager, you should see it:Figure 14.
mined_data.zipin thedatadirectory contains version control repository metadata mined by theosminemodule.
-
To view the contents of the mined metadata from the JPL Open Source Rover repository, start by extracting the contents of the compressed file
mined_data.zip. Most of the time, this is done by double-clicking onmined_data.zipin your graphical file manager which reveals its contents. There should be exactly one file inside namedmined_data.json. Extract this file by dragging and dropping it to a location convenient for you.Figure 15.
mined_data.zipis a compressed archive containing a filemined_data.jsonwhich contains mined data from version control repositories. -
The file
mined_data.jsonis a pure-text file with data organised in JavaScript Object Notation (JSON), which is a widely-used international open standard for saving structured data. Because it is pure text, open it in a text editor of your choice to view its contents. Once opened, you should see the beginning lines of the file to resemble the following:Figure 16. Beginning of
mined_data.jsonopened in a text editor, showing basic information about the JPL Open Source Rover GitHub repository.Take a moment to observe the contents of this JSON file. The first 15 lines or so of this file contains basic metadata about the whole JPL Open Source Rover GitHub repository such as the date and time it was first
published(29 June 2018), thelicensefor the design files (Apache 2.0), or therepo_urlwhich you visited in your web browser at the beginning of this guide. -
In previous steps, we have visited the page of commit
ccfa9c138bc045937dc7db846930d613cf7469a9made by userapollokiton 25 March 2021. In your text editor with the filemined_data.jsonopened, search for the commit hash (just the first five or so characters,ccfa9, should be enough) and it should take you to the mined metadata for this commit where you can confirm that it matches what you saw in your web browser:Figure 17. Mined metadata for commit
ccfa9c138bc045937dc7db846930d613cf7469a9in the JPL Open Source Rover repository highlighted in red. -
We have also visited GitHub Issue #196 in a web browser. Metadata about this Issue is also in
mined_data.json. A quick way to see it inmined_data.jsonis to search for the Issue title "DROK current+voltage sensor replacement" in your text editor, which should bring you to its entry as follows:Figure 18. Mined metadata for GitHub Issue #196 in the JPL Open Source Rover repository highlighted in red.
Comparing to what was displayed on the GitHub page for Issue #196, we can confirm that this Issue was indeed
publishedon 2 July 2020, has fiveparticipantsand isattributedTothe userAchlllewho first opened it.
The data-mining scripts described and demonstrated in the previous sections represent the primary functionality of our OPEN-NEXT month-18 deliverable. This functionality provides the foundation on which an online dashboard could later be built and hosted by version control platforms such as Wikifactory or GitHub to present to their users. To illustrate this potential, we have created a basic, proof-of-concept demonstration in a module named osdash that summarises and displays some of the metadata obtained via the data-mining scripts.
-
With the terminal still open at the base of the
wp2.2_devdirectory, enter the following command to start theosdashmodule:python osdash
This will start a local server using the metadata mined from previous steps and saved in the file
data/mined_data.zip. The initialisation process should take at least several seconds, after which you should see the following typical output in the terminal:Figure 19. Typical terminal output upon successful start of the test server of the proof-of-concept
osdashdashboard.Make note of the line that reads "
Dash is running on http://127.0.0.1:21110/". This is the URL at which the demo dashboard can be accessed. -
In a web browser on the same system in which you ran the
python osdashcommand, navigate to the URLhttp://127.0.0.1:21110. You should then see the main page of the demo dashboard:Figure 20. Typical screenshot of the proof-of-concept dashboard module
osdashopened in a web browser.Take a moment to observe the different elements of this page. You should be able to see that this dashboard is showing information derived from the metadata mined from the JPL Open Source Rover GitHub repository. At the time the above screenshot was taken, there have been 517 commits to this repository from 88 contributors. There are also 200 open and 170 closed "tickets". The more general term "ticket" is used here because the data-mining scripts in the
osminemodule is designed to pull metadata from repositories hosted on platforms other than GitHub (such as Wikifactory), and not all of them call these discussions "Issues". In addition, the ForgeFed version control data model specifies the term "ticket".Below that is a graph titled "Commits over time". When this page is first loaded, it shows a bar chart of the monthly number of commits to this repository across its lifetime from when it was first created to now.
The bottom-most section titled "User activities" shows a list of usernames with their number of commits and tickets participated in. Observe that the users
apollokitandAchlllewhich were noted in earlier steps are among the most prolific contributors to this repository. -
On the top left, notice a shaded box named "Choose repository". When the
osminemodule is used to mine metadata from multiple open source projects across repositories hosted on different platforms, those repositories will be selectable via the two menus "1. Select a project" and "2. Select a repository from...". In the current example, since we have only mined data from the JPL Open Source Rover repository, those menus will only have one option each.There is a third item named "3. Customise timeframe to view". Here, you can use your mouse to drag two sliders to constrain the timeframe of the information shown. Try dragging both and observe how the page changes.
Figure 21. Time slider to filter shown data by time.
-
As you filter the data shown using the time slider, here are some questions to consider: Does the rate of commits to this repository change over time? Are there certain users who consistently contribute to this project? Are there those who have contributed only during a certain timeframe? What are other questions that could be asked of the mined data?
-
Once you are done, the test server could be terminated in the terminal window by entering
Ctrl-C.
The walkthrough above provided detailed, step-by-step instructions on how to use the osmine Python module which is the primary component of the OPEN-NEXT project's WP2.2 month-18 deliverable: The ability to mine metadata from open source hardware version control repositories. This functionality forms the basis for post-month-18 development of an online dashboard that displays information derived from this metadata to inform open source hardware projects on the health of their communities. There is currently a demo dashboard osdash which illustrates this potential. All of this is designed to be hosted by platforms providers such as Wikifactory.
This document is shared under the Creative Commons Attribution-ShareAlike 4.0 license.
SPDX-FileCopyrightText: 2021 Pen-Yuan Hsing
SPDX-License-Identifier: CC-BY-SA-4.0




















