User Guide¶
Contributor CI was created with the intention to help measure collaboration in one or more repositories or an organization over time. If you haven’t read Installation you should do that first.
Quick Start¶
# install contributor ci
$ pip install contributor-ci
# Export a GitHub Personal Access Token
$ export GITHUB_TOKEN=xxxxxxxxxxxxxx
# Generate a contributor-ci.yaml (recommended in an empty directory)
$ mkdir -p cci-vsoch
$ cci init vsoch
# See metric extractors available
$ cci list
# Run an extraction
$ cci extract repos
# Generate a contributor friendliness assessment template
$ cci cfa https://github.com/vsoch/salad
Configuration¶
Since the majority of the command line interactions should be automated, we
use a main configuration file called contributor-ci.yaml
, that is
looked for in the present working directory where you call the client.
A simple example is shown here:
# The output directory to create a structure of results
# defaults to $PWD/.cci if not set
outdir: null
# memberOrgs will be used to label associated members as "internal"
member_orgs:
- llnl
# all repos in these orgs are considered in your institution
orgs:
- cdat
- llnl
- mfem
# Additional repos to add to the set (possibly not under an org above)
repos:
- alpine-dav/ascent
- atomconsortium/ampl
- ceed/benchmarks
# do not include these repos in the assessment
exclude_repos:
- mfem/github-actions
This file should have the following fields:
Name | Description | Default | Required |
---|---|---|---|
member_orgs | A list of GitHub organizations that are core to your institution. If you are concerned about contibutions, everyone that is a member here is labeled as an internal contributor. | unset | yes |
orgs | A list of repos your organization members contribute to, but aren’t necessarily owned by your institution. | unset | yes |
repos | A list of loose repos to add to the ones that are discovered under the orgs already provided. | unset | false |
exclude_repos | One or more repos to exclude given that they are found anywhere. | unset | false |
outdir | An output directory (must exist) to save results. | $PWD/.cci | true |
editor | An editor to use for cci config edit | vim | false |
You can make this file manually if you like (e.g., copy paste and edit
the above) or you can use cci init <name>
to initialize one, either for
an organization or your username. To generate it for your username, you
can run init as follows:
$ export GITHUB_TOKEN=xxxxxxxxx
$ cci init user:vsoch
And for an organization of interest:
$ export GITHUB_TOKEN=xxxxxxxxx
$ cci init org:spack
The place where you run this init depends on your use case. If you intent to create a CCI interface, you might want to run this init in an empty directory. If you just want to run cci extract commands to generate data, it can be in an existing repository.
$ mkdir -p cci-vsoch
$ cd cci-vsoch
$ cci init user:vsoch
After having your file, if you want to create an interface that visualizes data and contributor friendliness assessments (CFAs), you should see User Interface (UI). For basic extraction or generation of CFAs, you should see Contributor Friendliness Assessment.
Commands¶
Once you have your configuration file, and exported a GitHub personal access token:
# Export a GitHub Personal Access Token
$ export GITHUB_TOKEN=xxxxxxxxxxxxxx
…the following commands are available! For any command, you can specify a custom configuration file or output directory:
$ cci --config-file <config-file> --out-dir <out-dir> <command> <args>
User Interface (UI)¶
The most straight forward thing you might want is a user interface to explore your repositories. Once you have generated your contributor-ci.yaml in an otherwise empty directory (and it’s recommended to edit it after generation to make sure it looks okay) creating an interface is as simple as running:
$ cci ui generate
By default, generation happens in the directory where you run the command. To change this:
$ cci ui generate ./docs
Whether you are generating or updating, if you want to also generate new files for the contributor friendliness assessment, you can add –cfa:
$ cci ui --cfa generate
Once you have an interface, it has a GitHub action that will run an update command on a nightly basis to generate new data for it. But you can also run this locally or manually. It takes the same argument for a directory or defaults to the present working directory.
$ cci ui update
And akin to generate, you can specify to generate new CFA files (default will not):
$ cci ui --cfa update
For update and the GitHub action, the default will be to run all extractors. However for large projects you might instead want to choose a random subset:
$ cci ui update random:4
Once you have your interface, it’s recommended to update your GITHUB_TOKEN
to a personal access token CCI_GITHUB_TOKEN
that will work for all extractors.
You are of course free to customize the interface to your pleasing. For example,
you will likely want to change the site baseurl (the name of your repository where
you will serve it), site metadata, and the sidebar highlight color:
name: "Contributor CI Software Portal"
author: "Contributor CI <vsoch@users.noreply.github.com>"
title: Contributor CI Software Portal
description: "Contributor CI Software Portal"
# Change this to your baseurl
baseurl: "/contributor-ci"
url: ""
# Change this to the color you want the sidebar to highlight to
# it defaults to a bright green to match contributor-ci
sidebar_highlight_color: "#00d100"
For content, the main page is located in pages/index.md
, and you can delete any graph that you
don’t want to show up by deleting the corresponding file from _graphs
.
If there is a graph that does not exist that you’d like, or another site
feature (e.g., posts or other content type) please open an issue.
Finally, you’ll want to push your interface to GitHub and ensure that
GitHub pages is turned on for the root or subfolder where you have your
site.
Visualizations¶
The interface (by default) will generate the following files, each linked
to a specific set of data and javascript files. If you want to remove any
particular visualization from your interface, you can simply delete the markdown
file. All javascript files are located in assets/js/extractors
Markdown | Description | Data | Javascript |
---|---|---|---|
activity_commits.md | A graph that shows activity across all repos (default branches) for a year (activity commits example). | activity_commits.js |
cci-activity_commits.json and cci-repos.json |
dependencies.md | A hierarchical DAG that shows depdencies (dependencies example). | dependencies.js |
cci-repo_dependencies.json and cci-dependencies.json |
languages.md | A donut circle graph that shows a breakdown of repository languages (languages example). | languages.js |
cci-languages.json |
licenses.md | A donut circle graph that shows a breakdown of repository licenses (licenses example). | licenses.js |
cci-repos.json |
member_repos.md | Counts of organization members and repositories (member repos example). | member_repos.js |
cci-internal-users.json and cci-member_repos.json |
pack_users.md | A zoomable pack hierarchy that shows organizations and contributors (pack users example). | pack_users.js |
cci-internal-users.json and cci-external-users.json |
repos_issues.md | Scatterplot showing repository closed. vs open issues (issues example). | repos_issues.js |
cci-repos.json |
repos.md | Scatterplot of repository creation history by year (creation history example). | creation_history.js |
cci-creation-history.json |
repos_pulls.md | Scatterplot showing repository open. vs merged pull requests (pulls example). | repos_pulls.js |
cci-repos.json |
repos_size.md | Bubble chart of repositories by popularity size (number of stars) (size example). | repos_size.js |
cci-repos.json |
star_history.md | Number of stars over time (stars example). | star_history.js |
cci-stars.json |
topics.md | Tag map of repository topics (topics example). | topics.js |
cci-topics.json |
Config¶
Contributor CI provides an easy way to interact with your configuration file,
the file contributor-ci.yaml
. First, to edit the file, you can do:
$ cci config edit
By default, the editor chosen is vim. If you add an editor
field
to that same file, you can choose an editor of your choice.
You can also quickly sort your file in the case that you made a bunch
of additions and want to ensure they are sorted. Note that sorting
happens automatically when you do an add or remove operation.
$ cci config sort
Next, you might want to add a repository or organization to a list. You can use add and remove to do this. You should provide the key first (e.g. member_orgs) followed by one more entries to add or remove.
$ cci config add member_orgs vsoch
$ cci config remove member_orgs vsoch
List¶
You likely want to start with an extraction.
An extraction means that you are extracting metadata for the current data,
and for your current set of repos. But first you need to know what your
options are! For this purpose you can use list
:
$ cci list
creation_history: extract creation history for repositories.
topics: extract repository topics.
repo_dependencies: extract repository dependencies.
languages: extract languages for a repository.
activity_commits: extract internal repository commit activity.
releases: extract repository releases.
stars: extract repository stars.
member_repos: extract repositories that belong to members not within org.
activity_lines: extract internal repository activity via lines of code.
dependencies: extract dependencies.
repo_metadata: gather repository metadata from several extractors.
repos: extract repository metrics.
users: extract user metrics for a repository.
repo_users: extract repositories worked on for external and internal users.
Extract¶
You next likely want to run an extractor. The default output directory used
will be a directory named .cci
for “contributor CI” in the present working
directory.
$ cci extract repos
Retrieving organization info for cdat
Checking GitHub API token... Token validated.
Auto-retry limit for requests set to 10.
Reading '/home/vanessa/Desktop/Code/contributor-ci/contributor_ci/main/extractors/collection/repos/org-repos-info.gql' ... File read!
Page 1
Sending GraphQL query...
Checking response...
HTTP STATUS 200 OK
When it finished, you can inspect the output in the present working directory “.cci” folder (unless you changed the path in the config or on the command line). It is a tree organized by year, month, and day:
$ tree .cci/
.cci/data/
└── 2021
└── 6
└── 13
└── cci-repos.json
You’ll notice that the extracted data is saved in a “data” subfolder. This is because there are other output types that can be saved here. You can also ask CCI to run more than one extractor at once:
$ cci --out-dir _data extract repos repo_metadata topics
Finally, if you want to change the output organization (which defaults to year/month/day
under the
data folder) you can add --save-format
:
$ cci --out-dir _data extract --save-format year/month repos repo_metadata topics
Note that since CCI uses its directories as a cache, changing the default save format will change this behavior to generate the data no matter what, as we cannot be confident when the data was actually generated. If you find that you don’t want this behavior, it’s recommended to run with the default save format and then clean up or organize the data directory as you see fit.
Extractors¶
The following extractors are available.
Name | Description | Depends On |
---|---|---|
repos | Extract repository metadata | none |
users | Extract internal and external contributors lists | repos |
repo_dependencies | Extract repository dependencies | repos |
dependencies | Extract dependency metadata | repo_dependencies |
releases | Extract releases for repositories | repos |
languages | Extract languages for repositories | repos |
activitycommits | Extract weekly number of repository commits to reflect activity | repos |
repo_users | Extract users and repositories contributed to (internal and external) | users |
creation_history | Extract creation history (first commit) of repositories | repos |
stars | Extract repository stars | repos |
member_repos | Extract repositories of members not associated with the organization | users |
topics | Extract repository topics | repos |
repo_metadata | Combine repository metadata across repps and topics extractors | repos, topics |
Contributor Friendliness Assessment¶
The Contributor Friendliness Assessment (CFA) is an effort to identify aspects of a repository that can be improved to make the repository more contributor friendly. The assessment derives a list of criteria to assess how easy it is to contribute to a project. This means arriving at a project repository and having an easy time going from knowing nothing to opening a pull request, and also how well the project attracts new contributors. Generally, we assess the repository for:
CFA-branding
: Does the project have branding?CFA-popularity
: How popular is the project?CFA-description
: Does the project have a clear description (What is it for)?CFA-need
: Does the project have a compelling set of use cases, or statement of need (Should I use it)? This is a fork in the visitor’s decision tree, because if the answer is yes they will continue exploring, otherwise they will not.CFA-license
: The GitHub repository has an OSI-approved open-source license.CFA-build
: Methods to build or install the software or service are clearly stated.CFA-examples
: Does the README.md have a quick example of usage?CFA-documentation
: Does the project have documentation?CFA-support
: Does the project make it easy to ask for help?CFA-developer
: Process and metadata is provided for the developer to understand and make changes.CFA-quality
: The code quality of the project.CFA-tests
: The project has testing.CFA-coverage
: The project reports code coverage.CFA-format
: The project adheres to a language specific format.CFA-outreach
: Is the project active at conferences or otherwise externally presented?
Each of the items above has a more detailed description, rationale, and list
of criteria – some of which are automated. Currently, the assessment
is under development so running the cfa
tool for a repository:
# Generate a contributor friendliness assessment template and print to terminal
$ cci cfa --terminal https://github.com/vsoch/salad
# Save to local .cci directory
$ cci cfa https://github.com/vsoch/salad
# Pipe into file
$ cci cfa --terminal https://github.com/LLNL/b-mpi3 > _cfa/cfa-LLNL-b-mpi3.md
For the latter, your cfa template (with some fields populated) will be saved to your .cci output directory, as specified in your config or on the command line:
$ tree .cci/cfa/
└── cfa-vsoch-salad.md
Will simply output the template to be filled in. This will be updated
with automation and allowing for save in the .cci
output folder, allowing
for creating new assessments, and updating previously created assessments.
We will also provide a GitHub action for generating assessment files
and opening a pull request when new repositories are found that have not
been assessed.
CFA Background¶
The author of CCI noticed that there are many good software projects, but they don’t do a good job of explaining use cases. She also noticed that small details like branding, documentation, and ease of use were hugely important variables for making it easy to contribute. You can imagine a sequence of events (a decision tree) that models a user interaction:
- Arrive at the repository.
- Assess project for branding and popularity.
- What does it do?
- Does it help with a problem that I have yes –> continue, no–> leave?
- Does it have a license that I like?
- Install / build the software to try out
- Look for a getting started guide or examples
- Make changes to the repository, sometimes look for contributing guide.
- Run local tests, formatting, etc.
- Open a pull request
GitHub Action¶
Contributor CI comes with a GitHub action that will be more developed as the library
is developed. Currently, you can use it to run one or more extractors for
a contributor-ci.yaml
in your repository. For example, let’s say we want to
run all extractors:
name: Contributor CI Extract
on:
schedule
# Every Sunday
- cron: 0 0 * * 0
jobs:
run:
runs-on: ubuntu-latest
steps:
- name: Checkout Actions Repository
uses: actions/checkout@v2
- name: Extract
uses: vsoch/contributor-ci@main
env:
CCI_GITHUB_TOKEN: ${{ secrets.CCI_GITHUB_TOKEN }}
with:
extract: repos
results_dir: .cci
config_file: contributor-ci.yaml
- name: Check that results exist
run: tree .cci
- name: Upload results
if: success()
uses: actions/upload-artifact@v2-preview
with:
name: cci-results
path: .cci
Note that CCI_GITHUB_TOKEN
is recommended to be a personal access token,
which is needed for some of the queries to look at organizations. If you just
need repository metadata, the standard GITHUB_TOKEN
provided in actions
will suffice. You can either save as an artifact as shown above, or just push directly to a branch:
- name: Push Results
run:
git config --global user.name "github-actions"
git config --global user.email "github-actions@users.noreply.github.com"
git add _cci
set +e
git status | grep modified
if [ $? -eq 0 ]; then
set -e
printf "Changes\n"
git commit -m "Automated push with new data results $(date '+%Y-%m-%d')" || exit 0
git push origin main
else
set -e
printf "No changes\n"
fi
You can also use a pull request action
to open a pull request instead. The action can also support generating
Contributor Friendliness Assessment (markdown) files. Since these might
warrant being populated into an interface, if you select a results_dir
here, the markdown files will explicitly be written there. If not, then
they will be written to the default in .cci/cfa
.
name: Contributor CI Update Contributor Friendliness Assessment
on:
schedule
# Every Sunday
- cron: 0 0 * * 0
jobs:
run:
runs-on: ubuntu-latest
steps:
- name: Checkout Actions Repository
uses: actions/checkout@v2
- name: Update CFAs
uses: vsoch/contributor-ci@main
env:
CCI_GITHUB_TOKEN: ${{ secrets.CCI_GITHUB_TOKEN }}
with:
cfa: true
results_dir: ./cfa
config_file: contributor-ci.yaml
- name: Check that results exist
run: tree ./cfa
Finally, you can ask to run more than one extractor, akin to how you can on the command line!
jobs:
extraction:
runs-on: ubuntu-latest
steps:
- name: Checkout Repository
uses: actions/checkout@v2
- name: Update Data
uses: vsoch/contributor-ci@main
env:
GITHUB_TOKEN: ${{ secrets.CCI_GITHUB_TOKEN }}
with:
results_dir: _data/
extract: repo_metadata topics languages releases stars activity_commits activity_lines