GraphKit is a collection of language-specific source code analysers that output a standard data format listing a project’s code definitions and cross-references. The long-term vision of GraphKit is to make dev tools (such as editors, build tools, package managers, linters, code search, etc.) more powerful and easier to create, by providing a standard way for them to determine information about a project’s source code.
The problem: Dev tools (such as editors, build tools, package managers, linters, and code search) must partially reimplement a language’s compiler or interpreter to get the information about a project’s source code that’s necessary to do their job (such as autocompletion, jump-to-definition, documentation lookup, or dependency tracking). That means it’s hard to write dev tools and they are often brittle and limited, especially for dynamic languages.
The solution: GraphKit’s goal is to provide a high-quality, standardized source analyzer for every popular language. Dev tools can get the information they need about a project’s code from a GraphKit source analyzer for the language, instead of implementing their own ad-hoc analysis. This will make dev tools more powerful and easier to create.
GraphKit has 2 parts: a common data format (called the source graph) that contains information about a project’s code, and a set of source analyzers (called graphers) that output this data.
The source graph: A project’s source graph describes every definition (of a type, variable, class, etc.) and maps every reference to a definition in its source code files to its target.
Graphers: For each supported language, GraphKit provides a source analyzer program called a grapher. A grapher takes a set of files or directories containing source code as input, performs various kinds of source analysis on the code (such as type checking, type inference, etc.), and outputs a data dump describing the code. The data format is described tentatively at jsg.
Language support status
- Project: the project that contains the grapher for the language.
- Underlying: the existing type-checking or type-inference library that the grapher relies on, if any.
- Types?: does the grapher perform type checking (for statically typed languages) or type inference (for dynamic languages)?
- Defs?: does the grapher find all definitions in the code and output them?
- Refs?: does the grapher find and resolve source code references to definitions?
- Docs?: does the grapher find documentation attached to each definition?
- Output?: does the grapher output all of the source graph data?
|Ruby||RubySonar||RubySonar & YARD||YES||YES||YES||YES||YES|
|Go||sourcegraph/gog (coming soon)||go/types||YES||YES||YES||YES||NO|
The sponsor of this project, Sourcegraph, analyzes and indexes hundreds of thousands of open source projects to provide developers with global, semantic search for code, docs, and examples. Doing this well requires high-quality, comprehensive information about a code’s definitions, cross-references, dependencies, authorship, etc.
Sourcegraphers (including @yinwang0, @beyang, and @sqs) have already made significant contributions to the underlying open source projects that power Sourcegraph’s analysis, such as yinwang0/pysonar2 and marijnh/tern. GraphKit sits on top of these libraries, adding some more features and serializing the data into the source graph output format.
We welcome contributions, both to this GraphKit project (which defines the source graph format) and to the per-language graphers and analyzers.