Skip to content

Clarify Data Collection Consent in GitHub Pre-release License Terms #37405

@sohsatoh

Description

@sohsatoh

Code of Conduct

What article on docs.github.com is affected?

https://docs.github.com/en/site-policy/github-terms/github-pre-release-license-terms#4-data-collection-and-usage

What part(s) of the article would you like to see updated?

If the contents of the blog post referenced below are accurate, the definition of the data collected under the pre-release terms should be more narrowly scoped.

Why the Docs Should Be Changed

  • Transparency: The current phrasing can lead to misunderstandings about what data is collected from users, potentially implying that any usage details may be recorded.

  • User Trust: Clarifying the language would help build user trust by ensuring that the data collection is strictly limited (e.g., to telemetry related to product usage and performance) or by directing users to more detailed documentation.

  • Compliance: Clearer language would support compliance with GitHub’s Data Protection Agreement and Privacy Statement by precisely defining the scope of data collection.

Expected Outcome or Behavior

  • A revised text that explicitly defines the types of data collected (e.g., telemetry data, performance metrics, and usage statistics) and the purposes for which that data is collected.
  • An update that includes either detailed descriptions in the clause itself or a clear link to additional documentation for further clarification.

Additional information

Reproducibility

The ambiguity in the current clause is consistently present wherever the pre-release license is referenced, affecting all users who access and review the pre-release software terms. Every user encountering this clause is subject to its ambiguous language, meaning the issue is reliably reproducible.

Other Context

In the GitHub Japan blog post “License term and Your Data”, the following statement appears:

Section 4 of the GitHub Pre-release License Terms, titled “Data Collection and Use,” states under “a. Consent to Data Collection” that “the pre-release software may collect information about you and your use of the software, and send that information to GitHub.”
According to “b. Use of Collected Data,” this information is used “to understand how the pre-release software and related products are used,” and it explicitly does not mean that the data will be used for model training.
The statement that “data about events generated during use and usage information is collected” refers to telemetry—information used to understand how customers are using GitHub products and whether those products are working as intended. It is not data used for training models.

GitHub プレリリース ライセンス条項にある「4. データの収集と使用」において、「a. データ収集に対する同意」にあるように、「お客様およびお客様によるソフトウェアの使用状況に関する情報を収集し、その情報を GitHub に送信する場合があります。」
その情報とは、「b. 収集したデータの使用」にあるように「プレリリース版ソフトウェアおよび関連製品がどのように使用されているかを把握するために、収集されたデータを分析および測定に使用します。」とあり、これはモデルの学習のために使用するという意味ではありません。
「操作時に生成されたイベントに関するデータと使用状況の情報が収集されます。」というのは、テレメトリなど、お客様がGitHub製品をどのように使っているか、GitHub製品が意図した動きをしているかを知るための情報であり、それらはモデルの学習のために使用するものではありません。

However, the current wording in the Pre-release License Terms does not clearly limit the scope of collected data to telemetry or similar types of information, which may lead to legal ambiguity or misinterpretation.

If the information in the blog post is accurate, then the license terms should be updated to explicitly limit the definition of collected data. If the blog post is incorrect, then its content should be revised accordingly.
(While it’s unclear whether this blog post represents GitHub’s official stance, it is likely to be interpreted as such by the general public.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    contentThis issue or pull request belongs to the Docs Content teamsite policyContent related to site policy

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions