Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GitLab functionalities #1098

Closed
wants to merge 14 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
OTA_ENGINE_SENDINBLUE_API_KEY='xkeysib-3f51c…'
OTA_ENGINE_SMTP_PASSWORD='password'
OTA_ENGINE_GITHUB_TOKEN=ghp_XXXXXXXXX
OTA_ENGINE_GITLAB_TOKEN=XXXXXXXXXX
OTA_ENGINE_GITLAB_RELEASES_TOKEN=XXXXXXXXXX
1 change: 1 addition & 0 deletions config/default.json
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@
"dataset": {
"title": "sandbox",
"versionsRepositoryURL": "https://github.com/OpenTermsArchive/sandbox",
"versionsRepositoryURLGitLab": "https://gitlab.com/p2b/contrib-versions",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The account p2b on GitLab is reserved. Do you own it? If not, it does not seem appropriate to send notifications to that account by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The url was only used as an example, but it was our mistake since we did not realize it was pointing to an existing account. This was removed, and replaced with an obvious fake url as an example. It will be available when the pull request will be updated.

"publishingSchedule": "30 8 * * MON"
}
}
Expand Down
33 changes: 26 additions & 7 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,8 @@
"swagger-jsdoc": "^6.2.8",
"swagger-ui-express": "^5.0.0",
"winston": "^3.3.3",
"winston-mail": "^2.0.0"
"winston-mail": "^2.0.0",
"axios": "^1.7.2"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a dependency is not a lightweight change as it introduces security risks and maintenance burden. It seems this dependency is used only for 3 HTTP calls. Please use the native http module or one of the existing dependencies.
Please also use npm install --save to maintain alphabetical order and avoid future churn.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is solved and will be available when the pull request will be updated. Axios was removed and was replaced with node-fetch (already used in the project).

},
"devDependencies": {
"@commitlint/cli": "^19.0.3",
Expand Down
65 changes: 65 additions & 0 deletions scripts/dataset/assets/README.templateGitLab.js
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a complete duplicate from the GitHub template. Duplicating this file means that they are more likely to not be updated properly. Is there a reason for not simply importing the existing template? 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only change in that file is, in fact, just a variable name (compared to the GitHub counterpart).
To use a single template, we could:

  1. pass that variable as an input for the template and modify the code in the Release generation to handle both the GitHub and GitLab parts
  2. use only one variable in the configuration file for "versionsRepositoryURL"; this will make the publishing part not consistent with the reporting part, where GitHub and GitLab use different configurations (reporter.githubIssues and reporter.gitlabIssues).

Please let us know if there is a preferred solution, otherwise we could proceed with n°1.

Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
import config from 'config';

const LOCALE = 'en-EN';
const DATE_OPTIONS = { year: 'numeric', month: 'long', day: 'numeric' };

export default function readme({ releaseDate, servicesCount, firstVersionDate, lastVersionDate }) {
return `# Open Terms Archive — ${title({ releaseDate })}

${body({ servicesCount, firstVersionDate, lastVersionDate })}`;
}

export function title({ releaseDate }) {
releaseDate = releaseDate.toLocaleDateString(LOCALE, DATE_OPTIONS);

const title = config.get('@opentermsarchive/engine.dataset.title');

return `${title} — ${releaseDate} dataset`;
}

export function body({ servicesCount, firstVersionDate, lastVersionDate }) {
firstVersionDate = firstVersionDate.toLocaleDateString(LOCALE, DATE_OPTIONS);
lastVersionDate = lastVersionDate.toLocaleDateString(LOCALE, DATE_OPTIONS);

const versionsRepositoryURLGitLab = config.get('@opentermsarchive/engine.dataset.versionsRepositoryURLGitLab');

return `This dataset consolidates the contractual documents of ${servicesCount} service providers, in all their versions that were accessible online between ${firstVersionDate} and ${lastVersionDate}.

This dataset is tailored for datascientists and other analysts. You can also explore all these versions interactively on [${versionsRepositoryURLGitLab}](${versionsRepositoryURLGitLab}).

It has been generated with [Open Terms Archive](https://opentermsarchive.org).

### Dataset format

This dataset represents each version of a document as a separate [Markdown](https://spec.commonmark.org/0.30/) file, nested in a directory with the name of the service provider and in a directory with the name of the terms type. The filesystem layout will look like below.

\`\`\`
├ README.md
├┬ Service provider 1 (e.g. Facebook)
│├┬ Terms type 1 (e.g. Terms of Service)
││├ YYYY-DD-MMTHH-MM-SSZ.md (e.g. 2021-08-01T01-03-12Z.md)
┆┆┆
││└ YYYY-DD-MMTHH-MM-SSZ.md (e.g. 2021-10-03T08-12-25Z.md)
┆┆
│└┬ Terms type X (e.g. Privacy Policy)
│ ├ YYYY-DD-MMTHH-MM-SSZ.md (e.g. 2021-05-02T03-02-15Z.md)
┆ ┆
│ └ YYYY-DD-MMTHH-MM-SSZ.md (e.g. 2021-11-14T12-36-45Z.md)
└┬ Service provider Y (e.g. Google)
├┬ Terms type 1 (e.g. Developer Terms)
│├ YYYY-DD-MMTHH-MM-SSZ.md (e.g. 2019-03-12T04-18-22Z.md)
┆┆
│└ YYYY-DD-MMTHH-MM-SSZ.md (e.g. 2021-12-04T22-47-05Z.md)
└┬ Terms type Z (e.g. Privacy Policy)
├ YYYY-DD-MMTHH-MM-SSZ.md (e.g. 2021-05-02T03-02-15Z.md)
└ YYYY-DD-MMTHH-MM-SSZ.md (e.g. 2021-11-14T12-36-45Z.md)
\`\`\`

### License

This dataset is made available under an [Open Database (OdBL) License](https://opendatacommons.org/licenses/odbl/1.0/) by Open Terms Archive Contributors.
`;
}
25 changes: 19 additions & 6 deletions scripts/dataset/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
import generateRelease from './export/index.js';
import logger from './logger/index.js';
import publishRelease from './publish/index.js';
import publishReleaseGitLab from './publishGitLab/index.js';

export async function release({ shouldPublish, shouldRemoveLocalCopy, fileName }) {
const releaseDate = new Date();
Expand All @@ -24,13 +25,25 @@

logger.info('Start publishing dataset…');

const releaseUrl = await publishRelease({
archivePath,
releaseDate,
stats,
});
if (typeof process.env.OTA_ENGINE_GITHUB_TOKEN !== 'undefined') {
const releaseUrl = await publishRelease({
archivePath,
releaseDate,
stats,
});

logger.info(`Dataset published to ${releaseUrl}`);
logger.info(`Dataset published to ${releaseUrl}`);
}

if (typeof process.env.OTA_ENGINE_GITLAB_RELEASES_TOKEN !== 'undefined') {
const releaseUrl = await publishReleaseGitLab({
archivePath,
releaseDate,
stats,
});

Check failure on line 44 in scripts/dataset/index.js

View workflow job for this annotation

GitHub Actions / test (ubuntu-20.04)

Trailing spaces not allowed

Check failure on line 44 in scripts/dataset/index.js

View workflow job for this annotation

GitHub Actions / test (windows-latest)

Trailing spaces not allowed

Check failure on line 44 in scripts/dataset/index.js

View workflow job for this annotation

GitHub Actions / test (macos-latest)

Trailing spaces not allowed
logger.info(`Dataset published to ${releaseUrl}`);
}

if (!shouldRemoveLocalCopy) {
return;
Expand Down
102 changes: 102 additions & 0 deletions scripts/dataset/publishGitLab/index.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
import fsApi from 'fs';
import path from 'path';
import url from 'url';

import axios from 'axios';

Check failure on line 5 in scripts/dataset/publishGitLab/index.js

View workflow job for this annotation

GitHub Actions / test (ubuntu-20.04)

There should be no empty line within import group

Check failure on line 5 in scripts/dataset/publishGitLab/index.js

View workflow job for this annotation

GitHub Actions / test (windows-latest)

There should be no empty line within import group

Check failure on line 5 in scripts/dataset/publishGitLab/index.js

View workflow job for this annotation

GitHub Actions / test (macos-latest)

There should be no empty line within import group

import config from 'config';
import dotenv from 'dotenv';

Check failure on line 8 in scripts/dataset/publishGitLab/index.js

View workflow job for this annotation

GitHub Actions / test (ubuntu-20.04)

There should be no empty line within import group

Check failure on line 8 in scripts/dataset/publishGitLab/index.js

View workflow job for this annotation

GitHub Actions / test (windows-latest)

There should be no empty line within import group

Check failure on line 8 in scripts/dataset/publishGitLab/index.js

View workflow job for this annotation

GitHub Actions / test (macos-latest)

There should be no empty line within import group
//import { Octokit } from 'octokit';

Check failure on line 9 in scripts/dataset/publishGitLab/index.js

View workflow job for this annotation

GitHub Actions / test (ubuntu-20.04)

Expected exception block, space or tab after '//' in comment

Check failure on line 9 in scripts/dataset/publishGitLab/index.js

View workflow job for this annotation

GitHub Actions / test (windows-latest)

Expected exception block, space or tab after '//' in comment

Check failure on line 9 in scripts/dataset/publishGitLab/index.js

View workflow job for this annotation

GitHub Actions / test (macos-latest)

Expected exception block, space or tab after '//' in comment
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not keep commented-out lines, please erase them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is solved (along with other similar cases) and will be available when the pull request will be updated


import FormData from 'form-data';

import * as readme from '../assets/README.templateGitLab.js';

dotenv.config();

const gitlabAPIUrl = "https://gitlab.com/api/v4";

Check failure on line 17 in scripts/dataset/publishGitLab/index.js

View workflow job for this annotation

GitHub Actions / test (ubuntu-20.04)

Strings must use singlequote

Check failure on line 17 in scripts/dataset/publishGitLab/index.js

View workflow job for this annotation

GitHub Actions / test (windows-latest)

Strings must use singlequote

Check failure on line 17 in scripts/dataset/publishGitLab/index.js

View workflow job for this annotation

GitHub Actions / test (macos-latest)

Strings must use singlequote
const gitlabUrl = "https://gitlab.com";

Check failure on line 18 in scripts/dataset/publishGitLab/index.js

View workflow job for this annotation

GitHub Actions / test (ubuntu-20.04)

Strings must use singlequote

Check failure on line 18 in scripts/dataset/publishGitLab/index.js

View workflow job for this annotation

GitHub Actions / test (windows-latest)

Strings must use singlequote

Check failure on line 18 in scripts/dataset/publishGitLab/index.js

View workflow job for this annotation

GitHub Actions / test (macos-latest)

Strings must use singlequote

export default async function publishReleaseGitLab({
archivePath,
releaseDate,
stats,
}) {
let projectId = null;

// const octokit = new Octokit({ auth: process.env.GITHUB_TOKEN });

const [owner, repo] = url

Check failure on line 29 in scripts/dataset/publishGitLab/index.js

View workflow job for this annotation

GitHub Actions / test (ubuntu-20.04)

A space is required after '['

Check failure on line 29 in scripts/dataset/publishGitLab/index.js

View workflow job for this annotation

GitHub Actions / test (ubuntu-20.04)

A space is required before ']'

Check failure on line 29 in scripts/dataset/publishGitLab/index.js

View workflow job for this annotation

GitHub Actions / test (windows-latest)

A space is required after '['

Check failure on line 29 in scripts/dataset/publishGitLab/index.js

View workflow job for this annotation

GitHub Actions / test (windows-latest)

A space is required before ']'

Check failure on line 29 in scripts/dataset/publishGitLab/index.js

View workflow job for this annotation

GitHub Actions / test (macos-latest)

A space is required after '['

Check failure on line 29 in scripts/dataset/publishGitLab/index.js

View workflow job for this annotation

GitHub Actions / test (macos-latest)

A space is required before ']'
.parse(config.get('@opentermsarchive/engine.dataset.versionsRepositoryURLGitLab'))
.pathname.split('/')
.filter((component) => component);

Check failure on line 32 in scripts/dataset/publishGitLab/index.js

View workflow job for this annotation

GitHub Actions / test (ubuntu-20.04)

Unexpected parentheses around single function argument

Check failure on line 32 in scripts/dataset/publishGitLab/index.js

View workflow job for this annotation

GitHub Actions / test (windows-latest)

Unexpected parentheses around single function argument

Check failure on line 32 in scripts/dataset/publishGitLab/index.js

View workflow job for this annotation

GitHub Actions / test (macos-latest)

Unexpected parentheses around single function argument
const commonParams = { owner, repo };

try {
const repositoryPath = `${commonParams.owner}/${commonParams.repo}`;
const response = await axios.get(
`${gitlabAPIUrl}/projects/${encodeURIComponent(repositoryPath)}`,
{
headers: {

Check failure on line 40 in scripts/dataset/publishGitLab/index.js

View workflow job for this annotation

GitHub Actions / test (ubuntu-20.04)

Unexpected line break after this opening brace

Check failure on line 40 in scripts/dataset/publishGitLab/index.js

View workflow job for this annotation

GitHub Actions / test (windows-latest)

Unexpected line break after this opening brace

Check failure on line 40 in scripts/dataset/publishGitLab/index.js

View workflow job for this annotation

GitHub Actions / test (macos-latest)

Unexpected line break after this opening brace
Authorization: `Bearer ${process.env.OTA_ENGINE_GITLAB_RELEASES_TOKEN}`,
},
},
);
projectId = response.data.id;
} catch (error) {
//logger.error(`🤖 Error while obtaining projectId: ${error}`);
projectId = null;
}

const tagName = `${path.basename(archivePath, path.extname(archivePath))}`; // use archive filename as Git tag

try {
// First, create the release
const releaseResponse = await axios.post(
`${gitlabAPIUrl}/projects/${projectId}/releases`,
{
ref: 'main',
tag_name: tagName,
name: readme.title({ releaseDate }),
description: readme.body(stats),
},
{
headers: {
Authorization: `Bearer ${process.env.OTA_ENGINE_GITLAB_RELEASES_TOKEN}`,
'Content-Type': 'application/json',
},
},
);

const releaseId = releaseResponse.data.commit.id;

// Then, upload the ZIP file as an asset to the release
const formData = new FormData();
formData.append('name', archivePath);
formData.append(
'url',
`${gitlabUrl}/${commonParams.owner}/${commonParams.repo}/-/archive/${tagName}/${archivePath}`,
);
formData.append('file', fsApi.createReadStream(archivePath), {
filename: path.basename(archivePath),
});

const uploadResponse = await axios.post(
`${gitlabAPIUrl}/projects/${projectId}/releases/${tagName}/assets/links`,
formData,
{
headers: {
...formData.getHeaders(),
Authorization: `Bearer ${process.env.OTA_ENGINE_GITLAB_RELEASES_TOKEN}`,
},
},
);

const releaseUrl = uploadResponse.data.direct_asset_url;

return releaseUrl;
} catch (error) {
console.error('Failed to create release or upload ZIP file:', error);
throw error;
}
}
19 changes: 19 additions & 0 deletions src/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ import Archivist from './archivist/index.js';
import logger from './logger/index.js';
import Notifier from './notifier/index.js';
import Reporter from './reporter/index.js';
import ReporterGitlab from './reporterGitlab/index.js';

const require = createRequire(import.meta.url);

Expand Down Expand Up @@ -65,13 +66,31 @@ export default async function track({ services, types, extractOnly, schedule })
} catch (error) {
logger.error('Cannot instantiate the Reporter module; it will be ignored:', error);
}
archivist.attach(reporter);
} else {
logger.warn('Configuration key "reporter.githubIssues.repositories.declarations" was not found; issues on the declarations repository cannot be created');
}
} else {
logger.warn('Environment variable "OTA_ENGINE_GITHUB_TOKEN" was not found; the Reporter module will be ignored');
}

if (process.env.OTA_ENGINE_GITLAB_TOKEN) {
if (config.has('@opentermsarchive/engine.reporter.gitlabIssues.repositories.declarations')) {
try {
const reporter = new ReporterGitlab(config.get('@opentermsarchive/engine.reporter'));

await reporter.initialize();
archivist.attach(reporter);
} catch (error) {
logger.error('Cannot instantiate the ReporterGitlab module; it will be ignored:', error);
}
} else {
logger.warn('Configuration key "reporter.gitlabIssues.repositories.declarations" was not found; issues on the declarations repository cannot be created');
}
} else {
logger.warn('Environment variable "OTA_ENGINE_GITLAB_TOKEN" was not found; the ReporterGitlab module will be ignored');
}

if (!schedule) {
await archivist.track({ services, types });

Expand Down
Loading
Loading