Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

livecheck: add GithubLatest strategy #9381

Merged
merged 6 commits into from
Dec 7, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 12 additions & 3 deletions Library/Homebrew/livecheck/livecheck.rb
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,11 @@ module Livecheck
lolg.it
].freeze

STRATEGY_SYMBOLS_TO_SKIP_PREPROCESS_URL = [
:github_latest,
:page_match,
].freeze

UNSTABLE_VERSION_KEYWORDS = %w[
alpha
beta
Expand Down Expand Up @@ -381,14 +386,18 @@ def latest_version(formula, args:)
next
end

# Do not preprocess the URL when livecheck.strategy is set to :page_match
url = if livecheck_strategy == :page_match
# Only preprocess the URL when it's appropriate
url = if STRATEGY_SYMBOLS_TO_SKIP_PREPROCESS_URL.include?(livecheck_strategy)
Comment on lines +389 to +390
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

preprocess_url function (still) has the following bit:

if host.end_with?("github.com")
return url if path.match? %r{/releases/latest/?$}
owner, repo = path.delete_prefix("downloads/").split("/")
url = "#{scheme}://#{host}/#{owner}/#{repo}.git"
elsif host.end_with?(*GITEA_INSTANCES)

Shall we remove or make changes to it?

Copy link
Member

@samford samford Dec 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking is that /releases/latest URLs shouldn't ever be processed and keeping this guard in place ensures this continues to be true regardless of whether #preprocess_url is skipped for some strategies.

Removing this guard would mess up checks for any formulae that use a /releases/latest URL in a livecheck block (i.e., not updated to use strategy :github_latest). PageMatch checks only skip preprocessing when strategy :page_match is present in the livecheck block (i.e., checks that implicitly use PageMatch don't skip preprocessing). Homebrew/homebrew-core#66134 will address this for homebrew/core but other taps could be affected if we removed this.

There isn't any notable benefit to removing this simple guard but there are clear detriments under certain circumstances. With that in mind, I think we should simply keep this guard.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it - thank you! Perhaps it would be helpful to add a comment to that block above explaining which cases it catches (formulae in non-homebrew/core taps + <some conditions> ) -- otherwise it's not obvious why we still check for github.com.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I'm not the target audience, do you think something short like # Handles any `livecheck` blocks not using the `GithubLatest` strategy would be sufficient? This guard would also apply if a check that uses a /releases/latest URL mistakenly sneaks into homebrew/core, so I don't think we need to go into the details of when we're most likely to encounter this situation (e.g., non-core taps).

original_url
else
preprocess_url(original_url)
end

strategies = Strategy.from_url(url, livecheck_regex.present?)
strategies = Strategy.from_url(
url,
livecheck_strategy: livecheck_strategy,
regex_provided: livecheck_regex.present?,
)
strategy = Strategy.from_symbol(livecheck_strategy)
strategy ||= strategies.first
strategy_name = @livecheck_strategy_names[strategy]
Expand Down
22 changes: 16 additions & 6 deletions Library/Homebrew/livecheck/strategy.rb
Original file line number Diff line number Diff line change
Expand Up @@ -52,19 +52,28 @@ def from_symbol(symbol)
# Returns an array of strategies that apply to the provided URL.
#
# @param url [String] the URL to check for matching strategies
# @param regex_provided [Boolean] whether a regex is provided in a
# @param livecheck_strategy [Symbol] a {Strategy} symbol from the
# `livecheck` block
# @param regex_provided [Boolean] whether a regex is provided in the
# `livecheck` block
# @return [Array]
def from_url(url, regex_provided = nil)
def from_url(url, livecheck_strategy: nil, regex_provided: nil)
usable_strategies = strategies.values.select do |strategy|
# Ignore strategies with a priority of 0 or lower
next if strategy.const_defined?(:PRIORITY) && !strategy::PRIORITY.positive?
if strategy == PageMatch
# Only treat the `PageMatch` strategy as usable if a regex is
# present in the `livecheck` block
next unless regex_provided
elsif strategy.const_defined?(:PRIORITY) &&
!strategy::PRIORITY.positive? &&
from_symbol(livecheck_strategy) != strategy
# Ignore strategies with a priority of 0 or lower, unless the
# strategy is specified in the `livecheck` block
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We explicitly state on line 17 of this file that strategies with priorities of 0 or lower are ignored.
Shall we expand and say that "negative priorities raise an error (they don't at the moment, but they probably should), 0 priority is reserved for..., all normal strategies should use positive values below 10" ?

Why do we allow such strategies when they're specified in the livecheck block? I guess this comment should explain why we're doing this instead of or in addition to what it currently says.

Copy link
Member

@samford samford Dec 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the comment on line 15 mentions, the 0-10 range is informal. It's technically possible for us to go above or below this range if it becomes necessary in the future and that's why we don't strictly enforce the range in the code.

The only ideas that we strictly enforce in the code are:

  • The default priority is 5
  • Strategies with a non-positive priority (0 or below) are omitted unless specified in the livecheck block
  • Strategies that apply to a URL are arranged from highest priority to lowest

I do think this explanation can be improved a little (e.g., I didn't mention here that the way to apply a strategy with a non-positive priority is using strategy in a livecheck block), so I'll take care of that in a moment.

next
end

strategy.respond_to?(:match?) && strategy.match?(url)
end

usable_strategies << strategies[:page_match] if strategies.key?(:page_match) && regex_provided

# Sort usable strategies in descending order by priority, using the
# DEFAULT_PRIORITY when a strategy doesn't contain a PRIORITY constant
usable_strategies.sort_by do |strategy|
Expand All @@ -78,6 +87,7 @@ def from_url(url, regex_provided = nil)
require_relative "strategy/apache"
require_relative "strategy/bitbucket"
require_relative "strategy/git"
require_relative "strategy/github_latest"
require_relative "strategy/gnome"
require_relative "strategy/gnu"
require_relative "strategy/hackage"
Expand Down
73 changes: 73 additions & 0 deletions Library/Homebrew/livecheck/strategy/github_latest.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# typed: false
# frozen_string_literal: true

module Homebrew
module Livecheck
module Strategy
# The {GithubLatest} strategy identifies versions of software at
# github.com by checking a repository's "latest" release page.
#
# GitHub URLs take a few different formats:
#
# * `https://github.com/example/example/releases/download/1.2.3/example-1.2.3.tar.gz`
# * `https://github.com/example/example/archive/v1.2.3.tar.gz`
nandahkrishna marked this conversation as resolved.
Show resolved Hide resolved
nandahkrishna marked this conversation as resolved.
Show resolved Hide resolved
# * `https://github.com/downloads/example/example/example-1.2.3.tar.gz`
#
# A repository's `/releases/latest` URL normally redirects to a release
# tag (e.g., `/releases/tag/1.2.3`). When there isn't a "latest" release,
# it will redirect to the `/releases` page.
#
# This strategy should only be used when we know the upstream repository
# has a "latest" release and the tagged release is appropriate to use
# (e.g., "latest" isn't wrongly pointing to an unstable version, not
# picking up the actual latest version, etc.). The strategy can only be
# applied by using `strategy :github_latest` in a `livecheck` block.
#
# The default regex identifies versions like `1.2.3`/`v1.2.3` in `href`
# attributes containing the tag URL (e.g.,
# `/example/example/releases/tag/v1.2.3`). This is a common tag format
# but a modified regex can be provided in a `livecheck` block to override
# the default if a repository uses a different format (e.g.,
# `example-1.2.3`, `1.2.3d`, `1.2.3-4`, etc.).
#
# @api public
class GithubLatest
nandahkrishna marked this conversation as resolved.
Show resolved Hide resolved
NICE_NAME = "GitHub - Latest"

# A priority of zero causes livecheck to skip the strategy. We do this
# for {GithubLatest} so we can selectively apply the strategy using
# `strategy :github_latest` in a `livecheck` block.
PRIORITY = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, 0 priority was equivalent (specific) to the PageMatch strategy.
Now, we introduce another strategy with the same priority. This makes the way we write conditionals that process these priorities important.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original idea was that PRIORITY = 0 would cause a strategy to be omitted unless it's specified in a livecheck block (e.g., strategy :github_latest). I think the updated logic in Livecheck::Strategy#from_url more accurately represents this, as it now ensures the strategy can be applied when specified in a livecheck block.

Since PageMatch was the only strategy with a priority of zero up to this point and it was special-cased, I forgot to account for the from_symbol(livecheck_strategy) != strategy condition originally. Outside of this omission, we haven't treated PRIORITY = 0 as being equivalent to strategy == PageMatch in livecheck's code and we specifically reference PageMatch (or :page_match) where appropriate.

PageMatch is an exception to the above idea, rather than the rule. The PageMatch strategy will automatically apply if a livecheck block contains a url and regex, so it's not omitted like other zero-priority strategies. The reasons why it has a priority of zero is to be able to enforce the regex requirement before applying it and to also deprioritize it with respect to more specific strategies (e.g., those with the DEFAULT_PRIORITY of 5 or a higher PRIORITY).


I've thought about updating the #match? method in strategies to allow for contextual information to be passed in. I imagine this would either be the Livecheck object (i.e., the information in the formula's livecheck block) or maybe an optional options hash (to allow for more flexibility/extensibility in the future). With this setup, we could give PageMatch a priority of 1 (deprioritized but not omitted) and enforce the regex requirement in PageMatch#match?.

I believe this is a better setup overall, as we wouldn't violate the PRIORITY = 0 omission idea above and also it would allow us to keep this information in strategies instead of Strategy#from_url. I'll create a PR for this after I've finished the livecheck docs PR (and I allow myself to work on livecheck's internals again) and we can discuss it further there.


That said, I believe the related changes in this PR are sufficient for now but let me know if I've overlooked anything important.


nandahkrishna marked this conversation as resolved.
Show resolved Hide resolved
samford marked this conversation as resolved.
Show resolved Hide resolved
# The `Regexp` used to determine if the strategy applies to the URL.
URL_MATCH_REGEX = %r{//github\.com(?:/downloads)?(?:/[^/]+){2}}i.freeze

# Whether the strategy can be applied to the provided URL.
#
# @param url [String] the URL to match against
# @return [Boolean]
def self.match?(url)
URL_MATCH_REGEX.match?(url)
end

# Generates a URL and regex (if one isn't provided) and passes them
# to {PageMatch.find_versions} to identify versions in the content.
#
# @param url [String] the URL of the content to check
# @param regex [Regexp] a regex used for matching versions in content
# @return [Hash]
def self.find_versions(url, regex = nil)
%r{github\.com/(?:downloads/)?(?<username>[^/]+)/(?<repository>[^/]+)}i =~ url.sub(/\.git$/i, "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked Homebrew/core, and the following packages use /downloads/ subfolder:

  • apktool.rb
  • btpd.rb
  • cssembed.rb
  • growly.rb
  • henplus.rb
  • jadx.rb
  • midgard2.rb
  • rpg.rb
  • syck.rb
  • tidyp.rb
  • wrk-trello.rb

My understanding is that /downloads folders have been deprecated for a while now -- see https://github.blog/2012-12-12-goodbye-uploads/. I wonder if we should remove downloads from here and update the above formulae to use proper download paths.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've already been through the /downloads/ formulae recently (as part of another PR) and I took care of the straightforward ones, so this is what remains. As you would guess, most of these are highly outdated and tend to have low install counts in our analytics. The repository uploads URLs continue to resolve, so updating these is low priority in comparison to work that can have a greater impact.

Can potentially be updated

  • apktool.rb: The /downloads/ URL is a resource from another GitHub repository that's used in the test block but I didn't see an alternative location for this particular file. The formula could be updated to use a different apk if a suitable replacement can be found.
  • btpd.rb: The formula would need to be updated to build using the 0.16 tag archive.
  • jadx.rb: This uses the same apk file as the apktool formula, so the situation is the same.
  • midgard2.rb: The formula would need to be updated to build using the 12.09.1 tag archive.
  • rpg.rb: The formula would need to be updated to build using the 0.3.0 tag archive. This release is from 2010, the repository is archived (I deprecated the formula in the past), and there have only been 12 installs over the past year, so it may not be worth the effort.
  • tidyp.rb: The formula would need to be updated to build using the 1.04 tag archive.

Unlikely that we can update

  • cssembed.rb: stable is a jar file and I don't see alternatives. The GitHub repository is archived (I deprecated the formula in the past), so it's not worth the effort.
  • growly.rb: The repository doesn't have any tags or releases (and we're not going to use the growly file from master). Besides that, the code hasn't been modified in nine years and the repository doesn't have a license file (missing license file for the repo ryankee/growly#7).
  • henplus.rb: The repository doesn't have any tags or releases and the formula is deprecated since it doesn't build. There's been some activity here and there in recent years but we're stuck unless they create a release.
  • syck.rb: This is one of the two formulae using a github.s3.amazonaws.com/downloads URL. There aren't any alternatives (to my knowledge) and it hasn't been updated in years (it's a project by why the lucky stiff and I don't imagine him coming back anytime soon).
  • wrk-trello.rb: This is one of the two formulae using a github.s3.amazonaws.com/downloads URL. There aren't any alternatives (to my knowledge) and it hasn't been updated in eight years.


# Example URL: `https://github.com/example/example/releases/latest`
page_url = "https://github.com/#{username}/#{repository}/releases/latest"

# The default regex is the same for all URLs using this strategy
regex ||= %r{href=.*?/tag/v?(\d+(?:\.\d+)+)["' >]}i

Homebrew::Livecheck::Strategy::PageMatch.find_versions(page_url, regex)
end
end
end
end
end
33 changes: 33 additions & 0 deletions Library/Homebrew/test/livecheck/strategy/github_latest_spec.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# typed: false
# frozen_string_literal: true

require "livecheck/strategy/github_latest"

describe Homebrew::Livecheck::Strategy::GithubLatest do
subject(:github_latest) { described_class }

let(:github_release_artifact_url) {
"https://github.com/example/example/releases/download/1.2.3/example-1.2.3.tar.gz"
}
let(:github_tag_archive_url) { "https://github.com/example/example/archive/v1.2.3.tar.gz" }
let(:github_repository_upload_url) { "https://github.com/downloads/example/example/example-1.2.3.tar.gz" }
let(:non_github_url) { "https://brew.sh/test" }

describe "::match?" do
it "returns true if the argument provided is a GitHub release artifact URL" do
expect(github_latest.match?(github_release_artifact_url)).to be true
end

it "returns true if the argument provided is a GitHub tag archive URL" do
expect(github_latest.match?(github_tag_archive_url)).to be true
end

it "returns true if the argument provided is a GitHub repository upload URL" do
expect(github_latest.match?(github_repository_upload_url)).to be true
end

it "returns false if the argument provided is not a GitHub URL" do
expect(github_latest.match?(non_github_url)).to be false
end
end
end