Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add beets==2 and drop beets<1.5 support, improve catalognum, albumtypes parsing #73

Merged
merged 83 commits into from
Nov 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
83 commits
Select commit Hold shift + click to select a range
29f1e09
test_lib: report a single test_lib summary when running with xdist
snejus May 17, 2024
7398247
test_lib: By default test against the last commit
snejus May 20, 2024
65278cb
test_lib: add --fields arguments to report a subset of fields
snejus Oct 29, 2024
3369e4e
test_lib: use symlinks
snejus Nov 2, 2024
bfe0a41
Update githooks
snejus May 18, 2024
1ac296f
tests: update pytest flags
snejus May 20, 2024
0324489
Bump dev version
snejus Nov 2, 2024
650fa2e
internal: tidy up artists/catalognum clean up logic
snejus Nov 3, 2024
9bcf72d
album: use the name as it is when it is named by the albumartist
snejus May 17, 2024
0bdd003
album: require the album to span entire line when looking for a wild one
snejus May 18, 2024
756fec9
catalognum: support some new formats
snejus May 18, 2024
309b995
catalognum: parse catalogue numbers that contain label names in singl…
snejus May 18, 2024
0e4f86b
artist: use album artist for single track release with several remixes
snejus May 18, 2024
78d46ef
catalognum: reduce number of false short catnum positives
snejus May 18, 2024
eb9f461
media: ignore subscriptions, refactor MediaInfo
snejus May 18, 2024
fddce55
catalognum: refactor the logic, clarify the priority
snejus Nov 2, 2024
44e82da
catalognum: improve matching by label prefix
snejus May 18, 2024
8eeb634
catalognum: handle hash symbol in the desc
snejus May 19, 2024
b0706d8
catalognum: exclude label name
snejus May 20, 2024
9f45e4c
catalognum: exclude matches followed by a comma
snejus May 20, 2024
303c610
catalognum: skip matches like BBC6, NDR-5, WZY2.5
snejus May 20, 2024
0fa4173
catalognum: exclude match if followed by a single quote
snejus May 21, 2024
024ee92
catalognum: prevent albumartists becoming catalogue numbers
snejus May 21, 2024
46a8d04
catalognum: disallow spaces in most cases
snejus May 21, 2024
0f788ab
catalognum: [DRAKEN49] is an artist, not your average catalogue number
snejus May 21, 2024
43a6da8
catalognum: explicitly hardcode patterns for the most common variations
snejus May 21, 2024
4f67803
catalognum: fix old exclusion pattern which hid valid catalogue numbers
snejus May 22, 2024
09915f3
catalognum: catch RP suffix
snejus May 23, 2024
4c89c8b
internal: rename album to album_name
snejus May 22, 2024
aec85c8
internal: fix Match types again
snejus May 23, 2024
36977de
catalognum: separate the functionality into its own class
snejus May 23, 2024
ddf57ce
album: keep brackets in place when album name is not catalognum
snejus May 29, 2024
e8c7cdc
internal: remove redundant groups from catalognum patterns
snejus May 29, 2024
4898986
album: search all disctitles when looking for EP/LP album name
snejus May 29, 2024
f34cfbd
catalognum: move logic from metaguru to Catalognum
snejus May 30, 2024
da88303
albumtype: parse 2LP
snejus May 31, 2024
2ba81c7
albumtypes: improve EP/LP detection accuracy in the desc
snejus May 31, 2024
d1dabd0
internal: sort out types in helpers.py
snejus Oct 27, 2024
fa88160
album: do not remove EP or LP from the beginning
snejus Oct 28, 2024
298fca9
catalognum: parse SK11X015
snejus Jun 1, 2024
d039ce6
catalognum: match label name prefix without punctuation
snejus Jun 2, 2024
ef5329e
album: remove V.A from the beginning of the album name
snejus Jun 2, 2024
2f900f4
album: keep Free Download in front if followed by alphanum
snejus Jun 2, 2024
a8c28fa
cleanup: remove preview, free dl from track titles, albums
snejus Jun 2, 2024
e350ed6
cleanup: remove Name Your Price
snejus Jun 2, 2024
32409c6
title: remove leading track number from the title more reliably
snejus Jun 25, 2024
1751380
track_alt,title,artist: improve track_alt parsing, recover false posi…
snejus Jun 3, 2024
de2a118
Update test jsons
snejus Jun 25, 2024
fc304f8
internal: improve clarity of beets version checks
snejus Aug 20, 2024
d4abe18
internal: add types in conftest
snejus Aug 21, 2024
0b90bf9
tests: pretty print json failures and streamline testing
snejus Aug 22, 2024
7e5a9de
tests: format expected json data consistently
snejus Aug 23, 2024
7e3f983
Drop support for beets<1.5
snejus Aug 26, 2024
4796ca0
Add support for beets==2.0.0
snejus Sep 21, 2024
97325be
artist/title: ensure artist is first
snejus Oct 27, 2024
4825a9a
Fix a couple of lints
snejus Oct 17, 2024
ba560c1
catalognum: parse ST172
snejus Oct 28, 2024
150b576
albumtype: check for single before checking for compilations
snejus Oct 29, 2024
77aa75e
cleanup: remove HTML whitespace from the incoming data
snejus Oct 29, 2024
20c06be
albumtype: remove funky ep description parsing logic
snejus Oct 29, 2024
9c97182
albumtype: add either ep or lp to albumtypes, not both
snejus Oct 30, 2024
9f19a7d
catalognum: restrict ROAD6 format
snejus Oct 31, 2024
e18cff9
album: do not split album that has a year range
snejus Nov 1, 2024
28c02d9
album: keep albumartist in album when it is followed by a dot
snejus Nov 1, 2024
6291598
cleanup: remove just out from albums,titles
snejus Nov 1, 2024
3a21bde
cleanup: "- album" from album/track names
snejus Nov 1, 2024
0b7dd25
catalognum: parse TMF!12
snejus Nov 1, 2024
419b38a
artist/title: use all dashes, pipe symbol explicitly for splits
snejus Nov 1, 2024
d9b1b3d
http: allow redirects
snejus Nov 1, 2024
953d7e0
cleanup: remove selected by
snejus Nov 1, 2024
7df5ef9
catalognum: relax parsing from album name
snejus Nov 2, 2024
734de2e
internal: rename TrackNames to Names
snejus Nov 1, 2024
59a7bf6
title: constrain label removal
snejus Nov 2, 2024
8fce513
albumartist: remove note about remixer
snejus Nov 2, 2024
49e824a
title: recover wrongly split titles
snejus Nov 3, 2024
02d2bb5
albumtype: include remix when number of remixes is one less than tot …
snejus Nov 3, 2024
f7129a8
internal: fix REMIX_IN_TITLE pattern
snejus Nov 3, 2024
ee262f5
albumtype: prioritize the check for single with a single track
snejus Nov 4, 2024
7edf4a0
tracks: handle empty release
snejus Nov 4, 2024
6ff27fd
album: parse "Title :" header in the desc
snejus Nov 4, 2024
502891d
albumtype: check for string E.P for EP albumtype
snejus Nov 4, 2024
a921b6a
albumtype: identify LP in vinyl media
snejus Nov 4, 2024
58648d7
Up the version and update deps
snejus Nov 4, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 60 additions & 35 deletions .githooks/post-commit
Original file line number Diff line number Diff line change
@@ -1,43 +1,68 @@
#!/bin/bash
#!/bin/zsh
# Run tests across all test files, and report what has changed in a pretty way.
# Test metadata files are stored in $TEST_FILES_DIR (./jsons) directory: those
# are JSON files that contain Bandcamp release metadata.
#
# You can use 'url2json -s <release-url>' to create such testing file for your
# release of choice.
# It extracts the relevant metadata saves it in './jsons' directory under
# the filename that resembles that release URL (where slashes are replaced by
# underscores).

TESTS_DIR=lib_tests
setopt extendedglob nullglob
zmodload zsh/mapfile
autoload -Uz zmv

[ "$(git branch --show-current)" = dev ] || exit
TESTS_DIR=lib_tests

commit=$(git rev-parse --short HEAD)
previous_commit=$(git rev-parse --short HEAD~1)
[[ -d .git/rebase-merge ]] || [[ $(git branch --show-current) == dev ]] || exit

[[ -e $TESTS_DIR/$previous_commit ]] || exit
commit=$(git show --no-patch --format='%h %s')
after=${${=commit}[1]}
message=${commit#$after }
if git diff --quiet HEAD~1..HEAD ./beetsplug; then
print "$after: Source code has not changed, bye: $message"
exit
fi

git diff --quiet HEAD~1..HEAD ./beetsplug
committed_source_code=$?
git diff --quiet ./beetsplug
dirty_worktree=$?
if [[ -d .git/rebase-merge ]]; then
results=(${(f@)${mapfile[.git/rebase-merge/done]%$'\n'}})
current_commit=(${=results[-1]})
action=$current_commit[1]
rebase_head=${current_commit[2]:0:7}

if ((!committed_source_code)); then
cp -r "$TESTS_DIR/$previous_commit" "$TESTS_DIR/$commit"
cp "$TESTS_DIR/album-$previous_commit" "$TESTS_DIR/album-$commit"
cp "$TESTS_DIR/tracks-$previous_commit" "$TESTS_DIR/tracks-$commit"
exit
if [[ $action =~ (fixup|squash) ]]; then
print "Skipping commit rewrite: $message"
exit
fi
if [[ -d $TESTS_DIR/$rebase_head ]]; then
if [[ ! -s .git/rebase-merge/git-rebase-todo ]] && ( (( ${#${(M)${results}:#*(edit|fixup|squash)*}} )) || (( ! ${#${(M)${results}:#*reword*}} )) ); then
print "Current (latest) commit, testing against REBASE_HEAD $rebase_head to synchronise results: $message"
before=$rebase_head
else
zmv $TESTS_DIR/'(*)'$rebase_head $TESTS_DIR/'${1}'$after
if git diff --quiet $rebase_head..$after ./beetsplug; then
print "No source code changes in comparison to $rebase_head: $message"
exit
else
print "Testing previous against new commit: $message"
fi
fi
else
print "Results from base commit $rebase_head do not exist, testing against last commit $before: $message"
fi
else
print "Testing previous against new commit: $message"
fi

cp -r "$TESTS_DIR/$previous_commit" "$TESTS_DIR/$commit"
((dirty_worktree)) && git stash
pytest -p no:randomly -k 'lib and file' -s -n 4 --base "$previous_commit" --target "$commit"
((dirty_worktree)) && git stash pop

jq '[(input_filename | sub(".*/"; "")), if .name then {before: .} else {after: .} end]' jsons/* "$TESTS_DIR/$commit"/*.json |
jq -s '
group_by(.[0]) |
map(
[.[][1]] |
add |
select(.after.album) |
"\(.before.name) -> \(.after.album)"
) |
sort |
unique[]' -r >"$TESTS_DIR/album-$commit"
jq '(.tracks//[]) | map(if .track_alt then "\(.track_alt). " else "" end + .artist + " - " + .title)[]' "$TESTS_DIR/$commit"/*.json -r >"$TESTS_DIR/tracks-$commit"

git diff --unified=0 --no-index --color-words "$TESTS_DIR/album-$previous_commit" "$TESTS_DIR/album-$commit"
git diff --unified=0 --no-index --color-words "$TESTS_DIR/tracks-$previous_commit" "$TESTS_DIR/tracks-$commit"
(( $#before )) || {
folders=(lib_tests/^[a-z]##(/:t))
pat="^(${(j:|:)folders})"
before=($(git log main~3..HEAD~ --format=%h ./beetsplug | grep -E $pat -m1))
}

git diff --quiet ./beetsplug
dirty_worktree=$?
(( dirty_worktree )) && git stash
./test_lib $before $after
(( dirty_worktree )) && git stash pop --quiet
14 changes: 0 additions & 14 deletions .githooks/post-rewrite

This file was deleted.

2 changes: 1 addition & 1 deletion .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ jobs:
fail-fast: false
matrix:
python: ["3.8", "3.9", "3.10", "3.11", "3.12"]
beets: ["1.4.9", "1.5.0", "1.6.0"]
beets: ["1.5.0", "1.6.0", "2.0.0"]
steps:
- uses: actions/checkout@v4
with:
Expand Down
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,6 @@ dist
htmls
lib_tests
reports
jsons
*jsons
stubs
singletons
133 changes: 129 additions & 4 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,131 @@
## Unreleased
## [0.20.0] 2024-11-04

### Removed

- Drop support for `beets<1.5`.

### Fixed

- `album`:
- Keep remix artist in place within 'remix' parentheses, such as **Album (Artist
Remix)**.
- When a release has the same name as the album artist, do not clean/adjust it.
- Keep album artist in album when it is immediately followed by a dot.
- Do not remove **EP** or **LP** from the beginning of the album name.
- Remove **`V.A`** from the beginning of the album name, in the same way we remove
**`VA`**.
- Do not split album with a year range into `albumartist` and `album`.

- cleanup:
- Remove **`(... preview)`**, **`free dl`**, **`Name Your Price:`**, **`just out!`**,
**`- Album`**, **`(Selected by ...)`** from album and track names.
- Remove unicode HTML whitespace from incoming data.

- `albumtype`:
- Identify **LP** / **album** type from vinyl media descriptions.
- Remove some funky description parsing logic responsible for multiple **ep** false
positives.
- Resolve either **ep** or **lp** to add to `albumtypes`, and never both.
- Check for **single** album type (expecting a single track only) before anything else.

- `artist`: handle remix releases with a single title and its remixes. Instead of trying
to determine the artist from the titles, detect such release and use the given album
artist.

- `catalognum`:
- Fix false positives:
- Exclude very short matches like **OP-1**, **SK-1** and **BBC6**.
- Exclude label name, and label name without spaces.
- Exclude matches followed by a comma. This excludes many artists from release
descriptions that happen to have names that look like catalogue numbers.
- Exclude matches followed by a single quote. This used to wrongly match vinyl disc
titles like **LABEL 12** here: `LABEL 12'' Black Vinyl`.
- Prevent album artists becoming catalogue numbers.
- Remove pattern responsible for many false positives that contain a space, like
**DOOM 3**, **ONLY 1** and **NIGHT 3** etc.
- Do not any more assume that artist **[DRAKEN49]** is a catalogue number.
- Instead of using a pattern like `[A-Z]+-[0-9]+` _(more than one capital letter —
dash — more than one number)_, explicitly specify how many letters and numbers are
expected for the most common variations, like **TAR30**, **RM12012**, **HEY-101**
etc.

- `media`: ignore subscription type Bandcamp media format which returns a duplicate
digital media.

- `title`:
- Remove track number from the beginning of the title more reliably.
- Remove label name from anywhere if it inside brackets or from the end of the title if
preceded by a dash or a colon.

- `track`:
- `artist` / `title` / `track_alt`: Handle edge cases where `track_alt` is followed by a
single dash. Some instances were previously ignored.
- `artist` / `title` / `track_alt`: fix several artists and titles which had pieces
incorrectly identified as `track_alt`.
- `artist` / `title`: In releases where every track has the same title, check whether
this title may actually be the artist name. If so, move it to the artist field.
- `artist` / `title`: use characters **`[|-–—]`** explicitly for splitting artists and
titles.
- `artist` / `title`: recover some of the original titles which contain **`-`** and got
split into `artist` and `title`.
- Return an empty tracklist for [releases that have no tracks].

[releases that have no tracks]: https://seagrave.bandcalmp.com/album/interlocked

### Added
- Auto-Tagger: Wider search analogous to cli search query as a fallback for bad or missing meta data

- Auto-Tagger: Wider search analogous to cli search query as a fallback for bad or missing
meta data.
- Add artist list fields support for `beets==2.0.0`.

### Updated

- `album`:
- When album name is wrapped in brackets, **[ALBUM]**, keep the brackets in place.
- Uncover some release names in the description preceded by **Title :**.

- `albumartist`:
- Remove notes about remixes, like **(incl. ABC remix)** and similar.

- `albumtype`:
- Improve accuracy of identifying **EP** and **LP** release types from the description.
- Include **remix** albumtype to the release when remixed track count is one less than
the track count.
- Check album names that may end with **E.P.** instead of **EP**.

- `catalognum`:
- Add support for new formats: **`UVB76-023`**, **`SOP 061-1233`**, **`a+w lp029`**,
**`SK11X015`**.
- Parse label-like catalogue numbers for singletons too.
- When searching for a catalogue number which is prefixed by the label name
1. Take two variations of the label name
1. Original one
2. Without **Records**, **Recordings**, **Productions**, **Music** endings
2. Form prefixes from each variation
1. The original variation
2. With punctuation and spaces removed
3. Its acronym when it has multiple words
3. Lastly, if the original label has multiple words, use the first word as another
possible prefix.

For example, for a label named **Diffuse Reality Records**, the plugin is able to
recognize the following catalogue numbers (case insensitively)
- **Diffuse Reality Records**001
- **DiffuseRealityRecords**001
- **DRR**001
- **Diffuse Reality**001
- **DiffuseReality**001
- **DR**001
- **Diffuse**001

- Parse catalogue number from the description when the header is followed by a hash
symbol, like **CAT#: ABC-123**.
- Properly catch catalogue number suffix **RP**.
- Relax the rule that looks for a catalogue number within brackets in the release title.

- `track`:
- For tracks named like **[Remixer] - Artist - Title** move the remixer to the end:
**Artist - Title [Remixer]**.

## [0.19.3] 2024-10-17

Expand All @@ -12,7 +136,8 @@

### Fixed

- `exclude_extra_fields`: A typo that prevented exclude configurations from being applied correctly
- `exclude_extra_fields`: A typo that prevented exclude configurations from being applied
correctly

## [0.19.2] 2024-08-04

Expand Down Expand Up @@ -85,7 +210,6 @@
- consider **with** and **w/** as markers for collaborating artists
- remove **`bonus -`**
- `Artist - Title (bonus - something)` -> **`Artist - Title (something)`**

[album sent to us by the devil himself]: https://examine-archive.bandcamp.com/album/va-examine-archive-international-sampler-xmn01

## [0.17.2] 2023-08-09
Expand Down Expand Up @@ -1124,3 +1248,4 @@ Thanks @arogl for reporting each of the above!
[0.19.1]: https://github.com/snejus/beetcamp/releases/tag/0.19.1
[0.19.2]: https://github.com/snejus/beetcamp/releases/tag/0.19.2
[0.19.3]: https://github.com/snejus/beetcamp/releases/tag/0.19.3
[0.20.0]: https://github.com/snejus/beetcamp/releases/tag/0.20.0
8 changes: 4 additions & 4 deletions beetsplug/bandcamp/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,10 @@
import logging
import re
from contextlib import contextmanager
from functools import lru_cache, partial
from functools import partial
from itertools import chain
from operator import itemgetter
from typing import TYPE_CHECKING, Any, Dict, Iterable, Iterator, List, Literal, Sequence
from typing import TYPE_CHECKING, Any, Dict, Iterable, Iterator, List, Literal

from beets import IncludeLazyConfig, config, library, plugins

Expand Down Expand Up @@ -64,10 +64,10 @@ class BandcampRequestsHandler:
_log: logging.Logger
config: IncludeLazyConfig

def _exc(self, msg_template: str, *args: Sequence[str]) -> None:
def _exc(self, msg_template: str, *args: object) -> None:
self._log.log(logging.WARNING, msg_template, *args, exc_info=True)

def _info(self, msg_template: str, *args: Sequence[str]) -> None:
def _info(self, msg_template: str, *args: object) -> None:
self._log.log(logging.DEBUG, msg_template, *args, exc_info=False)

def _get(self, url: str) -> str:
Expand Down
Loading