- 
                Notifications
    You must be signed in to change notification settings 
- Fork 76
regenerate the mime tables from apache httpd; general refactorings #2042
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
regenerate the mime tables from apache httpd; general refactorings #2042
Conversation
| Package publishing
 Documentation at https://github.com/dart-lang/ecosystem/wiki/Publishing-automation. | 
| PR Health
Breaking changes  | 
| Package | Change | Current Version | New Version | Needed Version | Looking good? | 
|---|---|---|---|---|---|
| mime | Breaking | 2.0.0 | 2.0.0 | 3.0.0 Got "2.0.0" expected >= "3.0.0" (breaking changes) | 
This check can be disabled by tagging the PR with skip-breaking-check.
Changelog Entry ❗
| Package | Changed Files | 
|---|---|
| package:mime | pkgs/mime/lib/mime.dart pkgs/mime/lib/src/extension.dart pkgs/mime/lib/src/magic_number.dart pkgs/mime/lib/src/magic_numbers.dart pkgs/mime/lib/src/mime.dart pkgs/mime/lib/src/mime_shared.dart pkgs/mime/lib/src/mime_tables.g.dart pkgs/mime/lib/src/mime_type.dart pkgs/mime/lib/src/multipart_stream.dart pkgs/mime/lib/src/multipart_transformer.dart pkgs/mime/pubspec.yaml | 
Changes to files need to be accounted for in their respective changelogs.
This check can be disabled by tagging the PR with skip-changelog-check.
Coverage ⚠️ 
| File | Coverage | 
|---|---|
| pkgs/mime/lib/mime.dart | 💔 Not covered | 
| pkgs/mime/lib/src/magic_numbers.dart | 💚 100 % | 
| pkgs/mime/lib/src/mime.dart | 💚 83 % | 
| pkgs/mime/lib/src/mime_shared.dart | 💚 33 % | 
| pkgs/mime/lib/src/mime_type.dart | 💔 76 % ⬇️ 15 % | 
| pkgs/mime/lib/src/multipart_stream.dart | 💚 89 % | 
| pkgs/mime/lib/src/multipart_transformer.dart | 💚 100 % | 
| pkgs/mime/tool/generate_markdown.dart | 💔 Not covered | 
| pkgs/mime/tool/update_media_types.dart | 💔 Not covered | 
This check for test coverage is informational (issues shown here will not fail the PR).
This check can be disabled by tagging the PR with skip-coverage-check.
API leaks ✔️
The following packages contain symbols visible in the public API, but not exported by the library. Export these symbols or remove them from your publicly visible API.
| Package | Leaked API symbols | 
|---|
License Headers ✔️
// Copyright (c) 2025, the Dart project authors. Please see the AUTHORS file
// for details. All rights reserved. Use of this source code is governed by a
// BSD-style license that can be found in the LICENSE file.
| Files | 
|---|
| no missing headers | 
All source files should start with a license header.
Unrelated files missing license headers
| Files | 
|---|
| pkgs/bazel_worker/benchmark/benchmark.dart | 
| pkgs/bazel_worker/example/client.dart | 
| pkgs/bazel_worker/example/worker.dart | 
| pkgs/benchmark_harness/integration_test/perf_benchmark_test.dart | 
| pkgs/boolean_selector/example/example.dart | 
| pkgs/clock/lib/clock.dart | 
| pkgs/clock/lib/src/clock.dart | 
| pkgs/clock/lib/src/default.dart | 
| pkgs/clock/lib/src/stopwatch.dart | 
| pkgs/clock/lib/src/utils.dart | 
| pkgs/clock/test/clock_test.dart | 
| pkgs/clock/test/default_test.dart | 
| pkgs/clock/test/stopwatch_test.dart | 
| pkgs/clock/test/utils.dart | 
| pkgs/coverage/lib/src/coverage_options.dart | 
| pkgs/coverage/test/collect_coverage_config_test.dart | 
| pkgs/coverage/test/config_file_locator_test.dart | 
| pkgs/html/example/main.dart | 
| pkgs/html/lib/dom.dart | 
| pkgs/html/lib/dom_parsing.dart | 
| pkgs/html/lib/html_escape.dart | 
| pkgs/html/lib/parser.dart | 
| pkgs/html/lib/src/constants.dart | 
| pkgs/html/lib/src/encoding_parser.dart | 
| pkgs/html/lib/src/html_input_stream.dart | 
| pkgs/html/lib/src/list_proxy.dart | 
| pkgs/html/lib/src/query_selector.dart | 
| pkgs/html/lib/src/token.dart | 
| pkgs/html/lib/src/tokenizer.dart | 
| pkgs/html/lib/src/treebuilder.dart | 
| pkgs/html/lib/src/utils.dart | 
| pkgs/html/test/dom_test.dart | 
| pkgs/html/test/parser_feature_test.dart | 
| pkgs/html/test/parser_test.dart | 
| pkgs/html/test/query_selector_test.dart | 
| pkgs/html/test/selectors/level1_baseline_test.dart | 
| pkgs/html/test/selectors/level1_lib.dart | 
| pkgs/html/test/selectors/selectors.dart | 
| pkgs/html/test/support.dart | 
| pkgs/html/test/tokenizer_test.dart | 
| pkgs/pubspec_parse/test/git_uri_test.dart | 
| pkgs/stack_trace/example/example.dart | 
| pkgs/watcher/test/custom_watcher_factory_test.dart | 
| pkgs/yaml_edit/example/example.dart | 
| ), | ||
| ]; | ||
|  | ||
| Uint8List hex(String encoded) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider dropping the hex and allowing the list to be constant.
Use String hex encodings like:
MagicNumber(
    'image/heif',
    '\x00\x00\x00\x00\x66\x74\x79\x70\x6D\x69\x66\x31',
    mask: '\x00\x00\x00\x00\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF',
),Having the list be constant means you won't have all the computation of converting every string to a Uint8List on first access.
Then work directly off the strings, using codeUnitAt instead of List.[].
(I considered making the hex-parsing lazy, but you're likely to access every element if you trry to magic-number a random file.)
| final result = Uint8List(encoded.length ~/ 2); | ||
| for (var i = 0; i < result.length; i++) { | ||
| final offset = i * 2; | ||
| result[i] = int.parse(encoded.substring(offset, offset + 2), radix: 16); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have a hex codec somewhere.
... found it: https://pub.dev/documentation/convert/latest/convert/HexCodec-class.html
Don't know if it's worth using. This code can definitely be optimized.
| /// than [magicNumbersMaxLength] bytes was provided, some magic-numbers won't | ||
| /// be matched against. | ||
| String? lookup(String path, {List<int>? headerBytes}) { | ||
| String? lookupMimeType(String path, {List<int>? headerBytes}) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are adding new members, is now a good time to change the name to lookupMediaType?
| result = extensionToMime[ext]; | ||
| if (result != null) return result; | ||
| } | ||
| return null; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be reduced to:
 return _extensionToMime[ext] ?? (_useDefault ? extensionToMime[ext] : null);|  | ||
| for (final entry in _extensionToMime.entries) { | ||
| if (entry.value == mimeType) { | ||
| return entry.key; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This returns the first entry with that media type as value.
Is that guaranteed to be the canonical one? (Probably, if it's a linked hash map and the canonical one was added first. But might be worth mentioning.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm certain I saw wording to that effect in one of the mime registries I was looking at, but don't see that in the docs for the apache httpd mime list. Their mime registry may not have an idea of a canonical file ext. :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be the android one: https://android.googlesource.com/platform/frameworks/base/+/769f04c8f03a/mime/java-res/android.mime.types
"defines a mapping from one MIME type to the first of the listed extensions,"
| group('defaultExtensionMap', () { | ||
| test('keys are lowercase', () { | ||
| for (final key in defaultExtensionMap.keys) { | ||
| for (final key in extensionToMime.keys) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider not changing the name to "Mime" instead of "Media", when the original name wasn't called "Mime". It's a free "Media"!
| final response = await http.get(Uri.parse(mimeTypesUrl)); | ||
|  | ||
| final lines = response.body | ||
| .split('\n') | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use LineSplitter.
|  | ||
| final wsRegex = RegExp(' +'); | ||
| for (var line in lines) { | ||
| line = line.replaceAll('\t', ' ').replaceAll(wsRegex, ' '); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's can just be line = line.replaceAll(RegExp(r'[ \t]{2,}'), ' ');
| 'https://raw.githubusercontent.com/apache/httpd/refs/heads/trunk/docs/conf/mime.types'; | ||
|  | ||
| print('Reading from $mimeTypesUrl ...'); | ||
| final response = await http.get(Uri.parse(mimeTypesUrl)); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd consider not having the script automatically fetch from someone else's server. If we have some scheduling accident, we might start causing significant load.
Consider instead downloading the file into the repo manually, and generating from that file.
Or at least have a default mode where you generate from the downloaded file, so you have to pass a flag to download a new one.
Then you won't repeatedly download while tinkering with the script.
(That's what I do in package:characters to not repeatedly fetch from the Unicode site. It requires human action to actually reach out to a server.)
| for (var line in lines) { | ||
| line = line.replaceAll('\t', ' ').replaceAll(wsRegex, ' '); | ||
|  | ||
| final segments = line.split(' '); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we can combine with this too, gettting final segments = line.split(RegExp(r'[\t ]+'));
| | `audio/midi` | `mid` | `kar`, `midi`, `rmi` | | ||
| | `audio/mp4` | `m4a` | `m4b`, `mp4a` | | ||
| | `audio/mp4` | `m4a` | `mp4a` | | ||
| | `audio/mpeg` | `mpga` | `m2a`, `m3a`, `mp2`, `mp2a`, `mp3` | | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #2048
We might not want to trust that the first extension is the best default extension.
They seem to be ordered alphabetically, so the file is a mapping from extension to file type, not the other direction.
For some file types, some extensions are definitely better as default than others, like mp3 or ogg.
For others, it doesn't even make sense to have a default. See fx application/x-msdownload - those are completely different kinds of files, and assigning the wrong extension would be a mistake, but to a web server, they are all things you want to download on Windows.
| @devoncarew is there still intention to land this? Or is this superseded by #2063 ? | 
| Yes - this PR was more exploratory. | 
(note that this isn't intended to land - more to discuss potential changes)
Regenerate the mime tables from the apache httpd mapping and other general refactorings:
lib/src/mime_tables.g.dartlib/src/mime.dartMimeTypeResolverhave a similar API to the current two top level library symbols -lookupMimeType()andextensionFromMime()I think instead of looking to make the top-level
lookupMimeType()andextensionFromMime()methods something you can add mime types to, we want to have everything accessed through a mime registry, and by default provide a global instance of that. The registry could be configurable at construction time.MimeTypeResolveris somewhat similar to that, except that you can mutate it after construction. If we were to use it as our mime registry, the global instance would be mutable, which seems less desirable.Related to #2028; cc @lrhn.