forked from twitter/twitter-text
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Updates the twitter-text parsing library v2.0. Important changes include: - New configuration file and JSON format. - Support for length calculations based on weighting ranges of code points. - Updates Java, JavaScript, Ruby, and Objective-C implementations. - Regular expressions used for parsing Tweets and calculating length are now similar across all languages. - Conform to RFC 1035 for domain names. - Support for punycoded hostnames per RFC 3490. - Domain labels restricted to a maximum length of 63 characters, conforming to RFC 1035. - The overall URL length cannot be more than 4096 characters. - Allow hyphens in the middle of a non-ASCII hostname. - Updated conformance tests. - Deprecates old v1 length calculation methods. - Differentiating between the http and https shortened length for URLs has been deprecated (https is used for all t.co URLs). - Update TLDs list. - Update Emoji Regexes. - Many bug fixes.
- Loading branch information
Showing
299 changed files
with
52,448 additions
and
2,525 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,134 @@ | ||
# twitter-text Configuration | ||
|
||
twitter-text 2.0 introduces a new configuration format as well as APIs | ||
for interpreting this configuration. The configuration is a JSON | ||
string (or file) and the parsing APIs have been provided in each of | ||
twitter-text’s four reference languages. | ||
|
||
## Format | ||
|
||
The configuration format is a JSON string. The JSON can have the following properties: | ||
|
||
* `version` (required, integer, min value 0) | ||
* `maxWeightedTweetLength` (required, integer, min value 0) | ||
* `scale` (required, integer, min value 1) | ||
* `defaultWeight` (required, integer, min value 0) | ||
* `transformedURLLength` (integer, min value 0) | ||
* `ranges` (array of range items) | ||
|
||
A `range item` has the following properties: | ||
|
||
* `start` (required, integer, min value 0) | ||
* `end` (required, integer, min value 0) | ||
* `weight` (required, integer, min value 0) | ||
|
||
## Parameters | ||
|
||
### version | ||
|
||
The version for the configuration string. This is an integer that will | ||
monotonically increase in future releases. The legacy version of the | ||
string is version 1; weighted code point ranges and 280-character | ||
“long” tweets are supported in version 2. | ||
|
||
### maxWeightedTweetLength | ||
|
||
The maximum length of the tweet, weighted. Legacy v1 tweets had a | ||
maximum weighted length of 140 and all characters were weighted the | ||
same. In the new configuration format, this is represented as a | ||
maxWeightedTweetLength of 140 and a defaultWeight of 1 for all code | ||
points. | ||
|
||
### scale | ||
|
||
The Tweet length is the (`weighted length` / `scale`). | ||
|
||
### defaultWeight | ||
|
||
The default weight applied to all code points. This is overridden in | ||
one or more range items. | ||
|
||
### transformedURLLength | ||
|
||
The length counted for URLs against the total weight of the Tweet. In | ||
previous versions of twitter-text, which was the “shortened URL | ||
length.” Differentiating between the http and https shortened length | ||
for URLs has been deprecated (https is used for all t.co URLs). The | ||
default value is 23. | ||
|
||
### ranges | ||
|
||
An array of range items that describe ranges of Unicode code points | ||
and the weight to apply to each code point. Each range is defined by | ||
its start, end, and weight. Surrogate pairs have a length that is | ||
equivalent to the length of the first code unit in the surrogate | ||
pair. Note that certain graphemes are the result of joining code | ||
points together, such as by a zero-width joiner; unlike a surrogate | ||
pair, the length of such a grapheme will be the sum of the weighted | ||
length of all included code points. | ||
|
||
## API | ||
|
||
Each of the four reference language implementations provides a way to | ||
read the JSON configuration. | ||
|
||
## Java | ||
|
||
```java | ||
public static TwitterTextConfiguration configurationFromJson(@Nonnull String json, boolean isResource) | ||
``` | ||
|
||
`json`: the configuration string or file name in the config directory (see `isResource`) | ||
`isResource`: if true, json refers to a file name for the configuration. | ||
|
||
## JavaScript | ||
|
||
Configurations are accessed via `twttr.text.configs` (example: | ||
`twttr.text.configs.version2`). This config is passed as an argument | ||
to `parseTweet:` | ||
|
||
```js | ||
twttr.txt.parseTweet(inputText, configVersion2) | ||
``` | ||
|
||
## Objective-C | ||
|
||
The Objective-C implementation provides two methods for reading the | ||
input, either from a string or a file resource. | ||
|
||
```objective-c | ||
+ (instancetype)configurationFromJSONResource:(NSString *)jsonResource; | ||
+ (instancetype)configurationFromJSONString:(NSString *)jsonString; | ||
``` | ||
|
||
The default configuration can also be set: | ||
|
||
```objective-c | ||
+ (void)setDefaultParserConfiguration:(TwitterTextConfiguration *)configuration | ||
``` | ||
|
||
The resource string refers to the two included configuration files | ||
(which are referenced in the Xcode project). | ||
|
||
## Ruby | ||
|
||
Ruby provides the `Twitter::Configuration` class and means to read | ||
from a file or string. | ||
|
||
```ruby | ||
def self.parse_string(string, options = {}) | ||
def self.parse_file(filename) | ||
``` | ||
|
||
You can use `configuration_from_file()` or initialize a configuration | ||
using `Twitter::Configuration.new(config)`, where `config` is the | ||
output of one of the two above methods. | ||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
{ | ||
"version": 1, | ||
"maxWeightedTweetLength": 140, | ||
"scale": 1, | ||
"defaultWeight": 1, | ||
"transformedURLLength": 23, | ||
"ranges": [] | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
{ | ||
"version": 2, | ||
"maxWeightedTweetLength": 280, | ||
"scale": 100, | ||
"defaultWeight": 200, | ||
"transformedURLLength": 23, | ||
"ranges": [ | ||
{ | ||
"start": 0, | ||
"end": 4351, | ||
"weight": 100 | ||
}, | ||
{ | ||
"start": 8192, | ||
"end": 8205, | ||
"weight": 100 | ||
}, | ||
{ | ||
"start": 8208, | ||
"end": 8223, | ||
"weight": 100 | ||
}, | ||
{ | ||
"start": 8242, | ||
"end": 8247, | ||
"weight": 100 | ||
} | ||
] | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.