Skip to content

Commit

Permalink
Initial commit of twitter-text 2.0
Browse files Browse the repository at this point in the history
Updates the twitter-text parsing library v2.0. Important changes
include:

- New configuration file and JSON format.
- Support for length calculations based on weighting ranges of
  code points.
- Updates Java, JavaScript, Ruby, and Objective-C implementations.
- Regular expressions used for parsing Tweets and calculating length
  are now similar across all languages.
- Conform to RFC 1035 for domain names.
- Support for punycoded hostnames per RFC 3490.
- Domain labels restricted to a maximum length of 63 characters,
  conforming to RFC 1035.
- The overall URL length cannot be more than 4096 characters.
- Allow hyphens in the middle of a non-ASCII hostname.
- Updated conformance tests.
- Deprecates old v1 length calculation methods.
- Differentiating between the http and https shortened length for URLs
  has been deprecated (https is used for all t.co URLs).
- Update TLDs list.
- Update Emoji Regexes.
- Many bug fixes.
  • Loading branch information
sudhee committed Dec 15, 2017
1 parent f81bbba commit 34dc1dd
Show file tree
Hide file tree
Showing 299 changed files with 52,448 additions and 2,525 deletions.
9 changes: 2 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,7 @@
[![Build Status](https://img.shields.io/travis/twitter/twitter-text/master.svg)](https://travis-ci.org/twitter/twitter-text) [![Maven Central](https://img.shields.io/maven-central/v/com.twitter/twitter-text.svg)](http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22com.twitter%22%20AND%20a%3A%22twitter-text%22) [![Gem](https://img.shields.io/gem/v/twitter-text.svg)](https://rubygems.org/gems/twitter-text) [![npm](https://img.shields.io/npm/v/twitter-text.svg)](https://www.npmjs.com/package/twitter-text) [![CocoaPods](https://img.shields.io/cocoapods/v/twitter-text.svg)](http://cocoapods.org/?q=twitter-text) [![Bower](https://img.shields.io/bower/v/twitter-text.svg)](http://bower.io/search/?q=twitter-text)

twitter-text
============

This repo is a collection of libraries and conformance tests to standardize parsing of Tweet text. It synchronizes development, testing, creating issues, and pull requests for twitter-text's implementations and specification. These libraries are responsible for determining the quantity of characters in a Tweet and identifying and linking any url, @username, #hashtag, or $cashtag.
This repo is a collection of libraries and conformance tests to standardize parsing of Tweet text. It synchronizes development, testing, creating issues, and pull requests for twitter-text's implementations and specification. These libraries are responsible for determining the quantity of characters in a Tweet and identifying and linking any url, @username, #hashtag, or $cashtag.

See implementations and conformance in this repo below:

Expand All @@ -13,11 +11,8 @@ See implementations and conformance in this repo below:
* [JavaScript](js)
* [Objective-C](objc)

:warning: Note that a new version of twitter-text will be released soon. See the [announcement on the Twitter Developer forums](https://twittercommunity.com/t/updating-the-character-limit-and-the-twitter-text-library/96425/2), and [documentation regarding the changes](https://developer.twitter.com/en/docs/developer-utilities/twitter-text).


## Copyright and License

Copyright 2014 Twitter, Inc and other contributors
Copyright 2017 Twitter, Inc and other contributors

Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0
134 changes: 134 additions & 0 deletions config/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
# twitter-text Configuration

twitter-text 2.0 introduces a new configuration format as well as APIs
for interpreting this configuration. The configuration is a JSON
string (or file) and the parsing APIs have been provided in each of
twitter-text’s four reference languages.

## Format

The configuration format is a JSON string. The JSON can have the following properties:

* `version` (required, integer, min value 0)
* `maxWeightedTweetLength` (required, integer, min value 0)
* `scale` (required, integer, min value 1)
* `defaultWeight` (required, integer, min value 0)
* `transformedURLLength` (integer, min value 0)
* `ranges` (array of range items)

A `range item` has the following properties:

* `start` (required, integer, min value 0)
* `end` (required, integer, min value 0)
* `weight` (required, integer, min value 0)

## Parameters

### version

The version for the configuration string. This is an integer that will
monotonically increase in future releases. The legacy version of the
string is version 1; weighted code point ranges and 280-character
“long” tweets are supported in version 2.

### maxWeightedTweetLength

The maximum length of the tweet, weighted. Legacy v1 tweets had a
maximum weighted length of 140 and all characters were weighted the
same. In the new configuration format, this is represented as a
maxWeightedTweetLength of 140 and a defaultWeight of 1 for all code
points.

### scale

The Tweet length is the (`weighted length` / `scale`).

### defaultWeight

The default weight applied to all code points. This is overridden in
one or more range items.

### transformedURLLength

The length counted for URLs against the total weight of the Tweet. In
previous versions of twitter-text, which was the “shortened URL
length.” Differentiating between the http and https shortened length
for URLs has been deprecated (https is used for all t.co URLs). The
default value is 23.

### ranges

An array of range items that describe ranges of Unicode code points
and the weight to apply to each code point. Each range is defined by
its start, end, and weight. Surrogate pairs have a length that is
equivalent to the length of the first code unit in the surrogate
pair. Note that certain graphemes are the result of joining code
points together, such as by a zero-width joiner; unlike a surrogate
pair, the length of such a grapheme will be the sum of the weighted
length of all included code points.

## API

Each of the four reference language implementations provides a way to
read the JSON configuration.

## Java

```java
public static TwitterTextConfiguration configurationFromJson(@Nonnull String json, boolean isResource)
```

`json`: the configuration string or file name in the config directory (see `isResource`)
`isResource`: if true, json refers to a file name for the configuration.

## JavaScript

Configurations are accessed via `twttr.text.configs` (example:
`twttr.text.configs.version2`). This config is passed as an argument
to `parseTweet:`

```js
twttr.txt.parseTweet(inputText, configVersion2)
```

## Objective-C

The Objective-C implementation provides two methods for reading the
input, either from a string or a file resource.

```objective-c
+ (instancetype)configurationFromJSONResource:(NSString *)jsonResource;
+ (instancetype)configurationFromJSONString:(NSString *)jsonString;
```

The default configuration can also be set:

```objective-c
+ (void)setDefaultParserConfiguration:(TwitterTextConfiguration *)configuration
```

The resource string refers to the two included configuration files
(which are referenced in the Xcode project).

## Ruby

Ruby provides the `Twitter::Configuration` class and means to read
from a file or string.

```ruby
def self.parse_string(string, options = {})
def self.parse_file(filename)
```

You can use `configuration_from_file()` or initialize a configuration
using `Twitter::Configuration.new(config)`, where `config` is the
output of one of the two above methods.









8 changes: 8 additions & 0 deletions config/v1.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"version": 1,
"maxWeightedTweetLength": 140,
"scale": 1,
"defaultWeight": 1,
"transformedURLLength": 23,
"ranges": []
}
29 changes: 29 additions & 0 deletions config/v2.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
{
"version": 2,
"maxWeightedTweetLength": 280,
"scale": 100,
"defaultWeight": 200,
"transformedURLLength": 23,
"ranges": [
{
"start": 0,
"end": 4351,
"weight": 100
},
{
"start": 8192,
"end": 8205,
"weight": 100
},
{
"start": 8208,
"end": 8223,
"weight": 100
},
{
"start": 8242,
"end": 8247,
"weight": 100
}
]
}
4 changes: 2 additions & 2 deletions conformance/Rakefile
Original file line number Diff line number Diff line change
Expand Up @@ -30,11 +30,11 @@ namespace :tlds do
file.write(yml.to_yaml)
end

File.open(repo_path("TldLists.java"), 'w') do |file|
File.open(repo_path("../java/src/main/java/com/twitter/twittertext/TldLists.java"), 'w') do |file|
file.write(<<-EOF
// Auto-generated by conformance/Rakefile
package com.twitter;
package com.twitter.twittertext;
import java.util.Arrays;
import java.util.List;
Expand Down
Loading

0 comments on commit 34dc1dd

Please sign in to comment.