This is a dataset compiling typical corporate entity designators that are suffixed to the end of company names, like 'Incorporated', 'Corporation', 'Limited', etc. Both long forms and their abbreviations are included (e.g. 'Inc.', 'Corp.', 'Ltd.', etc.)
The dataset is collated from:
plus data seen in the wild.
The datafile company_designator.yml file is a YAML hash/dictionary whose entry keys are long corporate designators e.g. "Incorporated", "Limited", "Proprietary Limited".
Each entry is a hash with (some of) the following keys:
-
'lang' (required) - a 2-character ISO639 language code e.g. en, fr, etc.
-
'abbr' (optional) - a list of zero or more abbreviations for this key e.g. "Inc." for "Incorporated", "Ltd." for "Limited", etc. Abbreviations should include periods if they are ever used e.g. L.L.C., not LLC. It is expected that libraries will make periods optional where appropriate.
-
'abbr_std' (optional) - an abbreviation that is considered standard/canonical for the entry
-
'lead' (optional) - a boolean flag set if this entry can also occur at the beginning of a name, rather than just at the end
-
'doc' (optional) - a string entry used to document the entry (ignored)
Long corporate designator keys will be unique, but abbreviations may not necessarily be e.g. both English 'Incorporated' and French 'Incorporée' use the abbreviation 'Inc.'.
First, install some Perl pre-requisites:
cpan YAML
cpan Locales
then run:
perl t/data.t
This dataset is licensed under the [http://creativecommons.org/licenses/by-sa/3.0/](Creative Commons Attribution-Share-Alike License 3.0). See the LICENCE file for the full licence text.
It includes material from the Wikipedia article Types of business entity, also released under the [http://creativecommons.org/licenses/by-sa/3.0/](Creative Commons Attribution-Share-Alike License 3.0).