Skip to content

Latest commit

 

History

History
executable file
·
58 lines (42 loc) · 2.66 KB

DESCRIPTION.md

File metadata and controls

executable file
·
58 lines (42 loc) · 2.66 KB

The description is on the data format of the dataset provided by KKBox

Tables

train.csv

  • msno: user id
  • song_id: song id
  • source_system_tab: the name of the tab where the event was triggered. System tabs are used to categorize KKBOX mobile apps functions. For example, tab my library contains functions to manipulate the local storage, and tab search contains functions relating to search.
  • source_screen_name: name of the layout a user sees.
  • source_type: an entry point a user first plays music on mobile apps. An entry point could be album, online-playlist, song .. etc.
  • target: this is the target variable. target=1 means there are recurring listening event(s) triggered within a month after the user’s very first observable listening event, target=0 otherwise .

test.csv

  • id: row id (will be used for submission)
  • msno: user id
  • song_id: song id
  • source_system_tab: the name of the tab where the event was triggered. System tabs are used to categorize KKBOX mobile apps functions. For example, tab my library contains functions to manipulate the local storage, and tab search contains functions relating to search.
  • source_screen_name: name of the layout a user sees.
  • source_type: an entry point a user first plays music on mobile apps. An entry point could be album, online-playlist, song .. etc.

sample_submission.csv

sample submission file in the format that we expect you to submit

  • id: same as id in test.csv
  • target: this is the target variable. target=1 means there are recurring listening event(s) triggered within a month after the user’s very first observable listening event, target=0 otherwise .

songs.csv

The songs. Note that data is in unicode.

  • song_id
  • song_length: in ms
  • genre_ids: genre category. Some songs have multiple genres and they are separated by |
  • artist_name
  • composer
  • lyricist
  • language

members.csv

user information.

  • msno
  • city
  • bd: age. Note: this column has outlier values, please use your judgement.
  • gender
  • registered_via: registration method
  • registration_init_time: format %Y%m%d
  • expiration_date: format %Y%m%d

song_extra_info.csv

  • song_id
  • song name - the name of the song.
  • isrc - International Standard Recording Code, theoretically can be used as an identity of a song. However, what worth to note is, ISRCs generated from providers have not been officially verified; therefore the information in ISRC, such as country code and reference year, can be misleading/incorrect. Multiple songs could share one ISRC since a single recording could be re-published several times.