Cdiscount

  • /public/Cdiscount/category_names.csv - Shows the hierarchy of product classification. Each category_id has a corresponding level1, level2, and level3 name, in French. The category_id corresponds to the category tree down to its lowest level. This hierarchical data may be useful, but it is not necessary for building models and making predictions. All the absolutely necessary information is found in train.bson.
  • /public/Cdiscount/meta/meta_test.csv - Meta data extracted from the test.bson file. It keeps the information on where are the images stored in the test.bson file. We use this to load image batches into memory. Rows are _id for product_id, offset and length for the position in the test.bson file so we could retrieve images and num_pictures that specifies how many images (1-4) exists for this _id.
  • /public/Cdiscount/meta/meta_train.csv - Meta data extracted from the train.bson file. It keeps the information on where are the images stored in the train.bson file. We use this to load image batches into memory. Rows are _id for product_id, category_id for the image category we are predicting, offset and length for the position in the test.bson file so we could retrieve images and num_pictures that specifies how many images (1-4) exists for this _id.
  • /public/Cdiscount/sample_submission.csv - Shows the correct format for submission. It is highly recommended that you zip your submission file before uploading for scoring.
  • /public/Cdiscount/test.bson - Contains a list of 1,768,182 products in the same format as train.bson, except there is no category_id included. The objective of the competition is to predict the correct category_id from the picture(s) of each product id (_id). The category_id‘s that are present in Private Test split are also all present in the Public Test split.
  • /public/Cdiscount/train.bson - Contains a list of 7,069,896 dictionaries, one per product. Each dictionary contains a product id (key: _id), the category id of the product (key: category_id) and between 1-4 images, stored in a list (key: imgs). Each image list contains a single dictionary per image, which uses the format: {'picture': b'...binary string...'}. The binary string corresponds to a binary representation of the image in JPEG format.
  • /public/Cdiscount/train_example.bson - Contains the first 100 records of train.bson so you can start exploring the data before downloading the entire set.

According to competition rules:

Unless otherwise restricted under the Competition Specific Rules above, after your acceptance of these Rules, you may access and use the Competition Data for the purposes of the Competition, participation on Kaggle Website forums, academic research and education, and other non-commercial purposes.