Dataset - What are the Machine Learning best practices reported by practitioners on Stack Exchange?
Date
Journal Title
Journal ISSN
Volume Title
Publisher
European Organization for Nuclear Research
Abstract
The data correspond to the posts (questions and answers) retrieved by querying for posts related to the tag 'machine learning' and the phrase 'best practice(s).' The data were used as the basis for a study currently under review on discussing machine learning best practices as discussed by practitioners in question-and-answer communities such as Stack Exchange. The information from each type of post (i.e., questions and answers) is presented in multiple formats (i.e., .txt, .csv, and .xlsx). <strong>Answers - Variables</strong> <strong>AID</strong>:<strong> </strong> Unique identification of the answer in the Q&A website. <strong>ParentId</strong>: Unique identification of the question associated with the answer in the Q&A website <strong>AcceptedAnswerId</strong> : In the case in which an answer is the most voted question associated with the <em>ParentId</em>, and it is different from the accepted answer, a different identifier from the <em>AID</em> is available. In the case in which the accepted question had a <em>score</em> lower than 1, a -1 is assigned. <strong>ABody:</strong> HTML text of the answer. <strong>Score:</strong> Upvotes - downvotes of the answer. <strong>url_Answer:</strong> URL of the answer. The question URL can be from different websites. <strong>type:</strong> best or accepted. Accepted in the case that the information belongs to the accepted answer of the <em>ParentId </em>question and best in the case in which it is the most voted question of the <em>ParentId </em>question. <strong>Date: </strong>Creation date of the answer. <strong>Questions - Variables</strong> <strong>QID</strong>: Unique identification of the question in the Q&A website. <strong>AcceptedAnswerId</strong>: Unique identification of the accepted answer for a specific question in the Q&A website. In the case in which a question had a most-voted answer different from the accepted one, and the accepted one had a negative score, a -1 was assigned to the <em>AcceptedAnswerId</em><strong>. </strong> <strong>BestAnswerId</strong>: Unique identification of the most voted answer for a specific question in the Q&A website. In the case in which the most voted and accepted questions were the same, then a -1 was assigned to the <em>BestAnswerId</em>. <strong>Qtitle</strong>: Title of the question. <strong>QBody</strong>: HTML text of the question. <strong>Score</strong>: Upvotes - downvotes of the questions. <strong>QTags</strong>: Tags that are associated with each question. <strong>url_question</strong>: URL of the question. The question URL can be from different websites. <strong>Date</strong>: Creation date of the question This dataset is a subset of the Stack Exchange dump of 03.2021 (https://archive.org/details/stackexchange_20210301) in which a series of filters were applied to obtain the data used in the study.