Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect Labels in Hadoop Log Data #56

Open
mmantyla opened this issue Dec 14, 2024 · 0 comments
Open

Incorrect Labels in Hadoop Log Data #56

mmantyla opened this issue Dec 14, 2024 · 0 comments

Comments

@mmantyla
Copy link

Recently, I began working on a demo for our log analysis tool, LogDelta, using your Hadoop. However, during the demo's creation, I grew increasingly suspicious of certain labels in the Hadoop data. As a result, what started as a simple demo evolved into a label investigation, ultimately requiring far more effort than initially anticipated.

I focused solely on the PageRank application, meaning that the WordCount application might still contain additional incorrect labels. Below are the identified incorrect labels along with their corresponding fixes:

ID Orig Label Fixed Label
1445144423722_0024 Normal Disk Full
1445182159119_0017 Machine Down Normal
1445062781478_0020 Machine Down Normal
1445182151478_0015 Machine Down Disk Full
1445182159119_0013 Disk Full Machine Down
1445182159119_0011 Disk Full Machine Down

If you're curious about how I reached these conclusions, the process is documented in a YouTube playlist.

  • The key part of the label correction is covered in the final video.
  • The earlier videos provide details on how the suspicions began to arise.
  • I have also shared the text script of the video, which includes some visuals.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant