Contents

THM day 2 - Advent of Cyber

The second day is about log analysis. I take a look at data science, and the appliance in cyber security. In the first part there is an introduction to Jupyter Notebooks, Python, Pandas, and Mathplotlib.

The data

I get access to the pre-setup Jupyter notebooks, which contains the following data I need for the challenges.

Network_traffic.csv
https://i.imgur.com/t6uDqkg.png

Workbook.ipynb
https://i.imgur.com/igy4q9M.png

The challenges

  1. How many packets were captured (looking at the PacketNumber)?
  2. What IP address sent the most amount of traffic during the packet capture?
  3. What was the most frequent protocol?

Challenge 1

Using df.count() shows the amount of captured packets.
https://i.imgur.com/kQc58Bw.png

Challenge 2

df.groupby(['Source']).size() solves the challenge, to the the column name I looked in the CSV file.
https://i.imgur.com/c7vFlAb.png

Challenge 3

df['Protocol'].value_counts() solves this challenge. I used the following Stack Overflow link for an explanation https://stackoverflow.com/questions/35523635/extract-values-in-pandas-value-counts
https://i.imgur.com/Zk0LiAW.png

Takeaways

  • Jupyter Notebooks are great; easy to share/execute, easy to demonstrate POC’s
  • Pandas is good to analyse csv files
  • Python is the best
  • Basic of Mathplotlib