-
Notifications
You must be signed in to change notification settings - Fork 4
X ARF
DRAFT
Aim: Define and document
- A transformation of IntelMQ events into X-ARF format (emails).
- (later) a transformation of X-ARF format (emails) into IntelMQ events.
X-ARF is an email based format. The core unit is a single report in one of the available X-ARF schemas.
IntelMQ's events have harmonised internal values.
Thus our transformation has only be defined for a single X-ARF report. All other X-ARF variants can be derived from it.
As first example we consider Shadowserver-Botnet-drone data. (Attention: the description of the format on the Shadowserver site itself is sometimes outdated compared to the data that is actually send.)
#EXAMPLE DATA -- IP's and ASN where pseudonomized
"timestamp","ip","port","asn","geo","region","city","hostname","type","infection","url","agent","cc_ip","cc_port","cc_asn","cc_geo","cc_dns","count","proxy","application","p0f_genre","p0f_detail","machine_name","id","naics","sic","cc_naics","cc_sic","sector","cc_sector","ssl_cipher","family","tag","public_source"
"2016-07-24 00:00:01","198.51.100.4",,31334,"DE","BREMEN","BREMEN","198-51-100-4.example.net",,"bitdefender-ramnit",,,"198.51.100.182",,8075,"US",,,,,,,,,0,0,334111,357101,,"Communications",,,,
"2016-07-24 00:00:01","198.51.100.176",7960,3320,"DE","NORDRHEIN-WESTFALEN","BONN","198-51-100-176.example.net","udp","zeroaccess",,,"198.51.100.221",16471,22773,"US",,,,,,,,,0,0,517510,737415,,"Commercial Facilities",,,,
The current configuration of IntelMQ (as of 2017-02-08) will parse the above data into IntelMQ "events" like
# RESULT IN INTELMQ
# Dataset 1
{
"classification.identifier": "botnet",
"classification.taxonomy": "Malicious Code",
"classification.type": "botnet drone",
"destination.asn": 8075,
"destination.geolocation.cc": "US",
"destination.ip": "198.51.100.182",
"extra": "{\"cc_naics\": \"334111\", \"cc_sector\": \"Communications\", \"cc_sic\": \"357101\"}",
"feed.accuracy": 100.0,
"feed.name": "Botnet-Drone-Hadoop",
"feed.url": "file://localhost/tmp/sserver.csv",
"malware.name": "bitdefender-ramnit",
"raw": "THIS IS A VERY LONG BASE64 VALUE CONTAINING THE ORIGNAL CSV-ROW",
"source.asn": 31334,
"source.geolocation.cc": "DE",
"source.geolocation.city": "BREMEN",
"source.geolocation.region": "BREMEN",
"source.ip": "198.51.100.4",
"source.reverse_dns": "198-51-100-4.example.net",
"time.observation": "2017-02-07T08:14:05+00:00",
"time.source": "2016-07-24T00:00:01+00:00",
}
# Dataset 2
{
"classification.identifier": "botnet",
"classification.taxonomy": "Malicious Code",
"classification.type": "botnet drone",
"destination.asn": 22773,
"destination.geolocation.cc": "US",
"destination.ip": "198.51.100.221",
"destination.port": 16471,
"extra": "{\"cc_naics\": \"517510\", \"cc_sector\": \"Commercial Facilities\", \"cc_sic\": \"737415\"}",
"feed.accuracy": 100.0,
"feed.name": "Botnet-Drone-Hadoop",
"feed.url": "file://localhost/tmp/sserver.csv",
"malware.name": "zeroaccess",
"protocol.transport": "udp",
"raw": "THIS IS A VERY LONG BASE64 VALUE CONTAINING THE ORIGNAL CSV-ROW",
"source.asn": 3320,
"source.geolocation.cc": "DE",
"source.geolocation.city": "BONN",
"source.geolocation.region": "NORDRHEIN-WESTFALEN",
"source.ip": "198.51.100.176",
"source.port": 7960,
"source.reverse_dns": "198-51-100-176.example.net",
"time.observation": "2017-02-07T08:14:05+00:00",
"time.source": "2016-07-24T00:00:01+00:00",
}
As we can see, this Data is reporting a malicious-code
activity. We assume it is possible to map this to the X-ARF Report Type: Malware-Attack
.
Known stable X-ARF schemas share a set of fields. We assume those fields to be the same over all X-ARF schemas and suggest to clearly state a common subsets of fields in the next iteration of the X-ARF specification.
Reported-From reports@example.com
Report-ID UUID@example.com
Date time.source # This is the value of IntelMQs time.source field, conversion to RFC 3339 not necessary
TLP none # This field cannot be determined, yet. The integration of TLP into IntelMQ is in discussion: /~https://github.com/certtools/intelmq/issues/252
User-Agent IntelMQ-Mailgen # The User-Agent of the X-Arf generating Software
Attachment none # If no Attachment exists, this must be none
Version 0.2 # Most likely always 0.2, Version is Optional
Occurences none # This field cannot be determined, its optional
In addition to the known fields from above the format malware-attack
contains the fields:
Category: abuse # This is a constant field, no Mapping to an IntelMQ Field is necessary
Report-Type: malware-attack # This is a constant field, no Mapping to an IntelMQ Field is necessary
Schema-URL: http://x-arf.org/schema/abuse_malware-attack_0.1.4.json
Source: source.ip # This is the value of the IntelMQ-Field source.ip
Source-Type: calculated_field # This field needs to be set to ipv4 or ipv6 depending on source.ip
Destination-System: none # Cannot be determined
Download-Link: none # Cannot be determined
Download-Port: none # Cannot be determined
Malware-MD5: malware.hash.md5 # This is the value of the IntelMQ-Field malware.hash.md5, it does not exist in shadowserver data
Antivirus-Result: none # Cannot be determined
Antivirus-Vendor: none # Cannot be determined
Feedback-Link: none # Cannot be determined
When mapping the aforementioned data according to these two maps, the following two datasets are the result (without X-ARF specific headers and MIME encoding / boundaries)
Dataset 1:
Schema-URL: http://x-arf.org/schema/abuse_malware-attack_0.1.4.json
Category: abuse
Report-Type: malware-attack
Reported-From: mail@example.com
Report-ID: TicketNumber#4711@example.com
User-Agent: IntelMQ-Mailgen
Date: 2016-07-24T00:00:01+00:00
Source: 198.51.100.4
Source-Type: ipv4
Attachment: none
Dataset 2:
Schema-URL: http://x-arf.org/schema/abuse_malware-attack_0.1.4.json
Category: abuse
Report-Type: malware-attack
Reported-From: mail@example.com
Report-ID: TicketNumber#0815@example.com
User-Agent: IntelMQ-Mailgen
Date: 2016-07-24T00:00:01+00:00
Source: 198.51.100.176
Source-Type: ipv4
Attachment: none
We can see that some information is lost from intelmq-event to X-ARF
as defined in abuse_malware-attack_0.1.4
, specially interesting would be
-
classification.type
or other classification details not covered inCategory
andReport-Type
. malware.name
destination.ip
destination.port
protocol.transport
source.port
source.reverse_dns
Alternatively considering abuse_bot-infection_0.1.0.json, we could still miss:
-
classification.type
or other classification details not covered inCategory
andReport-Type
. destination.port
protocol.transport
source.reverse_dns
Some values in the original report offer additional information, but can also be derived from others:
-
destination.ip and time.observation determines
-
destination.asn
-
destination.geolocation.cc
-
source.ip and time.observation determines
-
source.asn
-
source.geolocation.cc
-
source.geolocation.city
-
source.geolocation.region
Some values are internal to IntelMQ and others would usually be left out of a report to be send externally:
-
time.observation
is the datetime when the report data entered IntelMQ. -
extra
potentially contains internal information
As some fields are missing, we've created an updated schema for X-ARF bot-infections. See: /~https://github.com/Intevation/xarf-schemata/blob/master/abuse_bot-infection_0.2.0_unstable.json
The changes to 0.1.0 are documented in /~https://github.com/Intevation/xarf-schemata/blob/master/abuse_bot-infection_0.2.0_unstable.json.README.md
To use this schema, one can use this URL: https://raw.githubusercontent.com/Intevation/xarf-schemata/master/abuse_bot-infection_0.2.0_unstable.json
Can we find or define other schemes that provide a better fit?
We've had the chance to see some real-world X-ARF messages using the abuse_bot-infection_0.1.0.json scheme and BULK format. The BULK format seems to carry a high amount of duplicated data, such as the E-Mail Texts.
The X-ARF Message within the BULK message carries some additional payload like the destination port in the data which is attached to the X-ARF message. Our proposed schema contains this data as a real X-ARF field Destination-Port
. Other data, like the IP of the destination which is supported by the 0.1.0 schema (as destination
) is left out from the X-ARF message, but also included within the attachment.