Centralized logging is a common requirement especially if you move past the classic monolith and more towards a distributed application or even microservices. And your distribution model with Docker or Kubernetes with scaling and short-lived containers will further increase that need.

So how do we get started with centralized logging? You might have heard of the ELK or Elastic Stack before and we will go through the relevant parts:

The code samples, shell history, additional resources, and complete transcript are also included below to allow easy copy & paste or searching.

Code #

Java Application Part One #

The first AuthenticationEventListener:

@Component
@Scope(ConfigurableBeanFactory.SCOPE_PROTOTYPE) //More overhead but workaround for the singleton
public class AuthenticationEventListener
        implements ApplicationListener<AbstractAuthenticationEvent> {

    private static Logger logger = LoggerFactory.getLogger(AuthenticationEventListener.class);

    @Autowired
    private HttpServletRequest request;

    @Override
    public void onApplicationEvent(AbstractAuthenticationEvent authenticationEvent) {
        Authentication authentication = authenticationEvent.getAuthentication();

        // These would be duplicate events in the case of successful logins
        if ((authenticationEvent instanceof InteractiveAuthenticationSuccessEvent) ||
                (authenticationEvent instanceof SessionFixationProtectionEvent)) {
            logger.debug("Authentication event [{}]", authenticationEvent.getClass());
            return;
        }

        if (authentication.isAuthenticated()) {
            logger.info("[{}] logged in successfully", authentication.getName());
        } else {
            logger.warn("[{}] failed to log in with password [{}]", authentication.getName(),
                    authentication.getCredentials().toString().replaceAll(".", "*"));

            // Simulate a stacktrace
            if (authentication.getName().equals("admin")){
                throw new RuntimeException("There is no admin here...");
            }
        }

        // Clear MDC data for reuse of the thread
        MDC.clear();
    }
}

The Gradle dependency with comment:

implementation("co.elastic.logging:logback-ecs-encoder:0.1.3"){
    because 'we want to have structured logging'
}

The Logback appender:

<appender name="json" class="ch.qos.logback.core.rolling.RollingFileAppender">
    <file>${LOG_PATH:-./logs}/log.json</file>
    <rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
        <fileNamePattern>log.json.%d{yyyy-MM-dd}.gz</fileNamePattern>
        <maxHistory>7</maxHistory>
    </rollingPolicy>
    <encoder class="co.elastic.logging.logback.EcsEncoder">
        <serviceName>gs-securing-web</serviceName>
    </encoder>
</appender>

Java Application Part Two #

The extended AuthenticationEventListener:

@Component
@Scope(ConfigurableBeanFactory.SCOPE_PROTOTYPE) //More overhead but workaround for the singleton
public class AuthenticationEventListener
        implements ApplicationListener<AbstractAuthenticationEvent> {

    private static Logger logger = LoggerFactory.getLogger(AuthenticationEventListener.class);

    @Autowired
    private HttpServletRequest request;

    @Override
    public void onApplicationEvent(AbstractAuthenticationEvent authenticationEvent) {
        Authentication authentication = authenticationEvent.getAuthentication();
        MDC.put("user.name", authentication.getName());
        MDC.put("url.full", request.getRequestURI());
        if(request.getHeader("x-forwarded-for") != null){
            MDC.put("source.ip", request.getHeader("x-forwarded-for"));
        } else {
            MDC.put("source.ip", request.getRemoteAddr());
        }

        // These would be duplicate events in the case of successful logins
        if ((authenticationEvent instanceof InteractiveAuthenticationSuccessEvent) ||
                (authenticationEvent instanceof SessionFixationProtectionEvent)) {
            logger.debug("Authentication event [{}]", authenticationEvent.getClass());
            return;
        }

        if (authentication.isAuthenticated()) {
            MDC.put("event.category", LogEvents.LOGIN_SUCCESS.name());
            logger.info("[{}] logged in successfully", authentication.getName());
        } else {
            MDC.put("event.category", LogEvents.LOGIN_FAILURE.name());
            logger.warn("[{}] failed to log in with password [{}]", authentication.getName(),
                    authentication.getCredentials().toString().replaceAll(".", "*"));

            // Simulate a stacktrace
            if (authentication.getName().equals("admin")){
                throw new RuntimeException("There is no admin here...");
            }
        }
    }
}

The LogEvents to have a central place for managing the possible values:

public enum LogEvents {
    LOGIN_SUCCESS,
    LOGIN_FAILURE,
}

Filebeat for the Java Applications #

filebeat.inputs:
- type: log
  paths:
    - /opt/logs/*.json
  json:
    keys_under_root: true
    overwrite_keys: true
    add_error_key: true

cloud.id: "<cloud-id>"
cloud.auth: "elastic:<password>"

This one deserves a more in-depth explanation:

  • That we want to collect a log file can be configured with type: log and the list of paths to monitor (files, folders, or wildcards).
  • If you don’t set keys_under_root: true all the fields from the JSON log file would be stored under a json. key in Elasticsearch. We want to keep them on the top level as in the original.
  • With overwrite_keys: true your fields from the JSON log file can overwrite autogenerated fields from Filebeat. This is important to use the actual @timestamp from your log file and not when Filebeat collected it (which could be slightly delayed).
  • add_error_key: true isn’t strictly necessary, but would help us to find issues in our JSON that cannot be correctly unmarshalled.
  • cloud.id and cloud.auth are provided by Elastic Cloud. For other output formats, like Elasticsearch, please consult the documentation.

nginx #

Only the section proxying to the Java applications and skipping TLS:

server {
    charset utf-8;
    listen 443 ssl;
    server_name _;

    # Proxy to the Java application
    location / {
        proxy_pass       http://127.0.0.1:8080;
        proxy_redirect   off;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Host $server_name;
    }
}

Shell History #

curl -L -O https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.4.2-amd64.deb
sudo dpkg -i filebeat-7.4.2-amd64.deb
sudo vi /etc/filebeat/filebeat.yml
less /var/log/nginx/access.log
ls /etc/filebeat/modules.d/
sudo filebeat modules enable nginx
sudo filebeat setup
sudo service filebeat start
sudo service filebeat status
cd /opt/
java -jar app-v1.jar
less /opt/logs/log.json
sudo vi /etc/filebeat/filebeat.yml
sudo service filebeat restart
sudo service filebeat status
java -jar app-v2.jar
less /opt/logs/log.json

Resources #

Transcript #

WEBVTT

Quick introduction 00:00:01.770 –> 00:00:11.940 Hi, I am Philipp. Today I want to talk a bit about logging. Logging is something that we all need. And that's often kind of more complicated than you want it to be.

00:00:12.420 –> 00:00:29.730 And especially with something like microservices, or at least highly distributed services or containers or Kubernetes. Logging and especially centralized logging is getting ever more important. So let's dive into that for a bit, let me share my screen.

00:00:33.750 –> 00:00:34.530 Here we go.

00:00:35.970 –> 00:00:39.750 Maybe you've already heard of the ELK or Elastic Stack.

00:00:42.300 –> 00:00:50.160 What is this all about? Or sometimes I even get the question: Okay, I know what the ELK Stack is and I want to use it for logging of my application.

00:00:50.520 –> 00:01:01.620 So let's take a step back and see what we have in here. The end goal is the Elastic Stack. And we'll skip over the architecture diagram because we'll get back to that in a moment.

00:01:02.040 –> 00:01:12.720 It started off with search. So if you're searching on GitHub, Wikipedia, Stack Overflow behind that little search box. There's always Elasticsearch doing the search and it's

00:01:13.110 –> 00:01:22.590 probably the most widely used search engine that you could run yourself. And after some time people figured out that logs are kind of a related problem.

00:01:23.070 –> 00:01:30.510 Because storing logs is not the interesting part. What you always want is you want to be able to find something relevant in your logs like

00:01:31.320 –> 00:01:44.430 a specific error for the user or you want to see which stack traces did we have recently. All of those are basically search problems that you want to solve. So people and we, the company behind the ELK Stack,

00:01:45.000 –> 00:01:51.210 built this entire thing here. You can see ELK is made up of three components. Elasticsearch that stores the data.

00:01:51.870 –> 00:02:05.340 Logstash that, as it's in the name already, gets the logs and prepares them for storing, which we dive into a little later. And then Kibana to visualize all of that and to give you a nice UI to interact with that.

00:02:06.030 –> 00:02:19.500 All three combined, taking the first letters make up the so called ELK Stack and that is widely used. There is just one more component that's a bit newer and some people don't know about yet is Beats.

00:02:20.130 –> 00:02:36.270 So this one here could be the logo with Beats, so Beats as in the lightweight shipper or agent that we have. And when adding B to ELK. This is what you get out: BELK or ELKB.

00:02:36.660 –> 00:02:40.110 And you can see, it's a bee, it's an elk. It's a combination of everything.

00:02:40.560 –> 00:02:48.600 However, since we're always about scaling and this wouldn't scale if we add more products we kind of tried to call it Elastic Stack now.

00:02:48.930 –> 00:03:02.820 But if you say BELK or ELK Stack we will also totally understand. And now coming back to the initial diagram that we had: This is what the stack looks like today. So what you get here, basically, is you have like an agent.

00:03:05.280 –> 00:03:16.740 You have a lightweight shipper. This is what you run on the system you want to monitor. For example, if you want to monitor some log files, then you get Filebeat and it's basically a tail -f

00:03:17.370 –> 00:03:28.650 over the network with some extra power, but we will also talk about that. And then Beats can either forward data to good old Logstash to parse and enrich the data.

00:03:29.010 –> 00:03:39.090 Or it could forward it directly to Elasticsearch as well. And we'll take a look at that as well. And then Kibana can visualize what is going on in there.

Filebeat Modules 00:03:39.780 –> 00:03:47.250 So to get started, and make this more concrete. Let's start with something we call Filebeat Modules. Filebeat Modules are

00:03:47.730 –> 00:03:59.940 things we noted are commonly being used and we wanted to make the setup for those easier. And that's why we get started with this one here, for example. So I want to start with nginx.

00:04:00.630 –> 00:04:10.650 Looking at the nginx data, here is the nginx module. Basically what this gives you is: It knows you want to monitor one specific operating system. For example, in my example Ubuntu.

00:04:12.150 –> 00:04:20.880 On my Ubuntu, I have installed nginx with the default settings, so it knows where does the log file live and also how to work with that log file.

00:04:21.360 –> 00:04:29.430 And then nginx or the nginx module, you will take care of how to get and prepare the data to store that in Elasticsearch and

00:04:30.120 –> 00:04:38.820 I just scroll over all the installation commands because we'll try out the relevant ones anyway. And at the very end, you will also get

00:04:39.330 –> 00:04:48.750 the dashboards, which are prepared so you can just start using those and you can just get wild with all the data you have.

00:04:49.080 –> 00:05:05.790 So let's get started with that. So the easiest way to work with that is Elastic Cloud. It's our cloud service where you can just sign up and get an Elasticsearch and Kibana instance without doing anything else. So let me quickly sign in here.

00:05:13.200 –> 00:05:20.310 And we do support two factor authentication. So let me quickly add that one here as well.

00:05:31.080 –> 00:05:37.980 So I assume here that you already have an Elastic Cloud account. If you don't just sign up for the free 14 day trial.

00:05:38.610 –> 00:05:47.130 And then you're at the same step where I am, basically. So what I want to do now is I want to create a new deployment. And in this new deployment

00:05:47.760 –> 00:06:00.210 you can say, okay, I want to have a "logging" use case. Then you can pick the cloud provider where you want to run this. We support all three main cloud providers plus a lot of the regions they support. For example

00:06:00.570 –> 00:06:08.160 I'm running my stuff in Frankfurt and then you can also pick out the version. I stick to the current stable one.

00:06:08.670 –> 00:06:13.500 And then one of the scenarios of what you want to run in your cluster. Also, I stick to the default

00:06:13.920 –> 00:06:23.940 because that one is fine for our use case. And then you can just create the deployment and it will take something about three minutes or so until your entire deployment will be up and running.

00:06:24.420 –> 00:06:34.020 And you can use that. So right now, as I said, three minutes is what it estimates that this would take. The one thing that is important here is the password.

00:06:34.560 –> 00:06:50.130 This one I copy right away because we need that in a couple of steps. While this is booting up, let me show you how else you could get the stack. So if you go to our site, you have this try free here. If you go to try free. You can either

00:06:51.090 –> 00:06:56.580 launch a cloud service or you can download the components. So looking at Elasticsearch for example.

00:06:57.150 –> 00:07:10.230 We provide binaries for all common operating systems. We're also tying into Homebrew for Mac. For example, we support Chef, Puppet, and Ansible. We have Docker images, we have a Helm Chart or Kubernetes Operator.

00:07:10.830 –> 00:07:18.900 Any one of those is a good method to get started with our stack. But the easiest one is still the cloud service. That's why I'll stick to that one.

00:07:19.290 –> 00:07:28.080 So what we could download any of these binaries and start with that I will stick to the cloud service for today. So let's see where we are.

00:07:28.740 –> 00:07:34.410 So you can see here right now it's trying to boot up the three components in my stack that I'm getting here.

00:07:34.920 –> 00:07:43.830 And Elasticsearch has just finished booting so that one is already ready and usable and in the background. At the same time, we have

00:07:44.370 –> 00:07:51.900 Kibana setting up as well for visualizing and then we have APM, the one for tracing or performance monitoring, but

00:07:52.200 –> 00:08:04.440 we skip that one for today. We keep it for another time. So what we're waiting for now is to have Kibana up and running as well. And then we can dive into that and get started. Okay.

00:08:05.100 –> 00:08:11.430 Looks like my services are all up and running. So let's head over to deployment overview of this one here.

00:08:11.760 –> 00:08:24.030 To see what we have running and you have the endpoints here, what I want to do is I want to launch Kibana right away. So we head over to that one. It adds the right URL. We just need to log in. Now,

00:08:24.780 –> 00:08:36.180 do you remember the logins? It was elastic for the username and the password I copy because that is randomly generated. Let's paste that in here. Hopefully, it works.

00:08:37.800 –> 00:08:41.190 And then we're good to go. So

00:08:42.120 –> 00:08:53.610 while Kibana is loading. What does the current version look like: It's more on the white-ish side again. We used to experiment with different colors and by this, you can normally tell which major version of Kibana you're using.

00:08:54.060 –> 00:09:09.540 So this is what it looks like today. And the nice thing is with this starting splash screen here, it actually guides you through the different use cases and how to get started. So let's add some log data since we were talking about nginx already.

00:09:11.100 –> 00:09:19.980 Here it suggests multiple things that you could collect and we walk you through the right steps to set it up so I head over to nginx.

00:09:21.270 –> 00:09:42.840 And for nginx it just lets you pick the right binary that you can run. So for example, here you can see this is what we run, we get the binary for Filebeat and I'm running on Ubuntu. So that's the one I want to have. So let's actually do that. I have my Linux instance here.

00:09:44.190 –> 00:09:48.150 Let's fetch the binary and then let's install this one.

00:09:53.580 –> 00:10:03.390 Easy. That's pretty much everything we need for the installation. For the configuration there's one thing we need to do here: We need to add the cloud

00:10:04.200 –> 00:10:12.780 instance and the authentication tokens here. So the cloud ID connects your Filebeat to this cloud cluster.

00:10:13.110 –> 00:10:30.510 And you don't need to worry about any URLs or anything that's in the cloud ID in here. The cloud authentication just gives you the username and then we need to paste in the right password to get going with that. And that's in /etc/filebeat/filebeat.yml. So let's

00:10:32.550 –> 00:10:32.820 See.

00:10:36.600 –> 00:10:36.990 Here.

00:10:38.340 –> 00:10:39.180 To the very end.

00:10:55.890 –> 00:10:57.390 What do we want to add here is

00:10:59.970 –> 00:11:08.400 I want to add the password here. So this is all the should be needed to connect that Filebeat instance to that cloud instance.

00:11:20.610 –> 00:11:22.590 Okay, so this worked.

00:11:24.000 –> 00:11:33.570 Let's see, what are the next steps. Next up, we want to enable the Filebeat Module. Before we do that, let's take a look at the actual data that we want to collect so

00:11:34.140 –> 00:11:46.050 let's open another browser tab and the service that we're running is running at xeraa.wtf because that's always the right domain for any demo.

00:11:46.530 –> 00:11:58.590 Okay, so now we're getting a 502 Bad Gateway, because only the nginx is running. And I have an application that will be running behind that. But we haven't started that yet. So the 502 is the error code or

00:11:58.860 –> 00:12:16.800 status code that we were expecting. Given this, this is good. So this is what is triggering one of the log events that we're interested in. So let's have a look at the access log that we want to then process. So that's in /var/log/nginx/access.log.

00:12:18.570 –> 00:12:33.240 Go to the most recent one, and maybe this one was me right now. So you can see we have an IP address. We have a date and a timestamp. We have the HTTP verb. We also see the response code, which was a 502.

00:12:33.900 –> 00:12:42.480 How many bytes were sent. And for example, the user agent. Lots of good information and you want to collect that and see what is up with the system.

00:12:43.530 –> 00:12:57.690 Would have been the good old classic way to do that? You would have taken Logstash. You would have pointed Logstash at this specific file and then said, okay, this is the right regular expression or Grok pattern to parse this one.

00:12:59.940 –> 00:13:04.530 Who likes writing regular expressions? Probably not that many people.

00:13:05.610 –> 00:13:12.180 I always say it's a bit like the Stockholm Syndrome. Once you get used to writing regular expressions, at some point, you start liking it.

00:13:12.510 –> 00:13:20.100 Or it's a bit like job security, sometimes because they're kind of hard to write and then much harder to read still afterward. It's a bit like Perl.

00:13:20.670 –> 00:13:30.720 So we figured out that you probably don't want to do that every time you want to collect that file. And nginx is very widely used, so this is why we have the Filebeat Modules.

00:13:31.140 –> 00:13:40.410 And they will do all of that in the background for you. So they will figure out where's that log file and how to parse it, and then how to import the dashboards.

00:13:40.770 –> 00:13:51.390 So we can leave that. And the next step will be to enable the Filebeat Module. So just to give you a quick overview of what the Filebeat Modules look like.

00:13:53.520 –> 00:13:56.550 It's in /etc/filebeat/modules.d/

00:13:57.810 –> 00:14:15.000 These are all the Filebeat Modules that you have and you can see all of them are disabled by default. What we want to do now is, following what we had in Kibana. We then just copy the right command to enable the Filebeat Module. Let's copy the thing

00:14:18.870 –> 00:14:31.140 and then you can see nginx lost the disabled flag after it and it will also have a collection of the right files enabled in it now by default. So that will work to then fetch

00:14:31.740 –> 00:14:39.720 the Filebeat log files. Next up, we need to run the setup and after that, we can start Filebeat.

00:14:41.340 –> 00:14:50.310 Let's run the setup that is also loading our dashboards. So Kibana gets prepared by the Filebeat for all the relevant plugins that it needs to install.

00:14:51.450 –> 00:15:01.800 And once that is done, that might take one or two moments, we can start Filebeat and then start looking at what we have started to collect here.

00:15:02.880 –> 00:15:04.350 So let's give it a moment.

00:15:05.550 –> 00:15:10.380 Only be a couple of more seconds, and then we're ready to get started.

00:15:18.810 –> 00:15:22.620 Let's see, only a couple of more moments, hopefully.

00:15:24.540 –> 00:15:27.090 And it's still loading, loading, loading.

00:15:28.140 –> 00:15:42.900 While we do that we could also take a look at some other configuration files. But I think we will also keep that for later because it looks like we're almost there. So what we wanted to do now is, we want to do sudo service filebeat start.

00:15:44.100 –> 00:15:55.080 We want to see that this is living and what is generally a good idea. I hope we didn't do any mistakes here but you can run the same here with status.

00:15:55.920 –> 00:16:06.450 And it will then tell you that is active and running. Eleven seconds ago is not that long ago, maybe it's still crashing because of a misconfiguration, but normally you would see any problems here very quickly.

00:16:06.900 –> 00:16:13.770 Okay. Let's run it again, you can see, okay, it's still running. And we see some output here. So this is probably good.

00:16:14.580 –> 00:16:31.800 Heading back to Kibana. This one has even a nice way to check for data in the cluster and it tells us okay, I found some data that has been generated by the nginx module. So this looks all good. Let's head over to the dashboard right away.

00:16:34.650 –> 00:16:47.670 This won't have too much data because of the last fifteen minutes that we're looking at. I've just done this one request, I think, but maybe some other bots were crawling the site. So maybe we see a bit more data here.

00:16:48.120 –> 00:16:55.320 So looking at that you can see: Okay, we have a couple of requests. Most of them are 500s here. Here we have a 300.

00:16:56.220 –> 00:17:09.090 That is probably a redirect from HTTP to HTTPS. We can see which operating systems and user agents have been trying to connect, how much traffic we had, and also where our traffic is coming from. So we

00:17:09.480 –> 00:17:21.390 see we're getting some traffic from the US, which is curious, whereas I am here. Let's see how good that data is. Or actually, let's take a step back and think like where does this data come from?

00:17:22.260 –> 00:17:32.550 This the GeoIP lookup. So every request had an IP address and then we try to map the IP address to a geolocation. Sometimes this works well, sometimes it's a bit off.

00:17:32.880 –> 00:17:49.680 It's always interesting to see how well it is working today. So let's dive in and see where we are. I'm based in Vienna, Austria, so this looks pretty good today. Let me dive in a little further, to see how good it actually is.

00:17:51.390 –> 00:18:05.430 It's not perfect. I'm a little further up to the north, but the general area is pretty good. Like this is only off by a few kilometers. So you could then also filter down to all users just from this area and then would only see my requests for example.

00:18:06.990 –> 00:18:20.100 This is a nice dashboard that we can see. But let's take a quick look at the underlying data of what is happening behind the scenes here. So I head over to Discover which shows us that raw requests of what we're getting.

00:18:21.000 –> 00:18:32.400 So in the last 15 minutes, we will have had only a couple of requests. You can see we have 27 requests that we have here. Let me open up one of those, so we can see what happened.

00:18:33.480 –> 00:18:45.180 So you can see this is the Filebeat index. This is the metadata where we store everything. Then we have the agent that is the Filebeat that has collected that information.

00:18:45.750 –> 00:18:52.920 Then we have also enriched it with some cloud information. So this here, for example, shows where this is running on which cloud provider. This is running in

00:18:53.640 –> 00:19:00.690 eu-west-1a from Amazon. So this is their data center. You can see the instance ID and the instance type.

00:19:01.110 –> 00:19:09.090 And then you can see the event that we have collected. So you can see it's the event.module is nginx. Next is the nginx data.

00:19:09.450 –> 00:19:21.000 And it has parsed this request. And you can see it was a GET request that got the response of a 301 so it was redirected and you can see the source address where this

00:19:21.300 –> 00:19:35.820 request was coming from. And then we have the GeoIP lookup here, which shows us where the request was coming from. And you can see this was some bot or spider that was collecting data from various

00:19:36.540 –> 00:19:46.560 web addresses. Which is not us, but it's just another request that we have collected. So this is the Filebeat and this is all working well. So

00:19:47.010 –> 00:19:55.650 Next up, you might be interested in something a bit more related to your application because probably you have your own application that you want to monitor.

Spring Boot application 00:19:56.190 –> 00:20:03.090 One that I see very frequently is Spring Boot. And Spring Boot in the Java ecosystem is very widely used and powerful.

00:20:03.360 –> 00:20:10.170 So that's what I want to use for my demo today. It will translate to most other Java projects pretty easily as well.

00:20:10.560 –> 00:20:18.960 And more or less similar to many other programming languages as well. We just use one very simple example of a Spring Boot application.

00:20:19.470 –> 00:20:29.940 So what I did was I took one Spring Boot example for security and added some minor logging logic to it.

00:20:30.450 –> 00:20:40.710 There is the spring-guides/gs-securing-web, which is a simple login application that you can use. I'm picking logins as an example because those are

00:20:41.130 –> 00:20:53.310 very widely used. Almost any application out there will have a login so everybody will understand the general problem and what is happening. What I have added to that application is, first:

00:20:54.420 –> 00:20:57.450 What it was generally doing; logging with the classic log line.

00:20:58.530 –> 00:21:11.370 Then I added one dependency here which outputs log files in a structured format. JSON for us since Elasticsearch is also storing JSON, we're outputting the logs directly to JSON

00:21:11.820 –> 00:21:20.640 So again, we don't have to write any regular expressions or parse and fight that. That will make your life much easier and is generally what we recommend.

00:21:21.300 –> 00:21:27.840 So by adding that dependency, we then have an encoder in our backend to output those log files.

00:21:28.110 –> 00:21:41.700 Let's jump to that encoder or logging backend that we have here. So I'm using Logback, which is one of the common choices. Log4j or Log4j2 would be other common choices.

00:21:42.720 –> 00:21:55.650 Our encoder supports all three of them. And we'll just stick to Logback for today's example. So what you have here is: I'm using rolling file appender. So this is what you would probably use with any classic log file.

00:21:56.760 –> 00:22:09.630 The rolling policy is we roll a new file every day and keep them seven days. The only thing that is different from what most of your log files will do is this encoder here.

00:22:10.110 –> 00:22:15.630 This is all I need to do to actually output my data into JSON.

Elastic Common Schema 00:22:16.320 –> 00:22:27.000 So this is saying: I'm using our ECS encoder with ECS stands for Elastic Common Schema and it's just a naming convention of what we expect fields to be named.

00:22:27.540 –> 00:22:39.030 And then it will output that as JSON. The naming convention helps us that in our UI some fields will be correlated the right way and you need don't need to worry like what to call what specific field.

00:22:39.900 –> 00:22:49.710 And thanks to outputting in JSON, you don't need to write any regular expressions to parse. Also, you don't need to worry about any multi-line statements like stacktraces

00:22:50.130 –> 00:23:08.910 because this will be encoded in a single line. And then you don't have to worry about how to make multiple statements out of one stacktrace or how to keep them together — what you generally want to do. So this will just output a JSON log file in the right way for us to consume.

00:23:10.410 –> 00:23:22.980 I have then added a listener to log something. And what this listener is doing, the relevant part is done here from line 37 onwards. Basically, what we do is:

00:23:23.340 –> 00:23:32.040 When somebody logs in successfully, I will log a message: Username, whatever your username was, logged in successfully. Otherwise,

00:23:32.760 –> 00:23:43.050 I'm logging: Okay, some user logged in or didn't log in successfully. This is the username and then I have a blanked out password for that user. Also to make it a bit more interesting

00:23:43.380 –> 00:23:53.610 we can throw in a stacktrace. So if somebody tries to log in with the user "admin", then we'll throw a runtime exception. Just to throw in an exception as well because

00:23:54.030 –> 00:23:57.330 it will happen at some moment in your application. We want to see how those are handled.

00:23:57.810 –> 00:24:10.740 So this is all I added to a very simple logging login application from Spring Boot. You can get the rest of the code in either my repository or the original repository from the Spring guides.

00:24:11.100 –> 00:24:26.130 But it's just doing the regular Spring Boot or Spring Boot Security magic. So you can rely on that. So just to point out to the logging library that we're using here. This is the repository for this one.

00:24:27.990 –> 00:24:32.940 It ties into our stack very well. And as I've said outputs the right fields.

00:24:33.960 –> 00:24:46.710 And just describes what its features are, how it's behaving performance-wise and, for example, how it's mapping the different fields, and what names we kind of expect. So this is the work or the legwork that this is doing for you.

00:24:48.060 –> 00:24:49.500 Okay, so…

00:24:50.610 –> 00:24:57.990 Let's see what we have here. We change into this folder where I have prepared the application already.

00:24:59.610 –> 00:25:07.350 So we have an app version one. This is the one where we want to get started with and later on we will move on to an app version two.

00:25:07.830 –> 00:25:18.210 Like in any good cooking show. Basically the code I've shown you is already deployed here and we just need to start it up so you don't see me chop the onions like in any cooking show.

00:25:18.510 –> 00:25:30.330 It's just prepared already and we'll jump to the interesting part. So let's see what happens when I start this one so we do a java -jar and then we just do app-v1.jar.

00:25:31.230 –> 00:25:39.510 And this will start our Spring Boot application. It might take a couple of seconds until it's started up and bound to port 8080.

00:25:39.990 –> 00:25:54.870 And then nginx is proxying to that application. So any request will be going through nginx that also terminates TLS. And in the background, I just run my Java application and we get to that output here.

00:25:56.010 –> 00:25:56.640

00:25:57.870 –> 00:26:03.780 Let's see what we have here. Or actually, let's start a new window.

00:26:06.930 –> 00:26:15.510 Log in as well here and let's take a quick look at the actual log file that we're generating. Or actually, let's do one more request. Let's see.

00:26:17.100 –> 00:26:27.240 So hopefully this will load. Okay, it works now. Great. Now we can switch to the login. For the login, we can't just try "hello" "world", which will fail.

00:26:28.440 –> 00:26:35.940 We don't want to save this and it tells us: Well, this is an invalid combination. If you use "user" and "password",

00:26:39.570 –> 00:26:53.280 this works. And then your user is signed in. And we can log out as well. Then we could generate a stack trace. But I'll keep that for a little later. So now that we have added a second window, let's head over to our folder and

00:26:54.510 –> 00:27:01.380 look at the logs. We have one file log.json. This is the one we want. Let's jump to the very end.

00:27:01.980 –> 00:27:12.000 Here you can see this is one of the log events that we have generated. You can see our "user" logged in successfully and right before that you can see

00:27:12.930 –> 00:27:22.800 "user" also logged in successfully. But here, we have to have the "hello" user that failed to login. So "hello" user failed to log in and this would come out to "world".

00:27:23.370 –> 00:27:32.940 So this is nicely prepared in JSON, you see a timestamp, a log level, the actual message, the thread where this is running, and the logger — so the classpath

00:27:33.810 –> 00:27:45.390 where this is running. So we can just pick that up and put it into Elasticsearch and then you would see it in Kibana. Now we actually need to do some configuration for

00:27:46.290 –> 00:27:54.900 Filebeat because now it needs to pick up the log file that we have generated. So what we need is — let me head over to the documentation.

00:27:55.170 –> 00:28:10.290 What you need is an input configuration. So if you head to the Filebeat configuration on the "Configuring Filebeat >> Configure inputs". You see the multiple inputs that we support. Like by now those are quite a few. The one we want to have is "log".

00:28:11.760 –> 00:28:21.000 And "log" basically specifies that you have the type: log and then you have a path — one or more paths that you want to collect.

00:28:21.570 –> 00:28:36.990 And we will go for /opt/*.json because those are the relevant files that we want to collect today. So let's head to the configuration of Filebeat. Do you remember the path for that? It was

00:28:38.220 –> 00:28:38.760 easy:

00:28:41.310 –> 00:28:41.580 /etc/filebeat/filebeat.yml

00:28:43.830 –> 00:28:47.070 And let's jump to the beginning of the file.

00:28:50.670 –> 00:28:52.290 So you can see we have

00:28:53.460 –> 00:28:58.920 an input. We have a log type. We have enabled: false. Let's change that to true.

00:29:00.960 –> 00:29:14.190 Then you see we have a path. And the path is the one we wanted to have is first: Be careful because we were logging in a JSON file and then it was /opt/logs/.

00:29:15.480 –> 00:29:18.000 So this is what we want to collect

00:29:18.960 –> 00:29:30.990 But there's one other thing we need to watch out for here. It's that we're logging to JSON and we want to collect JSON. So we need to tell Filebeat to interpret it as JSON right away.

00:29:31.260 –> 00:29:40.140 And then not store it as one log line into Elasticsearch, but actually JSON. So what you need to do is, going back to the configuration.

00:29:41.460 –> 00:29:49.080 Search for JSON and here you can see: You can tell it that we have JSON and the two settings that I want to add are those.

169 00:29:49.410 –> 00:29:58.170 So you can see keys_under_root, it is now a JSON message I want to put that directly into our log event and not under another key called json..

00:29:58.680 –> 00:30:10.350 And if something with parsing the JSON goes wrong. I want to log an error message of that as well. Or add an error message to the document. So these are the two things we want to add to this one.

00:30:10.800 –> 00:30:24.270 Now indentation is always fun in JSON where you want to have this — is you want to have this on the same layer as this one here. So you want to move

00:30:26.700 –> 00:30:27.510 two spaces in.

00:30:28.560 –> 00:30:36.540 And then you'll paste it. So you're at the same layer as path, we want to add those statements.

00:30:38.880 –> 00:30:44.580 Hopefully, we've done everything right. Let's restart Filebeat so it actually picks it up.

00:30:50.160 –> 00:31:03.270 A couple of seconds to start up and then we can check the status again and see if we have done everything right. There would also be a command to check the configuration, but to keep it simple, we just rely on the status here.

00:31:06.480 –> 00:31:11.940 So you can see it's been running 16 seconds. Probably this is good.

00:31:13.080 –> 00:31:27.360 So let's see what we have in here. So let's head back over to Discover and we have a lot of events like, let's refresh and ideally it will already have more events now logins can be coming in.

00:31:27.900 –> 00:31:38.940 Because now all the old JSON events will be collected. So what you want to do is, right now we have a mix of nginx and our own application, right here.

00:31:39.330 –> 00:31:46.290 What you can do is you can filter those out. So, for example, one of the things we have was "event.module".

00:31:46.950 –> 00:31:58.590 And we can say "is not" and then it already suggested the possible values: nginx. So we can exclude the nginx logs for now because we might not be interested in them.

00:31:58.830 –> 00:32:07.830 Because I want to focus on the Java application. Now you can see we have 21 log events of our Java application. Let's open one up and see what we have here.

00:32:08.550 –> 00:32:18.000 And you see we have the metadata from before again and now in "log.", we see what we have here. So here we have an "INFO" statement. And this was OK.

00:32:18.900 –> 00:32:32.940 No active profile falling back to default profiles. It's a Java log message. So this is what we want. It got all the right pieces so we don't need to do any parsing. You can see it's in the main method.

00:32:33.510 –> 00:32:37.650 You can see the log level. You can see the message, but it's probably not the one we want.

00:32:38.100 –> 00:32:51.360 Maybe what we want is in log level. Let's look for login messages, maybe we have some login messages here. You could either do that in the previous more structured way I've shown you, or you could just start typing here.

00:32:52.170 –> 00:33:02.640 And then we can just say log.level: and then it already gives you the options. So let's say we are interested in "WARNING" messages right now. So let's filter down to "WARN".

00:33:03.870 –> 00:33:08.850 And now you see: Okay, we have two. Maybe we want to look at those. What do we have here?

00:33:10.620 –> 00:33:11.340 Here.

00:33:13.560 –> 00:33:14.280 Okay.

00:33:16.500 –> 00:33:24.900 Use with default password encoder is considered unsafe. That is a bit unfortunate. The other warning message that we have here

00:33:25.650 –> 00:33:42.090 is that "hello" failed to log in with "password". At last, we have found that message because that was the one I was after. So here, this was our failed login that we have triggered. Now, we could also try the stacktrace. So "admin", for example, I tried to login.

00:33:44.160 –> 00:33:45.930 In here, let's

00:33:48.450 –> 00:33:50.730 remove this one here.

00:33:52.830 –> 00:33:55.110 I think this one should be an "ERROR".

00:33:56.130 –> 00:33:57.270 Let's give it a go.

00:34:02.250 –> 00:34:04.050 And we have one arrow here.

00:34:06.150 –> 00:34:09.540 And you can see, okay, what happened here: We have an arrow.

00:34:10.620 –> 00:34:20.460 And this is exactly our runtime exception. So you can see here, that one threw an exception. Let's look at the actual stacktrace. This is this one here.

00:34:21.450 –> 00:34:26.310 The log message that we have is: "There is no admin here…" and then we have the entire stacktrace in here.

00:34:26.790 –> 00:34:38.340 And all of this is taken care of for you and you don't need to worry about how to parse it, how to work with multi-line statements. All of that is working for you out of the box. Okay. Are we done?

Moving from logs to events 00:34:38.970 –> 00:34:46.800 Not quite. You saw that there was a version two of my application. I wanted to show you some more tricks of what you could do in version two.

00:34:47.340 –> 00:34:54.240 Let's have a look at the code first. So the general idea of the changes I've thrown into this one here is:

00:34:54.660 –> 00:35:01.230 All of this looks very familiar. The only thing that I have changed is, I have added this MDC information.

00:35:01.680 –> 00:35:10.140 So what is MDC, for example, is Mapped Diagnostic Context. So it's basically structured information that you can add to a log event.

00:35:10.710 –> 00:35:18.030 Why would you want to have that? For example, you saw, we had a log message "user managed to log" in or "fake login".

00:35:18.540 –> 00:35:24.810 And you probably want to have that in a bit more structured way so you could, for example, just search for all the failed user logins.

00:35:25.140 –> 00:35:39.030 And you probably don't want to do that as a full-text search because you could just filter down on one specific field. So what I'm doing here with this MDC information is I'm adding this structured information to every log event. So we can then slice and dice it the right way.

00:35:40.830 –> 00:35:52.500 For example, I'm extracting the username. I'm getting the remote address. And by the way, I've by now fixed this code and I will push the changes so that it also works with our reverse proxy data which showed the right

00:35:53.010 –> 00:36:02.430 IP address that I'm using. We're also logging the end-point where the request is coming from, which might be relevant. If you get lots of errors for one specific endpoint at some point.

00:36:02.940 –> 00:36:21.090 And then I add "LOGIN_SUCCESS" or "LOGIN_FAILURE", depending on the success state of my login event. Okay, so back to my application. Let's terminate this one here and let's switch over to our version two application.

00:36:22.200 –> 00:36:29.640 This will again take a couple of seconds to actually start up and then we can start running requests against that one as well.

00:36:30.690 –> 00:36:34.530 So let's give it a try. So here

00:36:35.760 –> 00:36:38.670 with "foo" "bar", which should fail.

00:36:39.720 –> 00:36:42.690 We can also use "user" and "password".

00:36:44.730 –> 00:36:51.720 And this one is working again. Okay, did anything change in our log file? Let's have a quick look at our log file again.

00:36:56.040 –> 00:37:03.000 Jump to the end and you can see how the log events are a bit more like events and not just log entries.

00:37:03.360 –> 00:37:11.820 So the general idea here is we have the same messages as before, but we have extracted some more meaningful information. For example, you can see we have a "LOGIN_SUCCESS".

00:37:12.120 –> 00:37:21.630 You have the username that was successful in this example. We have my IP address, who tried to login. So for example, if somebody tries to brute-force a specific user account

00:37:22.110 –> 00:37:27.240 or from a specific IP address for the requests, you could isolate that IP address and block it.

00:37:27.660 –> 00:37:32.340 Or you could figure out: Oh, somebody is trying to brute force the user, maybe we should lock their account.

00:37:32.790 –> 00:37:53.250 Or you could see like we have weird requests for one specific endpoint and then you take can take care of that. So let's take a look at what we have in Kibana here. And actually, let's switch this around so you can see — this is in a log that I have already collected. Let's

00:37:54.600 –> 00:37:56.730 throw away this filter here.

00:37:57.750 –> 00:38:01.020 Updated and then see how many log events we have.

00:38:03.390 –> 00:38:14.550 So now we have 43 events. This was when we started the application. The first time and now I've restarted again. And this is the second time. And now, for example, you can already see the "LOGIN_SUCCESS" here.

00:38:16.560 –> 00:38:30.390 If I scroll down to the relevant information: "LOGIN_SUCCESS", the IP address, the URL, and the user. Something that's good that we can point that out right away is we have this little yellow warning sign here. This is basically

00:38:31.290 –> 00:38:39.180 when Kibana first saw the data, this field wasn't there yet. And it doesn't know about how to properly use and visualize it.

00:38:39.450 –> 00:38:47.790 It wants to refresh this so called Index Pattern in the background to use the data. So let's quickly do that since it's something you might run into every now and then.

00:38:48.090 –> 00:38:58.020 So first off, it's important to know that every index has a so-called Index Pattern and the default one for log files is "filebeat-*". So this one here.

00:38:58.380 –> 00:39:03.900 And what you want to do now is you want to refresh that one. So it knows by default

00:39:04.770 –> 00:39:11.970 1500 something log fields, but not the customer ones that I've just added. So what you want to do is you want to click this refresh here.

00:39:12.540 –> 00:39:23.040 This might take a little while, it will refresh and instead of 83 we will probably have something like 87 or so events at the end, so 1587 or so.

00:39:23.430 –> 00:39:35.400 So this might take a little while and then it will update. Then it will tell us how many of those we have now. So let's take a quick look if we have to force it to refresh.

00:39:37.380 –> 00:39:44.970 Okay, now we're at 88 it so it added five more fields that it hadn't seen before. And going back to the log data.

00:39:46.590 –> 00:39:58.080 And Discover. Now, the little exclamation marks in yellow should be gone and we can use all the fields in any way we see fit. So let's open this one up here again.

00:39:58.890 –> 00:40:08.160 And you can see, for example, now I could include or exclude all the successful logins. Here with the plus, I would only see the "LOGIN_SUCCESS".

00:40:08.430 –> 00:40:23.790 With the minus, I would exclude those or we could just pull out that column here. Let's do that. We toggle the column and then it could just show me: This was a "LOGIN_SUCCESS" for example here and we only have this one login success in our data.

00:40:30.240 –> 00:40:48.390 You could also, for example, filter down on: Okay, this is where this event was being generated or we have a service name and here it's "gs-securing-web" and this is one that I want to use now because we have another view with that will make your life easier here. So we call it the Log UI.

00:40:51.810 –> 00:41:03.240 And this is a bit more like tail -f for what you're used to. It's streaming your logs, as they come in and you don't have to refresh or anything, but it can also filter down.

00:41:03.600 –> 00:41:20.040 So I can just show by default, it will show you these are all the logs that come in. You can turn that on to streaming, but you can also filter down to log events that are interested in. So for example, I was interested in service.

00:41:21.960 –> 00:41:26.220 That name this one here. And let's see.

00:41:27.960 –> 00:41:42.510 Ideally, it auto-completes already what we have here. Now I can filter down to just those. And you can see here, user logged in successfully and we could, for example, expand that event here.

00:41:44.520 –> 00:42:02.010 And then we would have all the fields and any of these fields, by the way, you could drop into that filter as well to just filter down on a specific subset of data that you want to have. So for example, I could try to close this one we keep streaming and we

00:42:04.170 –> 00:42:05.040 add some more

00:42:06.630 –> 00:42:07.530 log events.

00:42:16.680 –> 00:42:17.910 Let's go with user

00:42:20.340 –> 00:42:21.030 again.

00:42:25.410 –> 00:42:29.220 So we have generated some more events and hopefully

00:42:30.360 –> 00:42:39.300 you can see right now the users that we have been using, those just keep streaming in and you can just watch the right part of your application.

00:42:39.990 –> 00:42:51.300 Or how it's behaving and how your service is keeping up. You can just follow along as stuff happens. So this would make your life a lot easier to follow with your log events. Cool.

00:42:52.350 –> 00:43:04.350 I think that's pretty much it for today. And I hope you've got a better idea of how to make your life much easier to log, especially with either Filebeat Modules or things that we have to configure.

00:43:04.800 –> 00:43:11.850 Or for example Spring Boot, where you can just tie in with our logging library. And then you just need to take that and then you're done.

00:43:12.630 –> 00:43:19.320 So what's next. If you want to go further, something that you might be interested in, for example, is how to get metrics. Like

00:43:20.040 –> 00:43:31.020 about your operating system. The JVM. Maybe from your application just emit the rate of events that you collected. You could also get for example container information or Kubernetes

00:43:31.590 –> 00:43:40.800 data like if you're running in those environments. You could also be interested in more performance data with APM or tracing where you connect to your application

00:43:41.460 –> 00:43:51.030 with an agent to collect those pieces of information. Those are all things that we could explore in the future. But for today, let's stick to the login use case.

Visualize the structured information in your logs 00:43:51.480 –> 00:44:01.260 One final thing that we could do is since we have all these nice log events that we could take a look: How many successful and unsuccessful logins did we have all the time now.

00:44:01.530 –> 00:44:14.280 So just going back to Discovery you to take one final look at the fields that we have. So we can then use those to see what is the ratio of successful and unsuccessful log events that I've used today.

00:44:17.550 –> 00:44:35.190 Luckily, it always keeps the filter until you delete it. So, we will see in a moment how successful or unsuccessful we were. So the field we are interested in is "labels.event.category". And then we have "LOGIN_SUCCESS" and "LOGIN_FAILURE". So let's head over to Visualize.

00:44:38.040 –> 00:44:52.710 Create a new visualization here. Let's say we want a Horizontal Bar chart and I now pick the Index Pattern which is "filebeat-*" which I want to visualize. And then in the Index Pattern here:

00:44:53.670 –> 00:45:05.970 At first, it tells me how many log events we have, but I can group that into Buckets where I say for example I want to split it up on a specific Term.

00:45:06.870 –> 00:45:20.910 And the Term I am interested in is the "labels.event.category" and we only get the top five events but I only have successful and unsuccessful logins right now. So it shouldn't matter. And then you could just run that.

00:45:23.400 –> 00:45:33.150 And then you see, okay, we have three failures and four success messages in the last 15 minutes and then you could expand that to 30 minutes.

00:45:33.450 –> 00:45:45.030 Probably it doesn't change too much now because I've not been running this app for too long. But here you can see these are all the successful and unsuccessful events that we have been collecting.

00:45:45.480 –> 00:45:56.190 And then you can just see what is the rate. And then you could, for example, even split that out into specific usernames, so we could split those series and say: I want to have a

00:45:57.690 –> 00:45:58.320 Term.

00:45:59.610 –> 00:46:11.250 And we want to go for the username. And the username that I had in mind — so again, it's "labels.event.username". And we want, let's say the top 10 usernames that we've been using.

00:46:14.340 –> 00:46:23.940 And you can see all the successful ones were on a single user. So that's good. And the failed ones we have admin and this "asdfo".

00:46:24.210 –> 00:46:31.890 So those were the unsuccessful logins that we had. And then you could figure out. Oh, somebody's trying to brute-force this one account with a lot of unsuccessful logins.

00:46:32.250 –> 00:46:50.010 Maybe we should lock that user. So you can get a lot of extra value out of your logs if you keep them structure. And in the future, we can dive more into what you can do with metrics here, by the way, but let's finish up for today. Thanks a lot for listening or let me stop sharing.

00:46:51.270 –> 00:47:05.340 So thanks for listening and I hope you learned something. If you have any questions, just let me know. Or if you want to know more about a specific area, also let me know and we can do a follow-up. Thanks so much, and good luck with your logs.