Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] master from apache:master #57

Merged
merged 4 commits into from
Dec 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions LICENSE-binary
Original file line number Diff line number Diff line change
Expand Up @@ -262,6 +262,7 @@ org.eclipse.jetty:jetty-proxy
org.apache.logging.log4j:log4j-1.2-api
org.apache.logging.log4j:log4j-api
org.apache.logging.log4j:log4j-core
org.apache.logging.log4j:log4j-layout-template-json
org.apache.logging.log4j:log4j-slf4j-impl
org.yaml:snakeyaml
io.dropwizard.metrics:metrics-core
Expand Down
13 changes: 12 additions & 1 deletion conf/log4j2.xml.template
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
~ limitations under the License.
-->

<!-- Provide log4j2.xml.template to fix `ERROR Filters contains invalid attributes "onMatch", "onMismatch"`, see KYUUBI-2247 -->
<!-- Provide log4j2.xml.template to fix `ERROR Filters contains invalid attributes "onMatch", "onMismatch"`, see KYUUBI #2247 -->
<!-- Extra logging related to initialization of Log4j.
Set to debug or trace if log4j initialization is failing. -->
<Configuration status="INFO">
Expand Down Expand Up @@ -57,6 +57,17 @@
</Policies>
<DefaultRolloverStrategy max="10"/>
</RollingFile>
<!-- Kafka appender with Elastic Common Schema(ECS) JSON template layout
<Kafka name="kafka" topic="ecs-json-logs" syncSend="false">
<JsonTemplateLayout>
<EventTemplateAdditionalField key="app" value="kyuubi"/>
<EventTemplateAdditionalField key="cluster" value="kyuubi-cluster"/>
<EventTemplateAdditionalField key="host" value="${hostName}"/>
</JsonTemplateLayout>
<Property name="bootstrap.servers" value="kafka-1:9092,kafka-2:9092,kafka-3:9092"/>
<Property name="compression.type" value="gzip"/>
</Kafka>
-->
</Appenders>
<Loggers>
<Root level="INFO">
Expand Down
1 change: 1 addition & 0 deletions dev/dependencyList
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,7 @@ kubernetes-model-storageclass/6.13.1//kubernetes-model-storageclass-6.13.1.jar
log4j-1.2-api/2.24.2//log4j-1.2-api-2.24.2.jar
log4j-api/2.24.2//log4j-api-2.24.2.jar
log4j-core/2.24.2//log4j-core-2.24.2.jar
log4j-layout-template-json/2.24.2//log4j-layout-template-json-2.24.2.jar
log4j-slf4j-impl/2.24.2//log4j-slf4j-impl-2.24.2.jar
logging-interceptor/3.12.12//logging-interceptor-3.12.12.jar
metrics-core/4.2.26//metrics-core-4.2.26.jar
Expand Down
21 changes: 11 additions & 10 deletions docs/configuration/settings.md
Original file line number Diff line number Diff line change
Expand Up @@ -399,16 +399,17 @@ You can configure the Kyuubi properties in `$KYUUBI_HOME/conf/kyuubi-defaults.co

### Metrics

| Key | Default | Meaning | Type | Since |
|---------------------------------|------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|-------|
| kyuubi.metrics.console.interval | PT5S | How often should report metrics to console | duration | 1.2.0 |
| kyuubi.metrics.enabled | true | Set to true to enable kyuubi metrics system | boolean | 1.2.0 |
| kyuubi.metrics.json.interval | PT5S | How often should report metrics to JSON file | duration | 1.2.0 |
| kyuubi.metrics.json.location | metrics | Where the JSON metrics file located | string | 1.2.0 |
| kyuubi.metrics.prometheus.path | /metrics | URI context path of prometheus metrics HTTP server | string | 1.2.0 |
| kyuubi.metrics.prometheus.port | 10019 | Prometheus metrics HTTP server port | int | 1.2.0 |
| kyuubi.metrics.reporters | PROMETHEUS | A comma-separated list for all metrics reporters<ul> <li>CONSOLE - ConsoleReporter which outputs measurements to CONSOLE periodically.</li> <li>JMX - JmxReporter which listens for new metrics and exposes them as MBeans.</li> <li>JSON - JsonReporter which outputs measurements to json file periodically.</li> <li>PROMETHEUS - PrometheusReporter which exposes metrics in Prometheus format.</li> <li>SLF4J - Slf4jReporter which outputs measurements to system log periodically.</li></ul> | set | 1.2.0 |
| kyuubi.metrics.slf4j.interval | PT5S | How often should report metrics to SLF4J logger | duration | 1.2.0 |
| Key | Default | Meaning | Type | Since |
|---------------------------------------------------|------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|--------|
| kyuubi.metrics.console.interval | PT5S | How often should report metrics to console | duration | 1.2.0 |
| kyuubi.metrics.enabled | true | Set to true to enable kyuubi metrics system | boolean | 1.2.0 |
| kyuubi.metrics.json.interval | PT5S | How often should report metrics to JSON file | duration | 1.2.0 |
| kyuubi.metrics.json.location | metrics | Where the JSON metrics file located | string | 1.2.0 |
| kyuubi.metrics.prometheus.labels.instance.enabled | false | Whether to add instance label to prometheus metrics | boolean | 1.10.2 |
| kyuubi.metrics.prometheus.path | /metrics | URI context path of prometheus metrics HTTP server | string | 1.2.0 |
| kyuubi.metrics.prometheus.port | 10019 | Prometheus metrics HTTP server port | int | 1.2.0 |
| kyuubi.metrics.reporters | PROMETHEUS | A comma-separated list for all metrics reporters<ul> <li>CONSOLE - ConsoleReporter which outputs measurements to CONSOLE periodically.</li> <li>JMX - JmxReporter which listens for new metrics and exposes them as MBeans.</li> <li>JSON - JsonReporter which outputs measurements to json file periodically.</li> <li>PROMETHEUS - PrometheusReporter which exposes metrics in Prometheus format.</li> <li>SLF4J - Slf4jReporter which outputs measurements to system log periodically.</li></ul> | set | 1.2.0 |
| kyuubi.metrics.slf4j.interval | PT5S | How often should report metrics to SLF4J logger | duration | 1.2.0 |

### Operation

Expand Down
49 changes: 49 additions & 0 deletions docs/monitor/logging.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,55 @@ For example, we can disable the console appender and enable the file appender li

Then everything goes to `log/dummy.log`.

#### Sending Structured Logs to Kafka

The Log4j2 has a built-in [KafkaAppender](https://logging.apache.org/log4j/2.x/manual/appenders/message-queue.html#KafkaAppender)
which allows sending log messages to an Apache Kafka topic with a few configurations, and it also provides a built-in
[JSON Template Layout](https://logging.apache.org/log4j/2.x/manual/json-template-layout.html) that supports encoding
`LogEvents` to structured JSON messages according to the structure described by the provided template.

For example, we can configure the Kyuubi server to send the structured logs to Kafka `ecs-json-logs` topic,

```xml
<Configuration status="INFO">
<Appenders>
<Kafka name="kafka" topic="ecs-json-logs" syncSend="false">
<JsonTemplateLayout>
<EventTemplateAdditionalField key="app" value="kyuubi"/>
<EventTemplateAdditionalField key="cluster" value="kyuubi-cluster"/>
<EventTemplateAdditionalField key="host" value="${hostName}"/>
</JsonTemplateLayout>
<Property name="bootstrap.servers" value="kafka-1:9092,kafka-2:9092,kafka-3:9092"/>
<Property name="compression.type" value="gzip"/>
</Kafka>
</Appenders>
<Loggers>
<Root level="INFO">
<AppenderRef ref="kafka"/>
</Root>
</Loggers>
</Configuration>
```

And each structured log message looks like,

```json
{
"@timestamp": "2024-12-24T18:53:01.030Z",
"ecs.version": "1.2.0",
"log.level": "INFO",
"message": "Service[KyuubiServer] is started.",
"process.thread.name": "main",
"log.logger": "org.apache.kyuubi.server.KyuubiServer",
"app": "kyuubi",
"cluster": "kyuubi-cluster",
"host": "hadoop-master1.orb.local"
}
```

Note: this feature may require additional jars to work. Please read the Log4j2 docs and ensure those jars are
on the Kyuubi server's classpath before enabling it.

## Logs of Spark SQL Engine

Spark SQL Engine is one type of Kyuubi Engines and also a typical Spark application.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ import org.apache.spark.sql.catalyst.planning.ScanOperation
import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
import org.apache.spark.sql.execution.SparkPlan
import org.apache.spark.sql.execution.datasources.{CatalogFileIndex, HadoopFsRelation, InMemoryFileIndex, LogicalRelation}
import org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanRelation
import org.apache.spark.sql.types.StructType

import org.apache.kyuubi.sql.KyuubiSQLConf
Expand Down Expand Up @@ -230,6 +231,39 @@ case class MaxScanStrategy(session: SparkSession)
logicalRelation.catalogTable)
}
}
case ScanOperation(
_,
_,
relation @ DataSourceV2ScanRelation(_, _, _, _)) =>
val table = relation.relation.table
if (table.partitioning().nonEmpty) {
val partitionColumnNames = table.partitioning().map(_.describe())
val stats = relation.computeStats()
lazy val scanFileSize = stats.sizeInBytes
if (maxFileSizeOpt.exists(_ < scanFileSize)) {
throw new MaxFileSizeExceedException(
s"""
|SQL job scan file size in bytes: $scanFileSize
|exceed restrict of table scan maxFileSize ${maxFileSizeOpt.get}
|You should optimize your SQL logical according partition structure
|or shorten query scope such as p_date, detail as below:
|Table: ${table.name()}
|Partition Structure: ${partitionColumnNames.mkString(",")}
|""".stripMargin)
}
} else {
val stats = relation.computeStats()
lazy val scanFileSize = stats.sizeInBytes
if (maxFileSizeOpt.exists(_ < scanFileSize)) {
throw new MaxFileSizeExceedException(
s"""
|SQL job scan file size in bytes: $scanFileSize
|exceed restrict of table scan maxFileSize ${maxFileSizeOpt.get}
|detail as below:
|Table: ${table.name()}
|""".stripMargin)
}
}
case _ =>
}
}
Expand Down
Loading
Loading