头歌大数据答案(自用)

2024-06-19 06:04
文章标签 数据 答案 头歌 自用

本文主要是介绍头歌大数据答案(自用),希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

第一关

# 命令行
start-all.sh
nohup hive --service metastore &
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions.col
import org.apache.spark.sql.functions._
object cleandata {def main(args: Array[String]): Unit = {//创建spark对象val spark = SparkSession.builder().appName("HiveSupport").master("local[*]").config("spark.sql.warehouse.dir", "hdfs://127.0.0.1:9000/opt/hive/warehouse").config("hive.metastore.uris", "thrift://127.0.0.1:9083").config("dfs.client.use.datanode.hostname", "true").enableHiveSupport().getOrCreate()//############# Begin ############//创建hive数据库daobidataspark.sql("create database daobidata")//创建hive数据表spark.sql("use daobidata")//创建diedata表spark.sql("create table if not exists diedata(bianh int,com_name string," +"com_addr string,cat string,se_cat string,com_des string,born_data string," +"death_data string,live_days int,financing string,total_money int,death_reason string,"+"invest_name string,ceo_name string,ceo_des string"+")row format delimited fields terminated by ',';")//将本地datadie.csv文件导入至hive数据库diedata表中spark.sql("load data local inpath '/data/workspace/myshixun/data/datadie.csv' into table diedata;")//进入diedata表进行清洗操作,删除为空的数据,根据倒闭原因切分出最主要原因,根据成立时间切分出,企业成立的年份,根据倒闭时间切分出,企业倒闭的年份val c1 = spark.table("diedata").na.drop("any").distinct().withColumn("death_reason",split(col("death_reason")," ")(0)).withColumn("bornyear",split(col("born_data"),"/")(0)).withColumn("deathyear",split(col("death_data"),"/")(0))c1.createOrReplaceTempView("c1")//创建die_data表spark.sql("create table if not exists die_data(bianh int,com_name string," +"com_addr string,cat string,se_cat string,com_des string,born_data string," +"death_data string,live_days int,financing string,total_money int,death_reason string,"+"invest_name string,ceo_name string,ceo_des string,bornyear string,deathyear string"+")row format delimited fields terminated by ',';")//将清洗完的数据导入至die_data表中spark.sql("insert overwrite table die_data select * from c1")//############# End ##############spark.stop()}
}

第二关

import org.apache.spark.sql.{SaveMode, SparkSession}
object citydiedata {def main(args: Array[String]): Unit = {val spark = SparkSession.builder().appName("SparkCleanJob").master("local[*]").getOrCreate()//************* Begin **************//读取数据,用逗号分隔,第一行不做为数据,做为标题val df1 = spark.read.option("delimiter", ",").option("header",true).csv("/data/workspace/myshixun/die_data.csv")df1.createOrReplaceTempView("df1")//使用spark SQL语句,根据城市统计企业倒闭top5val df=spark.sql("select df1.com_addr as com_addr,count(df1.com_addr) as saddr from df1 group by df1.com_addr order by saddr desc limit 5").repartition(1).write//连接数据库.format("jdbc").option("url", "jdbc:mysql://127.0.0.1:3306/diedata?useUnicode=true&characterEncoding=utf-8").option("driver","com.mysql.jdbc.Driver")//保存至数据库的数据表名.option("dbtable", "addr")//用户名.option("user", "root")//连接数据库的密码.option("password", "123123")//不破坏数据表结构,在后添加.mode(SaveMode.Append).save()//************ End ***********spark.stop()}
}   

import org.apache.spark.sql.{SaveMode, SparkSession}
object industrydata {def main(args: Array[String]): Unit = {val spark = SparkSession.builder().appName("SparkCleanJob").master("local[*]").getOrCreate()//########## Begin ############//读取数据,用逗号分隔,第一行不做为数据,做为标题val df1 = spark.read.option("delimiter", ",").option("header",true).csv("/data/workspace/myshixun/die_data.csv")df1.createOrReplaceTempView("df1")//使用spark SQL语句,根据行业统计企业倒闭top10val df=spark.sql("select df1.cat as industry,count(df1.cat) as catindustry from df1 group by df1.cat order by catindustry desc limit 10 ").repartition(1).write//连接数据库.format("jdbc").option("url", "jdbc:mysql://127.0.0.1:3306/diedata?useUnicode=true&characterEncoding=utf-8").option("driver","com.mysql.jdbc.Driver")//数据表名.option("dbtable", "industry").option("user", "root").option("password", "123123")//不破坏数据表结构,在后添加.mode(SaveMode.Append).save()//############ End ###########spark.stop()}
}  

import org.apache.spark.sql.{SaveMode, SparkSession}
object closedown {def main(args: Array[String]): Unit = {val spark = SparkSession.builder().appName("SparkCleanJob").master("local[*]").getOrCreate()//############ Begin ###########//读取数据,用逗号分隔,第一行不做为数据,做为标题val df1 = spark.read.option("delimiter", ",").option("header",true).csv("/data/workspace/myshixun/die_data.csv")df1.createOrReplaceTempView("df1")//使用spark SQL语句,根据倒闭原因字段,找到企业倒闭的主要原因,统计主要原因的个数val df=spark.sql("select df1.death_reason as death_reason,count(df1.death_reason) as dreason from df1 group by df1.death_reason order by dreason desc").repartition(1).write//连接数据库.format("jdbc")//数据库名.option("url", "jdbc:mysql://127.0.0.1:3306/diedata?useUnicode=true&characterEncoding=utf-8").option("driver","com.mysql.jdbc.Driver")//数据表名.option("dbtable", "cldown").option("user", "root").option("password", "123123")//不破坏表结构,在后面添加.mode(SaveMode.Append).save()//############ End ###########spark.stop()}
}

import org.apache.spark.sql.{SaveMode, SparkSession}
object comfinanc {def main(args: Array[String]): Unit = {val spark = SparkSession.builder().appName("SparkCleanJob").master("local[*]").getOrCreate()//############ Begin ###########//读取数据,用逗号分隔,去除表头,第一行不做为数据,做为标题val df1 = spark.read.option("delimiter", ",").option("header",true).csv("/data/workspace/myshixun/die_data.csv")df1.createOrReplaceTempView("df1")//使用spark SQL语句,根据行业细分领域字段,统计企业倒闭分布情况top20val df=spark.sql("select df1.se_cat as se_cat,count(df1.se_cat) as countsecat from df1 group by df1.se_cat order by countsecat desc limit 10").repartition(1).write//连接数据库.format("jdbc").option("url", "jdbc:mysql://127.0.0.1:3306/diedata?useUnicode=true&characterEncoding=utf-8").option("driver","com.mysql.jdbc.Driver")//数据表名.option("dbtable", "secat").option("user", "root").option("password", "123123")//不破坏表结构,在后面添加.mode(SaveMode.Append).save()//使用spark SQL语句,统计倒闭企业融资情况val d1=spark.sql("select df1.financing as financing,count(df1.financing) as countfinanc from df1 group by df1.financing order by countfinanc desc").repartition(1).write//连接数据库.format("jdbc").option("url", "jdbc:mysql://127.0.0.1:3306/diedata?useUnicode=true&characterEncoding=utf-8").option("driver","com.mysql.jdbc.Driver")//数据表名.option("dbtable", "financing").option("user", "root").option("password", "123123")//不破坏表结构,在后面添加.mode(SaveMode.Append).save()//########## End #########spark.stop()}
}

import org.apache.spark.sql.{SaveMode, SparkSession}
object yeardata {def main(args: Array[String]): Unit = {val spark = SparkSession.builder().appName("SparkCleanJob").master("local[*]").getOrCreate()//############ Begin ###########//读取数据,用逗号分隔,第一行不做为数据,做为标题val df1 = spark.read.option("delimiter", ",").option("header",true).csv("/data/workspace/myshixun/die_data.csv")df1.createOrReplaceTempView("df1")//根据企业成立时间字段,统计每年有多少成立的企业val d1=spark.sql("select df1.bornyear as bornyear,count(df1.bornyear) as byear from df1 group by df1.bornyear order by bornyear desc limit 10").repartition(1).write//连接数据库.format("jdbc").option("url", "jdbc:mysql://127.0.0.1:3306/diedata?useUnicode=true&characterEncoding=utf-8").option("driver","com.mysql.jdbc.Driver")//数据表名.option("dbtable", "bornyear").option("user", "root").option("password", "123123")//不破坏表结构,在后面添加.mode(SaveMode.Append).save()//根据企业倒闭年份字段,统计企业每个年份倒闭的数量val d2=spark.sql("select df1.deathyear as deathyear,count(df1.deathyear) as dyear from df1 group by df1.deathyear order by deathyear desc limit 10").repartition(1).write//连接数据库.format("jdbc")//数据库名.option("url", "jdbc:mysql://127.0.0.1:3306/diedata?useUnicode=true&characterEncoding=utf-8").option("driver","com.mysql.jdbc.Driver")//数据表名.option("dbtable", "deathyear").option("user", "root").option("password", "123123")//不破坏表结构,在后面添加.mode(SaveMode.Append).save()//############# End ############spark.stop()}
}

第三关

from app import db
class diedata(db.Model):__tablename__ = "addr"#**************** Begin ************#ID = db.Column(db.Integer, primary_key=True)  ##序号 主键com_addr = db.Column(db.String(255))  ##城市saddr = db.Column(db.Integer)  ##统计企业倒闭数量#************* End *************#
from flask import render_template
from app.views import index
from app import db
from app.model.models import diedata
@index.route("/city")
def index1():selectdata = db.session.query(diedata.com_addr).all()selectdata1 = db.session.query(diedata.saddr).all()list1 =[]list2=[]#********** Begin **********##获取城市倒闭企业top5的数据for k in selectdata:data = {"com_addr": k.com_addr,}list1.append(data)for i in selectdata1:list2.append(i[0])return render_template("test3.html", com_addr=list1, saddr=list2)#*********** End ***********#
<!DOCTYPE html>
<html lang="en">
<head><meta charset="UTF-8"><title>城市倒闭企业统计情况</title><script type="text/javascript" src="../static/js/echarts-all-3.js" ></script>
</head>
<body>
<!--准备一个DOM容器--><div id="main" style="width: 1500px;height: 650px;"></div><script>var myChart = echarts.init(document.getElementById('main'));//*********** Begin ***************com_addr=[]{% for a in com_addr %}com_addr.push('{{ a.com_addr }}');{% endfor %}var saddr={{saddr|tojson}};option = {title:{text:'城市倒闭企业top5展示图',left:'center'},legend: {data: ['城市倒闭企业个数'], //这里设置柱状图上面的方块,名称跟series里的name保持一致align: 'right', //图例显示的位置:靠左,靠右还是居中的设置.不设置则居中right: 10,},xAxis: {type: 'category',data: com_addr},yAxis: {type: 'value',name: '倒闭个数',axisLabel: {formatter: '{value} 个'}},series: [{data: saddr,type: 'bar',name: '城市倒闭企业个数',itemStyle: {normal: {color:'blue',lineStyle:{color:'blue'},label : {show: true}}}}]};myChart.setOption(option);//************ End ***************</script>
</body>
</html>

from app import db
class diedata(db.Model):__tablename__ = "industrydata"#************* Begin ************ID = db.Column(db.Integer, primary_key=True)  ##序号 主键industry = db.Column(db.String(255))  ##行业名catindustry = db.Column(db.Integer)  ##行业倒闭数#************* End ************
from flask import render_template
from app.views import index
from app import db
from app.model.models import diedata
@index.route("/industry")
def index1():#************* Begin ************selectdata = db.session.query(diedata.industry).all()selectdata1 = db.session.query(diedata.catindustry).all()list1 =[]list2=[]for k in selectdata:data = {"industry": k.industry,}list1.append(data)for i in selectdata1:list2.append(i[0])return render_template("test3.html", industry=list1, catindustry=list2)#************* End *************
<!DOCTYPE html>
<html lang="en">
<head><meta charset="UTF-8"><title>城市倒闭企业统计情况</title><script type="text/javascript" src="../static/js/echarts-all-3.js" ></script>
</head>
<body>
<!--准备一个DOM容器--><div id="main" style="width: 1500px;height: 650px;"></div><script>var myChart = echarts.init(document.getElementById('main'));//************* Begin ************industry=[]{% for a in industry %}industry.push('{{ a.industry }}');{% endfor %}var catindustry={{catindustry|tojson}};option = {title:{text:'行业企业倒闭top10折线图',left:'center'},legend: {data: ['行业企业倒闭数'], //这里设置柱状图上面的方块,名称跟series里的name保持一致align: 'right', //图例显示的位置:靠左,靠右还是居中的设置.不设置则居中right: 10,},xAxis: {type: 'category',name: '行业分类',axisLabel: {formatter: '{value}'},data: industry},yAxis: {type: 'value',name: '行业企业倒闭数',axisLabel: {formatter: '{value} 个'}},series: [{name:'行业企业倒闭数',data: catindustry,type: 'line',smooth: true,label:{show:true},itemStyle: {normal: {color:'green',lineStyle:{color:'green'},label : {show: true}}}}]};myChart.setOption(option);//************* End ************</script>
</body>
</html>

from app import db
class diedata(db.Model):__tablename__ = "closedown"############ Begin ###########ID = db.Column(db.Integer, primary_key=True)  ##序号 主键death_reason = db.Column(db.String(255))  ##倒闭原因dreason = db.Column(db.Integer)  ##倒闭原因统计############ End ###########
from flask import render_template
from app.views import index
from app import db
from app.model.models import diedata
@index.route("/deathreason")
def index1():selectdata = db.session.query(diedata.death_reason,diedata.dreason).all()list1 =[]############# Begin ############for k in selectdata:data = {"name": k.death_reason,"value":k.dreason}list1.append(data)return render_template("test3.html", datas=list1)############# End ############
<!DOCTYPE html>
<html lang="en">
<head><meta charset="UTF-8"><title>企业倒闭的原因</title><script type="text/javascript" src="../static/js/echarts-all-3.js" ></script>
</head>
<body>
<!--准备一个DOM容器--><div id="main" style="width: 1500px;height: 650px;"></div><script>var myChart = echarts.init(document.getElementById('main'));//########### Begin #############var datas={{datas|tojson}};option = {title: {text: '企业倒闭原因结果统计图',left: 'center'},legend: {top: 'bottom',data:datas},tooltip: {trigger: 'item',formatter: '{b} : {c} ({d}%)'},toolbox: {show: true},series: [{type: 'pie',radius: [50, 250],center: ['50%', '50%'],roseType: 'area',itemStyle: {borderRadius: 8},data:datas}]};myChart.setOption(option);//########### End #############</script>
</body>
</html>

from app import db
class diedata(db.Model):__tablename__ = "secat"############## Begin ###########ID = db.Column(db.Integer, primary_key=True)  ##序号 主键se_cat = db.Column(db.String(255))  ##细分领域countsecat = db.Column(db.Integer)  ##细分领域企业倒闭数############## End ############
class diedata1(db.Model):__tablename__ = "financing"############## Begin ###########ID = db.Column(db.Integer, primary_key=True)  ##序号 主键financing = db.Column(db.String(255))  ##融资名countfinanc = db.Column(db.Integer)  ##融资个数############## End ############
from flask import render_template
from app.views import index
from app import db
from app.model.models import diedata
from app.model.models import diedata1
@index.route("/fincat")
def index1():selectdata = db.session.query(diedata.se_cat).all()selectdata1 =db.session.query(diedata.countsecat).all()selectdata2=db.session.query(diedata1.financing).all()selectdata3=db.session.query(diedata1.countfinanc).all()list1 =[]list2 = []list3 = []list4 = []############## Begin ###########for i in selectdata:data = {"se_cat": i.se_cat,}list1.append(data)for j in selectdata1:list2.append(j[0])for x in selectdata2:data = {"financing": x.financing,}list3.append(data)for y in selectdata3:list4.append(y[0])return render_template("test3.html", se_cat=list1,countsecat=list2,financing=list3,countfinanc=list4)############## End ###########
<!DOCTYPE html>
<html lang="en">
<head><meta charset="UTF-8"><title>企业融资以及细分领域倒闭企业数据</title><script type="text/javascript" src="../static/js/echarts-all-3.js" ></script>
</head>
<body>
<!--准备一个DOM容器--><div id="main" style="width: 1500px;height: 650px;"></div><script>var myChart = echarts.init(document.getElementById('main'));//############## Begin ###########se_cat=[]{% for a in se_cat %}se_cat.push('{{ a.se_cat }}');{% endfor %}var countsecat={{countsecat|tojson}};financing=[]{% for b in financing %}financing.push('{{ b.financing }}');{% endfor %}var countfinanc={{countfinanc|tojson}};option = {title: [{left: 'center',text: '细分领域企业倒闭数'},{top: '55%',left: 'center',text: '企业融资情况'}],tooltip: {trigger: 'axis'},legend: {data: ['细分领域', '融资'],left: 10},xAxis: [{data: se_cat},{data: financing,gridIndex: 1}],yAxis: [{},{gridIndex: 1}],grid: [{bottom: '60%'},{top: '60%'}],series: [{name:'细分领域',type: 'bar',showSymbol: true,data: countsecat,label:{show:true},itemStyle: {normal: {color:'red',lineStyle:{color:'red'},label : {show: true}}}},{name:'融资',type: 'line',showSymbol: true,data: countfinanc,xAxisIndex: 1,yAxisIndex: 1,label:{show:true},itemStyle: {normal: {color:'green',lineStyle:{color:'green'},label : {show: true}}}}]};myChart.setOption(option);//############## End ###########</script>
</body>
</html>

from app import db
class diedata(db.Model):__tablename__ = "bornyear"########### Begin ##########ID = db.Column(db.Integer, primary_key=True)  ##序号 主键bornyear = db.Column(db.String(255))  ##成立年份byear = db.Column(db.Integer)  ##计数########### End ##########
class diedata1(db.Model):__tablename__ = "deathyear"########### Begin ##########ID = db.Column(db.Integer, primary_key=True)  ##序号 主键deathyear = db.Column(db.String(255))  ##倒闭年份dyear = db.Column(db.Integer)  ##计数########### End ##########
from flask import render_template
from app.views import index
from app import db
from app.model.models import diedata
from app.model.models import diedata1
@index.route("/ydata")
def index1():########### Begin ##########selectdata = db.session.query(diedata.bornyear,diedata.byear).all()selectdata1 =db.session.query(diedata1.deathyear,diedata1.dyear).all()list1 =[]list2 = []list3 = []list4 = []for x in selectdata:list1.append(str(x[0])+'年')list2.append(x[1])for j in selectdata1:list3.append(str(j[0])+'年')list4.append(j[1])############ End ############return render_template("test3.html", bornyear=list1,byear=list2,deathyear=list3,dyear=list4)
<!DOCTYPE html>
<html lang="en">
<head><meta charset="UTF-8"><title>企业成立年份和倒闭年份</title><script type="text/javascript" src="../static/js/echarts-all-3.js" ></script>
</head>
<body>
<!--准备一个DOM容器--><div id="main" style="width: 1500px;height: 650px;"></div><script>//########### Begin ###########var myChart = echarts.init(document.getElementById('main'));var bornyear={{bornyear|tojson}};var byear={{byear|tojson}};var deathyear={{deathyear|tojson}};var dyear={{dyear|tojson}};option = {title: [{left: 'center',text: '企业成立年份柱状图'},{top: '55%',left: 'center',text: '企业倒闭年份柱状图'}],tooltip: {trigger: 'axis'},legend: {data: ['成立年份', '倒闭年份'],left: 10},xAxis: [{data: bornyear},{data: deathyear,gridIndex: 1}],yAxis: [{},{gridIndex: 1}],grid: [{bottom: '60%'},{top: '60%'}],series: [{name:'成立年份',type: 'bar',showSymbol: true,data: byear,label:{show:true},itemStyle: {normal: {color:'red',lineStyle:{color:'red'},label : {show: true}}}},{name:'倒闭年份',type: 'bar',showSymbol: true,data: dyear,xAxisIndex: 1,yAxisIndex: 1,label:{show:true},itemStyle: {normal: {color:'green',lineStyle:{color:'green'},label : {show: true}}}}]};myChart.setOption(option);//########### End ###########</script>
</body>
</html>

这篇关于头歌大数据答案(自用)的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!


原文地址:
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.chinasem.cn/article/1074211

相关文章

canal实现mysql数据同步的详细过程

《canal实现mysql数据同步的详细过程》:本文主要介绍canal实现mysql数据同步的详细过程,本文通过实例图文相结合给大家介绍的非常详细,对大家的学习或工作具有一定的参考借鉴价值,需要的... 目录1、canal下载2、mysql同步用户创建和授权3、canal admin安装和启动4、canal

使用SpringBoot整合Sharding Sphere实现数据脱敏的示例

《使用SpringBoot整合ShardingSphere实现数据脱敏的示例》ApacheShardingSphere数据脱敏模块,通过SQL拦截与改写实现敏感信息加密存储,解决手动处理繁琐及系统改... 目录痛点一:痛点二:脱敏配置Quick Start——Spring 显示配置:1.引入依赖2.创建脱敏

详解如何使用Python构建从数据到文档的自动化工作流

《详解如何使用Python构建从数据到文档的自动化工作流》这篇文章将通过真实工作场景拆解,为大家展示如何用Python构建自动化工作流,让工具代替人力完成这些数字苦力活,感兴趣的小伙伴可以跟随小编一起... 目录一、Excel处理:从数据搬运工到智能分析师二、PDF处理:文档工厂的智能生产线三、邮件自动化:

Python数据分析与可视化的全面指南(从数据清洗到图表呈现)

《Python数据分析与可视化的全面指南(从数据清洗到图表呈现)》Python是数据分析与可视化领域中最受欢迎的编程语言之一,凭借其丰富的库和工具,Python能够帮助我们快速处理、分析数据并生成高质... 目录一、数据采集与初步探索二、数据清洗的七种武器1. 缺失值处理策略2. 异常值检测与修正3. 数据

pandas实现数据concat拼接的示例代码

《pandas实现数据concat拼接的示例代码》pandas.concat用于合并DataFrame或Series,本文主要介绍了pandas实现数据concat拼接的示例代码,具有一定的参考价值,... 目录语法示例:使用pandas.concat合并数据默认的concat:参数axis=0,join=

C#代码实现解析WTGPS和BD数据

《C#代码实现解析WTGPS和BD数据》在现代的导航与定位应用中,准确解析GPS和北斗(BD)等卫星定位数据至关重要,本文将使用C#语言实现解析WTGPS和BD数据,需要的可以了解下... 目录一、代码结构概览1. 核心解析方法2. 位置信息解析3. 经纬度转换方法4. 日期和时间戳解析5. 辅助方法二、L

使用Python和Matplotlib实现可视化字体轮廓(从路径数据到矢量图形)

《使用Python和Matplotlib实现可视化字体轮廓(从路径数据到矢量图形)》字体设计和矢量图形处理是编程中一个有趣且实用的领域,通过Python的matplotlib库,我们可以轻松将字体轮廓... 目录背景知识字体轮廓的表示实现步骤1. 安装依赖库2. 准备数据3. 解析路径指令4. 绘制图形关键

解决mysql插入数据锁等待超时报错:Lock wait timeout exceeded;try restarting transaction

《解决mysql插入数据锁等待超时报错:Lockwaittimeoutexceeded;tryrestartingtransaction》:本文主要介绍解决mysql插入数据锁等待超时报... 目录报错信息解决办法1、数据库中执行如下sql2、再到 INNODB_TRX 事务表中查看总结报错信息Lock

使用C#删除Excel表格中的重复行数据的代码详解

《使用C#删除Excel表格中的重复行数据的代码详解》重复行是指在Excel表格中完全相同的多行数据,删除这些重复行至关重要,因为它们不仅会干扰数据分析,还可能导致错误的决策和结论,所以本文给大家介绍... 目录简介使用工具C# 删除Excel工作表中的重复行语法工作原理实现代码C# 删除指定Excel单元

Linux lvm实例之如何创建一个专用于MySQL数据存储的LVM卷组

《Linuxlvm实例之如何创建一个专用于MySQL数据存储的LVM卷组》:本文主要介绍使用Linux创建一个专用于MySQL数据存储的LVM卷组的实例,具有很好的参考价值,希望对大家有所帮助,... 目录在Centos 7上创建卷China编程组并配置mysql数据目录1. 检查现有磁盘2. 创建物理卷3. 创