GG资源网

机器学习100天-Day2406 循环神经网络RNN(NLP自然语言处理)

大家好,这里是关于[WordPress禁用Embeds,WordPress关闭文章内的链接自动嵌入变成摘要形式],[机器学习100天-Day2406 循环神经网络RNN(NLP自然语言处理)]问题的解答,希望对您有所帮助。如果你还想了解更多这方面的信息,请关注本站其他相关内容,共同学习吧!

WordPress禁用Embeds,WordPress关闭文章内的链接自动嵌入变成摘要形式

教程大全wordpress,WordPress教程,WordPress禁用Embeds,WordPress禁用嵌入

WordPress禁用Embeds,WordPress关闭文章内的链接自动嵌入变成摘要形式

WordPress现在的版本插入一个链接,如果对方也是WordPress站点的话,自动会给他变成一个嵌入引用的方式,会显示网址内容的摘要、图片甚至视频。如下图:

这个功能大部分情况根本就不需要,而且有很坑的几个地方。

如果你发表的文章是这种形式,访客浏览文章出了家在你的这个页面之外,他还会加载引用的那个页面,引用的要是自己网站的还好。如果是别人的,或者国外的。速度就会非常慢。访客也看不到这个网址是啥。

个人建议禁用此功能。

如果不想禁用,偶尔引用下,但是不想某篇文章引用的话。在填入网址时候不要带网址前面的HTTP即可。

下面说下禁用此功能的方法:

1.使用插件Disable Embeds

直接在WordPress后台插件商店里搜索安装即可。

2.使用代码的方式禁用。

function disable_embeds_init() {
/* @var WP $wp */
global $wp;
// Remove the embed query var.
$wp->public_query_vars = array_diff( $wp->public_query_vars, array(
\'embed\',
) );
// Remove the REST API endpoint.
remove_action( \'rest_api_init\', \'wp_oembed_register_route\' );
// Turn off
add_filter( \'embed_oembed_discover\', \'__return_false\' );
// Don\'t filter oEmbed results.
remove_filter( \'oembed_dataparse\', \'wp_filter_oembed_result\', 10 );
// Remove oEmbed discovery links.
remove_action( \'wp_head\', \'wp_oembed_add_discovery_links\' );
// Remove oEmbed-specific JavaScript from the front-end and back-end.
remove_action( \'wp_head\', \'wp_oembed_add_host_js\' );
add_filter( \'tiny_mce_plugins\', \'disable_embeds_tiny_mce_plugin\' );
// Remove all embeds rewrite rules.
add_filter( \'rewrite_rules_array\', \'disable_embeds_rewrites\' );
}

add_action( \'init\', \'disable_embeds_init\', 9999 );

/**
* Removes the \'wpembed\' TinyMCE plugin.
*
* @since 1.0.0
*
* @param array $plugins List of TinyMCE plugins.
* @return array The modified list.
*/
function disable_embeds_tiny_mce_plugin( $plugins ) {
return array_diff( $plugins, array( \'wpembed\' ) );
}

/**
* Remove all rewrite rules related to embeds.
*
* @since 1.2.0
*
* @param array $rules WordPress rewrite rules.
* @return array Rewrite rules without embeds rules.
*/
function disable_embeds_rewrites( $rules ) {
foreach ( $rules as $rule => $rewrite ) {
if ( false !== strpos( $rewrite, \'embed=true\' ) ) {
unset( $rules[ $rule ] );
}
}
return $rules;
}

/**
* Remove embeds rewrite rules on plugin activation.
*
* @since 1.2.0
*/
function disable_embeds_remove_rewrite_rules() {
add_filter( \'rewrite_rules_array\', \'disable_embeds_rewrites\' );
flush_rewrite_rules();
}

register_activation_hook( __FILE__, \'disable_embeds_remove_rewrite_rules\' );

/**
* Flush rewrite rules on plugin deactivation.
*
* @since 1.2.0
*/
function disable_embeds_flush_rewrite_rules() {
remove_filter( \'rewrite_rules_array\', \'disable_embeds_rewrites\' );
flush_rewrite_rules();
}

register_deactivation_hook( __FILE__, \'disable_embeds_flush_rewrite_rules\' );

将以上代码写入functions.php文件即可禁用。

代码较长,我们可以将上面代码保存为php文件,然后引用这个PHP文件。这样functions.php里面的代码就不会太乱。

这里直接提供下这个php文件的下载。

蓝奏云:https://www.lanzous.com/i51k5ne

如何在functions.php中引用这个PHP?

//禁止加载wp-embeds.mins.js
include (TEMPLATEPATH . \'/includes/disable_embeds.php\');

/includes/disable_embeds.php这个就是你php文件上传的位置,你传到哪里这个位置就改到哪里。别直接复制我这个代码。

机器学习100天-Day2406 循环神经网络RNN(NLP自然语言处理)

说明:本文依据《Sklearn 与 TensorFlow 机器学习实用指南》完成,所有版权和解释权均归作者和翻译成员所有,我只是搬运和做注解。

进入第二部分深度学习

第十四章循环神经网络

循环神经网络可以分析时间序列数据,诸如股票价格,并告诉你什么时候买入和卖出。在自动驾驶系统中,他们可以预测行车轨迹,避免发生交通意外。

循环神经网络可以在任意长度的序列上工作,而不是之前讨论的只能在固定长度的输入上工作的网络。

举个例子,它们可以把语句,文件,以及语音范本作为输入,使得它们在诸如自动翻译,语音到文本或者情感分析(例如,读取电影评论并提取评论者关于该电影的感觉)的自然语言处理系统中极为有用。

另外,循环神经网络的预测能力使得它们具备令人惊讶的创造力。

可以要求它们去预测一段旋律的下几个音符,随机选取这些音符的其中之一并演奏它。然后要求网络给出接下来最可能的音符,演奏它,如此周而复始。

同样,循环神经网络可以生成语句,图像标注等。

在本章中,教程介绍以下几点

  • 循环神经网络背后的基本概念
  • 循环神经网络所面临的主要问题(在第11章中讨论的消失/爆炸的梯度),广泛用于反抗这些问题的方法:LSTM 和 GRU cell(单元)。
  • 展示如何用 TensorFlow 实现循环神经网络。最终我们将看看及其翻译系统的架构。

11.NLP

现在,大多数最先进的 NLP 应用(如机器翻译,自动摘要,解析,情感分析等)(至少一部分)都基于 RNN。在这一节中教程快速了解机器翻译模型的概况。 TensorFlow 的很厉害的 Word2Vec 和 Seq2Seq 教程非常好地介绍了这个主题。

这一节,主要做的是Word2Vec词嵌入

在这里使用的是http://mattmahoney.net/dc/text8.zip压缩文件,同样也可以下载下来,然后使用python进行读取。

from six.moves import urllib
import errno
import os
import zipfile
words_path=r\"./datasets/words\"
words_url=r\"http://mattmahoney.net/dc/text8.zip\"
def mkdir_p(path):
os.makedirs(path, exist_ok=True)
def fetch_words_data(words_url=words_url, words_path=words_path):
os.makedirs(words_path,exist_ok=True)
zip_path=os.path.join(words_path,\"words.zip\")
# if not os.path.exists(zip_path):
# urllib.request.urlretrieve(words_url, zip_path)
with zipfile.ZipFile(zip_path) as f:
data = f.read(f.namelist()[0])
return data.decode(\"ascii\").split()
#因为用的是pycharm,不像jupyter那样,所以第二次调试的时候注销了fetch_words_data方法中关于压缩包生成的代码
words=fetch_words_data()
# print(words[:5])

构建字典

words = fetch_words_data()
# print(words[:5])
vocabulary_size = 50000
vocabulary = [(\"UNK\", None)] + Counter(words).most_common(vocabulary_size - 1)
vocabulary = np.array([word for word, _ in vocabulary])
dictionary={word: code for code, word in enumerate(vocabulary)}
data = np.array([dictionary.get(word, 0) for word in words])
def generate_batch(batch_size, num_skips, skip_window):
global data_index
assert batch_size % num_skips==0
assert num_skips<=2*skip_window
batch=np.ndarray(shape=[batch_size], dtype=np.int32)
labels=np.ndarray(shape=[batch_size,1],dtype=np.int32)
span=2*skip_window+1
buffer=deque(maxlen=span)
for _ in range(span):
buffer.append(data[data_index])
data_index = (data_index + 1) % len(data)
for i in range(batch_size // num_skips):
target = skip_window # target label at the center of the buffer
targets_to_avoid = [skip_window]
for j in range(num_skips):
while target in targets_to_avoid:
target = np.random.randint(0, span)
targets_to_avoid.append(target)
batch[i * num_skips + j] = buffer[skip_window]
labels[i * num_skips + j, 0] = buffer[target]
buffer.append(data[data_index])
data_index = (data_index + 1) % len(data)
return batch, labels
np.random.seed(42)

构建模型

batch_size = 128
embedding_size = 128 # 嵌入向量的维度
skip_window = 1
num_skips = 2 # 重复使用输入来生成标签的次数
# 在这里,教程使用了一个随机验证集作为最近临进行抽样,只是取了16个单词作为样例
valid_size = 16
valid_window = 100
valid_examples = np.random.choice(valid_window, valid_size, replace=False)
num_sampled = 64
learning_rate = 0.01
train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1])
valid_dataset = tf.constant(valid_examples, dtype=tf.int32)
vocabulary_size=50000
embedding_size=150
init_embeds=tf.random_uniform([vocabulary_size,embedding_size],-1.0,1.0)
embeddings=tf.Variable(init_embeds)
train_inputs=tf.placeholder(tf.int32,shape=[None])
embed=tf.nn.embedding_lookup(embeddings,train_inputs)
# Construct the variables for the NCE loss
nce_weights = tf.Variable(
tf.truncated_normal([vocabulary_size, embedding_size],
stddev=1.0 / np.sqrt(embedding_size)))
nce_biases = tf.Variable(tf.zeros([vocabulary_size]))
# 计算每一个batch的NCE loss
# tf.nce_loss:NCE是softmax的一种近似,但是为什么要做这种近似,而不直接用softmax呢?因为效果比softmax好……
loss = tf.reduce_mean(
tf.nn.nce_loss(nce_weights, nce_biases, train_labels, embed,
num_sampled, vocabulary_size))
optimizer = tf.train.AdamOptimizer(learning_rate)
training_op = optimizer.minimize(loss)
# 使用cosine计算各个词汇之间的相似度
norm = tf.sqrt(tf.reduce_sum(tf.square(embeddings), axis=1, keepdims=True))
normalized_embeddings = embeddings / norm
valid_embeddings = tf.nn.embedding_lookup(normalized_embeddings, valid_dataset)
similarity = tf.matmul(valid_embeddings, normalized_embeddings, transpose_b=True)
init = tf.global_variables_initializer()

训练模型

num_steps = 10001
with tf.Session() as session:
init.run()
average_loss = 0
for step in range(num_steps):
print(\"\\rIteration: {}\".format(step), end=\"\\t\")
batch_inputs, batch_labels = generate_batch(batch_size, num_skips, skip_window)
feed_dict = {train_inputs: batch_inputs, train_labels: batch_labels}
# We perform one update step by evaluating the training op (including it
# in the list of returned values for session.run()
_, loss_val = session.run([training_op, loss], feed_dict=feed_dict)
average_loss += loss_val
if step % 2000 == 0:
if step > 0:
average_loss /= 2000
# The average loss is an estimate of the loss over the last 2000 batches.
print(\"Average loss at step \", step, \": \", average_loss)
average_loss = 0
# Note that this is expensive (~20% slowdown if computed every 500 steps)
if step % 10000 == 0:
sim = similarity.eval()
for i in range(valid_size):
valid_word = vocabulary[valid_examples[i]]
top_k = 8 # number of nearest neighbors
nearest = (-sim[i, :]).argsort()[1:top_k + 1]
log_str = \"Nearest to %s:\" % valid_word
for k in range(top_k):
close_word = vocabulary[nearest[k]]
log_str = \"%s %s,\" % (log_str, close_word)
print(log_str)
final_embeddings = normalized_embeddings.eval()
np.save(\"./tf_logs/my_final_embeddings.npy\", final_embeddings)

在这里可以看到第一次迭代的时候,损失为284,16个样本的最近词汇都不是很准确,。

在经过10001次迭代后,损失降低到26,同时16个样本的最近词汇也比较正常了。

Iteration: 0 Average loss at step 0 : 284.065673828125

Nearest to people: gave, replaces, accept, shelves, pinnacle, grandfather, encompass, ctbt,

Nearest to can: ratites, spenser, aai, paine, neanderthal, jodie, fed, lewinsky,

Nearest to than: newborn, bolivian, harness, nineties, lpc, masochism, simplifying, cassady,

Nearest to its: austere, norsemen, cantigas, hermione, flockhart, quackery, apr, quaestor,

Nearest to their: dilemmas, holbach, anisotropic, imposes, slavic, buckley, chola, rivest,

Nearest to have: consumes, drogheda, resignation, knitted, traitors, sandro, bremer, azeotrope,

Nearest to six: cornelius, chagas, parrot, mckenzie, immunization, kombinate, literature, delgado,

Nearest to over: menachem, udf, chukotka, haste, kamal, edition, receiving, magnetism,

Nearest to two: songwriter, cultured, imitates, cheka, mpa, heracleidae, given, colon,

Nearest to UNK: conferring, hole, metrolink, zeno, macrovision, trash, maduro, sporadic,

Nearest to was: cruisers, and, boudinot, moabites, struggling, fractal, superstardom, kinship,

Nearest to his: alkalis, sellers, licensee, libertine, hackers, herbivorous, parthenon, breed,

Nearest to b: duval, cannabis, libretto, divider, blythe, bloemfontein, terry, preserved,

Nearest to this: vaccinations, alpinus, emancipated, toaster, gorilla, io, ther, undergoing,

Nearest to use: inertial, hypertension, devotions, brokered, crumbs, discrepancy, polyatomic, vor,

Nearest to one: preview, duo, bout, sets, etruscan, chaplin, jedi, cryonicists,

Iteration: 2000 Average loss at step 2000 : 131.6964364299774

Iteration: 4000 Average loss at step 4000 : 62.64465307974815

Iteration: 6000 Average loss at step 6000 : 41.10564466834068

Iteration: 8000 Average loss at step 8000 : 31.070515101790427

Iteration: 10000 Average loss at step 10000 : 26.000642070174216

Nearest to people: abbots, that, oath, dilation, stream, UNK, bellows, astatine,

Nearest to can: ginsberg, is, may, evolutionary, therefore, nonfiction, that, to,

Nearest to than: much, kleine, possess, the, or, a, irate, charges,

Nearest to its: the, carbonate, tarmac, rn, symbolic, secluded, seismology, nsu,

Nearest to their: the, propositional, atomists, decreed, antigua, gangsta, counterparts, astatine,

Nearest to have: and, has, hebrides, been, aberdeen, axon, tableland, be,

Nearest to six: seven, one, four, five, eight, zero, nine, two,

Nearest to over: utopian, beers, jabir, screens, plural, gangsta, originates, airships,

Nearest to two: zero, three, one, four, five, seven, nine, six,

Nearest to UNK: and, the, one, cosmonaut, altaic, of, astatine, bicycle,

Nearest to was: and, actinium, aberdeenshire, had, aggression, ceased, kierkegaard, overseas,

Nearest to his: the, absurd, orange, and, alhazred, atomism, plutonium, explosive,

Nearest to b: one, art, six, eight, seven, indelible, bicycle, dragster,

Nearest to this: the, that, asteraceae, morphism, willing, whale, a, aquarius,

Nearest to use: arrest, and, morphisms, winfield, aziz, quantum, ataxia, carrot,

Nearest to one: nine, seven, eight, four, five, six, two, three,

绘制词嵌入图

def plot_with_labels(low_dim_embs, labels):
assert low_dim_embs.shape[0] >= len(labels), \"More labels than embeddings\"
plt.figure(figsize=(18, 18)) # in inches
for i, label in enumerate(labels):
x, y = low_dim_embs[i, :]
plt.scatter(x, y)
plt.annotate(label,
xy=(x, y),
xytext=(5, 2),
textcoords=\'offset points\',
ha=\'right\',
va=\'bottom\')
from sklearn.manifold import TSNE
tsne = TSNE(perplexity=30, n_components=2, init=\'pca\', n_iter=5000)
plot_only = 500
low_dim_embs = tsne.fit_transform(final_embeddings[:plot_only, :])
labels = [vocabulary[i] for i in range(plot_only)]
plot_with_labels(low_dim_embs, labels)

这张图我没跑出来,下楼吃饭了,回来的时候看到已经跑完,但是没有图,借用教程里面的了。

由于网站搬家,部分链接失效,如无法下载,请联系站长!谢谢支持!
1. 带 [亲测] 说明源码已经被站长亲测过!
2. 下载后的源码请在24小时内删除,仅供学习用途!
3. 分享目的仅供大家学习和交流,请不要用于商业用途!
4. 本站资源售价只是赞助,收取费用仅维持本站的日常运营所需!
5. 本站所有资源来源于站长上传和网络,如有侵权请邮件联系站长!
6. 没带 [亲测] 代表站长时间紧促,站长会保持每天更新 [亲测] 源码 !
7. 盗版ripro用户购买ripro美化无担保,若设置不成功/不生效我们不支持退款!
8. 本站提供的源码、模板、插件等等其他资源,都不包含技术服务请大家谅解!
9. 如果你也有好源码或者教程,可以到审核区发布,分享有金币奖励和额外收入!
10.如果您购买了某个产品,而我们还没来得及更新,请联系站长或留言催更,谢谢理解 !
GG资源网 » 机器学习100天-Day2406 循环神经网络RNN(NLP自然语言处理)

发表回复

CAPTCHAis initialing...