I am stuck on the SIGSEGV problem while trying to do training on a mobile device (Nexus 7), can you help me out?
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
Yes. I added a couple of lines to the TensorFlowInferenceInterface.java
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
Ubuntu 16.04
- TensorFlow installed from (source or binary):
Source
- TensorFlow version (use command below):
1.2.1
- Python version:
2.7.12
- Bazel version (if compiling from source):
0.52
- CUDA/cuDNN version:
-
- GPU model and memory:
-
- Exact command to reproduce:
-
Describe the problem
I am trying to find out the performance of doing training on mobile devices.
I got the Fatal signal 11 (SIGSEGV) when I am trying to work with a graph including a couple of convolutional layers, FC layers, and cross entropy operation.
Here is what I did:
I modified the TensorFlowInferenceInterface.java by adding a function for initialization all nodes; then I was able to reproduce the example in the following link in Android.
https://tebesu.github.io/posts/Training-a-TensorFlow-graph-in-C++-API
I try to do the training with a more complicated graph (including a couple of convolutional layers, FC layers and cross entropy operation). I can do the training with this graph using the C++ API.
I first solved the No OpKernel problem. The error message is as follows.
No OpKernel was registered to support Op 'SparseSoftmaxCrossEntropyWithLogits' with these attrs.
I add the sparse_xent_op.h and the sparse_xent_op.cc files to the BUILD file locates at /tensorflow/tensorflow/core/kernels/.
I modify the list_op_files.txt locally, adding the sparse_xent_op operator and, rebuild the Tensorflow source.
The No OpKernel error disappears, but the program stops at runner.runAndFetchMetadata() and throws me an error:
A/libc: Fatal signal 11 (SIGSEGV), code 1, fault addr 0x0 in tid 15173 (inference)
Source code / logs
TensorFlowInferenceInterface.java:
public void runTarget(String[] outputNames){
Log.d(TAG, "start of the runTarget");
for (String t : outputNames) {
runner.addTarget(t);
}
Log.d(TAG, "finished adding target");
runner.runAndFetchMetadata();
Log.d(TAG,"runAndFetchMetadata");
runner = sess.runner();
Log.d(TAG,"sess.runner");
}
Corresponding Logcat messages:
08-06 16:25:01.900 15099-15173/org.tensorflow.demo D/TensorFlowInferenceInterface: start of the runTarget
08-06 16:25:01.900 15099-15173/org.tensorflow.demo D/TensorFlowInferenceInterface: finished adding target
08-06 16:25:04.576 15099-15173/org.tensorflow.demo A/libc: Fatal signal 11 (SIGSEGV), code 1, fault addr 0x0 in tid 15173 (inference)
Crash dump result:
********** Crash dump: **********
Build fingerprint: 'google/razor/flo:6.0/MRA58K/2256973:user/release-keys'
pid: 5414, tid: 5478, name: inference >>> org.tensorflow.demo <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0
Stack frame #00 pc 00017664 /system/lib/libc.so (__memcpy_base+91)
Stack frame #1 pc 0072a203 /data/app/org.tensorflow.demo-2/lib/arm/libtensorflow_inference.so
Stack frame #2 pc 006eea49 /data/app/org.tensorflow.demo-2/lib/arm/libtensorflow_inference.so
Stack frame #3 pc 006d59d3 /data/app/org.tensorflow.demo-2/lib/arm/libtensorflow_inference.so
Stack frame #4 pc 006ed9c3 /data/app/org.tensorflow.demo-2/lib/arm/libtensorflow_inference.so
Stack frame #5 pc 006df991 /data/app/org.tensorflow.demo-2/lib/arm/libtensorflow_inference.so
Stack frame #6 pc 006dffb5 /data/app/org.tensorflow.demo-2/lib/arm/libtensorflow_inference.so
Stack frame #7 pc 0009eff9 /data/app/org.tensorflow.demo-2/lib/arm/libtensorflow_inference.so
Stack frame #8 pc 0009f403 /data/app/org.tensorflow.demo-2/lib/arm/libtensorflow_inference.so
Stack frame #9 pc 00099085 /data/app/org.tensorflow.demo-2/lib/arm/libtensorflow_inference.so (Java_org_tensorflow_Session_run+920)
Stack frame #10 pc 000eaa29 /system/lib/libart.so (art_quick_generic_jni_trampoline+40)
Stack frame #11 pc 000e6331 /system/lib/libart.so (art_quick_invoke_stub_internal+64)
Stack frame #12 pc 00402663 /system/lib/libart.so (art_quick_invoke_static_stub+170)
Stack frame #13 pc 001009f4 [stack:5478]
I am stuck on the SIGSEGV problem while trying to do training on a mobile device (Nexus 7), can you help me out?
System information
Yes. I added a couple of lines to the TensorFlowInferenceInterface.java
Ubuntu 16.04
Source
1.2.1
2.7.12
0.52
Describe the problem
I am trying to find out the performance of doing training on mobile devices.
I got the Fatal signal 11 (SIGSEGV) when I am trying to work with a graph including a couple of convolutional layers, FC layers, and cross entropy operation.
Here is what I did:
I modified the TensorFlowInferenceInterface.java by adding a function for initialization all nodes; then I was able to reproduce the example in the following link in Android.
https://tebesu.github.io/posts/Training-a-TensorFlow-graph-in-C++-API
I try to do the training with a more complicated graph (including a couple of convolutional layers, FC layers and cross entropy operation). I can do the training with this graph using the C++ API.
I first solved the No OpKernel problem. The error message is as follows.
No OpKernel was registered to support Op 'SparseSoftmaxCrossEntropyWithLogits' with these attrs.
I add the sparse_xent_op.h and the sparse_xent_op.cc files to the BUILD file locates at /tensorflow/tensorflow/core/kernels/.
I modify the list_op_files.txt locally, adding the sparse_xent_op operator and, rebuild the Tensorflow source.
The No OpKernel error disappears, but the program stops at runner.runAndFetchMetadata() and throws me an error:
A/libc: Fatal signal 11 (SIGSEGV), code 1, fault addr 0x0 in tid 15173 (inference)
Source code / logs
TensorFlowInferenceInterface.java:
public void runTarget(String[] outputNames){
Log.d(TAG, "start of the runTarget");
for (String t : outputNames) {
runner.addTarget(t);
}
Corresponding Logcat messages:
08-06 16:25:01.900 15099-15173/org.tensorflow.demo D/TensorFlowInferenceInterface: start of the runTarget
08-06 16:25:01.900 15099-15173/org.tensorflow.demo D/TensorFlowInferenceInterface: finished adding target
08-06 16:25:04.576 15099-15173/org.tensorflow.demo A/libc: Fatal signal 11 (SIGSEGV), code 1, fault addr 0x0 in tid 15173 (inference)
Crash dump result:
********** Crash dump: **********
Build fingerprint: 'google/razor/flo:6.0/MRA58K/2256973:user/release-keys'
pid: 5414, tid: 5478, name: inference >>> org.tensorflow.demo <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0
Stack frame #00 pc 00017664 /system/lib/libc.so (__memcpy_base+91)
Stack frame #1 pc 0072a203 /data/app/org.tensorflow.demo-2/lib/arm/libtensorflow_inference.so
Stack frame #2 pc 006eea49 /data/app/org.tensorflow.demo-2/lib/arm/libtensorflow_inference.so
Stack frame #3 pc 006d59d3 /data/app/org.tensorflow.demo-2/lib/arm/libtensorflow_inference.so
Stack frame #4 pc 006ed9c3 /data/app/org.tensorflow.demo-2/lib/arm/libtensorflow_inference.so
Stack frame #5 pc 006df991 /data/app/org.tensorflow.demo-2/lib/arm/libtensorflow_inference.so
Stack frame #6 pc 006dffb5 /data/app/org.tensorflow.demo-2/lib/arm/libtensorflow_inference.so
Stack frame #7 pc 0009eff9 /data/app/org.tensorflow.demo-2/lib/arm/libtensorflow_inference.so
Stack frame #8 pc 0009f403 /data/app/org.tensorflow.demo-2/lib/arm/libtensorflow_inference.so
Stack frame #9 pc 00099085 /data/app/org.tensorflow.demo-2/lib/arm/libtensorflow_inference.so (Java_org_tensorflow_Session_run+920)
Stack frame #10 pc 000eaa29 /system/lib/libart.so (art_quick_generic_jni_trampoline+40)
Stack frame #11 pc 000e6331 /system/lib/libart.so (art_quick_invoke_stub_internal+64)
Stack frame #12 pc 00402663 /system/lib/libart.so (art_quick_invoke_static_stub+170)
Stack frame #13 pc 001009f4 [stack:5478]